# Scraping basics for Selenium

If you feel comfortable with scraping, you're free to skip this notebook.

## Part 0: Imports

Import what you need to use Selenium, and start up a new Chrome to use for scraping. You might want to copy from the [Selenium snippets](http://jonathansoma.com/lede/foundations-2018/classes/selenium/selenium-snippets/) page.

**You only need to do `driver = webdriver.Chrome(...)` once,** every time you do it you'll open a new Chrome instance. You'll only need to run it again if you close the window (or want another Chrome, for some reason).

In [36]:
import pandas as pd

import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

from webdriver_manager.chrome import ChromeDriverManager

In [37]:
driver = webdriver.Chrome(ChromeDriverManager().install())



Could not get version for google-chrome with the any command: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version
Current google-chrome version is UNKNOWN
Get LATEST chromedriver version for UNKNOWN google-chrome
Driver [/Users/prinzmagtulis/.wdm/drivers/chromedriver/mac64/96.0.4664.45/chromedriver] found in cache
  driver = webdriver.Chrome(ChromeDriverManager().install())


## Part 1: Scraping by class

Scrape the content at http://jonathansoma.com/lede/static/by-class.html, printing out the title, subhead, and byline.

In [38]:
driver.get("https://jonathansoma.com/lede/static/by-class.html")

In [39]:
title= driver.find_element(By.CLASS_NAME, "title")
print(title.text)

How to Scrape Things


In [40]:
subhead= driver.find_element(By.CLASS_NAME, "subhead")
print(subhead.text)

Some Supplemental Materials


In [41]:
byline= driver.find_element(By.CLASS_NAME, "byline")
print(byline.text)

By Jonathan Soma


## Part 2: Scraping using tags

Scrape the content at http://jonathansoma.com/lede/static/by-tag.html, printing out the title, subhead, and byline.

In [42]:
driver.get("http://jonathansoma.com/lede/static/by-tag.html")

In [43]:
title= driver.find_element(By.TAG_NAME, "h1")
print(title.text)

How to Scrape Things


In [44]:
subhead= driver.find_element(By.TAG_NAME, "h3")
print(subhead.text)

Some Supplemental Materials


In [45]:
byline= driver.find_element(By.TAG_NAME, "p")
print(byline.text)

By Jonathan Soma


## Part 3: Scraping using a single tag

Scrape the content at http://jonathansoma.com/lede/static/by-list.html, printing out the title, subhead, and byline.

> **This will be important for the next few:** if you scrape multiples, you have a list. Even though it's Seleninum, you can use things like `[0]`, `[1]`, `[-1]` etc just like you would for a normal list.

In [46]:
driver.get("http://jonathansoma.com/lede/static/by-list.html")

In [49]:
tags= driver.find_elements(By.TAG_NAME, "p")
tags[0].text

'How to Scrape Things'

In [50]:
tags[1].text

'Some Supplemental Materials'

In [51]:
tags[2].text

'By Jonathan Soma'

## Part 4: Scraping a single table row

Scrape the content at http://jonathansoma.com/lede/static/single-table-row.html, printing out the title, subhead, and byline.

In [53]:
driver.get("http://jonathansoma.com/lede/static/single-table-row.html")

In [54]:
table= driver.find_elements(By.TAG_NAME, "td")

In [56]:
table[0].text

'How to Scrape Things'

In [57]:
table[1].text

'Some Supplemental Materials'

In [58]:
table[2].text

'By Jonathan Soma'

## Part 5: Saving into a dictionary

Scrape the content at http://jonathansoma.com/lede/static/single-table-row.html, saving the title, subhead, and byline into a single dictionary called `book`.

> Don't use pandas for this one!

In [60]:
driver.get("https://jonathansoma.com/lede/static/single-table-row.html")

In [64]:
tag = driver.find_elements(By.TAG_NAME, "td")

In [65]:
book = {}
book['title'] = tag[0].text
book['subhead'] = tag[1].text
book['byline'] = tag[2].text
print(book)

{'title': 'How to Scrape Things', 'subhead': 'Some Supplemental Materials', 'byline': 'By Jonathan Soma'}


## Part 6: Scraping multiple table rows

Scrape the content at http://jonathansoma.com/lede/static/multiple-table-rows.html, printing out each title, subhead, and byline.

> You won't use pandas for this one, either!

In [66]:
driver.get("http://jonathansoma.com/lede/static/multiple-table-rows.html")

In [70]:
rows = driver.find_elements(By.TAG_NAME, "tr")

for row in rows:
    cells= row.find_elements(By.TAG_NAME, "td")
    print("Title:", cells[0].text)
    print("Subhead:", cells[1].text)
    print("Byline:", cells[2].text)
    print("-----------")

Title: How to Scrape Things
Subhead: Some Supplemental Materials
Byline: By Jonathan Soma
-----------
Title: How to Scrape Many Things
Subhead: But, Is It Even Possible?
Byline: By Sonathan Joma
-----------
Title: The End of Scraping
Subhead: Let's All Use CSV Files
Byline: By Amos Nathanos
-----------


## Part 7: Scraping an actual table

Scrape the content at http://jonathansoma.com/lede/static/the-actual-table.html, creating a list of dictionaries.

> Don't use pandas here, either!

In [None]:
#Didn't answer No. 7 since you already uploaded the video for this. 

In [None]:
#Answered my way until 6, before your video.

## Part 8: Scraping multiple table rows into a list of dictionaries

Scrape the content at http://jonathansoma.com/lede/static/the-actual-table.html, creating a pandas DataFrame.

> There are two ways to do this one! One uses just pandas, the other one uses the result from Part 7.

## Part 9: Scraping into a file

Scrape the content at http://jonathansoma.com/lede/static/the-actual-table.html and save it as `output.csv`