# Intro to Python 3: Bonus
### Scrape from multiple pages

Below are the relevant portions of our last script with some added comments. We're going to add a step to the scraping of each row in the table: step through the link to the individual reactor page and collect additional data for the CSV.

The difficulty here is that we need to pull data from a paragraph of text; we'll break the paragraph apart into a list of its component lines and then isolate the data from those. We'll also do it with a function so that we don't have to write the same code repeatedly.

In [1]:
import requests
from bs4 import BeautifulSoup
import unicodecsv as csv
import time

The function we're going to write and what it does:
```python
def finder(list_to_search, term): # take two arguments: a list and a search string
    for item in list_to_search: # loop through each item in the list
        if term.upper() in item.upper(): # if the search string appears in the list item ...
            return item.split(':')[1].strip() # ... then return the part that appears AFTER the colon
```

In [2]:
# text-finding function goes here


In [3]:
url = "http://www.nrc.gov/reactors/operating/list-power-reactor-units.html"
main_page = requests.get(url)
soup = BeautifulSoup(main_page.content, 'html.parser')

reactors_table = soup.find('table')

In [4]:
scraped = []

for row in reactors_table.find_all('tr')[1:]:
    cells = row.find_all('td')
    reactor_name = cells[0].contents[0].text
    link = 'http://www.nrc.gov' + cells[0].contents[0].get('href')
    docket = cells[0].contents[2]
    license = cells[1].text
    reactor_type = cells[2].text
    location = cells[3].text
    owner = cells[4].text
    region = cells[5].text
    # add steps to the loop here
    # get the individual reactor page with requests
    
    # run the response through BeautifulSoup so that it can be navigated
    
    # isolate the table cell with the text we want to pick over and then split it up on line breaks
    
    # use the new function to grab the megawattage, vendor and containment type
    
    # print an informational status message to yourself
    
    # add these to the list that will ultimately be written to CSV
    
    scraped.append([reactor_name, link, docket, license, reactor_type, location, owner, region])
    # IMPORTANT: pause for a couple of seconds between page requests
    

In [5]:
with open('reactor_data.csv', 'wb') as outfile:
    writer = csv.writer(outfile)
    # add the new columns to the header row
    writer.writerow(['reactor_name', 'link', 'docket', 'license', 'reactor_type', 'location', 'owner', 'region'])
    writer.writerows(scraped)