## Intro to Python 3
### Scraping data

While code-free tools are handy in a pinch, scripts written in Python or another language are more flexible and adaptable. They can also run automatically in the background on a schedule. Also, you don't have to worry about a service or a tool ever disappearing, making all your hard work for naught.

The steps we will take together in Python to make this happen:

1. Fetch a web page
2. Make the HTML into something Python can navigate
3. Isolate a table
4. Loop through each row (and cell), extracting the text
5. Write all the data to a CSV

Instead of just powering through and hoping for the best, we'll mess around a bit as we go so you can see what each step is doing.

In [None]:
# import modules to facilitate the scrape


`requests` is great at playing web browser. For more information, check out the [full documentation](http://docs.python-requests.org/en/master/).

```python
requests.get('some URL')
# navigates to a site and sends you the response

response.content
# a way requests serves up the page's source code (HTML)
```

We are going to be getting data on nuclear reactors operating in the U.S.: http://www.nrc.gov/reactors/operating/list-power-reactor-units.html

In [None]:
# fetch the contents of webpage with requests


In [None]:
# let BeautifulSoup parse the content of that page


Two key ways to isolate specific sections of the web page in question with `BeautifulSoup`:
```python
soup.find('some HTML tag')
# returns the first tag that matches

soup.find_all('some HTML tag')
# returns a list of all tags that match
```

(`BeautifulSoup` also has [detailed documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for the various ways in which it can parse HTML and XML.)

In [None]:
# snip out the table and pass it to a new variable


In [None]:
# print reactor_table to verify we have the right thing


In [None]:
# use .find_all to create a list of rows in the table


In [None]:
# isolate the second row and print it


One of our table's rows, with a little shading and indentation:

```html
<tr valign="top">
    <td scope="row"><a href="/info-finder/reactors/ano1.html">Arkansas Nuclear 1</a><br/>05000313</td>
    <td align="center">DPR-51</td>
    <td>PWR</td>
    <td>6 miles WNW of Russellville,  AR</td>
    <td>Entergy Nuclear Operations, Inc.</td>
    <td align="middle">4</td>
</tr>```

In [None]:
# use .find_all again to generate a list of the row's cells and return it


BeautifulSoup has a few other methods that are helpful for extracting the information _inside_ of tags:
```python
soup.contents
# breaks up everything in a tag into a fresh list (useful when you have more than text in a cell)

soup.text
# returns the text in a tag as a string

soup.get('some attribute')
# returns the attribute (useful for getting URLs, for example)
```

In [None]:
# let's break apart the contents of the first column: the name, the link and the docket number


OK, now for the tricky part. We need to through each row in the table and extract the contents of each cell. We'll set up an empty list beforehand and append each row of extracted data to it as a list.

In [None]:
# make an empty list to hold the data


# a for loop is going to take us through every row in the table EXCEPT the header
# combining two steps: the list it pulls from will be greated by a .find_all for 'tr' tags

    
    # .find_all 'td' tags in the row and put them into a variable

    
    # extract the cell contents

    
    # append the collected data to the empty list


It's been great, of course, but now we need to get all the data out of the script and into a usable format. We're using `unicodecsv`, which glosses over Python 2's shortcomings for dealing with the unicode characters that exist in this table, letting us write them with ease.

```python
open('some file', 'read/write/append?')
# open a file and tell Python how to treat it

csv.writer('some file we opened')
# make a writer object that can move information from your script to a file in CSV form

writer_obj.writerow('some list of strings')
# write a single row

writer_obj.writerows('some list of lists of strings')
# write a bunch of rows
```

`unicodecsv` works a lot like the standard `csv` Python module, so check out [the documentation](https://docs.python.org/2/library/csv.html) for more examples of how it all works. 

In [None]:
# open a file and write our data to it
