### We're going to cover the basic of interacting with a webpage


#### Table Parser Code

Run the table parser code before continuing with the lesson


In [None]:
#import the html parser that constructs of tree of tags and what's in them
import lxml.html as ET

#Let's make a function that reads tables and gets the useful information
#content_string is the source code for the page
#table_number is which table we should parse if there are multiple tables on the page. 
#The default value for table_number is 0, meaning retreive the first table
def table_reader(source_code,table_number=0):
    
    #send the page html to the html parser
    doc = ET.fromstring(source_code)
    
    #make an empty list to save our table into
    data=[]
    
    #look in between the tags that say "table" and find all of the row elements, which are the <tr> tags 
    #the table indicates, which table on the page to retreive in case there are many
    rows = doc.xpath("//table")[table_number].findall("tr")
    
    #go through the list of table rows    
    for row in rows:
        #append to our data all of the data in the cells of the row
        data.append([c.text_content() for c in row.getchildren()])
    
    #return the data list
    return data

#### Interacting with Websites Automatically

Sometimes, the data we need is within a site that doesn't have a url.

Rather, we first have to interact with form, select the necessary information, and submit it by pressing a button.

Look at this website, for example:
http://www.chicagoelections.com/en/election3.asp

Try to access the Mayoral Results for the 2015 Municipal Elections using your browser.

After you access the results, try to get back to that results page directly without going through the form process.

Is it even possible?

#### Automating the web browsing process with Mechanize

We can use the mechanize library to emulate a browser, and interact with websites in an automated way.

Make sure you have mechanize installed. If you do not, go to your commandline terminal and enter: pip install mechanize.

#### Creating the browser in Python

We''ll first import the mechanize library

Then, we'll create an object called br that is our browser.

This object can interact with webpage, and perform the commands we give it

In [None]:
import mechanize

#start the mechanize browser
br = mechanize.Browser()

#### How to make the automated browser look more convincing to websites

We have to tell the br object to ignore text files that some websites have called "robots.txt"

These two lines bypass robot resriction and have our broswer able to pretend that is is a real browser

In [None]:
#some websites have a txt file that say not to use a bot
#the next lines tells mechanize to ignore it
br.set_handle_robots(False)
br.set_handle_equiv(False) 

#tell the mechanize browser to pretend it's a real Firefox Browser by setting the header information that is passed when a browser makes a request
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

#### Accessing a website through an automated browser

Now, we are ready to accessing websites

We will tell our browser to open the website

We can have have it print out the url it is on to verify

In [None]:
#Have the mechanize browser open the webpage
br.open("http://www.chicagoelections.com/en/election3.asp")

#Print out the url
br.geturl()

#### Interacting with forms and buttons on websites

Because we are on the website, we can access the form.

In mechanize, we can tell our browser to select a form by its name

To know the name to pass on, we have to go to the website ourselves and see the name of the form in the page source.

The name of the form is "form1"

We create a form variable that holds the form and shows us our available elements to interact with

In [None]:
#There is a form whose name "form1" 
#Tell the browser to select it
br.select_form(name="form1")

#### Selecting an item from a drop-down menu

Let's interact with the drop-down menu by selecting on of its elements and setting its value

In [None]:
#There is an dropdown menu called "D3". It's gives list of options of results.
#Set the value to be the Municipal General Elections Value
br.form['D3'] = ["2015 Municipal General - 2/24/15                  "]
print form

#### Submitting the form
We can tell the broswer to submit the form, which should take us to the next page

In [None]:
#Submit the form and be taken to the next page, which asks what information from the elections do you want
response = br.submit()

#print out the url to show it's a new page
br.geturl()

#### Repeating the process on the next page
This new page has another form on it.

The form wants us to select, which results we want from the election we selected in the first form.

We'll select the mayor option this time

In [None]:
br.select_form(name="form1")
br.form['D3'] = ["Mayor"]

#### Submitting the second form

We'll submit this form, and be taken to the page that contains the table we want

In [None]:
response = br.submit()
br.geturl()

#### Extracting the table information
The last step is to get the page source code and pass it on to our table reader function

The table reader function will return a list of lists that we'll call "tabledata"

In [None]:
content_string = response.read().encode("utf-8")
tabledata = table_reader(content_string)

#### Saving the results in a dataframe

We'll cutoff some unnecessary rows from the tabledata object and make a pandas DataFrame from it

In [None]:
#remove unecessary title headers, which occur in the first two rows and last 3 rows
#we just want the information in between (column headers, and data)
data_no_title = tabledata[2:-4]

import pandas as pd
mayorvotes_df = pd.DataFrame(data_no_title[1:],columns=data_no_title[0])

mayorvotes_df.head(10)

#### Try it yourself

Imagine as a researcher your were interested in seeing how the mayoral preferences of a Ward relate to how Democratic/Republican a Ward is.

Write code that accesses the data from the <U> 2012 General Election </U> and create a DataFrame that contains the data table for how many and what percentage of people in each Ward voted for each presidential candidate.

<U> Merge the presidential voting dataframe with the data frame we have for mayoral results </U>. Remember the syntax for merging two dataframes is: merge(dataframe1, dataframe2, on=keycolumn)
For the merge function, you provide the names of the two dataframe, and the column name that serves as your key for letting the merge function know how to line up the dataframes so everything matches.

The end result of your script should be a single pandas dataframe with the presidential results and mayoral results for each ward.





