<a href="https://colab.research.google.com/github/nalderto/POL300-Public/blob/master/modules/module-2/module-2-project/module-2-project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 2 Project

Instead of having a quiz this week, we are going to have a short project.  This is going to be very similar to the Python exercises, however we won't be covering any additional content.  You are going to utilize the Python skills covered in modules 1 and 2.  

For this project, we are going to be looking at some data hosted on the Brookings Institute on the cost of winning a Congressional seat.  Michael Malbin, Brendan Glavin, and the Campaign Finance Institute aggregated the data on how much successful Senate and House candidates spent to win their election.

This is snapshot of what the data looks like.  As you can see, it is broken down by election year and chamber.
![image.png](https://user-images.githubusercontent.com/25762130/89834483-f1ceda80-db30-11ea-8594-399b0a09c573.png)

We have provided the code to retrieve the data, all you need to do is to write the code to process the data and answer the following questions.  

Once you have passed all the test cases, download the .ipynb file using the instructions in module 1.  Then, upload it to the "Python Module 2 Project" assignment on Brightspace.

## Obtaining the Data
In Python, there a ton of ways to import data.  For this project, we are going to be using a couple Python modules.  Modules are essentially functions that other people have created that we can use.  We are going to discuss modules in more detail during the next assignment.  We are going to be using the `requests` and `csv` modules to obtain the data.  

The `requests` module allows us to download things from the Internet.  In this case, we are downloading a CSV (Comma Separated Values) file.  This is essentially a table of values that are separated by commas.  We are going to cover this in more detail in the Python module 3, however here is a brief explanation.  We are calling the `.get()` function from the `requests` module.  We "get" the data from the web address in the `url` variable.  Then we use `.content` to get the content from that page.  Finally, we use `.decode('ascii')`, which converts binary into text we can read.

The next line creates a DictReader object from the CSV file retrieved above.  This is essentially a list of dictionaries.  Each dictionary has four keys, `Year`, `Chamber`, `NominalDollars`, and `2018Dollars`.  The corresponding values are each of the rows in the CSV file.  The `.splitlines()` function splits up the string of the CSV file into a list of strings, with each entry representing a row in the CSV file.  This is needed for `csv.DictReader` to work.  We also use `list()` to convert the DictReader into an actual list, since you have learned how to use lists.

**Don't worry too much if this doesn't make sense.  In a later Python lesson, we will discuss Pandas, which is different package that makes managing tabular data easy.  We are using this approach to get you comfortable with the skills you have learned so far.**

Run the code below, and you will see the data from the Brookings CSV file printed.

**NOTE: You must run this cell for the rest of project to work!**

In [3]:
import requests # This module lets us download things from the internet
import csv # This module allows us to work with CSV files

# This is the URL to the Brookings Institute website
url = "https://www.brookings.edu/wp-content/uploads/2017/01/vitalstats_ch3_tbl1.csv"

response = requests.get(url).content.decode('ascii') # Downloads the CSV from the URL
reader = list(csv.DictReader(response.splitlines())) # Converts the CSV into a list of dictionaries

for row in reader: # Iterates through each of the dictionaries in the list
    print(row)

OrderedDict([('Year', '2018'), ('Chamber', 'House'), ('NominalDollars', '2092822'), ('2018Dollars', '2092822')])
OrderedDict([('Year', '2016'), ('Chamber', 'House'), ('NominalDollars', '1516021'), ('2018Dollars', '1586134.926')])
OrderedDict([('Year', '2014'), ('Chamber', 'House'), ('NominalDollars', '1466533'), ('2018Dollars', '1555558.521')])
OrderedDict([('Year', '2012'), ('Chamber', 'House'), ('NominalDollars', '1596953'), ('2018Dollars', '1746587.79')])
OrderedDict([('Year', '2010'), ('Chamber', 'House'), ('NominalDollars', '1434760'), ('2018Dollars', '1652228.232')])
OrderedDict([('Year', '2008'), ('Chamber', 'House'), ('NominalDollars', '1362239.138'), ('2018Dollars', '1588773.882')])
OrderedDict([('Year', '2006'), ('Chamber', 'House'), ('NominalDollars', '1259791'), ('2018Dollars', '1569158.426')])
OrderedDict([('Year', '2004'), ('Chamber', 'House'), ('NominalDollars', '1038390.91'), ('2018Dollars', '1380345.295')])
OrderedDict([('Year', '2002'), ('Chamber', 'House'), ('Nominal

## Example
Before you delve into writing code to answer the questions below, we are going to give an example of how to process the above data.  You can use this example as a guide when creating your Python solutions.   

For this example, we are going to find the row with the largest `2018Dollars` value.  We are going to return the entire dictionary for simplicity's sake. 

In [5]:
def find_max_2018_dollars(data):
    largest = data[0] # We temporarily set largest to the first element in the list
    for row in data:
      # We check if the "row" 2018Dollars is larger than the one set in the "largest" variable
        if int(row['2018Dollars']) > int(largest['2018Dollars']):
            largest = row # If so, we set "largest" equal to "row"
    return largest

print(find_max_2018_dollars(reader))

KeyError: ignored

This function begins by setting the `largest` variable to the first row in our data set.  This essentially gives us a benchmark to compare future rows.  We then begin to iterate through each of the dictionaries in the `data` list variable.  Recall that `data` is just a list of dictionaries.  Each `row` is a dictionary with the following keys: `Year`, `Chamber`, `NominalDollars`, and `2018Dollars`.  With each new `row`, we compare it with the largest `2018Dollars` value we have seen so far.  

Notice how we have to wrap the `row['2018Dollars']` in `int()`.  This is to ensure we are doing a numeric comparison.  Otherwise, Python will do a alphabetic comparison since the `csv.DictReader()` read in each of the numerical values as a string (notice the apostrophes wrapping each of the values).  Consider the following:

In [None]:
print('9' > '10000')
print(int('9') > int('10000'))
print(9 > 10000)

We get this strange result, because "9" appears later than "1" in alphanumeric order.  However, if the numbers are not strings (no apostrophes), we get the value we expect.  The `int()` function converts a string containing a number into an actual number.

Returning to the example code, if we find a `2018Dollars` value that is larger, then we set the `largest` variable to the newly determined largest `2018Dollars` dictionary.  We continue this process until we run out of rows.  Once this happens, the `for` loop ends, and we return the `largest` dictionary.

Use the example as your guide, as each of the following problems can be solved with some slight changes to the example code.

## Exercise 1
For this problem, write a function that will go through each of the rows in the dataset and find the row with the lowest `2018Dollars` value.  Just as we did with the example, return the entire dictionary for that respective row.  The function declaration is provided to the you.  The `data` parameter will be a list of dictionaries for each of the corresponding values in the CSV file we loaded up earlier.

**Concepts: Lists, Dictionaries, Relational Operators, `for` Loops**

In [None]:
def find_min_2018_dollars(data):
    # Type your code here 
    
    
    
    
    
    
    
# IGNORE BELOW
# Test Cases
import unittest 
  
class TestCases(unittest.TestCase):  
    def test(self):
        smallest = {'Year': '1990', 'Chamber': 'House', 'NominalDollars': '423245', '2018Dollars': '777213'}
        self.assertTrue(find_min_2018_dollars(reader) == smallest)

        

if __name__ == '__main__': 
    unittest.main(argv=[''], exit=False)

## Exercise 2
Find which year has the highest `2018Dollars` values for the **House**.  Please just return the year, instead of the entire row.

This problem is a bit different compared to the example.  Instead of initializing the largest value seen so far to the first row, instead set it to 0.  The first row belongs to the Senate, so using it as the initial value will result in the wrong answer.  

**Hint: Add an `if` statement to check if a row is "House" in the "Chamber" key.**

In [None]:
def find_max_house(data):
    # Type your code here

    

    
    
    
    
    
    
# IGNORE BELOW
# Test Cases
import unittest 
  
class TestCases(unittest.TestCase):  
    def test(self):
        self.assertTrue(int(find_max_house(reader)) == 2012)
        

if __name__ == '__main__': 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

## Exercise 3
Which chamber of Congress on average spends more to win elections?  For this problem, you are going to find the average `2018Dollars` value across all the years of available data for each chamber separately.  You are going to return `chamber, average, difference`.  For instance, if you find that the House spends \$222,000 on average while the Senate spends \$200,000, then you would return `"House", 222,000, 22,000`.

1. Start by creating four variables to keep track the House sum, Senate sum, count of House rows, and count of Senate rows.  Each of these variables will be initialized to `0`.  When calculating your average, the sums will be the numerator and the row count will be the denominator.

2. Iterate through the rows of the data.  Check if the "Chamber" of each row is equal to either "Senate" or "House".  You can check if a row "Chamber" key is equal to either "House" or "Senate" by using an `if` statement and accessing the index using syntax similar to this, `name_of_dict['Chamber']`.  Depending on the chamber, add the "2018Dollars" value to the respective sum and increment the appropriate row counter.

3. After iterating through all the rows, create two new variables for the average "2018Dollars" for both the House and Senate.

4. Compare the two chambers' "2018Dollars" averages, and return the appropriate values (`chamber, average, difference`).  If the Senate average is greater then the `chamber` value would be "Senate".  If the House average is greater, then the `chamber` would be "House".

**Hint: Remember that you can return multiple values with `return value1, value2, value3`.**

In [None]:
def find_bigger_spender(data):
    # Type your code here 
    
    
    
    
    
    
    
    
    
# IGNORE BELOW
# Test Cases
import unittest 
  
class TestCases(unittest.TestCase):  
    def test(self):
        answer = ('Senate', 8111022.6875, 6908696.3125)
        self.assertTrue(find_bigger_spender(reader) == answer)

        

if __name__ == '__main__': 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)