# An Analysis of Political Contributions During the 2020 House of Representatives Election

In this part, you will obtain as much data as you can on the campaign contributions received by each candidate. This data is available through the website https://www.opensecrets.org/. At the end of the project, your group will give a presentation of your findings.

1. Start by scraping the data from the summary page for Tennessee's 2nd District, which is available at https://www.opensecrets.org/races/summary?cycle=2020&id=TN02&spec=N.
    * The data that we want is contained in the "Total Raised and Spent" table.
    * Make a DataFrame showing, for each candidate:
        * the candidate's name
        * the candidate's party
        * state
        * district number
        * whether the candidate was an incumbent
        * whether the candidate won the race
        * the total amount raised by that candidate (as a numeric variable)
        * the total amount spent by the candidate (as a numeric variable)

In [3]:
import requests
import pandas as pd
from bs4 import BeautifulSoup as BS
import re

In [5]:
URL = 'https://www.opensecrets.org/races/summary?cycle=2020&id=TN02&spec=N'

def political_df_creation(URL):
    response = requests.get(URL)
    soup = BS(response.text)
    
    district = pd.read_html(str(soup.find('table', attrs = {'class' : 'DataTable'})))[0]
    district['Name'] = (
        district['Candidate'].\
        str.extract(r'([A-Za-z]+\s+[A-Za-z]+)\s', expand = True)
    )
    district['Party'] = (
        district['Candidate'].\
        str.extract(r'(\([A-Z]\))', expand = True)
    )
    incumbent = re.compile(r'\s(Incumbent)')
    district['Incumbent'] = ''
    for i in district['Candidate']:
        if incumbent.search(i):
            district['Incumbent'].loc[district['Candidate'] == i] = True
        else:
            district['Incumbent'].loc[district['Candidate'] == i] = False

    winner = re.compile(r'\s(Winner)')
    district['Winner'] = ''
    for i in district['Candidate']:
        if winner.search(i):
            district['Winner'].loc[district['Candidate'] == i] = True
        else:
            district['Winner'].loc[district['Candidate'] == i] = False
        
    district = (
        district.drop(columns = ['Candidate', 'Cash on Hand', 'Last Report'])
    )

    district['Raised'] = (
        district['Raised'].\
        str.replace(',', '', regex = False).\
        str.replace('$', '', regex = False).astype('int64')
    )
    district['Spent'] = (
        district['Spent'].\
        str.replace(',', '', regex = False).\
        str.replace('$', '', regex = False).astype('int64')
    )

    homelink = str(soup.find('link', href=True))
    state = re.compile(r'id=([A-Z]+)\d*')
    state = state.search(homelink)
    district['State'] = state.group(1)

    district_number = re.compile(r'id=[A-Z]+(\d*)')
    district_number = district_number.search(homelink)
    district['District'] = district_number.group(1)

    cols = district.columns.tolist()
    cols = cols[2:len(cols)+1] + cols [0:2]
    district = district[cols]
    return district


Unnamed: 0,Name,Party,Incumbent,Winner,State,District,Raised,Spent
0,Tim Burchett,(R),True,True,TN,2,1336276,878488
1,Renee Hoyos,(D),False,False,TN,2,812784,816793


2. Once you have working code for Tennessee's 2nd District, expand on your code to capture all of Tennessee's districts.

In [15]:
for i in range(1,10):
    district = (
        political_df_creation(f'https://www.opensecrets.org/races/summary?cycle=2020&id=TN0{i}&spec=N')
    )
    if i == 1:
        all_districts = district
    else:
        all_districts = all_districts.merge(district, how = 'outer')
all_districts

Unnamed: 0,Name,Party,Incumbent,Winner,State,District,Raised,Spent
0,Diana Harshbarger,(R),False,True,TN,1,2126946,1869100
1,Blair Nicole,(D),False,False,TN,1,140209,134995
2,Tim Burchett,(R),True,True,TN,2,1336276,878488
3,Renee Hoyos,(D),False,False,TN,2,812784,816793
4,Chuck Fleischmann,(R),True,True,TN,3,1051653,381411
5,Meg Gorman,(D),False,False,TN,3,85843,77760
6,Scott Desjarlais,(R),True,True,TN,4,331464,392499
7,Christopher Hale,(D),False,False,TN,4,308731,302996
8,Jim Cooper,(D),True,True,TN,5,936569,1332131
9,John Rose,(R),True,True,TN,6,1050429,625688


3. Once you have working code for all of Tennessee's districts, expand on it to capture all states and districts. The number of representatives each state has can be found in a table on this page: https://www.britannica.com/topic/United-States-House-of-Representatives-Seats-by-State-1787120

4. Using your scraped data, investigates different relationships between candidates and the amount of money they raised. Here are some suggestions to get you started, but feel free to pose you own questions or do additional exploration:  
    a. How often does the candidate who raised more money win a race?  
    b. How often does the candidate who spent more money win a race?  
    c. Does the difference between either money raised or money spent seem to influence the likelihood of a candidate winning a race?  
    d. How often does the incumbent candidate win a race?  
    e. Can you detect any relationship between amount of money raised and the incumbent status of a candidate?

### Bonus Questions:

If you complete all of the above, you can attempt these challenging bonus questions.

Open Secrets also gives a detailed breakdown of contributions by source. For example, for Tennessee's second district, this is located at https://www.opensecrets.org/races/candidates?cycle=2020&id=TN02&spec=N

Scrape these pages to get information on contributions by source. See if you can find anything interesting in terms of the source of contributions. Some examples to get you started:
* What does the overall distribution of funding sources look like?
* Is there any detectable difference in contribution sources between Democrat and Republican candidates?
* Do the funding sources for either the winning candidate or incumbent candidate differ from the other candidates?