# An Analysis of Political Contributions During the 2020 House of Representatives Election

In this part, you will obtain as much data as you can on the campaign contributions received by each candidate. This data is avaiable through the website https://www.opensecrets.org/.


In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from io import StringIO
from IPython.core.display import HTML
import tqdm


### Part 1: Data Gathering
1. Start by acquiring the data from Tennessee's 7th District, which is available at https://www.opensecrets.org/races/summary?cycle=2020&id=TN07&spec=N. If you click the "Download .csv file", you can get a csv for this district. However, we don't want to have to click this button across all districts. Instead, we'll use Python to help automate this process. Start by sending a get request to the download button URL, https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07. Convert the result to a DataFrame.


In [2]:
URL = 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07'
response = requests.get(URL)
if response.status_code == requests.codes.ok:
    soup = BeautifulSoup(response.text, features="html.parser")
else:
    response.raise_for_status()
TN_district7 = pd.read_csv(StringIO(response.text), sep=',')

2. Once you have working code for Tennessee's 7th District, expand on your code to capture all of Tennessee's districts into a single DataFrame. Make sure that you can distinguish which district each result came from. Export the results to a csv file.


In [3]:
URL = 'https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations'
states_df = pd.read_html(URL)[1]
states_df.columns = states_df.columns.map(lambda x: x[1])
states_df = (
    states_df
    .reset_index()
    .drop(columns = ['index', 'Status of region', 'Unnamed: 2_level_1', 'Unnamed: 4_level_1', 'Unnamed: 5_level_1', 'Unnamed: 6_level_1', 'GPO', 'AP', 'Other abbreviations'])
    .dropna()
    .rename(columns = {'Name': 'State', 'Unnamed: 3_level_1': 'Abbreviation'})
    .drop(0).reset_index(drop=True)
)
states_abr_dict = states_df.set_index('State')['Abbreviation'].to_dict()
# states_abr_dict

3. Once you have working code for all of Tennessee's districts, expand on it to capture all states and districts. The number of districts for each state can be found at https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections. You may also find the table of state abbreviations here helpful: https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations. Export a csv file for each state.


In [4]:
URL = 'https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections'
response = requests.get(URL)
if response.status_code == requests.codes.ok:
    soup = BeautifulSoup(response.text, features="html.parser")
else:
    response.raise_for_status()

In [5]:
tables_html = str(soup.find_all('table', attrs={'class' : 'wikitable'}))
all_states_df = pd.read_html(StringIO(str(tables_html)))[1].fillna('-')
all_states_df.columns = all_states_df.columns.map(lambda x: x[1])
all_states_df = all_states_df.drop(columns=['Seats', 'Change'])

In [6]:
URL = 'https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations'
states_df = pd.read_html(URL)[1]
states_df.columns = states_df.columns.map(lambda x: x[1])
states_df = (
    states_df
    .reset_index()
    .drop(columns = ['index', 'Status of region', 'Unnamed: 2_level_1', 'Unnamed: 4_level_1', 'Unnamed: 5_level_1', 'Unnamed: 6_level_1', 'GPO', 'AP', 'Other abbreviations'])
    .dropna()
    .rename(columns = {'Name': 'State', 'Unnamed: 3_level_1': 'Abbreviation'})
    .drop(0).reset_index(drop=True)
)

In [7]:
state_representatives_df = pd.merge(left=all_states_df, right=states_df, on='State')
# state_representatives_df.head(3)

4. Finally, combine all of the data you've gathered together into a single DataFrame.

In [8]:
# functions moved to data_pull.py
import data_pull

In [9]:
data_pull.retrieve_2020_state_district_data('arizind', 2)

No state by this name. Assuming you meant Arizona.


Unnamed: 0,State_Abbreviation,District,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,...,Result,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote
0,AZ,2,N00029260,Ann Kirkpatrick (D),1849861.51,1384764.75,690001.0,1182656.78,0.0,-22796.27,...,W,I,Arizona,,,2020-08-04 00:00:00 +0000,AZ02,0,1,N
1,AZ,2,N00042605,Brandon Martin (R),374820.55,381067.07,6000.0,374040.55,0.0,-5220.0,...,L,C,Arizona,,,2020-08-04 00:00:00 +0000,,0,2,N
2,AZ,2,N00044751,Iman-Utopia Layjou Bah (I),0.0,0.0,0.0,0.0,0.0,0.0,...,,C,Arizona,,,2020-08-04 00:00:00 +0000,,0,2,N


In [10]:
# data_pull.get_all_data(state_representatives_df)#.to_csv('all_states_reps.csv', index=False)