# An Analysis of Political Contributions During the 2020 House of Representatives Election

In this part, you will obtain as much data as you can on the campaign contributions received by each candidate. This data is avaiable through the website https://www.opensecrets.org/.

### Part 1: Data Gathering

#### 1. Start by acquiring the data from Tennessee's 7th District, which is available at https://www.opensecrets.org/races/summary?cycle=2020&id=TN07&spec=N. If you click the "Download .csv file", you can get a csv for this district. However, we don't want to have to click this button across all districts. Instead, we'll use Python to help automate this process. Start by sending a get request to the download button URL, https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07. Convert the result to a DataFrame.

In [57]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import io
from requests.exceptions import HTTPError
from IPython.core.display import HTML

In [59]:
url = 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07'

#Lets use try-except whenever we make any any http request.

#If we invoke .raise_for_status(), then Requests will raise an HTTPError for status codes between 400 and 600. 
#If the status code indicates a successful request, then the program will proceed without raising that exception.

try:
    response = requests.get(url)
    response.raise_for_status()
except HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except Exception as err:
    print(f"Other error occurred: {err}")
else:
    data = response.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

#Let's populate the district ID column with TN07 so that we could use it later
df['DistIDCurrTN']='TN07'
df.head(2)


Unnamed: 0,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,EndCash,LgIndivs,...,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote,DistIDCurrTN
0,N00041873,Mark Green (R),1194960.47,935486.67,171900.0,819151.42,0.0,203909.05,287888.55,819151.42,...,I,Tennessee,,,2020-08-06 00:00:00 +0000,TN07,0,1,N,TN07
1,N00045536,Kiran Sreepada (D),206644.28,207190.98,4000.0,202644.28,0.0,0.0,0.0,179129.75,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN07


#### 2. Once you have working code for Tennessee's 7th District, expand on your code to capture all of Tennessee's districts into a single DataFrame. Make sure that you can distinguish which district each result came from. Export the results to a csv file.


In [62]:
#Let's define function to get the dataFrame for district

def getDistrictData(State_n_DistrictCode):
    url = f'https://www.opensecrets.org/races/summary.csv?cycle=2020&id={State_n_DistrictCode}'

    #Let's use try-except whenever we make any any http request.
    #If we invoke .raise_for_status(), then Requests will raise an HTTPError for status codes between 400 and 600. 
    #If the status code indicates a successful request, then the program will proceed without raising that exception.

    try:
        response = requests.get(url)
        response.raise_for_status()
    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
    else:
        data = response.content.decode('utf8')
        df = pd.read_csv(io.StringIO(data))

    #Let's populate the district ID column with TN07 so that we could use it later
    df['DistIDCurrTN'] = State_n_DistrictCode
    return df


#Let's define dictionary with state and district code.
districtList = ['01','02','03','04','05','06','07','08','09']
StateDistricts = {
    "state":'TN',
    "Districts": districtList
}

#let's define the Data Frame list which we can use later to concatenate all district data
frameList=[]

for i in range(0,len(StateDistricts['Districts'])):
    state_n_district_code = f'{StateDistricts["state"]}{StateDistricts["Districts"][i]}'
    frameList.append(getDistrictData(state_n_district_code))

StateDistrictframe = pd.concat(frameList)
StateDistrictframe

Unnamed: 0,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,EndCash,LgIndivs,...,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote,DistIDCurrTN
0,N00046688,Diana Harshbarger (R),2126945.6,1869099.77,222800.0,359728.5,1461293.0,83124.1,257845.83,315489.1,...,O,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN01
1,N00046686,Blair Nicole Walsingham (D),140209.14,134994.55,1520.0,138689.14,0.0,0.0,5214.59,70085.2,...,O,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN01
2,N00047760,Steve Holder (I),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,O,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN01
0,N00041594,Tim Burchett (R),1336275.75,878487.63,269535.0,1072845.61,0.0,-6104.86,593677.72,729831.26,...,I,Tennessee,,,2020-08-06 00:00:00 +0000,TN02,0,1,N,TN02
1,N00041699,Renee Hoyos (D),812783.86,816793.15,3100.0,807459.01,0.0,2224.85,209.82,807459.01,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN02
2,N00047761,Matthew Campbell (I),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN02
0,N00030815,Chuck Fleischmann (R),1051653.39,381411.2,453858.46,603344.93,0.0,-5550.0,1880341.32,599059.93,...,I,Tennessee,,,2020-08-06 00:00:00 +0000,TN03,0,1,N,TN03
1,N00046911,Meg Gorman (D),85843.21,77759.83,2671.6,81271.61,2000.0,-100.0,8083.38,50245.2,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN03
2,N00046589,Nancy Baxley (I),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN03
3,N00047762,Amber Hysell (I),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N,TN03


### 3. Once you have working code for all of Tennessee's districts, expand on it to capture all states and districts. 
    The number of districts for each state can be found at https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections. You may also find the table of state abbreviations here helpful: https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations. Export a csv file for each state.

Gathering Lists of US State Abbreviations and Number of Districts for Each State:

In [77]:
# https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections
# Number of representatives for each state
# scrape wiki page - data strings from there and use those strings to interpolate the url through 

wiki_rep_url = 'https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections'

r = requests.get(wiki_rep_url)
wiki_rep_soup = BeautifulSoup(r.text, features="html.parser")

table_html_rep_wiki = str(wiki_rep_soup.find('table', attrs={'class':['wikitable', 'sortable jquery-tablesorter'], 'style':'text-align:center'}))
HTML(table_html_rep_wiki)

# Coverted the html table to a df
wiki_rep_df = pd.read_html(io.StringIO(str(table_html_rep_wiki)))[0]
wiki_rep_df

# renamed columns 
wiki_rep_df_2Columns = wiki_rep_df[['State', 'Total seats']]
wiki_rep_df_2Columns

# flattened double header column titles
wiki_rep_df_2Columns_SingleLevel = wiki_rep_df_2Columns.to_csv(header=None,index=False)
wiki_rep_df_2Columns_SingleLevel_df = pd.read_csv(io.StringIO(wiki_rep_df_2Columns_SingleLevel), names=['US State', 'Number of Districts'])
wiki_rep_df_2Columns_SingleLevel_df.head(2)

Unnamed: 0,US State,Number of Districts
0,Alabama,7
1,Alaska,1


In [83]:
# State Abbreviations for 50 United States
# scraped page 
StateAbbrev_url = 'https://www.worldatlas.com/geography/usa-states.html'

r = requests.get(StateAbbrev_url)
StateAbbrev_soup = BeautifulSoup(r.text, features="html.parser")

table_html_StateAbbrev = str(StateAbbrev_soup.find('table'))
HTML(table_html_StateAbbrev)

# converting html table into a pandas df
StateAbbrev_df = pd.read_html(io.StringIO(str(table_html_StateAbbrev)))[0]
StateAbbrev_df.head(2)

Unnamed: 0,US State,Abbreviation
0,Alabama,AL
1,Alaska,AK


In [85]:
# Merged state abbreviation with the number of representatives for each state in 2020

Merged_StateAbbrev_n_DistrictCode_df = pd.merge(StateAbbrev_df, wiki_rep_df_2Columns_SingleLevel_df, on = "US State", how = "inner").drop(['US State'], axis=1)
dist_num = Merged_StateAbbrev_n_DistrictCode_df
dist_num.head(2)

Unnamed: 0,Abbreviation,Number of Districts
0,AL,7
1,AK,1


In [95]:
# creating the state abbreviation series
st_abbr = Merged_StateAbbrev_n_DistrictCode_df['Abbreviation']

# creating the representative number series
dist_num = Merged_StateAbbrev_n_DistrictCode_df['Number of Districts'] 

# defining base of the url
open_secrets_csv_url_base = f'https://www.opensecrets.org/races/summary.csv?cycle=2020&id='

# creating list containers
urls_st = []
dist_num_container = []

# logic to create the urls for each csv
for i in range(0, len(st_abbr)):    
    for d in range(dist_num[i], 0, -1):
        urls_st.append(open_secrets_csv_url_base + st_abbr[i] + str(d).zfill(2)) 
    
urls_st[:6]

['https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL07',
 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL06',
 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL05',
 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL04',
 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL03',
 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL02']

In [99]:
def getDistrictData(Full_URL):
    url = Full_URL

    #Let's use try-except whenever we make any http request.
    #If we invoke .raise_for_status(), then Requests will raise an HTTPError for status codes between 400 and 600. 
    #If the status code indicates a successful request, then the program will proceed without raising that exception.

    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status()
    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
    else:
        all_data = response.content.decode('utf8')
        district_df = pd.read_csv(io.StringIO(all_data))

    #Let's populate the district ID column example -  TN07 so that we could use it later
    district_df['DistIDCurr'] = Full_URL[-4:]
    return district_df
    
#let's define the Data Frame list which we can use later to concatenate all district data
frameList=[]
stateList=[]
for i in range(0, len(urls_st)):
    Full_URL_iteration = urls_st[i]  #StateDistricts["Districts"][i]
    stateDisticts_df = getDistrictData(Full_URL_iteration)
    frameList.append(stateDisticts_df)
    stateList.append(stateDisticts_df)
    if Full_URL_iteration[-2:] == '01': #logic to split files.
        StateDistrictframe = pd.concat(stateList, ignore_index=True)
        #make sure you have data directory created in your current folder path
        StateDistrictframe.to_csv(f'..\/data\/{Full_URL_iteration[-4:-2]}.csv', index=False)
        stateList.clear()

  StateDistrictframe.to_csv(f'..\/data\/{Full_URL_iteration[-4:-2]}.csv', index=False)


### 4. Finally, combine all of the data you've gathered together into a single DataFrame.

In [100]:
finalFrame = pd.concat(frameList, ignore_index=True)
finalFrame.head()

Unnamed: 0,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,EndCash,LgIndivs,...,Result,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote
0,N00030622,Terri Sewell (D),2168165.01,1495957.14,1760802.74,407562.27,0.0,-200.0,2243480.25,379899.0,...,W,I,Alabama,,,2020-03-03 00:00:00 +0000,AL07,0,1,N
1,N00035691,Gary Palmer (R),907218.78,909082.2,397600.0,469170.0,0.0,40448.78,370687.86,468075.0,...,W,I,Alabama,,,2020-03-03 00:00:00 +0000,AL06,0,1,N
2,N00043975,Kaynen Pellegrino (I),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,C,Alabama,,,2020-03-03 00:00:00 +0000,AL06,0,2,N
3,N00030910,Mo Brooks (R),655364.8,210045.13,250020.0,417354.83,0.0,-12010.03,1137501.18,395551.0,...,W,I,Alabama,,,2020-03-03 00:00:00 +0000,AL05,0,1,N
4,N00003028,Robert B Aderholt (R),1255076.11,1323812.08,739300.0,541121.78,0.0,-25345.67,647004.39,479663.0,...,W,I,Alabama,,,2020-03-03 00:00:00 +0000,AL04,0,1,N
