<h1>Part 1: Data Gathering</h1>

1. Start by acquiring the data from Tennessee's 7th District, which is available at https://www.opensecrets.org/races/summary?cycle=2020&id=TN07&spec=N. If you click the "Download .csv file", you can get a csv for this district. However, we don't want to have to click this button across all districts. Instead, we'll use Python to help automate this process. Start by sending a get request to the download button URL, https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07. Convert the result to a DataFrame.

In [3]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import io
from io import StringIO

In [4]:
URL = 'https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN07'
response = requests.get(URL)
soup = BeautifulSoup(response.text, features="html.parser")
csv_file = io.StringIO(soup.prettify())
df = pd.read_csv(csv_file)
pd.DataFrame(df)

Unnamed: 0,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,EndCash,LgIndivs,...,Result,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote
0,N00041873,Mark Green (R),1194960.47,935486.67,171900.0,819151.42,0.0,203909.05,287888.55,819151.42,...,W,I,Tennessee,,,2020-08-06 00:00:00 +0000,TN07,0,1,N
1,N00045536,Kiran Sreepada (D),206644.28,207190.98,4000.0,202644.28,0.0,0.0,0.0,179129.75,...,L,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N
2,N00047077,Ronald Brown (I),1750.0,0.0,0.0,1750.0,0.0,0.0,9006.0,300.0,...,L,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N
3,N00046592,Scott Vieira Jr (I),655.47,1048.51,10.0,45.0,35.0,565.47,-196.52,0.0,...,L,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N
4,N00045535,Benjamin Estes (3),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,C,Tennessee,,,2020-08-06 00:00:00 +0000,,0,2,N


2. Once you have working code for Tennessee's 7th District, expand on your code to capture all of Tennessee's districts into a single DataFrame. Make sure that you can distinguish which district each result came from. Export the results to a csv file.

In [6]:
base_url = "https://www.opensecrets.org/races/summary?cycle=2020&id=TN{}&spec=N"

district_links = []

for district in range(1, 10):
    district_code = f"{district:02d}"
    url = base_url.format(district_code)
    district_links.append(url)


for link in district_links:
    print(link)

https://www.opensecrets.org/races/summary?cycle=2020&id=TN01&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN02&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN03&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN04&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN05&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN06&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN07&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN08&spec=N
https://www.opensecrets.org/races/summary?cycle=2020&id=TN09&spec=N


In [7]:
base_url = "https://www.opensecrets.org/races/summary.csv?cycle=2020&id=TN{}"

district_data = []

for district in range(1, 10):
    district_code = f"{district:02d}"
    url = base_url.format(district_code)
    
    response = requests.get(url)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, features="html.parser")
        
        csv_file = io.StringIO(soup.prettify())
        
        df = pd.read_csv(csv_file)
        
        df['District'] = district_code
        
        district_data.append(df)
    else:
        print(f"Failed to retrieve data for district {district_code} from {url}")

final_df = pd.concat(district_data, ignore_index=True)

3. Once you have working code for all of Tennessee's districts, expand on it to capture all states and districts. The number of districts for each state can be found at https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections. You may also find the table of state abbreviations here helpful: https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations. Export a csv file for each state.

 4. Finally, combine all of the data you've gathered together into a single DataFrame.

In [18]:
base_url = "https://www.opensecrets.org/races/summary.csv?cycle=2020&id={}"
states_and_districts = {
    "AL": 7, "AK": 1, "AZ": 9, "AR": 4, "CA": 53, "CO": 7, "CT": 5, "DE": 1,
    "FL": 27, "GA": 14, "HI": 2, "ID": 2, "IL": 18, "IN": 9, "IA": 4, "KS": 4,
    "KY": 6, "LA": 6, "ME": 2, "MD": 8, "MA": 9, "MI": 14, "MN": 8, "MS": 4,
    "MO": 8, "MT": 1, "NE": 3, "NV": 4, "NH": 2, "NJ": 12, "NM": 3, "NY": 27,
    "NC": 13, "ND": 1, "OH": 16, "OK": 5, "OR": 5, "PA": 18, "RI": 2, "SC": 7,
    "SD": 1, "TN": 9, "TX": 36, "UT": 4, "VT": 1, "VA": 11, "WA": 10, "WV": 3,
    "WI": 8, "WY": 1
}

all_links = []
for state, num_districts in states_and_districts.items():
    for district in range(1, num_districts + 1):
        url = base_url.format(f"{state}{district:02d}")
        all_links.append((url, state, district))

final_df = [] 
error_links = []
for link, state, district in all_links:
    try:
        response = requests.get(link)

        if response.status_code == 200:
            soup = BeautifulSoup(response.text, features="html.parser")
            csv_file = io.StringIO(soup.prettify())
            df = pd.read_csv(csv_file)
            df['District'] = f"{district:02d}"
            df['State_Abv'] = state
            final_df.append(df)

            print(f"Info for {link}.")
            print(df.head())

        else:
            print(f"Failure to complete for {link}. Status Code: {response.status_code}")
            error_links.append(link)

    except Exception as e:
        print(f"Error for {link}: {e}")
        error_links.append(link)

full_df = pd.concat(final_df, ignore_index=True)    
full_df.to_csv('full_df.csv', index=False)
pd.read_csv('full_df.csv')

Info for https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL01.
         cid          FirstLastP       Rcpts       Spent      PACs  \
0  N00044245      Jerry Carl (R)  1971321.50  1859348.91  387000.0   
1  N00044750  James Averhart (D)    80094.95    78973.24       0.0   

       Indivs      Cand      Other    EndCash   LgIndivs  ...    State IncCID  \
0  1044195.95  434655.5  105470.05  111972.59  999616.34  ...  Alabama    NaN   
1    50849.95   29245.0       0.00    1121.71   37954.77  ...  Alabama    NaN   

  Incumbent                primarydate DistIDCurr capeye sort  SmLgIndivsNote  \
0       NaN  2020-03-03 00:00:00 +0000                 0    2               N   
1       NaN  2020-03-03 00:00:00 +0000                 0    2               N   

   District State_Abv  
0        01        AL  
1        01        AL  

[2 rows x 26 columns]
Info for https://www.opensecrets.org/races/summary.csv?cycle=2020&id=AL02.
         cid               FirstLastP      Rcpts      Sp

Unnamed: 0,cid,FirstLastP,Rcpts,Spent,PACs,Indivs,Cand,Other,EndCash,LgIndivs,...,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote,District,State_Abv
0,N00044245,Jerry Carl (R),1971321.50,1859348.91,387000.00,1044195.95,434655.50,105470.05,111972.59,999616.34,...,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N,1,AL
1,N00044750,James Averhart (D),80094.95,78973.24,0.00,50849.95,29245.00,0.00,1121.71,37954.77,...,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N,1,AL
2,N00041295,Barry Moore (R),650806.75,669367.70,230281.65,408536.20,11500.00,488.90,-13633.28,346328.65,...,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N,2,AL
3,N00045944,Phyllis Harvey-Hall (D),56049.68,55988.07,2032.00,42411.95,10575.41,1030.32,0.00,27105.15,...,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N,2,AL
4,N00045631,John Page (L),0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N,2,AL
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1259,N00035504,Liz Cheney (R),3003883.34,3060166.78,1292490.00,1169995.46,0.00,541397.88,153567.15,980348.72,...,Wyoming,,,2020-08-18 00:00:00 +0000,WY01,0,1,N,1,WY
1260,N00047272,Lynnette Grey Bull (D),134597.32,132234.75,2800.00,130197.32,0.00,1600.00,2362.57,65975.00,...,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N,1,WY
1261,N00047207,Zoilo Adalia (3),0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N,1,WY
1262,N00035139,Richard Brubaker (L),0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N,1,WY
