# An Analysis of Political Contributions During the 2020 House of Representatives Election, Part 2

Now, you'll take the data that you gathered and analyze it, presenting your results at the end.


In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from io import StringIO
from IPython.core.display import HTML
import tqdm


### Part 2: Exploratory Data Analysis
Using your scraped data, investigates different relationships between candidates and the amount of money they raised. Here are some suggestions to get you started, but feel free to pose you own questions or do additional exploration: 

 
    a. How often does the candidate who raised more money win a race? 

In [83]:
all_df = pd.DataFrame(pd.read_csv('../open-secrets-bayesian-butterfingers/Data/All_2020_Election_Data.csv'))
all_df

Unnamed: 0,State_Abbreviation,District,cid,FirstLast,Party,Rcpts,Spent,PACs,Indivs,Cand,...,Result,CRPICO,State,IncCID,Incumbent,primarydate,DistIDCurr,capeye,sort,SmLgIndivsNote
0,AL,1,N00044245,Jerry Carl,R,1971321.50,1859348.91,387000.00,1044195.95,434655.50,...,W,O,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N
1,AL,1,N00044750,James Averhart,D,80094.95,78973.24,0.00,50849.95,29245.00,...,L,O,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N
2,AL,2,N00041295,Barry Moore,R,650806.75,669367.70,230281.65,408536.20,11500.00,...,W,O,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N
3,AL,2,N00045944,Phyllis Harvey-Hall,D,56049.68,55988.07,2032.00,42411.95,10575.41,...,L,O,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N
4,AL,2,N00045631,John Page,L,0.00,0.00,0.00,0.00,0.00,...,,O,Alabama,,,2020-03-03 00:00:00 +0000,,0,2,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1259,WY,1,N00035504,Liz Cheney,R,3003883.34,3060166.78,1292490.00,1169995.46,0.00,...,W,I,Wyoming,,,2020-08-18 00:00:00 +0000,WY01,0,1,N
1260,WY,1,N00047272,Lynnette Grey Bull,D,134597.32,132234.75,2800.00,130197.32,0.00,...,L,C,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N
1261,WY,1,N00047207,Zoilo Adalia,3,0.00,0.00,0.00,0.00,0.00,...,,C,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N
1262,WY,1,N00035139,Richard Brubaker,L,0.00,0.00,0.00,0.00,0.00,...,,C,Wyoming,,,2020-08-18 00:00:00 +0000,,0,2,N


In [78]:
win_lose_dict = {'W':True, 'L':False}
highest_raisers = all_df.loc[all_df.groupby(['State', 'District'])['Rcpts'].idxmax()]
wins = highest_raisers['Result'].map(win_lose_dict).sum()
win_pct = wins/len(highest_raisers['District'])*100
print(f'The candidate who raised the most money won {win_pct:.2f}% of the time')

The candidate who raised the most money won 88.28% of the time


 
    b. How often does the candidate who spent more money win a race? 

In [47]:
all_df = pd.DataFrame(pd.read_csv('../open-secrets-bayesian-butterfingers/Data/All_2020_Election_Data.csv'))

In [79]:
highest_spenders = all_df.loc[all_df.groupby(['State', 'District'])['Spent'].idxmax()]
wins = highest_spenders['Result'].map(win_lose_dict).sum()
win_pct = wins/len(highest_spenders['District'])*100
print(f'The candidate who spent the most money won {win_pct:.2f}% of the time')

The candidate who spent the most money won 87.59% of the time


 
    c. Does the difference between either money raised or money spent seem to influence the likelihood of a candidate winning a race? 

In [82]:
highest_diffs = all_df.loc[all_df.groupby(['State', 'District'])['EndCash'].idxmax()]
wins = highest_diffs['Result'].map(win_lose_dict).sum()
win_pct = wins/len(highest_diffs['District'])*100
print(f'The candidate who had the highest Endcash {win_pct:.2f}% of the time')

The candidate who had the highest Endcash 94.25% of the time


 
    d. How often does the incumbent candidate win a race? 

 
    e. Can you detect any relationship between amount of money raised and the incumbent status of a candidate?



### Part 3: Statistical Modeling
Fit a logistic regression model to see if the amount spent has a statistically significant impact on the probability of winning an election.  
Feel free to brainstorm ways to set up your model, but a suggestion to get started would be to calculate, for each candidate, the percentage of to total amount spent in their race that was spent by them and use this as your predictor variable of interest. Hint: you may find the `transform` method (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transform.html) in combination with `groupby` useful to find the total spending by race.  
Don't forget to include the incumbent variable in your model.  
After fitting your model, interpret the meaning of the coefficients you get.  




## Deliverable

Prepare a 10-12 minute presentation of your findings. This presentation should focus on the exploratory analysis and statistical modeling portions of this project and not on the webscraping components. Thus, you should not include any code in your presenation. Your presentation should be done using PowerPoint/Google Slides or other presentation software.