Analyze This! Campaign Finance Analysis

Overview

The purpose of this analysis is to establish a functional relational database using bulk election data, build a machine learning model that is capable of accurately predicting election results, create dashboards accessible to the public, and explore the distribution of funding across parties, candidates, and the United States. We will then present the insights to a group of peers in a 12 minute presentation.

Outline

Communication Protocols

Team "Analyze This!" has discussed our communication protocols and agreed to the following:

We will be using Slack as our primary communication tool, using our own channel.
We will meet no less than twice a week over Zoom to discuss the progress on our project.
We will post a message into our Slack channel to discuss any emerging Pull Requests and to seek team approvals.
Communicate any difficult life circumstances that may prevent us from completing a task into Slack so that we may jump in as a team to assist.

Machine Learning Model

US Federal Campaign Finance data 1990-2016 https://www.kaggle.com/datasets/jeegarmaru/campaign-contributions-19902016

• Overview - Machine Learning will be applied to this data set to address our main problem statement: Predict a winner in a political race based on selected features from the data. We will be testing and comparing different models to see which performs the best in terms of accuracy and best fit.

MODELS:

- Supervised:
	•Random Forest Classifier
	•Logistic Regression
	•Neural Network (likely too complex for this situation).

- Un-Supervised:
	•KMEANS Clustering

PREPROCESSING: Cleaning and encoding categorical variables. Bucketing rare values may be necessary. PCA for Clustering. Scaling/Standardizing. Joining Candidates to pacs and individual_contributions tables as to include features from multiple tables. It's also worth mentioning that we may decide to select only the most recent years, 2000-2016, due to inflation devaluing money and including most relevant data.

FEATURES: c.party • c.dist_id_run_for • c.CRPICO • c.NOPACS • c.raised_from_pacs • c.raised_from_individuals • c.raised_total • c.raised_unitemized • p.pacid • p.Amount • p.type

TARGET: c.result - 1:W, 0:L

RESULT: Hopefully, our supervised models will provide us with accurate predictions for election results. Our un-supervised approach may help us group candidates based on fiscal activity and help better understand the politics and power play.

Database

In this project we will be utilizing a data set of Kaggle that contains data from campaign finance data starting in 1990. The data is originally sourced from the website OpenSecrets, a reputable nonpartisan, independent, and nonprofit organization that has been in operation since 1996. We have access to a handful of different data sets through Kaggle including candidates, backer information, committee information, pac information and more that we will be tying together via common keys with SQL. We will accomplish this through inner joins, and create new tables with the information needed, so that we can analyze the election results based on financial support.

We took this data and cleaned it by removing duplicate rows, nulls, converting date to datetime format and selected only relevant columns.

We used Postgres SQL to create the connections through out are data and put together a cohesive and clear picture. When connecting the tables we noticed that the candidate ID was not unique enough because many candidates ran in more than one election cycle and received donations from more than one place which caused a many to many relationship. In order to create a one to many relationship and properly attribute the donations to the correct election we created a new key. This key was a combination of both the candidate ID and the year that they ran. This allowed us to ensure that the correct contributions were being matched to the correct year and the data was being correctly portrayed.

SQL Database

Entity Relationship Diagram

PgAdmin4 - Postgres Table

Machine Learning Model - Random Forest Classifier

Feature Selection

Encode categorical variables with OneHotEncoder()

Split data to train/test groups

Standardize Data with StandardScaler()

Train/Test - analyze accuracy score

Predict 2022 Election

Raw Data
Data downloaded from Federal Election Commision.

Columns were selected, formatted, cleaned, and combined to calculate columns resembling the structure of training data then preprocessed and fed to RFC model.

Prediction Results - first 5 rows

Number of Wins vs. Losses

Seat Requirement Differentials
- difference in expected seats and predicted winning seats

Adjusted to fulfill State seat requirements
68 candidates were adjusted - 40 from L to W (with highest raised_total), 28 from W to L (with lowest raised_total) in order to attempt satisfying seat requirements. An improvement overall but not exact. Full list of adjusted politicians in ML_modeling_v1.ipynb

Seat Requirement Differentials - after adjustment

KMeans Clustering

Feature Selection - added State

Dimensionality Reduction - Principal Component Analysis (PCA)

Elbow Curve, n=5

Clustering 3D Graph

Web Page - Link

Page 1 - Prediction Results + 2022 Candidate Dashboard

Page 2 - Past Data Dashboard

Conclusion

Our initial hypothesis was that funding sources would influence election outcomes. What we found was that indeed it does, however, incumbency status was perhaps the greatest indicator for a successful campaign. All in all this was an insightful analysis on Election data. With a 93% accuracy score, we can be pretty confident in the predictions. Our dashboards indicate the distribution of funding across the United States, each party's presence in the races and their resources, how the distribution of capital disseminates over time, and more. We invite you to explore our website, peer into the crystal ball, and return after the votes are counted to compare our predictions with reality. Will your candidate win? Only time will tell...

Contributors

Lora Leonida
Neekoh Tablate
Ryan Knauff
Cayli Swartz
Marshall Miley

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
2022 predictions		2022 predictions
Machine Learning		Machine Learning
Past Data		Past Data
Resources		Resources
SQL		SQL
python scripts		python scripts
web app		web app
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyze This! Campaign Finance Analysis

Overview

Outline

Communication Protocols

Machine Learning Model

Database

SQL Database

Machine Learning Model - Random Forest Classifier

Predict 2022 Election

KMeans Clustering

Web Page - Link

Conclusion

Contributors

About

Releases

Packages

Languages

loraleonida/Campaign-Finance-Analysis

Folders and files

Latest commit

History

Repository files navigation

Analyze This! Campaign Finance Analysis

Overview

Outline

Communication Protocols

Machine Learning Model

Database

SQL Database

Machine Learning Model - Random Forest Classifier

Predict 2022 Election

KMeans Clustering

Web Page - Link

Conclusion

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages