This project is a part of the Data Science Working Group at Code for San Francisco in collaboration with CSUMB Computer Science students who are completing their capstone project. Other DSWG projects can be found at the main GitHub repo.
-- Project Status: [Kicked Off!]
Campaign finance in the U.S is the key to the system of corruption that has now wrecked our government. Members and candidates for Congress spend anywhere between 30% to 70% of their time raising money to get themselves (re)elected. But who and how many people actually contribute to these campaigns?
It turns out that only a tiny fraction of the 1% are actually "relevant funders" of congressional campaigns. In other words, 150,000 Americans wield enormous power over this government. Furthermore, our government is supposed to represent the public, but with so few making meaningful financial contributions, how do we know if our elected officials are not answering to special demands these "funders" make?
This challenge and the problems we face is described beautifully in Lawrence Lessig's TED Talk in which he discusses the problems of Campaign Finance in America as the number one issue that blocks progress on every other issue.
The goals of this project are to use data and technology to (1) provide more transparency of campaign finance at the local, state, or even federal level and (2) investigate how campaign finance contributions affect elected officials' behavior. Our current problem statements can be found here.
As an optional component to this project, Challenge.gov is currently sponsoring a Congressional Data Competition. The Challenge framing is actually quite broad: the goal is to create an application, website visualization, or other digital creation that helps analyze Congressional data. As an optional component, we can have as a deliverable to submit to this competition (there is a $5,000 prize)!
- In partnership with CSUMB Computer Science students completing their capstone project, we will be analyzing Congressional data with a focus on campaign finance data.
- CSUMB CST 499 Capstone Project: https://sites.google.com/a/csumb.edu/cst-499-computer-science-capstone-course/mentors-partners
- Partner contact: Erik Eldridge, [@erikeldridge]
- Inferential Statistics
- Machine Learning
- Data Visualization
- Predictive Modeling
- Pandas, Jupyter
- Mode Analytics
This project broadly decomposes into client/server and data eng/sci tasks:
We recently extracted the client/server code into it's own repo (for cloning efficiency on Travis).
The data eng/sci code is housed in this repo.
Needs of this project
- Project Leads (from Code for San Francisco): We need project leads that are willing to be a point of contact for the CSUMB students and be an engaging partner in scoping out the problem. We are also considering a "Support Rotation", see proposed schedule below which would consist of a team of project leads from C4SF who will rotate each week on being the mentor.
Other Roles Include:
- frontend developers
- data exploration/descriptive statistics
- data processing/cleaning
- statistical modeling
Please go to the Onboarding docs to start contributing to this project!
Contributing DSWG Members
Team Leads (Contacts) : [Full Name](https://github.com/[github handle])(@slackHandle)
Code for San Francisco Support Rotation
|01/03/2018 - 01/09/2018||Vincent La||@vincela14|
|01/10/2018 - 01/16/2018||Erik Eldridge||@erikeldridge|
|01/17/2018 - 01/23/2018||Vincent La||@vincela14|
|01/24/2018 - 01/30/2018||Vincent La||@vincela14|
|01/31/2018 - 02/06/2018||Vincent La||@vincela14|
|02/07/2018 - 02/13/2018||Vincent La||@vincela14|
|FUTURE DATE||YOUR NAME||YOUR SLACK HANDLE|
|[Full Name](https://github.com/[github handle])||@johnDoe|
|[Full Name](https://github.com/[github handle])||@janeDoe|
- If you haven't joined the SF Brigade Slack, you can do that here.
- Our slack channel is
- We'll use a Trello dashboard to organize work
- Feel free to contact team leads with any questions or if you are interested in contributing!
Note while the main focus of this project will be on campaign finance, there are undoubtedly other very interesting questions using congressional data. Some additional ideas include:
- Voting patterns - How has your Congressional representative voted over time? Do any factors correlate with a yes vote? Can we predict how she’ll vote on the next bill? How confident are we in the prediction? Can we establish a voting preference profile, e.g. trained on voter recommendations, and generate an alert when a prediction conflicts with our preference?
- Visualizing Gerrymandering - (i.e. can we show evidence of racial gerrymandering, or other illegal/unethical gerrymandering by socio-demographic splits)