CSUMB in Collaboration with C4SF Data Science Working Group work together to solve problems with Congressional Data
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


datasci-congressional-data Build Status Coverage Status

Development Status: Build Status Coverage Status

This project is a part of the Data Science Working Group at Code for San Francisco in collaboration with CSUMB Computer Science students who are completing their capstone project. Other DSWG projects can be found at the main GitHub repo.

-- Project Status: [Kicked Off!]

Project Intro/Objective

Campaign finance in the U.S is the key to the system of corruption that has now wrecked our government. Members and candidates for Congress spend anywhere between 30% to 70% of their time raising money to get themselves (re)elected. But who and how many people actually contribute to these campaigns?

It turns out that only a tiny fraction of the 1% are actually "relevant funders" of congressional campaigns. In other words, 150,000 Americans wield enormous power over this government. Furthermore, our government is supposed to represent the public, but with so few making meaningful financial contributions, how do we know if our elected officials are not answering to special demands these "funders" make?

This challenge and the problems we face is described beautifully in Lawrence Lessig's TED Talk in which he discusses the problems of Campaign Finance in America as the number one issue that blocks progress on every other issue.

The goals of this project are to use data and technology to (1) provide more transparency of campaign finance at the local, state, or even federal level and (2) investigate how campaign finance contributions affect elected officials' behavior. Our current problem statements can be found here.

As an optional component to this project, Challenge.gov is currently sponsoring a Congressional Data Competition. The Challenge framing is actually quite broad: the goal is to create an application, website visualization, or other digital creation that helps analyze Congressional data. As an optional component, we can have as a deliverable to submit to this competition (there is a $5,000 prize)!


Methods Used

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Predictive Modeling


  • Python
  • PostgreSQL
  • Pandas, Jupyter
  • Mode Analytics


This project broadly decomposes into client/server and data eng/sci tasks:

project overview

overview diagram source.

We recently extracted the client/server code into it's own repo (for cloning efficiency on Travis).

The data eng/sci code is housed in this repo.

Needs of this project

  • Project Leads (from Code for San Francisco): We need project leads that are willing to be a point of contact for the CSUMB students and be an engaging partner in scoping out the problem. We are also considering a "Support Rotation", see proposed schedule below which would consist of a team of project leads from C4SF who will rotate each week on being the mentor.

Other Roles Include:

  • frontend developers
  • data exploration/descriptive statistics
  • data processing/cleaning
  • statistical modeling
  • writeup/reporting

Getting Started

Please go to the Onboarding docs to start contributing to this project!

Project History

Time Milestone
December 2017
  • Project Formulation Problem Statements Created
  • Ongoing Team Recruitment
January 2018
  • Two Student Teams from California State University Monterey Bay Join (9 Total Students)
  • Problem Statement Refinement
  • Set up Postgres DB on Microsoft Azure
February 2018
  • Project Pitch at Code for San Francisco Demo Night
  • Mock Dashboards Up on Mode Analytics
  • First Strawman Machine Learning Model Predicting Election Results using Campaign Finance Data produced
  • Begin Working on Web App
March 2018
  • Pitched Project at Open Data Day SF
  • Improvements in Underlying Data Model
  • Begin working on deployment of Web App

Contributing DSWG Members

Team Leads (Contacts) : [Full Name](https://github.com/[github handle])(@slackHandle)

Code for San Francisco Support Rotation

Week Name Slack Handle
01/03/2018 - 01/09/2018 Vincent La @vincela14
01/10/2018 - 01/16/2018 Erik Eldridge @erikeldridge
01/17/2018 - 01/23/2018 Vincent La @vincela14
01/24/2018 - 01/30/2018 Vincent La @vincela14
01/31/2018 - 02/06/2018 Vincent La @vincela14
02/07/2018 - 02/13/2018 Vincent La @vincela14

Other Members:

Name Slack Handle
[Full Name](https://github.com/[github handle]) @johnDoe
[Full Name](https://github.com/[github handle]) @janeDoe


  • If you haven't joined the SF Brigade Slack, you can do that here.
  • Our slack channel is #datasci-congressdata
  • We'll use a Trello dashboard to organize work
  • Feel free to contact team leads with any questions or if you are interested in contributing!


Note while the main focus of this project will be on campaign finance, there are undoubtedly other very interesting questions using congressional data. Some additional ideas include:

  1. Voting patterns - How has your Congressional representative voted over time? Do any factors correlate with a yes vote? Can we predict how she’ll vote on the next bill? How confident are we in the prediction? Can we establish a voting preference profile, e.g. trained on voter recommendations, and generate an alert when a prediction conflicts with our preference?
  2. Visualizing Gerrymandering - (i.e. can we show evidence of racial gerrymandering, or other illegal/unethical gerrymandering by socio-demographic splits)