Trying to forecast state legislative elections based on data available from the Colorado Secretary of State's office
This project entails projections of both Senate and House legislative races, and is broken into four Jupyter notebooks.
2018_reg_results_trends.ipynb looks at whether there is any way to forecast an outcome with the County-District combination from just registration data. There is a clear trend:
although predictions from a simple logistic regression on this data leads to a wide range of predictions.
To run the models, we have to first clean the raw data from the spreadsheets from the Colorado SoS's office. This is done with the data_cleaning_house.ipynb and data_cleaning_senate.ipynb notebooks, which should be run through once to generate the cleaned data for the model notebooks.
Once the data are cleaned, we fit neural network with a single hidden layer to predict for each COUNTY-DISTRICT the vote share for each party for the current election, using as features the vote share from the prior election, the registration composition of the prior election, and the registration composition for the current election. Because the data are not strongly correlated, our model is an ensemble of neural networks trained on only part of the data. This leads to a spread in predictions, but can be used to determine if races are close (lots of overlap in the figure below) or safe. Close races should receive more resources than safe races.