<a href="https://www.kaggle.com/code/mikedelong/2020-2024-electoral-college?scriptVersionId=146502959" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
voting_df = pd.read_csv(filepath_or_buffer='/kaggle/input/2020-us-presidential-election-results-by-state/voting.csv')
voting_df.head(n=5)

Unnamed: 0,state,state_abr,trump_pct,biden_pct,trump_vote,biden_vote,trump_win,biden_win
0,Alaska,AK,53.1,43.0,189543,153502,1,0
1,Hawaii,HI,34.3,63.7,196864,366130,0,1
2,Washington,WA,39.0,58.4,1584651,2369612,0,1
3,Oregon,OR,40.7,56.9,958448,1340383,0,1
4,California,CA,34.3,63.5,5982194,11082293,0,1


We want to get the Electoral College result from 2020 and roll it forward to 2024 based on changes in the Electoral College due to reapportionment after the 2020 Census.

In [2]:
college_df = pd.read_csv(filepath_or_buffer='/kaggle/input/electoralcollege/Electoral_College.csv', usecols=['Year', 'State', 'Votes'])
college_df['State'] = college_df['State'].replace(to_replace={'D.C.': 'District of Columbia'})
college_df.head(n=5)

Unnamed: 0,Year,State,Votes
0,1788,Alabama,
1,1792,Alabama,
2,1796,Alabama,
3,1800,Alabama,
4,1804,Alabama,


In [3]:
ec2020_df = college_df[college_df['Year']==2020].drop(columns=['Year',]).copy()
ec2020_df['Votes'] = ec2020_df['Votes'].astype(int)
# we need the codes to build a choropleth
ec2020_df = ec2020_df.merge(how='inner', on='State', right=pd.read_csv(filepath_or_buffer='/kaggle/input/electoralcollege/StateCode.csv'))
ec2020_df.head(n=5)

Unnamed: 0,State,Votes,Code
0,Alabama,9,AL
1,Alaska,3,AK
2,Arizona,11,AZ
3,Arkansas,6,AR
4,California,55,CA


In [4]:
from plotly.express import choropleth
choropleth(data_frame=ec2020_df, locations='Code', color='Votes', projection='albers usa', locationmode='USA-states')

In [5]:
result_2020_df = voting_df[['state', 'trump_win', 'biden_win']].rename(columns={'state' : 'State'}).merge(how='inner', on='State', right=ec2020_df).copy()
result_2020_df['DEC votes'] = result_2020_df['biden_win'] * result_2020_df['Votes']
result_2020_df['REC votes'] = result_2020_df['trump_win'] * result_2020_df['Votes']
result_2020_df.head(n=5)

Unnamed: 0,State,trump_win,biden_win,Votes,Code,DEC votes,REC votes
0,Alaska,1,0,3,AK,0,3
1,Hawaii,0,1,4,HI,4,0
2,Washington,0,1,12,WA,12,0
3,Oregon,0,1,7,OR,7,0
4,California,0,1,55,CA,55,0


In [6]:
choropleth(data_frame=result_2020_df[result_2020_df['biden_win']==1], locations='Code', color='Votes', projection='albers usa',
           locationmode='USA-states', color_continuous_scale='blues',
          title='Where do Democratic Party Electoral College votes come from?')

In [7]:
choropleth(data_frame=result_2020_df[result_2020_df['trump_win']==1], locations='Code', color='Votes', projection='albers usa',
           locationmode='USA-states', color_continuous_scale='reds',
          title='Where do Republican Electoral College votes come from?')

Now we can check the result against the historical result: D 306 - 232 R.

In [8]:
result_2020_df['DEC votes'].sum(), result_2020_df['REC votes'].sum()

(306, 232)

We can introduce the changes to the Electoral College due to the 2020 Census.

In [9]:
changes = {
    'California': -1,
    'Colorado': 1,
    'Florida' : 1,
    'Illinois': -1,
    'Michigan': -1,
    'Montana': 1,
    'New York': -1,
    'North Carolina': 1,
    'Ohio': -1,
    'Oregon': 1,
    'Pennsylvania': -1,
    'Texas': 2,
    'West Virginia': -1
}
# there should be no net changes
sum(list(changes.values()))

0

In [10]:
result_2024_df = result_2020_df.copy().drop(columns=['DEC votes', 'REC votes'])
result_2024_df['Votes'] = result_2020_df.apply(axis=1, func=lambda x: x['Votes'] if x['State'] not in changes.keys() else x['Votes'] + changes[x['State']])
result_2024_df.head()

Unnamed: 0,State,trump_win,biden_win,Votes,Code
0,Alaska,1,0,3,AK
1,Hawaii,0,1,4,HI
2,Washington,0,1,12,WA
3,Oregon,0,1,8,OR
4,California,0,1,54,CA


In [11]:
choropleth(data_frame=result_2024_df, locations='Code', color='Votes', projection='albers usa', locationmode='USA-states')

Seven votes out of 538 have been reapportioned, so it isn't surprising we can't see the difference comparing this to the 2020 map visually.

Our base scenario is that no states flip and the only change is due to reapportionment.

In [12]:
result_2024_df['DEC votes'] = result_2024_df['biden_win'] * result_2024_df['Votes']
result_2024_df['REC votes'] = result_2024_df['trump_win'] * result_2024_df['Votes']

result_2024_df['DEC votes'].sum(), result_2024_df['REC votes'].sum()

(303, 235)

D 303 - 235 R is our base case for 2024.

In [13]:
# these are the states that 270 to Win has as competitive in their polling average as of October 13, 2023
# and their D margin

battleground_states = {'North Carolina' : -1, 'Nevada': 1, 'Georgia': -1, 'Pennsylvania': -1.8, 
                       'Texas': -2, 'Arizona': -2, 'Michigan': -2, 'Wisconsin': -2, 'Minnesota': -3,
                      'Virginia': 3}
battleground_df = pd.DataFrame.from_dict(data=battleground_states, orient='index', 
                                         columns=['D margin']).reset_index().rename(columns={'index': 'State'})
battleground_df = battleground_df.merge(how='inner', on='State', right=result_2024_df[['State', 'Votes', 'Code']])
battleground_df

Unnamed: 0,State,D margin,Votes,Code
0,North Carolina,-1.0,16,NC
1,Nevada,1.0,6,NV
2,Georgia,-1.0,16,GA
3,Pennsylvania,-1.8,19,PA
4,Texas,-2.0,40,TX
5,Arizona,-2.0,11,AZ
6,Michigan,-2.0,15,MI
7,Wisconsin,-2.0,10,WI
8,Minnesota,-3.0,10,MN
9,Virginia,3.0,13,VA


What does our D vote total look like here?

In [14]:
from numpy import sign
(sign(battleground_df['D margin']) * battleground_df['Votes']).sum()

-118.0

How many votes are at stake in the battleground states?

In [15]:
sum(battleground_df['Votes'])

156

In [16]:
battleground = int((sign(battleground_df['D margin']) * battleground_df['Votes']).sum())
battleground_available = sum(battleground_df['Votes'])
d_total = battleground if battleground > 0 else battleground_available + battleground
r_total = battleground_available - d_total
print('D  {} - {} R'.format(d_total, r_total))

D  38 - 118 R


In [17]:
from random import random
from random import seed
from numpy import array
from collections import Counter

seed(a=2023)

margin = battleground_df['D margin'].values
votes = battleground_df['Votes'].values
def realize() -> float:
    iterate = [1-(int(9*random())//3) for _ in range(10)]
    return (sign(margin + array(iterate)) * votes).sum()

realizations = [realize() for _ in range(1000)]
Counter(realizations)

Counter({-102.0: 315,
         -118.0: 275,
         -124.0: 154,
         -108.0: 143,
         -86.0: 84,
         -92.0: 29})

In [18]:
from plotly.express import histogram
histogram(x=realizations)

Let's get the safe data and do a sanity check.

In [19]:
safe_df = result_2024_df[~result_2024_df['State'].isin(battleground_df['State'])]
print(safe_df['DEC votes'].sum() + d_total, safe_df['REC votes'].sum() + r_total)

241 297


Let's prototype with a uniform distribution of outcomes within the margin of error.

In [20]:
def scenario(base: array, weights: array, safe: int) -> int:
    scenario = array([3.16 * random() - (3.16/2) for _ in range(len(base))])
    total = int((votes * sign(base + scenario)).sum())
    if total >= 0:
        return safe + total
    else:
        return safe + battleground_available + total
    
scenarios = [scenario(base=battleground_df['D margin'].values, weights=battleground_df['Votes'].values, 
                      safe=int(safe_df['DEC votes'].sum())) for _ in range(10000)]

In [21]:
histogram(x=[scenario(base=[0], weights=[1], safe=99) for _ in range(10000)], nbins=538)

In [22]:
histogram(x=scenarios, color = [item >= 270 for item in scenarios], nbins=80)

In [23]:
pd.DataFrame(data={'scenario': scenarios, 'D win': [item >= 270 for item in scenarios]})['D win'].value_counts(normalize=True)

D win
False    0.7169
True     0.2831
Name: proportion, dtype: float64