<a href="https://www.kaggle.com/code/mikedelong/python-eda-first-look?scriptVersionId=148117185" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
trimmed_df = pd.read_csv(filepath_or_buffer='/kaggle/input/moving-democrats/shor_data_trimmed.csv', thousands=',')
# we need to add in some fake but sensible DC results to get EC totals to come out right
dc_df = pd.DataFrame(data={'state': ['DC'], 'population': [5500000], 'biden': [317323], 'trump': [18586], 'difference': [317323-18586],
                          'two_way': [0.5], 'expected_share': [0.92], 'baseline_winning': [0.99], })
electoral_college_df = pd.read_csv(filepath_or_buffer='/kaggle/input/2024-electoral-college-votes-available/2024_Electoral_College.csv').rename(columns={'Abbreviation': 'state', 'State': 'Name'})
df = pd.concat([trimmed_df, dc_df]).merge(right=electoral_college_df, how='inner', on='state').rename(columns={'Total': 'EC Votes'})
df['expected_win'] = df['expected_share'] > 0.5
df.head()

Unnamed: 0,state,population,biden,trump,difference,two_way,expected_share,baseline_winning,Name,EC Votes,expected_win
0,NH,1359711,424921,365654,59267,0.53748,0.51448,0.65546,New Hampshire,4,True
1,AK,731545,153778,189951,-36173,0.44738,0.42438,0.01836,Alaska,3,False
2,ME,1344212,435072,360737,74335,0.5467,0.5237,0.7437,Maine,4,True
3,NV,3080156,703486,669890,33596,0.51223,0.48923,0.38305,Nevada,6,False
4,NM,2096829,501614,401894,99720,0.55518,0.53218,0.81302,New Mexico,5,True


This is geographic data, and we have the state abbreviations, so we have make some choropleths.

In [2]:
from plotly.express import choropleth
for column in df.drop(columns=['state', 'Name']).columns:
    choropleth(data_frame=df, locations='state', locationmode='USA-states', color=column, projection='albers usa', title=column).show()

In [3]:
from plotly.express import imshow
imshow(img=df.drop(columns=['Name', 'state']).corr())

The correlations don't tell us anything we don't already know but they look cool.

In [4]:
from plotly.express import bar
bar(data_frame=df.sort_values(by='baseline_winning'), x='state', y='baseline_winning', color='EC Votes')

In [5]:
bar(data_frame=df.sort_values(by='expected_share'), x='state', y='expected_share', color='population')

In [6]:
from plotly.express import scatter
scatter(data_frame=df, x='biden', y='trump', color='expected_share', size='population', log_x=True, log_y=True, trendline='ols')

In [7]:
from plotly.express import histogram
histogram(data_frame=df, x='baseline_winning', nbins=50, color='expected_win')

Not a lot of states are up for grabs.

In [8]:
bar(data_frame=df.sort_values(by='two_way'), x='state', y='two_way', color='population')

In [9]:
scatter(data_frame=df, x='two_way', y='expected_share', color='baseline_winning', size='population', hover_name='state')

It is not true that smaller states vote R and larger states vote D; it would be more sensible to say that within the R and D cohort smaller states are on the R end and larger states are on the D end within the cohort.

In [10]:
histogram(data_frame=df, x='expected_share', nbins=50, color='expected_win')

After some investigation we are pretty close to a nut graf; this plot shows that Rs should expect to win more states after adjusting for internal migration.

In [11]:
df[['expected_win', 'EC Votes']].groupby(by='expected_win').sum()

Unnamed: 0_level_0,EC Votes
expected_win,Unnamed: 1_level_1
False,312
True,226


This is a crude approximation because it doesn't attempt to compensate for uncertainty, undecided, or third-party effects, but circumstances look dire for 2024 Ds.