# Evolving Election Uncertainty

In [None]:
#hide
import pandas as pd 
import numpy as np
import altair as alt
import datetime as dt
alt.themes.enable('fivethirtyeight')
pd.options.mode.chained_assignment = None

In [None]:
#hide
# Function for pulling historical data 
def data(market):
    #24h,7d,30d, 90d,
    time = '90d'
    url = 'https://www.predictit.org/Resource/DownloadMarketChartData?marketid='+str(market)+'&timespan='+str(time)
    df = pd.read_csv(url)
    return df

What are the odds that the Democratic nominee for president will win the 2032 election? If you had to guess now, you'd probably say 50-50. The US has a very stable two party system, presidential elections are generally competitive and of the past 10 elections, the Democrat was won 4. 

Now what are the odds that the Democratic nominee for president will win the 2020 election? In this case we have a lot more data to go on. Joe Biden has a significant lead in the polls. Donald Trump has a significant advantage due to the mechanics of the Electoral College, but most observers agree that Biden should be favored to win. A week before the election, FiveThirtyEight has Joe Biden with an 88% chance of victory, the Economist says 96%, and the betting market PredictIt says 63%. The odds differ, but all agree that a Biden win is more probable than a Trump victory. 

This makes intunitive sense. Probability estimates of an event can become more certain as time goes on, because there is less intervening time for events to occur that may affect the predicted event. Think of the win probability of a football game. Generally a team with a 7 point lead is more likely to win, but their chances are much higher if they hold that lead with one minute left than if they go into halftime up a touchdown. More time on the clock means more uncertainty, all else equal. 

You can see this principle quite clearly in the output of the FiveThirtyEight model. While Biden's [polling lead](https://projects.fivethirtyeight.com/polls/president-general/national/) has been fairly stable throughout the campaign there has been a steady (though not uninterrupted) increase in his odds of victory from late August onwards. 


In [None]:
#hide_input
fte_pres_winner = pd.read_csv("https://projects.fivethirtyeight.com/2020-general-data/presidential_national_toplines_2020.csv")
fte_pres_winner['Date']= pd.to_datetime(fte_pres_winner['timestamp'])

fte_pres_winner['Biden'] = fte_pres_winner.ecwin_chal
fte_pres_winner['Trump'] = fte_pres_winner.ecwin_inc
fte_pres_winner = fte_pres_winner[["Date", "Trump", "Biden"]]

fte_pres_winner = pd.melt(fte_pres_winner, id_vars = "Date", value_vars=["Trump", "Biden"], 
                          var_name='Candidate', value_name='Win Probability')

fte_pres_winner = fte_pres_winner[fte_pres_winner['Date'] > pd.to_datetime('2020-08-11')]



In [None]:
#hide_input
alt.Chart(fte_pres_winner).mark_line().encode(
    x='Date',
    y=alt.Y('Win Probability',
        scale=alt.Scale(domain=[0,1])
    ),
    color='Candidate',
    tooltip=['Date', "Candidate", "Win Probability"], 
).properties(
    title = "FiveThirtyEight Presidential Probabilites"
)

The win probabilities implied by the prices on the PredictIt betting market however do not exhibit this pattern. They have remained remarkably stable over time with the notable exception of the period in late September which coincides with the first debate (widely considered a Biden victory) and Trump's COVID diagnosis. The PredictIt market reacted suddenly to these events, but the prices do not adjust as election day approaches. 

In [None]:
#hide_input
pit_pres_winner = data(3698)

pit_pres_winner['Candidate']= pit_pres_winner['ContractName'].astype(str)
pit_pres_winner['Win Probability']= pit_pres_winner['CloseSharePrice'].str.lstrip('\$').astype(float)
pit_pres_winner['Date']= pd.to_datetime(pit_pres_winner['Date'])

pit_pres_winner = pit_pres_winner[pit_pres_winner['Candidate'].isin(['Biden', 'Trump'])]

pit_pres_winner = pit_pres_winner[["Date", "Candidate", "Win Probability"]]

alt.Chart(pit_pres_winner).mark_line().encode(
    x='Date',
    y=alt.Y('Win Probability',
        scale=alt.Scale(domain=[0,1])
    ),
    color='Candidate',
    tooltip=['Date', 'Win Probability', 'Candidate']
).properties(
    title = "PredictIt Presidential Probabilites"
)

Here are the chances of a Biden victory presented from both sources, to illustrate the difference: 

In [None]:
#hide_input
#Join Datasets
fte_pres_winner["Date"] = fte_pres_winner["Date"].dt.date
pit_pres_winner["Date"] = pit_pres_winner["Date"].dt.date

merged = pd.merge(pit_pres_winner,fte_pres_winner, on=['Date','Candidate'], suffixes=[" pit", " fte"])
merged = merged.rename(columns={"Win Probability pit": "PredictIt", "Win Probability fte": "FiveThirtyEight"})

merged = pd.melt(merged, id_vars = ["Date", "Candidate"], value_vars=["PredictIt", "FiveThirtyEight"], 
                 var_name='Source', value_name='Win Probability')

merged["Date"] = pd.to_datetime(merged.Date)
merged_biden = merged[merged["Candidate"] == "Biden"]

In [None]:
#hide_input
alt.Chart(merged_biden).mark_line().encode(
    x='Date',
    y=alt.Y('Win Probability',
        scale=alt.Scale(domain=[0,1])
    ),
    color=alt.Color('Source',
                   scale=alt.Scale(
                       domain=['FiveThirtyEight', 'PredictIt'],
                       range=['rgb(222,119,72)', 'rgb(70,159,184'])),   
    tooltip=['Date', "Source", "Win Probability"], 
).properties(
    title = "Biden's Election Probabilites"
)

PredictIt has been more bullish on Trump throughout the campaign. Putting aside whether or not PredictIt is correctly pricing the probability of a Trump win, this chart presents Joe Biden's win probability indexed to his first win probability in the sample to illustrate the difference in the trend over time: 

In [None]:
#hide_input
merged_biden["Win Probability (Indexed)"] = \
100 * merged_biden["Win Probability"] / merged_biden.groupby(["Source", "Candidate"]).transform("first")["Win Probability"]

In [None]:
#hide_input
alt.Chart(merged_biden).mark_line().encode(
    x='Date',
    y=alt.Y('Win Probability (Indexed)',
        scale=alt.Scale(domain=[70,130])
    ),
    color=alt.Color('Source',
                   scale=alt.Scale(
            domain=['FiveThirtyEight', 'PredictIt'],
            range=['rgb(222,119,72)', 'rgb(70,159,184'])),
    tooltip=['Date', "Source", "Win Probability (Indexed)"], 
).properties(
    title = "Biden's Election Probabilites (Indexed)"
)

Now that we've removed the differences between the FiveThirtyEight model we can clearly see how the two sources differ in how they evaluate the development of the race over time. Both FiveThirtyEight and PredictIt saw the race as fairly stable in August through early September. Then, Biden's odds begin to rise in the FiveThirtyEight model, but barely budged on PredictIt. The first debate and COVID diagnosis in late September showed up quickly in the PredictIt probability, and a bit later in the FiveThirtyEight probability (note the acceleration of Biden's upward trajectory in early October). Following those events Biden continued to make gains in the FiveThirtyEight model, but PredictIt movement settled down, with a modest tick back towards Trump. 

Given the structure of the prediction being made here, I'm inclined to say that the movement of probabilities in the FiveThirtyEight model makes more sense. As election day approaches (and more and more people have voted early) the potential for suprising events that could shake up the race decreases, and an accurate assessment of the probabilities should reflect that. This suggests an inefficiency in the PredictIt market. 