# 2020 US Presidential Elections Winner Simulator
This project uses state-by-state predictions from [PredictIt](https://www.predictit.org/) to simulate many hypothetical elections to create an overall prediction: *how often Joe Biden will win the US election*.  

If you are unfamiliar with Predictit, it is a financial prediction market where traders can exchange shares based on various political events around the world.  The value of these shares is determined by what traders believe is the likelihood of a particular event.

## Imports

In [52]:
import csv
import io
import json
import numpy
import urllib.request
import re

We need two sets of data: the number of electoral votes allocated to each congressional district, and the market data from PredictIt.

## Download Electoral Votes Data
Get the number of electoral votes assigned to each congressional district.

In [53]:
url_open = urllib.request.urlopen("https://raw.githubusercontent.com/peterhhchan/us-elections-2020/master/notebooks/PredictItSimulator/states.csv")
reader = csv.reader(io.TextIOWrapper(url_open, encoding = 'utf-8'), delimiter=',')

electoral_votes = {}
for v,n,a in reader:
    electoral_votes[a]=int(v)
    
print (electoral_votes)

{'CA': 55, 'TX': 38, 'FL': 29, 'NY': 29, 'IL': 20, 'PA': 20, 'OH': 18, 'GA': 16, 'MI': 16, 'NC': 15, 'NJ': 14, 'VA': 13, 'WA': 12, 'AZ': 11, 'IN': 11, 'MA': 11, 'TN': 11, 'MD': 10, 'MN': 10, 'MO': 10, 'WI': 10, 'AL': 9, 'CO': 9, 'SC': 9, 'KY': 8, 'LA': 8, 'CT': 7, 'OK': 7, 'OR': 7, 'AR': 6, 'IA': 6, 'KS': 6, 'MS': 6, 'NV': 6, 'UT': 6, 'NE': 5, 'NM': 5, 'WV': 5, 'HI': 4, 'ID': 4, 'ME': 4, 'NH': 4, 'RI': 4, 'AK': 3, 'DE': 3, 'MT': 3, 'ND': 3, 'SD': 3, 'VT': 3, 'WY': 3, 'DC': 3}


## Download Market Data
Grab the latest share prices on PredictIt. Prices will range from 1 cents to 99 cents (0.01 - 0.99). Roughly, a price of 1 cent implies Biden has a 1% chance of winning that state, and a price of 99 cents implies Biden has a 99% chance of winning that state.

In [54]:
prices = {}
markets = json.loads(urllib.request.urlopen("https://www.predictit.org/api/marketdata/all/").read())['markets']
for m in markets:
    sn = m['shortName']
    match = re.search("^Which party will win ([A-Z]{2})( in)? 2020\?$", sn)
    if match:
        for c in m['contracts']:
            if c['name'] == 'Democratic':
                prices[match[1]] = c['lastTradePrice']

print (prices)

{'WI': 0.75, 'PA': 0.72, 'FL': 0.57, 'MI': 0.79, 'AZ': 0.69, 'MN': 0.8, 'NH': 0.8, 'NC': 0.55, 'OH': 0.48, 'NV': 0.8, 'VA': 0.91, 'IA': 0.43, 'GA': 0.45, 'CO': 0.9, 'TX': 0.33, 'ME': 0.89, 'IN': 0.1, 'NM': 0.9, 'NJ': 0.95, 'MO': 0.16, 'OR': 0.93, 'UT': 0.08, 'TN': 0.07, 'CT': 0.95, 'AK': 0.23, 'KY': 0.06, 'MD': 0.95, 'MA': 0.97, 'AR': 0.07, 'WA': 0.95, 'MT': 0.14, 'SC': 0.2, 'CA': 0.95, 'NY': 0.94, 'IL': 0.94, 'WV': 0.05, 'OK': 0.04, 'LA': 0.08, 'ID': 0.04, 'NE': 0.07, 'AL': 0.06, 'KS': 0.12, 'MS': 0.1, 'RI': 0.96, 'HI': 0.96, 'WY': 0.04, 'VT': 0.96, 'DE': 0.96, 'ND': 0.04, 'SD': 0.07, 'DC': 0.98}


## Calculate the Implied Probability
PredictIt charges a 10% fee on profits, the savvy bettor understands that in order to break-even on a 50-cent contract, the contract must win more than 52.6% of the time. Here we convert the price to its implied probability. 

In [55]:
win_prob = {}
for s, price in prices.items():
    p = float (price)
    ## predictit charges a 10% fee on profits
    ## the net profit on a 50 cent contract is 45 cents
    ## the bettor must win 52.63%+ of the time to breakeven
    win_prob[s] = p / ((1 - p) * 0.9 + p)
print (win_prob)

{'WI': 0.7692307692307693, 'PA': 0.7407407407407407, 'FL': 0.5956112852664576, 'MI': 0.8069458631256384, 'AZ': 0.7120743034055727, 'MN': 0.8163265306122449, 'NH': 0.8163265306122449, 'NC': 0.5759162303664922, 'OH': 0.5063291139240507, 'NV': 0.8163265306122449, 'VA': 0.9182643794147326, 'IA': 0.45599151643690344, 'GA': 0.47619047619047616, 'CO': 0.9090909090909092, 'TX': 0.3536977491961415, 'ME': 0.8998988877654196, 'IN': 0.10989010989010989, 'NM': 0.9090909090909092, 'NJ': 0.9547738693467336, 'MO': 0.17467248908296942, 'OR': 0.9365558912386708, 'UT': 0.0881057268722467, 'TN': 0.07717750826901874, 'CT': 0.9547738693467336, 'AK': 0.24918743228602383, 'KY': 0.06622516556291391, 'MD': 0.9547738693467336, 'MA': 0.9729187562688064, 'AR': 0.07717750826901874, 'WA': 0.9547738693467336, 'MT': 0.15317286652078776, 'SC': 0.21739130434782605, 'CA': 0.9547738693467336, 'NY': 0.9456740442655935, 'IL': 0.9456740442655935, 'WV': 0.05524861878453039, 'OK': 0.04424778761061947, 'LA': 0.0881057268722467,

## Adjust the Probabilities 
If you think the markets are incorrect, this section is where you could adjust the predictions.

Here we apply a global bias. If we think the markets are over-valueing Biden's chances by 25%, we would set our bias to `0.8`.

In [56]:
bias = 1
def apply_bias():
    for s, prob in win_prob.items():
        win_prob [s] = prob * bias
apply_bias()

Next, we can apply smooth out our simulations by applying some assumptions to our predictions. 

 * If the market prediction is above `win_threshold`, Biden will always win that district
 * If the market prediction is below `lose_threshold`, Biden will always lose that district.

An interesting set of parameters to use is to set `win_threshold` to *0.8*, and `lose_threshold` to *0.5*.

In [57]:
win_threshold  = 0.95
lose_threshold = 0.05

In [58]:
def apply_thresholds():
    for s, prob in win_prob.items():
        if prob > win_threshold:   
            win_prob [s] = 1.0
        elif prob < lose_threshold: 
            win_prob [s] = 0

apply_thresholds() 
print (win_prob)

{'WI': 0.7692307692307693, 'PA': 0.7407407407407407, 'FL': 0.5956112852664576, 'MI': 0.8069458631256384, 'AZ': 0.7120743034055727, 'MN': 0.8163265306122449, 'NH': 0.8163265306122449, 'NC': 0.5759162303664922, 'OH': 0.5063291139240507, 'NV': 0.8163265306122449, 'VA': 0.9182643794147326, 'IA': 0.45599151643690344, 'GA': 0.47619047619047616, 'CO': 0.9090909090909092, 'TX': 0.3536977491961415, 'ME': 0.8998988877654196, 'IN': 0.10989010989010989, 'NM': 0.9090909090909092, 'NJ': 1.0, 'MO': 0.17467248908296942, 'OR': 0.9365558912386708, 'UT': 0.0881057268722467, 'TN': 0.07717750826901874, 'CT': 1.0, 'AK': 0.24918743228602383, 'KY': 0.06622516556291391, 'MD': 1.0, 'MA': 1.0, 'AR': 0.07717750826901874, 'WA': 1.0, 'MT': 0.15317286652078776, 'SC': 0.21739130434782605, 'CA': 1.0, 'NY': 0.9456740442655935, 'IL': 0.9456740442655935, 'WV': 0.05524861878453039, 'OK': 0, 'LA': 0.0881057268722467, 'ID': 0, 'NE': 0.07717750826901874, 'AL': 0.06622516556291391, 'KS': 0.13157894736842105, 'MS': 0.109890109

Alternatively, you can simply update a state's probability via: `win_prob[FL] = 0.5` which means Biden has a 50% chance of winning Florida.

In [59]:
## Override Democrat's chances of winning
# win_prob['FL'] = 0.5
# win_prob['AZ'] = 0.5
# win_prob['NC'] = 0.5
# win_prob['WI'] = 0.5
# win_prob['MI'] = 0.5
# win_prob['PA'] = 0.5
# win_prob['MN'] = 0.5

## Prepare the Data
1. Arrange the data alphabetically

In [60]:
win_prob_sorted = sorted(win_prob.items())
print (win_prob_sorted)

[('AK', 0.24918743228602383), ('AL', 0.06622516556291391), ('AR', 0.07717750826901874), ('AZ', 0.7120743034055727), ('CA', 1.0), ('CO', 0.9090909090909092), ('CT', 1.0), ('DC', 1.0), ('DE', 1.0), ('FL', 0.5956112852664576), ('GA', 0.47619047619047616), ('HI', 1.0), ('IA', 0.45599151643690344), ('ID', 0), ('IL', 0.9456740442655935), ('IN', 0.10989010989010989), ('KS', 0.13157894736842105), ('KY', 0.06622516556291391), ('LA', 0.0881057268722467), ('MA', 1.0), ('MD', 1.0), ('ME', 0.8998988877654196), ('MI', 0.8069458631256384), ('MN', 0.8163265306122449), ('MO', 0.17467248908296942), ('MS', 0.10989010989010989), ('MT', 0.15317286652078776), ('NC', 0.5759162303664922), ('ND', 0), ('NE', 0.07717750826901874), ('NH', 0.8163265306122449), ('NJ', 1.0), ('NM', 0.9090909090909092), ('NV', 0.8163265306122449), ('NY', 0.9456740442655935), ('OH', 0.5063291139240507), ('OK', 0), ('OR', 0.9365558912386708), ('PA', 0.7407407407407407), ('RI', 1.0), ('SC', 0.21739130434782605), ('SD', 0.077177508269018

2. Create a `numpy` array of the win probabilities

In [61]:
ps_sorted = numpy.fromiter(dict(win_prob_sorted).values(), dtype=float)
print (ps_sorted)

[0.24918743 0.06622517 0.07717751 0.7120743  1.         0.90909091
 1.         1.         1.         0.59561129 0.47619048 1.
 0.45599152 0.         0.94567404 0.10989011 0.13157895 0.06622517
 0.08810573 1.         1.         0.89989889 0.80694586 0.81632653
 0.17467249 0.10989011 0.15317287 0.57591623 0.         0.07717751
 0.81632653 1.         0.90909091 0.81632653 0.94567404 0.50632911
 0.         0.93655589 0.74074074 1.         0.2173913  0.07717751
 0.07717751 0.35369775 0.08810573 0.91826438 1.         1.
 0.76923077 0.05524862 0.        ]


3. Create a `numpy` array of the electoral votes

In [62]:
vs_sorted = numpy.fromiter ((v for _,v in sorted(electoral_votes.items())), int)    
print (vs_sorted)

[ 3  9  6 11 55  9  7  3  3 29 16  4  6  4 20 11  6  8  8 11 10  4 16 10
 10  6  3 15  3  5  4 14  5  6 29 18  7  7 20  4  9  3 11 38  6 13  3 12
 10  5  3]


Combining steps 1-3 into a function

In [63]:
def update_arrays():
    win_prob_sorted = sorted(win_prob.items())
    ps_sorted = numpy.fromiter(dict(win_prob_sorted).values(), dtype=float)
    vs_sorted = numpy.fromiter ((v for _,v in sorted(electoral_votes.items())), int)    

## Create and Run the Simulation
1. Generate a bunch of random numbers, and compare the numbers to the implied probabilities
2. If the implied probability is greater than the random number, give Biden the number of votes that corresponds to that state
3. Sum up all the votes and see if that number is bigger than 269
4. Run steps 1-3 repeatedly and calculate how often Biden wins

In [64]:
def simulate(n):
    update_arrays()
    
    rs = numpy.random.rand(n, len(win_prob_sorted))
    sims_won = 0
    for r in rs:
        wins = numpy.greater(ps_sorted , r)
        total_votes_won = numpy.sum(numpy.multiply(wins, vs_sorted))
        if total_votes_won > 269: ## Tie breaks go to republicans
            sims_won+=1

    return sims_won / n

Run the simulation!

In [65]:
simulate (100000)

0.98289

## Limitations
1. The biggest problem with this simulator is that it treats the individual results as independent events, which they are not.  

2. Nebraska and Maine uses the congressional district method to assign their votes. However, their electoral votes are assigned using the winner-take-all system in our simulations.

## Further Reading
https://www.270towin.com/