## Expected Turnout

To predict turnout we have a number of different models that we can use. In the cells below you can select a turnout model and tune the parameter for statewide expected turnout. 

---
Our different models are as follows:

1. `exp_16` - models precinct level turnout with the same turnout numbers as 2016
2. `exp_08` - models precinct level turnout with the same turnout numbers as 2008
3. `exp_avg` - models precinct level turnout with the average of the precinct level turnout numbers from 2016 and 2008
4. `exp_percent_16` - models precinct level turnout using the percent of statewide vote per precicnt in 2016 and a parameter for statewide turnout (`expected_statewide_turnout`) that can be adjusted.
5. `exp_percent_08` - models precinct level turnout using the percent of statewide vote per precicnt in 2008 and a parameter for statewide turnout (`expected_statewide_turnout`) that can be adjusted.
6. `exp_percent_avg` - models precinct level turnout using the average of the percents of statewide vote per precicnt in 2016 and 2008, and a parameter for statewide turnout (`expected_statewide_turnout`) that can be adjusted.7.
7. `overall_avg` - models precinct level turnout using the average of `exp_16`, `exp_08`, `exp_percent_16`, and `exp_percent_08`.

By default `expected_statewide_turnout` is set to 300,000.




In [129]:
def exp_16 (row):
    return row['count16']

def exp_08 (row):
    return row['count08']

def exp_avg (row):
    return (exp_16(row) + exp_08(row)) / 2

def exp_percent_16 (row):
    return (row['count16'] / statewide_turnout_16) * expected_statewide_turnout

def exp_percent_08 (row):
    return (row['count08'] / statewide_turnout_08) * expected_statewide_turnout

def exp_percent_avg (row):
    return (exp_percent_16(row) + exp_percent_08(row)) / 2

def overall_avg (row):
    return (exp_16(row) + exp_08(row) + exp_percent_16(row) + exp_percent_08(row)) / 4

In [130]:
### SET THE TURNOUT MODEL YOU WANT TO USE ###
turnout_model = overall_avg
### SET THE EXPECTED STATEWIDE TURNOUT IF YOUR MODEL DEPENDS ON IT ###
expected_statewide_turnout = 300000

## Viability

To predict viablity for EW we first predict how many caucus goers we expect to turnout for EW and then compare this number against the viability threshold calculated from our expected turnout.

To predict viability for other candidtes we assume if they have at least 70% of the viability threshold in a given precinct then that candidate will be viable.

In calculating 'Distance to Next Delegate' for single delegate precincts we use 50% + 1 as the distance and the viability threshold. This percent (50%) can be changed.

---
We predict the number of caucus goers for EW by weighting the number of 'Committed Warren' and 'Lean Warren' IDs we have. We than assume a flake rate for the weighted sum. The variables `committed_warren_weight` and `lean_warren_weight` can be set to adjust the weights of this prediction. The variable `flake_rate` can be set to adjust the flake rate (Note, the flake rate is set as the percent of people who do show up).

By default, `committed_warren_weight` is set to `0.9`, `lean_warren_weight` is set to `0.5`, and `flake_rate` is set to 0.85.

For other candidates, the 70% estimation can be changed by adjusting the `viability_percent` variable. 

The 50% threshold for single delgate precicnts can be set by adjusting the `single_delegate_viablity_percent` varaible.

In [131]:
### SET THE DESIRED VIABILITY WEIGHTS HERE ###
committed_warren_weight = 0.9
lean_warren_weight = 0.8
flake_rate = 0.85

viability_percent = 0.7

single_delegate_viablity_percent = 0.5

In [132]:
import numpy as np
import pandas as pd
import math
import civis
import json
import datetime

lee = [948327, 947318, 948329] 

In [133]:
# Setup API client
client = civis.APIClient()

# Import precinct data
to_drop = ['clinton_16', 'hubbell_18', 'clinton_hubbell_sum', 'reporting_multiplier','percent_of_statewide_vote']
sql = "SELECT * FROM analytics_ia.precinct_data"
precinct_data = civis.io.read_civis_sql(sql, "Warren for MA", use_pandas=True, client=client)
precinct_data.drop(to_drop, inplace=True, axis=1)

# Import first choice ID data
sql = "select van_precinct_id, survey_response_name, count(*) from analytics_ia.vansync_responses where mrr_all = 1 and survey_question_name = '1st Choice Caucus' group by 1,2"    
fc = civis.io.read_civis_sql(sql, "Warren for MA", use_pandas=True, client=client)
# Set dtype for columns to float
cols = fc.columns.drop('survey_response_name')
fc[cols] = fc[cols].astype(np.float32)
# Pivot on van_precinct_id
fc = fc.pivot(index='van_precinct_id', columns='survey_response_name', values='count')
# Reset dtype for columns to float
cols = fc.columns
fc[cols] = fc[cols].astype(np.float32)

# Import caucus history data 
sql = "SELECT van_precinct_id, SUM(case when caucus_attendee_2016 = 1 then 1 else 0 end) count16, SUM(case when caucus_attendee_2008 = 1 then 1 else 0 end) count08 FROM phoenix_caucus_history_ia.person_caucus_attendance ca LEFT JOIN phoenix_ia.person p ON ca.person_id = p.person_id GROUP BY van_precinct_id"
caucus_history = civis.io.read_civis_sql(sql, "Warren for MA", use_pandas=True, client=client)

# Import Organizer Turfs
sql = "select van_precinct_id, fo_name from vansync_ia.turf"
turfs = civis.io.read_civis_sql(sql, "Warren for MA", use_pandas=True, client=client)
turfs.set_index('van_precinct_id', inplace=True)

In [134]:
# Rename columns
precinct_data.rename(index=str, columns={"congressional_district": "Congressional District", 
                                    "precinct_id": "Precinct ID", 
                                    "county": "County",
                                    "precinct_code": "Precinct Code",
                                    "sos_precinct_name": "Sec. State Precinct Name",
                                    "delegates_to_county_conv": "Delegates to County Conv",
                                    "state_delegate_equivalence_sde": "State Delegate Equivalence (SDE)"}, inplace=True)

In [135]:
# Create new df of historical caucus data and precicnt data
df = pd.merge(precinct_data, caucus_history, left_on='Precinct ID', right_on='van_precinct_id')
# Set the index of the new df
df.set_index('Precinct ID', inplace=True)


In [136]:
# Add expected turnout to df
# First, calculate statewide turnouts for certain models
statewide_turnout_16 = caucus_history['count16'].sum()
statewide_turnout_08 = caucus_history['count08'].sum()
# Then apply selected model to each row
df['Expected Turnout'] = df.apply(turnout_model, axis=1)
df['Expected Turnout'] = df['Expected Turnout'].apply(lambda row: round(row, 0))
# Remove historical caucus data after turnout calculations are complete
columns = ['van_precinct_id', 'count16', 'count08']
df.drop(columns, inplace=True, axis=1)

In [137]:
def viability_threshold(num_del):
    if num_del == 0:
        return 0.0
    elif num_del == 1:
        return 0.0
    elif num_del == 2:
        return 0.25
    elif num_del == 3:
        return 1 / 6
    else:
        return 0.15

In [138]:
# Calculate SDE per person based on turnout model
df['SDE per Person'] = df.apply(lambda row : row['State Delegate Equivalence (SDE)'] / row['Expected Turnout'], axis=1)
df['SDE per Person'] = df['SDE per Person'].apply(lambda row: round(row, 3))
# Assign viablity thresholds based on the number of County Convention delegates and the turnout model
df['Viability Threshold'] = df.apply(lambda row : math.ceil(row['Expected Turnout'] * viability_threshold(row['Delegates to County Conv'])), axis=1).astype(np.float32)

In [139]:
# Add Committed Warren and Lean Warren from fc to new merged df
df = pd.merge(df, fc[['Committed Warren']], how="left", left_index=True, right_index=True)
df = pd.merge(df, fc[['Lean Warren']], how="left", left_index=True, right_index=True)
# Add Viability Threshold to fc
fc = pd.merge(fc, df[['Viability Threshold']], how='left', left_index=True, right_index=True)
fc.fillna(0, inplace=True)

In [140]:
# Calculate whether or not EW is viable for each precinct
df['Expected Warren Turnout'] = df.apply(lambda row: flake_rate * (row['Committed Warren'] * committed_warren_weight) + (row['Lean Warren'] * lean_warren_weight), axis=1)
df['Expected Warren Turnout'] = df['Expected Warren Turnout'].apply(lambda row: round(row, 0))
df['Warren Viable'] = df.apply(lambda row: row['Viability Threshold'] <= row['Expected Warren Turnout'] and row['Expected Warren Turnout'] != 0, axis=1)

In [142]:
# Calculate the number of other viable candidates in each precinct
to_drop = ['Committed Warren', 'Lean Warren', 'GOP', 'Other Dem', 'Undecided', 'Refused to say', 'Viability Threshold']
candidates = fc.columns.drop(to_drop)

def get_other_viable (row):
    num_viable = 0
    for candidate in candidates:
        if (row[candidate] >= viability_percent * row['Viability Threshold']) and (row[candidate] != 0):
            num_viable += 1
    return num_viable

fc['Other Viable Candidates'] = fc.apply(get_other_viable, axis=1)

# Calculate the total turnout across other viable candidates
def get_turnout (row):
    ID_turnout = 0
    viable_turnout = 0
    for candidate in candidates:
        ID_turnout += row[candidate]
        # If the candidate has IDs above the viablity threshold
        if (row[candidate]) >= row['Viability Threshold'] and (row[candidate] != 0):
            # Add the number of IDs to the expected ID turnout
            viable_turnout += row[candidate]
        # If the candidate has IDs above the viability percent 
        if (row[candidate] >= viability_percent * row['Viability Threshold']) and (row[candidate] != 0):
            # Add the viability threhold to the expected turnout
            viable_turnout += row['Viability Threshold']
    return ID_turnout, viable_turnout

# This is the raw turnout based on IDs
def get_ID_turnout (row):
    x, y = get_turnout (row)
    return x

# This is the adjusted turnout rounding up if a candidate has more than the viablity percent
# This only includes candidates we think will be viable
def get_viable_turnout (row):
    x, y = get_turnout (row)
    return y

fc['Partial ID Turnout'] = fc.apply(get_ID_turnout, axis=1)
fc['Other Candidates Viable Turnout'] = fc.apply(get_viable_turnout, axis=1)

df = pd.merge(df, fc[['Other Viable Candidates']], how='left', left_index=True, right_index=True)
df = pd.merge(df, fc[['Other Candidates Viable Turnout']], how='left', left_index=True, right_index=True)

# Calculate total ID turnout by adding ID turnout for other candidates to expected warren turnout
df = pd.merge(df, fc[['Partial ID Turnout']], how='left', left_index=True, right_index=True)
df['Total ID Turnout'] = df.apply(lambda row: row['Partial ID Turnout'] + row['Expected Warren Turnout'], axis=1)
df.fillna(0, inplace=True)

In [143]:
# Calculate expected number of warren delegates based on exprected warren turnout

def expected_dels (row):
    et = row['Expected Turnout']
    ew = row['Expected Warren Turnout']
    oc = row['Other Candidates Viable Turnout']
    id_turnout = ew + oc
    num_del = row['Delegates to County Conv']
    
    # Return 0 if not viable
    if (not row['Warren Viable']):
        return 0
    
    exp_del = num_del * (ew) / max(et, id_turnout)
    if (exp_del % 1) >= 0.5:
        return math.ceil(exp_del)
    else:
        return math.floor(exp_del)

df['Expected Warren Delegates'] = df.apply(expected_dels, axis=1)    

In [144]:
def distance_to_viability (row):
    et = row['Expected Turnout']
    ew = row['Expected Warren Turnout']
    oc = row['Other Candidates Viable Turnout']
    num_other = row['Other Viable Candidates']
    id_turnout = ew + oc
    num_del = row['Delegates to County Conv']
    
    n = row['Expected Warren Delegates'] + 0.5
    
    # Lee County
    if num_del == 0:
        return None
    # One delegate precinct
    if num_del == 1:
        return None
    # More than one delegate precicnt
    else:
        return math.ceil(row['Viability Threshold'] - ew)

df['Distance to Viability'] = df.apply(distance_to_viability, axis=1)      

In [145]:
def distance_to_next_delegate (row):
    et = row['Expected Turnout']
    ew = row['Expected Warren Turnout']
    oc = row['Other Candidates Viable Turnout']
    num_other = row['Other Viable Candidates']
    id_turnout = ew + oc
    num_del = row['Delegates to County Conv']
    
    n = row['Expected Warren Delegates'] + 0.5
    
    # Lee County
    if num_del == 0:
        return None
    
    # One delegate precinct
    elif num_del == 1:
        # Distance to 50% + 1 of expected turnout or id_turnout (whichever is higher)
        return math.ceil((max(et, id_turnout) * (0.5)) + 1 - ew)
    
    # More than one delegate precicnt
    else:
        
        # If we are not yet viable return the distance to viability
        if (not row['Warren Viable']):
            return math.ceil(row['Viability Threshold'] - ew)
        
        else:
        
            # When there are more viable candidates than there are delegates
            if (num_other + 1 > num_del):
                return -1

            # Calculate distance to next assuming expected turnout
            dist_to_15 = math.ceil(((n * et) - (num_del * ew)) / num_del)

            # If the total is still less than or equal to expected turnout
            if (id_turnout + dist_to_15 <= et):
                return dist_to_15

            # Otherwise calculate distance to next with id_turnout
            else:
                return math.ceil(((n * (ew + oc)) - (num_del * ew)) / (num_del - n))
            
df['Distance to Next Delegate'] = df.apply(distance_to_next_delegate, axis=1)      

In [146]:
# Drop columns only used for internal calculations
df.drop(['Other Candidates Viable Turnout', 'Partial ID Turnout'], inplace=True, axis=1)
# Add organizer turfs
df = pd.merge(df, turfs[['fo_name']], how='left', left_index=True, right_index=True)
# Sort columns
df.sort_values(['fo_name', 'Distance to Next Delegate', 'State Delegate Equivalence (SDE)'], inplace=True)

In [147]:
# Calculate county level sums
county_totals = df.groupby(['County']).sum()
# Drop unnessecary columns
to_drop = ['Congressional District', 'SDE per Person','Warren Viable', 'Other Viable Candidates']
county_totals.drop(to_drop, inplace=True, axis=1)
# Find county level counts
county_totals['Total Precincts']  = df.groupby('County').size()
county_totals['Viable Precincts']  = df.groupby('County')['Warren Viable'].apply(lambda x: x[x == True].count())
# Find county level means
county_means = df.groupby(['County']).mean()
# Rename means
county_means.rename(index=str, columns={"Viability Threshold": "Mean Viability Threshold", 
                                        "Distance to Next Delegate": "Mean Distance to Next Delegate",}, inplace=True)
# Merge means to totals
county_totals = pd.merge(county_totals, county_means[["Mean Viability Threshold", "Mean Distance to Next Delegate"]], how='left', left_index=True, right_index=True)

In [148]:
# Round data
df['State Delegate Equivalence (SDE)'] = df['State Delegate Equivalence (SDE)'].apply(lambda row: round(row, 3))

county_totals['State Delegate Equivalence (SDE)'] = county_totals['State Delegate Equivalence (SDE)'].apply(lambda row: round(row, 0))
county_totals['Mean Viability Threshold'] = county_totals['Mean Viability Threshold'].apply(lambda row: round(row, 3))

In [149]:
county_totals

Unnamed: 0_level_0,Delegates to County Conv,State Delegate Equivalence (SDE),Expected Turnout,Viability Threshold,Committed Warren,Lean Warren,Expected Warren Turnout,Total ID Turnout,Expected Warren Delegates,Distance to Viability,Distance to Next Delegate,Total Precincts,Viable Precincts,Mean Viability Threshold,Mean Distance to Next Delegate
County,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,51,4.0,473.0,73.0,0.0,4.0,0.0,0.0,0,73.0,73.0,5,0,14.600,14.600000
Adams,35,3.0,239.0,38.0,2.0,4.0,3.0,3.0,0,35.0,35.0,5,0,7.600,7.000000
Allamakee,90,7.0,835.0,131.0,6.0,18.0,14.0,16.0,0,117.0,117.0,11,0,11.909,10.636364
Appanoose,75,6.0,870.0,139.0,0.0,6.0,0.0,0.0,0,139.0,139.0,12,0,11.583,11.583333
Audubon,30,3.0,493.0,75.0,2.0,2.0,3.0,3.0,0,72.0,72.0,2,0,37.500,36.000000
Benton,200,15.0,1741.0,271.0,10.0,27.0,16.0,23.0,0,255.0,255.0,19,0,14.263,13.421053
Black Hawk,500,101.0,10116.0,1543.0,93.0,251.0,234.0,333.0,0,1309.0,1342.0,62,0,24.887,21.645161
Boone,100,19.0,2347.0,363.0,25.0,60.0,65.0,93.0,0,298.0,298.0,15,0,24.200,19.866667
Bremer,75,17.0,1722.0,261.0,8.0,6.0,5.0,8.0,0,256.0,271.0,13,0,20.077,20.846154
Buchanan,95,13.0,1492.0,232.0,2.0,16.0,5.0,9.0,0,227.0,227.0,15,0,15.467,15.133333


In [150]:
df

Unnamed: 0_level_0,Congressional District,County,Precinct Code,Sec. State Precinct Name,Delegates to County Conv,State Delegate Equivalence (SDE),Expected Turnout,SDE per Person,Viability Threshold,Committed Warren,Lean Warren,Expected Warren Turnout,Warren Viable,Other Viable Candidates,Total ID Turnout,Expected Warren Delegates,Distance to Viability,Distance to Next Delegate,fo_name
Precinct ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
947795,3,Pottawattamie,CB 13,CB 13,10,1.200,123.0,0.010,19.0,2.0,11.0,10.0,False,0.0,14.0,0,9.0,9.0,01A
947786,3,Pottawattamie,CB 04,CB 04,10,1.200,109.0,0.011,17.0,2.0,8.0,8.0,False,0.0,11.0,0,9.0,9.0,01A
1593735,3,Pottawattamie,CL 1,Carter Lake 1,7,0.840,84.0,0.010,13.0,1.0,3.0,3.0,False,0.0,6.0,0,10.0,10.0,01A
947797,3,Pottawattamie,CB 15,CB 15,11,1.320,101.0,0.013,16.0,1.0,6.0,6.0,False,0.0,11.0,0,10.0,10.0,01A
1593734,3,Pottawattamie,CB 21,CB 21,8,0.960,82.0,0.012,13.0,0.0,3.0,0.0,False,0.0,0.0,0,13.0,13.0,01A
1593736,3,Pottawattamie,CL 2,Carter Lake 2,7,0.840,92.0,0.009,14.0,0.0,2.0,0.0,False,0.0,0.0,0,14.0,14.0,01A
947783,3,Pottawattamie,CB 01,CB 01,9,1.080,88.0,0.012,14.0,0.0,2.0,0.0,False,0.0,0.0,0,14.0,14.0,01A
947796,3,Pottawattamie,CB 14,CB 14,10,1.200,93.0,0.013,14.0,0.0,0.0,0.0,False,0.0,0.0,0,14.0,14.0,01A
947799,3,Pottawattamie,CB 17,CB 17,15,1.800,100.0,0.018,15.0,0.0,5.0,0.0,False,0.0,0.0,0,15.0,15.0,01A
947785,3,Pottawattamie,CB 03,CB 03,10,1.200,103.0,0.012,16.0,0.0,1.0,0.0,False,0.0,0.0,0,16.0,16.0,01A
