<a href="https://colab.research.google.com/github/ellenwterry/PoliticalAnalysis/blob/main/Models_in_Campaign_Planning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Campaigning in an Period of Polarization and Confounder Issues**

Steve Schmidt's https://substack.com/@steveschmidt post titled  "The Fever Will Break" suggested that there is some potential for softening in the current state of polarization. It would be nice to get back to debating real policy again, but whether of not that happens, the campaigns that hurdle primaries, will need to navigate position in the polarized general - and while some districts get a high Cook score - the key to overall success lies in the swing districts.

In these scenarios, the persuasion universe gets squeezed and elections qill tighten up. So, the game becomes auppoering GOTV with the base *(especially on the Dem side, with structural disadvanges)* **AND** managing votes in the margins *(a area where most campaigns are just not very comfortable)*.

Additionally, there are signficant confounders at play. Voters turned out in force in the Ohio special election, making a significant statement about the reproductive rights issue *(note: L2-Data research found strong evidence of Republican crossover voters)*. Meanwhile, Republicans are cultivating *"Religious Freedom", Crime, "Parental Rights" and Economy* voters, as usual with broadcast network and religious denomication enablement.

So, campagins with limited resources are walking into a minefield with a lot of questions. I'm here to say that I don't have any answers, but I do have a few suggestions on how to frame the questions.

Here's the scenario: Let's suppose that we're working on a campaign in the general in a tight swign district. Let's also suppose that, in spite of how plesant it is to talk to our supportive base, we realize that we're going to have to win some swing votes to win the election. Let's also suppose that, like most campaigns, we have good data on our base, and not-so-good data on independents and opposition *(you know, the ones we need to win)*. So the first questions are: where are those voters and how do we talk to them - what do we talk about, and will they show up and vote for us if we spend the effort?

First, where do we get the data? The party is going to feed you great data with voter demos and history, but where do we get the voters on the margin? There are pay services *(e.g., L2, aTargetSmart)*, and the campaign will have to decide whether or not so spend money on that resource. The other option is to begin with SOS and Census data and do the legwork yourself (keeping resource requirements in mind).

This scenario assumes that you get data from the party, integrate that with SOS and Census data and other sources, and survey that base to get the issues and direction you need. Then you get your ass out there and do the work.  











In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()

import patsy
from sklearn.linear_model import LogisticRegression

!pip install pystan
!pip install corner
import stan

import plotly.express as px
import plotly.graph_objects as go

!pip install geopy
from geopy.geocoders import Nominatim
import matplotlib.pyplot as plt
!pip install pygris
# import matplotlib.pyplot as plt
from pygris import core_based_statistical_areas
from pygris import tracts

from google.colab import files


import geopandas as gpd
import folium
# from google.colab import files


In [2]:
url = 'https://raw.githubusercontent.com/ellenwterry/PoliticalAnalysis/main/BaseVote.csv'
VoteBase = pd.read_csv(url)

In [3]:
# Clean up the data a bit

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

le.fit(VoteBase['Sex'])

codes = {'NR':0, 'M': 1, 'F': 2}
VoteBase['Sex'] = VoteBase['Sex'].map(codes)

VoteBase['Age']=VoteBase.Age.astype('int32')

#VoteBase['LastPrimary'] = le.transform(VoteBase['LastPrimary'])
codes = {'NR':0, 'R': 1, 'D':2}
VoteBase['LastPrimary'] = VoteBase['LastPrimary'].map(codes)

#VoteBase['Education'] = le.transform(VoteBase['Education'])
codes = {'NR':0, 'HS': 1, 'Some College':2, 'Bachelor':3, 'Masters':4, 'Doctorate':5}
VoteBase['Education'] = VoteBase['Education'].map(codes)

#VoteBase['HHIncome'] = le.transform(VoteBase['HHIncome'])
codes = {'NR':0, 'Under 50k': 1, '50k-100k':2, '100k-200k':3, '200k-300k':4, '300k-500k':5, 'Over 500k':6}
VoteBase['HHIncome'] = VoteBase['HHIncome'].map(codes)

#VoteBase['ReligiousAffil'] = le.transform(VoteBase['ReligiousAffil'])
codes = {'NR':0,'Protestant': 1, 'Catholic':2, 'Other':3, 'None':4}
VoteBase['ReligiousAffil'] = VoteBase['ReligiousAffil'].map(codes)

#VoteBase['Support24'] = le.transform(VoteBase['Support24'])
codes = {'R':0, 'D': 1}
VoteBase['Support24'] = VoteBase['Support24'].map(codes)
# NOTE: NAs were excluded from sample so that algorithms could score using logistic scale - 2nd pass will use imputed values

#VoteBase['TopIssue'] = le.transform(VoteBase['TopIssue'])
codes = {'NR':0, 'RFree':1, 'Crime':2, 'Parents':3, 'Economy':4, 'Womens':5, 'Education':6, 'Democracy':7}
VoteBase['TopIssue'] = VoteBase['TopIssue'].map(codes)

# This is for the second data source (later)
codes = {'NR':0, 'Signed':1}
VoteBase['RRPetition'] = VoteBase['RRPetition'].map(codes)

In [5]:
yDf = VoteBase['Support24']
Xmatrix = patsy.dmatrix('Age + Sex + Education + HHIncome+ ReligiousAffil + LastPrimary + TopIssue', VoteBase)
rows = Xmatrix.shape[0]
columns = Xmatrix.shape[1]
columns, rows

(8, 2498)

In [6]:
model = LogisticRegression(max_iter = 100000)
model.fit(Xmatrix, yDf)
VoteBase['SciPyModelPred'] = model.predict(Xmatrix)
VoteBase['SciPyProb'] = model.predict_proba(Xmatrix)[:,1]
# Just for Comparison

In [7]:
stanMod = """
data {
  int N_train;
  int K;
  int y_train[N_train];
  matrix[N_train, K] x_train;
  real p_b[K];
  real p_sb[K];
}
parameters {
  vector[K] beta;
}
transformed parameters {
  vector[N_train] y_hat;
  for(n in 1:N_train)
    y_hat[n] = x_train[n]*beta;
}
model {
  target += normal_lpdf(beta | p_b, p_sb);
  target += bernoulli_lpmf(y_train | inv_logit(y_hat));
}
"""

In [8]:
# Create UnInformed Priors

priors = np.repeat(0, 8)
priorsSigma = np.repeat(2, 8)

mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": priors,
          "p_sb": priorsSigma,
         }

In [None]:
# Compile the Model
postr = stan.build(stanMod, data = mData, random_seed = 1)

In [None]:
# Muliply Prior and Data - Sample Prosterior
fit = postr.sample(num_chains=4, num_samples=1000)

In [11]:
# Get Parameters
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
priors = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
priorsSigma = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
stanProb = stable_sigmoid((np.dot(priors,Xmatrix.transpose())).transpose())


def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['StanProb'] = stanProb
VoteBase['StanVote'] = VoteBase['StanProb'].apply(updateInferredVote)


In [12]:
priors

array([-4.17153888, -0.00773496,  0.11916086,  0.08832741,  0.78382778,
        0.5216264 ,  0.28251111,  0.20040936])

In [14]:
# Visualize Impact of Religious Affliation
fig = px.box(VoteBase, x='ReligiousAffil', y='StanProb', points="all")
fig.update_layout(
    width=1000,
    height=600,
    plot_bgcolor = "white",
    xaxis = dict(
        tickmode = 'array',
        tickvals = [0, 1, 2, 3, 4],
        ticktext = ['NR', 'Protestant', 'Catholic', 'Other', 'None']
    )
)
fig.show()

In [15]:
# and other issues
# ------------ upDATE FOR ADDED DIMENSIONS!

fig = px.box(VoteBase, x='TopIssue', y='StanProb', points="all")
fig.update_layout(
    width=1000,
    height=600,
    plot_bgcolor = "white",
    xaxis = dict(
        tickmode = 'array',
        tickvals = [0, 1, 2, 3, 4, 5, 6, 7],
        ticktext = ['NR', 'RFree', 'Crime', 'Parents', 'Economy', 'Womens', 'Education', 'Democracy']
    )

)
fig.show()



Let's change the prior on ReligiousAffil and see what happens

In [16]:
Ipriors = priors.copy()
Ipriors[2] = 0.4

IpriorsSigma = np.repeat(.02, 8)

discuss exogenious factors and awareness of correlated indepedent variables

In [17]:
#priors, priorsSigma
Ipriors, IpriorsSigma


(array([-4.17153888, -0.00773496,  0.4       ,  0.08832741,  0.78382778,
         0.5216264 ,  0.28251111,  0.20040936]),
 array([0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02]))

In [18]:
mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": Ipriors,
          "p_sb": IpriorsSigma,
         }


In [None]:
# Compile the Model
postr = stan.build(stanMod, data = mData, random_seed = 1)

In [None]:
# Muliply Prior and Data - Sample Prosterior
fit = postr.sample(num_chains=4, num_samples=1000)

In [21]:
# Get Parameters
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
Ipriors = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
IpriorsSigma = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
IstanProb = stable_sigmoid((np.dot(Ipriors,Xmatrix.transpose())).transpose())


def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['IStanProb'] = IstanProb
VoteBase['IStanVote'] = VoteBase['IStanProb'].apply(updateInferredVote)
Ipriors

array([-4.1754757 , -0.01344554,  0.38783849,  0.08312154,  0.77561788,
        0.51691695,  0.28239639,  0.20167023])

In [22]:
# Predictions:
VoteBase['StanVote'].sum(), VoteBase['IStanVote'].sum()

(1287, 1300)

Now let's add some new data

In [23]:
Xmatrix = patsy.dmatrix('Age + Sex + Education + HHIncome+ ReligiousAffil + LastPrimary + TopIssue + RRPetition', VoteBase)

rows = Xmatrix.shape[0]
columns = Xmatrix.shape[1]
columns, rows

(9, 2498)

In [24]:
Ipriors

array([-4.1754757 , -0.01344554,  0.38783849,  0.08312154,  0.77561788,
        0.51691695,  0.28239639,  0.20167023])

In [25]:
#priors + informed priors on selected parmaters
priors2 = np.append(Ipriors[0:8], .4)
priorsSigma2 = np.append(IpriorsSigma[0:8], .02)

In [26]:
Ipriors, priors2

(array([-4.1754757 , -0.01344554,  0.38783849,  0.08312154,  0.77561788,
         0.51691695,  0.28239639,  0.20167023]),
 array([-4.1754757 , -0.01344554,  0.38783849,  0.08312154,  0.77561788,
         0.51691695,  0.28239639,  0.20167023,  0.4       ]))

In [27]:
mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": priors2,
          "p_sb": priorsSigma2,
         }

In [None]:
postr = stan.build(stanMod, data = mData, random_seed = 1)

In [None]:
fit = postr.sample(num_chains=4, num_samples=1000)

In [30]:
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
beta9 = Params['mean']['beta.9']
priors2 = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8, beta9])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
beta9Std = Params['std']['beta.9']
priorsSigma2 = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std, beta9Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
stanProb2 = stable_sigmoid((np.dot(priors2,Xmatrix.transpose())).transpose())

def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['StanProb2'] = stanProb2
VoteBase['StanVote2'] = VoteBase['StanProb2'].apply(updateInferredVote)

In [31]:
priors, Ipriors, priors2

(array([-4.17153888, -0.00773496,  0.11916086,  0.08832741,  0.78382778,
         0.5216264 ,  0.28251111,  0.20040936]),
 array([-4.1754757 , -0.01344554,  0.38783849,  0.08312154,  0.77561788,
         0.51691695,  0.28239639,  0.20167023]),
 array([-4.17952377, -0.01324794,  0.37554212,  0.07952228,  0.77222776,
         0.51387094,  0.28271033,  0.20110105,  0.40622553]))

In [32]:
priors = np.append(priors, np.nan)
Ipriors = np.append(Ipriors, np.nan)

priorsSigma = np.append(priorsSigma, np.nan)
IpriorsSigma = np.append(IpriorsSigma, np.nan)

priorsDF = pd.DataFrame({"priors": priors, "priorsSigma": priorsSigma, "Ipriors": Ipriors, "IpriorsSigma": IpriorsSigma, "priors2": priors2, "priorsSigma2": priorsSigma2})

priorsDF


Unnamed: 0,priors,priorsSigma,Ipriors,IpriorsSigma,priors2,priorsSigma2
0,-4.171539,0.317584,-4.175476,0.02025,-4.179524,0.019949
1,-0.007735,0.003466,-0.013446,0.001616,-0.013248,0.001153
2,0.119161,0.12841,0.387838,0.019827,0.375542,0.0191
3,0.088327,0.057676,0.083122,0.018352,0.079522,0.016779
4,0.783828,0.048032,0.775618,0.016866,0.772228,0.014355
5,0.521626,0.064354,0.516917,0.018968,0.513871,0.017828
6,0.282511,0.106926,0.282396,0.019421,0.28271,0.019212
7,0.200409,0.035263,0.20167,0.01639,0.201101,0.014491
8,,,,,0.406226,0.020187


In [33]:
VoteBase['StanVote'].sum(), VoteBase['IStanVote'].sum(), VoteBase['StanVote2'].sum()

(1287, 1300, 1309)

In [34]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=VoteBase['StanProb'], marker_color='#DEDCDC', opacity=0.05, name = "Uninformed Priors"))
fig.add_trace(go.Histogram(x=VoteBase['IStanProb'], marker_color='#94CDD7', opacity=0.1, name = "Balance Priors"))
fig.add_trace(go.Histogram(x=VoteBase['StanProb2'], marker_color='#378796', opacity=1, name = "Petition Data"))
fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.9, nbinsx=100)
fig.update_layout(
    autosize=False,
    width=800,
    height=400,
    plot_bgcolor = "white",
)
fig.show()

In [35]:
bins = [0, .4, .5, .6, .7, .8, .9, 1]
labels = ['None', 'Grp 1', 'Grp 2', 'Grp 3', 'Grp 4', 'Grp 5', 'Grp 6']
VoteBase['Group'] = pd.cut(x = VoteBase['StanProb2'], bins = bins, labels = labels, include_lowest = False)

In [36]:
VoteBase.to_csv('VoteBaseCompare.csv')
files.download("VoteBaseCompare.csv")
priorsDF.to_csv("PriorsDF.csv")
files.download("PriorsDF.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Now time for some action - lets go talk to the right people

In [37]:
Group1 = VoteBase.loc[(VoteBase['Group'] == 'Grp 1')]
Group2 = VoteBase.loc[(VoteBase['Group'] == 'Grp 2')]
Group3 = VoteBase.loc[(VoteBase['Group'] == 'Grp 3')]

Group1 = Group1.reset_index()
Group2 = Group2.reset_index()
Group3 = Group3.reset_index()

In [46]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=Group1['StanProb2'], marker_color='#F47811', opacity=0.05, name = "Group 1"))
fig.add_trace(go.Histogram(x=Group2['StanProb2'], marker_color='#19F411', opacity=0.1, name = "Group 2"))
fig.add_trace(go.Histogram(x=Group3['StanProb2'], marker_color='#1177F4', opacity=1, name = "Group 3"))
fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.9, nbinsx=10)
fig.update_layout(
    autosize=False,
    width=600,
    height=800,
    plot_bgcolor = "white",
)
fig.show()

In [58]:
m = folium.Map(location=[41.0, -73.5], zoom_start=12)
for i in range(Group1[Group1.columns[0]].count()):
  folium.CircleMarker(
    [Group1['Latitude'][i], Group1['Longitude'][i]], radius = 1, weight = 2, color = "orange").add_to(m)
for i in range(Group2[Group2.columns[0]].count()):
  folium.CircleMarker(
    [Group2['Latitude'][i], Group2['Longitude'][i]], radius = 1, weight = 2, color = "green").add_to(m)
for i in range(Group3[Group3.columns[0]].count()):
  folium.CircleMarker(
    [Group3['Latitude'][i], Group3['Longitude'][i]], radius = 1, weight = 2, color = "blue").add_to(m)
m