<a href="https://colab.research.google.com/github/ellenwterry/PoliticalAnalysis/blob/main/Campaign_Planning_Models_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Campaign Planning in a Polarized and Confounded Environment**

Is it fair to say that 2024 promises to be a challenging campaign environment. One of the two parties is fully committed to creating chaos, like they don't believe the other party will have answers. I know I don't have answers, but I've started asking questions, and I know how to find out if they're the right questions: Begin with reasonable beliefs, and constantly test against evidence *(classic Bayesian philosophy)*.


Of course, many campaigns will not have major challenges - partisanship is at an all time high, and gerrymandered districts ensure a smooth ride for some - but this makes the swing districts and states all the more critical. And there are some cracks appearing in the partisan walls. Republican's dependence on an aggressive evangelist base has amplified a range of issues that are not playing well in many districts - even traditionally strong ones: e.g, voters turned out at an unusually high rate in the Ohio special election in August, making a significant statement women's healthcare concerns *(note: L2-Data research made note of Republican crossover voters - https://www.pbs.org/newshour/politics/this-analysis-shows-which-voters-rejected-ohios-issue-1-measure)*. Additionally, evangelists are pushing an anti-lgbt issues that conflict with voters more comfortable with inclusive education and work environments. And the authoritarianism conflicts with voters who value participative democracy *(including many former Republicans)*. So, traditional Dems, "issue voters", and "orphaned Republicans", along with the rest of the growing secular world are colliding with Republicans somewhere in the highly-charged middle - as Steve Schmidt said "Politics isn’t logical. It’s emotional. Always remember that." https://steveschmidt.substack.com/p/the-fever-will-break *(BTW, "the fever will break" is an appealing story for those who yearn for normality)*.


Within this polarized, emotional, confounded environment, campaigns in competitive districts will need to navigate position within the voter spectrum - where the persuadable universe will be squeezed, and vote goals will be tight *(when you see Gavin Newsom and Pete Buttigieg on Fox News, you know that they believe that theses elections will be won on the margins)*. So, the game becomes supporting GOTV with the party base **AND** managing votes in the margins.*(a area where campaign field volunteers are not super comfortable - something to be aware of)*. Where to start?


The following introduces a few planning ideas, framed as a hypothetical modeling scenario to introduce some ideas on how to quantify strategy *(we'll do a **walkthrough of 3 models**)*, which adds structure and focuses on action.


NOTE: For non-technical readers, my hope is that you can read the text, review the visuals, and get the gist - please let me know if that doesn't work:

First, we'll load the program libraries *(nothing to see here unless you're running code - and btw, I'm not reallly a python programmer, although I wrote this is python)*

In [None]:
# ---------- Load Libraries ---------- #

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()

import patsy
from sklearn.linear_model import LogisticRegression

!pip install pystan
!pip install corner
import stan

import plotly.express as px
import plotly.graph_objects as go

!pip install geopy
from geopy.geocoders import Nominatim
import matplotlib.pyplot as plt
!pip install pygris
# import matplotlib.pyplot as plt
from pygris import core_based_statistical_areas
from pygris import tracts

from google.colab import files


import geopandas as gpd
import folium
# from google.colab import files


A synthetic dataset is stored on my github site. It's a very small sample, reflective of a fact table in a warehouse that consolidattes data from the party, census, secretary of state, canvassing feedback, and campaign surveys.

In [None]:
# ---------- Get Data from Github ---------- #

url = 'https://raw.githubusercontent.com/ellenwterry/PoliticalAnalysis/main/BaseVote.csv'
VoteBase = pd.read_csv(url)

We begin by prepping the data for analysis. The dependent variable is "Support24" *(source: survey / field data)*, and The independent variables are:   

> LastName, FirstName, Address, Sex, Age, Latitude / Longitude *(geocoded)*, Education *(can be party data, surveyed or inferred from census/PUMS)*, and HHIncome, ReligiousAffil , TopIssue, *(all from field/phone canvassing and/or surveyed)*, and RRPetition *(a planned survey to be added later in the scenario)*       

In [None]:
# ---------- Clean up data ---------- #

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

le.fit(VoteBase['Sex'])

codes = {'NR':0, 'M': 1, 'F': 2}
VoteBase['Sex'] = VoteBase['Sex'].map(codes)

VoteBase['Age']=VoteBase.Age.astype('int32')

#VoteBase['LastPrimary'] = le.transform(VoteBase['LastPrimary'])
codes = {'NR':0, 'R': 1, 'D':2}
VoteBase['LastPrimary'] = VoteBase['LastPrimary'].map(codes)

#VoteBase['Education'] = le.transform(VoteBase['Education'])
codes = {'NR':0, 'HS': 1, 'Some College':2, 'Bachelor':3, 'Masters':4, 'Doctorate':5}
VoteBase['Education'] = VoteBase['Education'].map(codes)

#VoteBase['HHIncome'] = le.transform(VoteBase['HHIncome'])
codes = {'NR':0, 'Under 50k': 1, '50k-100k':2, '100k-200k':3, '200k-300k':4, '300k-500k':5, 'Over 500k':6}
VoteBase['HHIncome'] = VoteBase['HHIncome'].map(codes)

#VoteBase['ReligiousAffil'] = le.transform(VoteBase['ReligiousAffil'])
codes = {'NR':0,'Protestant': 1, 'Catholic':2, 'Other':3, 'None':4}
VoteBase['ReligiousAffil'] = VoteBase['ReligiousAffil'].map(codes)

#VoteBase['Support24'] = le.transform(VoteBase['Support24'])
codes = {'R':0, 'D': 1}
VoteBase['Support24'] = VoteBase['Support24'].map(codes)
# NOTE: NAs were excluded from sample so that algorithms could score using logistic scale - 2nd pass will use imputed values

#VoteBase['TopIssue'] = le.transform(VoteBase['TopIssue'])
codes = {'NR':0, 'RFree':1, 'Crime':2, 'Parents':3, 'Economy':4, 'Womens':5, 'Education':6, 'Democracy':7}
VoteBase['TopIssue'] = VoteBase['TopIssue'].map(codes)

# This is for the second data source (later)
codes = {'NR':0, 'Signed':1}
VoteBase['RRPetition'] = VoteBase['RRPetition'].map(codes)

We load the data into the matrix / array formats for that models like *(nothing to see here)*

In [None]:
# ---------- Create Matrices for Modeling ---------- #

yDf = VoteBase['Support24']
Xmatrix = patsy.dmatrix('Age + Sex + Education + HHIncome+ ReligiousAffil + LastPrimary + TopIssue', VoteBase)
rows = Xmatrix.shape[0]
columns = Xmatrix.shape[1]
#columns, rows


# Model 1 *(Uninformed Priors)*
Now we're ready to build the first model. This is a Bayesian logistic regression model *(common in political, sports, economics/business, insurance, marketing and banking - at least in my expeience)*.

We're not really trying to predict votes or vote goals here - we're using the '24 support' dependent variable to tune the independent variable associations, so that we can understand how their effect, or causes, on outcome and guide us in tuning our approach to voters along these dimensions.  

This model code should be understandable to non-technical readers:

The first section of the model passes in the data, and the next section transforms parameters into a logistic equation and prepares it for iteration through observations *(it's really doing a lot of work for the model section)*

The model section sets up the distributions *(bernoulli here to reflect the logistic equation)* to produce a target posterior distribution. At a high level, the Bayesian formula is:

> ***prior x data = posterior***

We multiply a prior distribution *(it's theoritical, using only distribution parameters)*, by the data *(with an undefined distribution)* to get a poterior distribution. We won't know what the posterior distribuion looks like until we sample it to define the parameters *(that's what the Stan program is doing - sampling is like a random walk)*

Why do we spend so much time on distributions? Because, if we don't have a distrubiton, we don't have probability or credibiilty. If we don't have probability credibility, we're making decisions on intuition *(not necessarily a bad thing - but I'd rather rely on a quantitative analysis if I have it)*

In [None]:
# ---------- Load Model ---------- #

stanMod = """
data {
  int N_train;
  int K;
  int y_train[N_train];
  matrix[N_train, K] x_train;
  real p_b[K];
  real p_sb[K];
}
parameters {
  vector[K] beta;
}
transformed parameters {
  vector[N_train] y_hat;
  for(n in 1:N_train)
    y_hat[n] = x_train[n]*beta;
}
model {
  target += normal_lpdf(beta | p_b, p_sb);
  target += bernoulli_lpmf(y_train | inv_logit(y_hat));
}
"""

This is where we feed the data into the model.

Notice that I'm not defining the priors here - we call that "uninformed priors", and it's a common starting point if you have no experience with the data. I'm giving it an unbiased starting point: in the code *priors = np.repeat(0, 8)* I set all the pior parameters to 0, and in the code: *priorsSigma = np.repeat(2, 8)*, I set the standard deviation of all the the paramters to 2, which is pretty wide open for a parameter *(parameters have distributions too - distributions of distributions)* . So, We'll wait and see what the data tells us)*

In [None]:
# ---------- Create Uninformed Priors ---------- #

priors = np.repeat(0, 8)
priorsSigma = np.repeat(2, 8)

mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": priors,
          "p_sb": priorsSigma,
         }

Compile the model *(the stan program is running in C under the hood)*

In [None]:
# Compile the Model
postr = stan.build(stanMod, data = mData, random_seed = 1)

...and run the sampler *(again, we don't know what the posterior distribution is looks like yet - this is where we find out)*.


In [None]:
# Muliply Prior and Data - Sample Prosterior
fit = postr.sample(num_chains=4, num_samples=1000)

OK, now we can pull the posterior parameters out. That's what I'll be working with from now on to "tweak" the modelding process.  Also, notice the  statements:


def stable_sigmoid(x):
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))


stanProb = stable_sigmoid((np.dot(priors,Xmatrix.transpose())).transpose())

This is what creates our probabliies - We use the paramters from the posterior  and run the data through a logistic equation with those parmaters. These statements are just a python version of the logistic equation:

𝑃(𝑌)=𝑒𝑥𝑝(𝛽0+𝛽𝑛𝑋𝑛)/𝑒𝑥𝑝(1+𝑒𝑥𝑝(𝛽0+𝛽𝑋𝑛)

There are so many advantages to generating predictions and probabilities outside of analysis models *(e.g., in high transaction, dynamic environments, analysts will often run models in parallel with production models and pass parameters in real time to keep models relevant)*. The re's also security and transction management issues - which is why many of my clients wouldn't allow R or Python in production. Anyway, that's whole book.

Now that we have the probabilies of voting for the D candidate, we store those in a new "StanProb" column. We also store a theorectial vote *(discrete variable)* based on a 50% threshold - against my better judgment: That's one of my pet peaves: Is there really a difference between a voter with a 49% overall likelihood of voting D vs. one with a 51%? It depends on what's driving that probability - maybe that 49% voter is a former R voter that has been pushed up by issue effects. So if we can only talk to one, which one? Probabilities tells us more than discrete assignments and parameter values and variance tells us even more. Not to say that discrete assignments can't be useful - just know why you're doing it.

In [None]:
# Get Parameters
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
priors = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
priorsSigma = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
stanProb = stable_sigmoid((np.dot(priors,Xmatrix.transpose())).transpose())


def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['StanProb'] = stanProb
VoteBase['StanVote'] = VoteBase['StanProb'].apply(updateInferredVote)


...and here's the parameters of Model 1's posterior distribution. First, keep in mind the order in the matrix: the Intercept *(garbage collection - i.e., what's left over)*, and 'Age + Sex + Education + HHIncome+ ReligiousAffil + LastPrimary + TopIssue'. So, based on the raw data, the model thinks that Eduction, HouseHold Income and Religious Affiliation are the most important independents. That's not a bad suggestion, but there are a couple of problems with that:

>1. Education and HHIncome are really too correlated *("no-no" in regression)*and should be a composite variable. But again, I'm not so much predictiing outcomes as trying to understand associations, so I'm leaving them both in to watch.

>2. We're interested in exploring issue effects here - starting with women's healthcare *(including reporductive freedom)*. The only variables addressing issues now, is TopIssue which is not great because the "Womens" selection is made by the volunteer, and there's only one top issue - so it's not an ideal driver. We also have "Sex" and we can suppose that women are more likely to support womens healthcare *(recognizing that evangelists will be opposed)*. We could estimate based on gender and religious affiliation, but their position is also influenced by Education, Income, LastPrimary, etc. So it's better to estimatae within the model using the 'Sex' parameter, and wait for the survey on Reproductive Rights to be completed and see how that works out.

So, we take the models parameters, and call them "priors" becuase we're just going to turn around and use them again *(with some adjustment)* for the next model:

In [None]:
priors

array([-4.17153888, -0.00773496,  0.11916086,  0.08832741,  0.78382778,
        0.5216264 ,  0.28251111,  0.20040936])

Before we get too far, and now that we're starting to dig into 'ReligiousAffil' and 'TopIssues', let's look at how those played out with probablity to vote D. First, ReligiousAffil:

In [None]:
# Visualize Impact of Religious Affliation
fig = px.box(VoteBase, x='ReligiousAffil', y='StanProb', points="all")
fig.update_layout(
    width=1000,
    height=600,
    plot_bgcolor = "white",
    xaxis = dict(
        tickmode = 'array',
        tickvals = [0, 1, 2, 3, 4],
        ticktext = ['NR', 'Protestant', 'Catholic', 'Other', 'None']
    )
)
fig.show()

And TopIssue:

In [None]:
# and other issues
# ------------ upDATE FOR ADDED DIMENSIONS!

fig = px.box(VoteBase, x='TopIssue', y='StanProb', points="all")
fig.update_layout(
    width=1000,
    height=600,
    plot_bgcolor = "white",
    xaxis = dict(
        tickmode = 'array',
        tickvals = [0, 1, 2, 3, 4, 5, 6, 7],
        ticktext = ['NR', 'RFree', 'Crime', 'Parents', 'Economy', 'Womens', 'Education', 'Democracy']
    )

)
fig.show()



# Model 2 *(Adjusting Priors)*

Now that we have a good starting point for quantifying parameters, let's use our admittedly weak approach to estimating the that Women's healthcare effect.

The campaign has prepared a survey to ask voters if they would sign a petition to tell leaders that they support reproductive healthcare choices. We'll add that data when it's done, but for now, we have block turf and phonebanks to assign. And gender is a bit of an exogenous factor now.

So let's just emphasize gender - which is called 'Sex' in the data *(I always liked that variable descriptor - I'm waiting for choices like "Often" and "Seldom")*

For now, we'll estimate the reporductive healthcare effect by increasing the parameter value from 0.11 to 0.4 - now the sampler will start at 0.4 and will wander back to 0.11 if we don't restrict it, so we squeeze the sigma *(teh standard deviation  ended up at 0.12 after sampling - we'll restrict it to .02)* That should start increasing probabilities in the voters we want to talk with - start pushing them into groups designated for contact *(remember, we're not trying to predict votes - which is a pointless exercise here, we're trying to decide where to allocate resources to our best chance of getting a vote)*

*Note: in decision-making, you're often working backwards from the predictions to assess the effects of variables, which often leads to assesment of causaility.*






In [None]:
Ipriors = priors.copy()
Ipriors[2] = 0.4

IpriorsSigma = np.repeat(.02, 8)

So, now we reset the priors...

In [None]:
mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": Ipriors,
          "p_sb": IpriorsSigma,
         }


...and run the model:

In [None]:
# Compile the Model
postr = stan.build(stanMod, data = mData, random_seed = 1)

In [None]:
# Muliply Prior and Data - Sample Prosterior
fit = postr.sample(num_chains=4, num_samples=1000)

get the parameters and compute probabilities again:

In [None]:
# Get Parameters
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
Ipriors = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
IpriorsSigma = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
IstanProb = stable_sigmoid((np.dot(Ipriors,Xmatrix.transpose())).transpose())


def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['IStanProb'] = IstanProb
VoteBase['IStanVote'] = VoteBase['IStanProb'].apply(updateInferredVote)


OK, now the bank finished the survey, so we'll add that data to model:

In [None]:
Xmatrix = patsy.dmatrix('Age + Sex + Education + HHIncome+ ReligiousAffil + LastPrimary + TopIssue + RRPetition', VoteBase)

rows = Xmatrix.shape[0]
columns = Xmatrix.shape[1]
columns, rows

(9, 2498)

# Model 3
we're going to push the effect of the petition data too - we've just seen too much anecdoctal evidence. Remember, a Bayesian model will give us quantiative feeback on our credibility.  Update the priors again, and run the model:

In [None]:
priors2 = np.append(Ipriors[0:8], .4)
priorsSigma2 = np.append(IpriorsSigma[0:8], .02)

In [None]:
mData = {
         "N_train": rows,
         "K": np.shape(Xmatrix)[1],
         "y_train": np.array(yDf),
         "x_train": Xmatrix,
          "p_b": priors2,
          "p_sb": priorsSigma2,
         }

In [None]:
postr = stan.build(stanMod, data = mData, random_seed = 1)

In [None]:
fit = postr.sample(num_chains=4, num_samples=1000)

pull the parameters and compute probabilities again:

In [None]:
df = fit.to_frame()
Params = df.describe().T
beta1 = Params['mean']['beta.1']
beta2 = Params['mean']['beta.2']
beta3 = Params['mean']['beta.3']
beta4 = Params['mean']['beta.4']
beta5 = Params['mean']['beta.5']
beta6 = Params['mean']['beta.6']
beta7 = Params['mean']['beta.7']
beta8 = Params['mean']['beta.8']
beta9 = Params['mean']['beta.9']
priors2 = np.array([beta1, beta2, beta3, beta4, beta5, beta6, beta7, beta8, beta9])

beta1Std = Params['std']['beta.1']
beta2Std = Params['std']['beta.2']
beta3Std = Params['std']['beta.3']
beta4Std = Params['std']['beta.4']
beta5Std = Params['std']['beta.5']
beta6Std = Params['std']['beta.6']
beta7Std = Params['std']['beta.7']
beta8Std = Params['std']['beta.8']
beta9Std = Params['std']['beta.9']
priorsSigma2 = np.array([beta1Std, beta2Std, beta3Std, beta4Std, beta5Std, beta6Std, beta7Std, beta8Std, beta9Std])

def stable_sigmoid(x):
  # Using np.where to avoid numerical overflow or underflow.
  return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

# Create Inferred Votes
stanProb2 = stable_sigmoid((np.dot(priors2,Xmatrix.transpose())).transpose())

def updateInferredVote(x):
  if x >= .5:
    return 1
  else:
    return 0

VoteBase['StanProb2'] = stanProb2
VoteBase['StanVote2'] = VoteBase['StanProb2'].apply(updateInferredVote)

and here's the migration of the parameters from the first to the third model.

In [None]:
priors = np.append(priors, np.nan)
Ipriors = np.append(Ipriors, np.nan)

priorsSigma = np.append(priorsSigma, np.nan)
IpriorsSigma = np.append(IpriorsSigma, np.nan)

priorsDF = pd.DataFrame({"priors": priors, "priorsSigma": priorsSigma, "Ipriors": Ipriors, "IpriorsSigma": IpriorsSigma, "priors2": priors2, "priorsSigma2": priorsSigma2})

priorsDF


Unnamed: 0,priors,priorsSigma,Ipriors,IpriorsSigma,priors2,priorsSigma2
0,-4.171539,0.317584,-4.175476,0.02025,-4.179524,0.019949
1,-0.007735,0.003466,-0.013446,0.001616,-0.013248,0.001153
2,0.119161,0.12841,0.387838,0.019827,0.375542,0.0191
3,0.088327,0.057676,0.083122,0.018352,0.079522,0.016779
4,0.783828,0.048032,0.775618,0.016866,0.772228,0.014355
5,0.521626,0.064354,0.516917,0.018968,0.513871,0.017828
6,0.282511,0.106926,0.282396,0.019421,0.28271,0.019212
7,0.200409,0.035263,0.20167,0.01639,0.201101,0.014491
8,,,,,0.406226,0.020187


Another way to look at this, is to view the shift in probabilities from model 1 to model 3. These are voters you might not talk to intiatlly, but emerge as promising after the womens health effect.   

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=VoteBase['StanProb'], marker_color='#DEDCDC', opacity=0.05, name = "Uninformed Priors"))
fig.add_trace(go.Histogram(x=VoteBase['IStanProb'], marker_color='#94CDD7', opacity=0.1, name = "Balance Priors"))
fig.add_trace(go.Histogram(x=VoteBase['StanProb2'], marker_color='#378796', opacity=1, name = "Petition Data"))
fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.9, nbinsx=100)
fig.update_layout(
    autosize=False,
    width=800,
    height=400,
    plot_bgcolor = "white",
)
fig.show()

Now, I'm binning these into groups that we can assign to blockwalking turfs.

In [None]:
bins = [0, .4, .5, .6, .7, .8, .9, 1]
labels = ['None', 'Grp 1', 'Grp 2', 'Grp 3', 'Grp 4', 'Grp 5', 'Grp 6']
VoteBase['InitialGroup'] = pd.cut(x = VoteBase['StanProb'], bins = bins, labels = labels, include_lowest = False)
VoteBase['Group'] = pd.cut(x = VoteBase['StanProb2'], bins = bins, labels = labels, include_lowest = False)

In [None]:
VoteBase.to_csv('VoteBaseCompare.csv')
files.download("VoteBaseCompare.csv")
priorsDF.to_csv("PriorsDF.csv")
files.download("PriorsDF.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Now I can drill down on the groups on the same histrogram. Group 1 starts at 40% and these are good prospective votes *(wouldn't show up if we were using a catagorical model)*

In [None]:
Group1 = VoteBase.loc[(VoteBase['Group'] == 'Grp 1')]
Group2 = VoteBase.loc[(VoteBase['Group'] == 'Grp 2')]
Group3 = VoteBase.loc[(VoteBase['Group'] == 'Grp 3')]
Group4 = VoteBase.loc[(VoteBase['Group'] == 'Grp 4')]
Group5 = VoteBase.loc[(VoteBase['Group'] == 'Grp 5')]
Group6 = VoteBase.loc[(VoteBase['Group'] == 'Grp 6')]

Group1 = Group1.reset_index()
Group2 = Group2.reset_index()
Group3 = Group3.reset_index()
Group4 = Group4.reset_index()
Group5 = Group5.reset_index()
Group6 = Group6.reset_index()

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=Group1['StanProb2'], marker_color='#F47811', opacity=0.05, name = "Group 1"))
fig.add_trace(go.Histogram(x=Group2['StanProb2'], marker_color='#33F411', opacity=0.1, name = "Group 2"))
fig.add_trace(go.Histogram(x=Group3['StanProb2'], marker_color='#1159F4', opacity=1, name = "Group 3"))
fig.add_trace(go.Histogram(x=Group4['StanProb2'], marker_color='#A811F4', opacity=1, name = "Group 4"))
fig.add_trace(go.Histogram(x=Group5['StanProb2'], marker_color='#F411BE', opacity=1, name = "Group 5"))
fig.add_trace(go.Histogram(x=Group6['StanProb2'], marker_color='#F4114A', opacity=1, name = "Group 6"))
fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.9, nbinsx=10)
fig.update_layout(
    autosize=False,
    width=600,
    height=400,
    plot_bgcolor = "white",
)
fig.show()

And add the groups to the turfs. Let's go talk to some voters!!

In [None]:


m = folium.Map(location=[41.07, -73.6], zoom_start=12)
for i in range(Group1[Group1.columns[0]].count()):
  folium.CircleMarker(
    [Group1['Latitude'][i], Group1['Longitude'][i]], radius = 1, weight = 2, color = "orange").add_to(m)

for i in range(Group2[Group2.columns[0]].count()):
  folium.CircleMarker(
    [Group2['Latitude'][i], Group2['Longitude'][i]], radius = 1, weight = 2, color = "green").add_to(m)

for i in range(Group3[Group3.columns[0]].count()):
  folium.CircleMarker(
    [Group3['Latitude'][i], Group3['Longitude'][i]], radius = 1, weight = 2, color = "blue").add_to(m)

for i in range(Group4[Group4.columns[0]].count()):
  folium.CircleMarker(
    [Group4['Latitude'][i], Group4['Longitude'][i]], radius = 1, weight = 2, color = "purple").add_to(m)

for i in range(Group5[Group5.columns[0]].count()):
  folium.CircleMarker(
    [Group5['Latitude'][i], Group5['Longitude'][i]], radius = 1, weight = 2, color = "pink").add_to(m)

for i in range(Group6[Group6.columns[0]].count()):
  folium.CircleMarker(
    [Group6['Latitude'][i], Group6['Longitude'][i]], radius = 1, weight = 2, color = "red").add_to(m)
m

In [None]:
# how many voters moved into focus groups, or moved up a group within focus groups?

# rate groups by prob:
codes = {'None':7, 'Grp 1':6, 'Grp 2': 5, 'Grp 3':4, 'Grp 4':3, 'Grp 5':2, 'Grp 6':1}
VoteBase['IGrpNo'] = VoteBase['InitialGroup'].map(codes)
VoteBase['GrpNo'] = VoteBase['Group'].map(codes)
VoteBase['Focus'] = np.where((VoteBase['IGrpNo']  > VoteBase['GrpNo']), 1, 0)
# count upgrades to focus
VoteBase['Focus'].sum()
# 88 is a good win margin for a precint with 2,500 voters

88