# 2020 Portland Town Council Analysis
Looking at how well different voter groups were represented by the results.

## Setup

In [38]:
import pandas as pd
import numpy as np
import sys

from functools import reduce
from datetime import datetime
from pathlib import Path
from itertools import combinations

sys.path.append('../src')
import helpers

%load_ext autoreload
%autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Load Data
Excel file with top 4 choices for each ballot

In [4]:
path = Path('..\\data\\2020 Portland Charter Commission Election Analysis\\Top-4.xlsx')

data = pd.read_excel(path, engine='openpyxl', header=None)
data.head()

Unnamed: 0,0,1,2,3
0,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M."
1,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M."
2,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M."
3,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M."
4,"Bailey, William M.","Bailey, William M.","Bailey, William M.",undervote


In [65]:
def to_percent(num):
  return f"{num:.1%}"

## Let's crunch the data a little to make it easier to analyze
Which winners did people vote for?

How many councilors did each ballot successfully vote in, i.e. "wins"?

We'll also set up some filters to use later.

In [66]:
# Setup constants for each candidate & the election winners
all_candidates = BAILEY, BUXTON, CHANN, CONDREY, DIMILLO, EMERSON, GRANT, HOUSEAL, ROVELTO, SHEIKH, WASHBURN = [
  'Bailey, William M.', 'Buxton, Catherine A.', 'Chann, Marpheen S.',
  'Condrey, Lawson T.', 'DiMillo, Steven A.', 'Emerson, Anthony M.',
  'Grant, Benjamin K.', 'Houseal, Ian P.', 'Rovelto, Hope R.',
  'Sheikh-Yousef, Nasreen A.', 'Washburn, Patricia J.']
winners = [CHANN, SHEIKH, BUXTON, WASHBURN]
non_winners = list(set(all_candidates) - set(winners))

In [67]:
# Find which votes won & count/sum them up for each row/voter
for c in all_candidates:
  data[c] = data.apply(lambda vote: vote == c).apply(np.any, axis=1)

data['wins'] = data[winners].apply(np.sum, axis=1)

data.head()

Unnamed: 0,0,1,2,3,"Chann, Marpheen S.","Sheikh-Yousef, Nasreen A.","Buxton, Catherine A.","Washburn, Patricia J.",wins,"Bailey, William M.","Condrey, Lawson T.","DiMillo, Steven A.","Emerson, Anthony M.","Grant, Benjamin K.","Houseal, Ian P.","Rovelto, Hope R."
0,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M.",False,False,False,False,0,True,False,False,False,False,False,False
1,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M.",False,False,False,False,0,True,False,False,False,False,False,False
2,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M.",False,False,False,False,0,True,False,False,False,False,False,False
3,"Bailey, William M.","Bailey, William M.","Bailey, William M.","Bailey, William M.",False,False,False,False,0,True,False,False,False,False,False,False
4,"Bailey, William M.","Bailey, William M.","Bailey, William M.",undervote,False,False,False,False,0,True,False,False,False,False,False,False


## How well were people represented?
How many people got 0-4 "wins"?



In [68]:
(data.groupby('wins')[0].count() / len(data)).apply(to_percent)

wins
0    15.3%
1    25.3%
2    22.1%
3    32.2%
4     5.1%
Name: 0, dtype: object

* Unfortunately, 15% of people got none of their top 4 choices.
* 59% of people got 2 or more of their top 4 choices.

It's certainly great to get as much representation as possible, but we're leaving a lot of people behind. Could a different set of candidates represent Portland more completely?

Let's start by identifying which winners tended to be voted in together.

In [69]:
win_patterns = data.groupby(['wins'] + winners).size().to_frame(name='Votes')
win_patterns['%'] = (win_patterns['Votes'] / len(data)).apply(to_percent)
def color_bool(val):
  if val == True: return 'color: green'
win_patterns.reset_index().style.applymap(color_bool, subset=pd.IndexSlice[:, winners])

Unnamed: 0,wins,"Chann, Marpheen S.","Sheikh-Yousef, Nasreen A.","Buxton, Catherine A.","Washburn, Patricia J.",Votes,%
0,0,False,False,False,False,1346,15.3%
1,1,False,False,False,True,118,1.3%
2,1,False,False,True,False,250,2.8%
3,1,False,True,False,False,188,2.1%
4,1,True,False,False,False,1668,19.0%
5,2,False,False,True,True,143,1.6%
6,2,False,True,False,True,144,1.6%
7,2,False,True,True,False,477,5.4%
8,2,True,False,False,True,150,1.7%
9,2,True,False,True,False,536,6.1%


Essentially a 4-part venn diagram of how people voted for the winners. The middle rows show which winners were on a ballot. 

We again see 15.3% got 0/4 choices and 5.1% got 4/4 choices (line 0 and 15, respectively).

Looking at the single winner ballots (lines 1-4) we see all the voers represented by just 1 of the 4 winners. This shows how many people would lose all their representation if we removed any one of the Commissioner Elects:

 * 19% of voters were represented solely by Marpheen Chan
 * < 3% of voters were represented solely by any of  Sheikh-Yousef, Buxton and Washburn (who campaigned together), including
 * 1.3% of voters represented solely by Patricia Washburn

The rest of the data shows a lot more overlap between the three coalition candiates:

 * 19.5% of voters voted for all 3
 * Only 5% of voters lose represention removing both Sheikh-Yousef and Washburn would only leave 5% more voters representation. (lines 1, 3, 6 show 1.3% and 2.1% individually, 1.6% together).
 * i.e. 80% of voters are represented by just Marpheen Chan and Catherine Buxton (actual 85% - 5%). Let's double check that.

In [70]:
to_percent(sum(data[CHANN] | data[BUXTON]) / len(data))

'79.6%'

## Can We Do Better?

Is there a set of candidates that would represent more than 85% of the population?

### Replacing Washburn
For starters, let's see how the unrepresented population voted if we remove Washburn, the least individually popular candidate.  and see how the unrepresented (now ~17%) voted. 

In [79]:
only_washburn = data['wins'] == 1 & data[WASHBURN]

votes_without_washburn = data[(data['wins'] == 0) | only_washburn][non_winners + [WASHBURN]].sum()
(votes_without_washburn / len(data)).sort_values(ascending=False).apply(to_percent)

DiMillo, Steven A.       13.7%
Bailey, William M.        7.5%
Grant, Benjamin K.        5.2%
Houseal, Ian P.           4.4%
Condrey, Lawson T.        2.8%
Emerson, Anthony M.       1.7%
Washburn, Patricia J.     1.3%
Rovelto, Hope R.          0.3%
dtype: object

As it happens, Washburn was the second least popular among this group. Switching Washburn out for Dimillo gives us +12.4% representation (13.7% - 1.3%), bringing us to a whopping 97%!!

In [74]:
to_percent(np.sum(data[CHANN] | data[SHEIKH] | data[BUXTON] | data[DIMILLO]) / len(data))

'97.1%'

### Replacing Washburn AND Sheih-Yousef

Let's see if we can do any better replacing the pair least critical to our representation %

In [85]:
other_candidates = non_winners + [SHEIKH, WASHBURN]

def rep_voters(c1, c2):
  return sum(data[[CHANN, BUXTON, c1, c2]].apply(np.any, axis=1))
hypothetical_replacement_rep = [[c1, c2, rep_voters(c1, c2)] for c1, c2 in combinations(other_candidates, 2)]

hypothetical_voting = pd.DataFrame(hypothetical_replacement_rep, columns=['Replacement 1', 'Replacement 2', 'Votes'])
hypothetical_voting = hypothetical_voting.sort_values('Votes', ascending=False)
hypothetical_voting['%'] = (hypothetical_voting['Votes'] / len(data)).apply(to_percent)
hypothetical_voting

Unnamed: 0,Replacement 1,Replacement 2,Votes,%
0,"Houseal, Ian P.","DiMillo, Steven A.",8382,95.4%
2,"Houseal, Ian P.","Grant, Benjamin K.",7822,89.0%
1,"Houseal, Ian P.","Emerson, Anthony M.",7633,86.8%


 * 97.2%, Dimillo & Grant gives our best representation, although negligably beating out.
 * 97.1%, Dimillo & Sheikh (i.e. the Washburn -> Dimillo replacement we looked at earlier.
 * Dimillo is critical to expanding representation, appearing in the top 8 pairs.

# Conclusions