# Implementing a Voting System

## Background Theory
Here I will summarize the formalism used/developed in Wayland2019. The construction of a generalized voting system, consisting of a set of voters, a set of candidates and a voting rule to compute the outcome of an election is quite standard in the field of social choice theory.
* Let $V$ be a nonempty set of n voters $\{1,...,n\}$
* Let $C$ be the set of m candidates $\{c_1,...,c_m\}$
* In this situation, we assume that each voter is ranking each of the candidates on their ballot to produce a linear order.
* Let $\mathcal{L}(C)$ denote the set of all linear orders on $C$. It follows then that the collection of ballots, which will be referred to as a profile $\mathbb{P}$, is a subset of $\mathcal{L}(C)$. 
* For $i\in V$ let $P_i\in \mathcal{L}(C)$ denote the truthful preferences of the $i^{th}$ voter.
* Then a voting rule, $f$ is nothing more than a function $f:\mathcal{L}(C)\to C$ that picks out a single winner.

### Voting Rules
There are a number of different ways to determine the outcome of an election. This investigation will only consider positional scoring rules. These assign a numerical score to each candidate based on their rank in each $P_i$; the candidate with the maximum score is selected as the winner. 
* Define a scoring vector to be $s = \langle s_1,s_2,...,s_m \rangle$ where for $j=1,...,m-1, s_j\geq s_{j+1}$.
* For $x\in C$, let the function $score(P_i,x) = s_r$. This picks out the appropriate score in the scoring vector assocaited with the $r^{th}$ element in the linear order $P_i$.
* Thus in collecting the total score for $x\in C$ we are computing $total(\mathbb{P},x) = \sum_{i=1}^n score(P_i,x)$
* This allows for a concrete definition of a voting rule $f$ on a given profile $\mathbb{P}$ and scoring vector $s$:
    $$ f(\mathbb{P},s) = \max_{x\in C} total(\mathbb{P},x)$$
##### This formalism allows you to fully generate a postional scoring rule given it's scoring vector. Here are some specifications of a few common voting rules:
* Plurality: the rule given by $\langle 1,...,0 \rangle$.
* k-approval: the rule given by $\langle 1,1,...,0 \rangle$ where you have $k$ 1's followed by zeros.
* Borda: the rule given by $\langle m-1,m-2,...,0 \rangle$

## Implementation

In [41]:
# Imports
library(dplyr)
library(MVN)
library(Hmisc)
library(ggplot2)
library(tidyverse)
devtools::install_github('MEDSL/elections')
library(elections)
library(data.table)

Skipping install of 'elections' from a github remote, the SHA1 (53297d9f) has not changed since last install.
  Use `force = TRUE` to force installation



In [2]:
# Setup an example Election
C <- c("Donald Trump","Joe Biden","Kanye West")
num_voters <- 30
voting_rule <- "Plurality"


In [100]:
# Generate random ballots for the voters to construct a voting profile 

construct_profile <- function(num_voters,candidates) {    
    #' Return n random permutations on the set of canidates as a matrix
    profile <- matrix(data = seq(num_voters*length(candidates)),nrow = num_voters, ncol = length(candidates))
    for (i in seq(num_voters)) {
        ballot <- sample(C, length(C), replace=FALSE)
        profile[i,] <- ballot
    }
    return(profile)
}



#Create a scoring vector for a particular profile and voting method

generate_s_vector <- function(profile,voting_rule) {
    
    m <- dim(profile)[2]
    
    if (voting_rule == "Plurality"){
        s_vector <- replicate(m,0)
        s_vector[1] <- 1
    }
    else if (voting_rule == "Borda"){
        s_vector <- rev(0:(m-1))
    }
    else {
        print("No functionality for this type of voting method yet")
        return
    }
    return(s_vector)
}

In [34]:
# Compute the total score for each candidate
total <- function(profile, s_vector) {
    
    dim <- dim(profile)
    
    
    if (!is.vector(s_vector)) {
        print("Scoring Vector is not a vector")
        return 
    }
    
    if (dim[2] != length(s_vector)) {
        print("Dimensions of score vector and profile do not match")
        return 
    }
    
    df <- as.data.frame(profile)
    colnames(df) <- s_vector
    freq <- (gather(df) %>% group_by(key, value) %>% tally)
    freq$score <- as.numeric(freq$key)*freq$n

    totals <- matrix(nrow = dim[2],ncol = 2)
    for (i in 1:dim(P)[2]) {
            totals[i,] <- c(unique(freq$value[seq(i, dim(freq)[1], dim[2])]),sum(freq$score[seq(i, dim(freq)[1], dim[2])]))
        }
    return(totals)
}

# Determine the winner of the election

f <- function(profile,voting_rule) {
   
    s_vector <- generate_s_vector(profile,voting_rule)
    totals <- as.data.frame(total(profile,s_vector))
    colnames(totals) <- c("Candidate",paste(voting_rule,"Score"))
    print(totals)
    winner <- totals[which.max(totals[,2]),]$Candidate
    print(paste("The winner of the election is",winner))
    return(winner)
    
}

In [101]:
# Test with randomly generated profile
P <- construct_profile(num_voters,C)
winner <- f(profile = P,voting_rule = "Borda")


     Candidate Borda Score
1 Donald Trump          29
2    Joe Biden          32
3   Kanye West          29
[1] "The winner of the election is Joe Biden"


## Election Data
pulling data from https://electionlab.mit.edu/ using https://github.com/MEDSL/elections.git

* One of the upshots of the above formalism is that elections (as networks) are composable; a large election is simply a composition of smaller ones which can all be determined using the formalism above
* Take the presidential election from 2016 as example:
    * we should be able to determine the electoral college votes in each state using a weighted version of plurality. Each states scoring vector will look like $\langle electoralvotes,0,.....,0\rangle$, according to the number of electoral college votes desginated to the given state. 
    * The winner of each state is determined using strict plurality

In [102]:
#Loading data from 2016 Presidential election
data(presidential_precincts_2016)
data <- presidential_precincts_2016 %>%
  select(state, candidate, office, votes)

### First to figure how to manipulate and extract relevant data

In [269]:
#Get scoring vector for each state
electoral_votes <- read.csv(file = 'electoral_votes_per_state.csv')[,1:2]
electoral_votes$winner <- replicate(51,"Undecided")
colnames(electoral_votes) <- c("State","Electoral_Votes","Winner")
head(electoral_votes)

Unnamed: 0_level_0,State,Electoral_Votes,Winner
Unnamed: 0_level_1,<chr>,<int>,<chr>
1,Alabama,9,Undecided
2,Alaska,3,Undecided
3,Arizona,11,Undecided
4,Arkansas,6,Undecided
5,California,55,Undecided
6,Colorado,9,Undecided


In [264]:
states <- unique(data[order(state)]$state)
candidates <- unique(data[order(candidate)]$candidate)
states == electoral_votes$State

In [271]:
#Populate the table of results
dt <- data.table(data)

count <- dt[,list(total_votes = sum(votes)), by = c("candidate","state")]
count <- count[order(state)]


for (i in 1:length(states)) {
    state <-  as.data.frame(count[count$state==states[i]])
    winner <- state[which.max(state[,3]),]$candidate
    electoral_votes$Winner[i] <- winner
}
electoral_votes

State,Electoral_Votes,Winner
<chr>,<int>,<chr>
Alabama,9,Donald Trump
Alaska,3,Donald Trump
Arizona,11,Donald Trump
Arkansas,6,Donald Trump
California,55,Hillary Clinton
Colorado,9,Hillary Clinton
Connecticut,7,Hillary Clinton
Delaware,3,Hillary Clinton
District of Columbia,3,Hillary Clinton
Florida,29,Donald Trump


In [273]:
final_count <- as.data.table(electoral_votes)[,list(total_votes = sum(Electoral_Votes)), by = "Winner"]
final_count #Why does this disagree by 1 from the NY times count?

Winner,total_votes
<chr>,<int>
Donald Trump,305
Hillary Clinton,233


### Now to fold into formalism
* Need new scoring vector to take into account different weights of state.