# Implementing a Voting System

## Background Theory
Here I will summarize the formalism used/developed in Wayland2019. The construction of a generalized voting system, consisting of a set of voters, a set of candidates and a voting rule to compute the outcome of an election is quite standard in the field of social choice theory.
* Let $V$ be a nonempty set of n voters $\{1,...,n\}$
* Let $C$ be the set of m candidates $\{c_1,...,c_m\}$
* In this situation, we assume that each voter is ranking each of the candidates on their ballot to produce a linear order.
* Let $\mathcal{L}(C)$ denote the set of all linear orders on $C$. It follows then that the collection of ballots, which will be referred to as a profile $\mathbb{P}$, is a subset of $\mathcal{L}(C)$. 
* For $i\in V$ let $P_i\in \mathcal{L}(C)$ denote the truthful preferences of the $i^{th}$ voter.
* Then a voting rule, $f$ is nothing more than a function $f:\mathcal{L}(C)\to C$ that picks out a single winner.

### Voting Rules
There are a number of different ways to determine the outcome of an election. This investigation will only consider positional scoring rules. These assign a numerical score to each candidate based on their rank in each $P_i$; the candidate with the maximum score is selected as the winner. 
* Define a scoring vector to be $s = \langle s_1,s_2,...,s_m \rangle$ where for $j=1,...,m-1, s_j\geq s_{j+1}$.
* For $x\in C$, let the function $score(P_i,x) = s_r$. This picks out the appropriate score in the scoring vector assocaited with the $r^{th}$ element in the linear order $P_i$.
* Thus in collecting the total score for $x\in C$ we are computing $total(\mathbb{P},x) = \sum_{i=1}^n score(P_i,x)$
* This allows for a concrete definition of a voting rule $f$ on a given profile $\mathbb{P}$ and scoring vector $s$:
    $$ f(\mathbb{P},s) = \max_{x\in C} total(\mathbb{P},x)$$
##### This formalism allows you to fully generate a postional scoring rule given it's scoring vector. Here are some specifications of a few common voting rules:
* Plurality: the rule given by $\langle 1,...,0 \rangle$.
* k-approval: the rule given by $\langle 1,1,...,0 \rangle$ where you have $k$ 1's followed by zeros.
* Borda: the rule given by $\langle m-1,m-2,...,0 \rangle$

## Implementation

In [21]:
# Imports
library(dplyr)
library(MVN)
library(Hmisc)
library(ggplot2)
library(tidyverse)

In [47]:
# Setup an example Election
C <- c("Donald Trump","Joe Biden")
num_voters <- 30
voting_rule <- "Plurality"


In [48]:
# Generate random ballots for the voters to construct a voting profile 

construct_profile <- function(num_voters,candidates) {    
    #' Return n random permutations on the set of canidates as a matrix
    profile <- matrix(data = seq(num_voters*length(candidates)),nrow = num_voters, ncol = length(candidates))
    for (i in seq(num_voters)) {
        ballot <- sample(C, length(C), replace=FALSE)
        profile[i,] <- ballot
    }
    return(profile)
}



#Create a scoring vector for a particular profile and voting method

generate_s_vector <- function(profile)

In [5]:
# Compute the total score for each candidate
total <- function(profile, s_vector) {
    
    dim <- dim(profile)
    
    
    if (!is.vector(s_vector)) {
        print("Scoring Vector is not a vector")
        return 0
    }
    
    if (dim[2] != length(s_vector)) {
        print("Dimensions of score vector and profile do not match")
        return 0
    }
    
    df <- as.data.frame(profile)
    colnames(df) <- s_vector
    freq <- (gather(df) %>% group_by(key, value) %>% tally)[,c("value","n")]
    freq$score <- as.numeric(freq$key)*freq$n
    
    totals <- matrix(nrow = dim[2],ncol = 2,dimnames = )
    
    for (i in 1:dim(P)[2]) {
        totals[i,] <- c(unique(freq$value[seq(i, dim(freq)[1], dim[2])]),sum(freq$total[seq(i, dim(freq)[1], dim[2])]))
    }
    
    return totals    
    
}

# Determine the winner of the election
f <- function(profile,voting_method) {
   
    s_vector <- generate_s_vector(voting method)
    totals <- total(profile,s_vector)
    print(totals)
    
}

In [76]:
P <- construct_profile(num_voters,C)
df <- as.data.frame(P)
colnames(df) <- c(2,1,0)
df

2,1,0
<chr>,<chr>,<chr>
Joe Biden,Donald Trump,Kanye west
Kanye west,Joe Biden,Donald Trump
Donald Trump,Joe Biden,Kanye west
Kanye west,Donald Trump,Joe Biden
Donald Trump,Kanye west,Joe Biden
Joe Biden,Kanye west,Donald Trump
Joe Biden,Kanye west,Donald Trump
Donald Trump,Kanye west,Joe Biden
Joe Biden,Donald Trump,Kanye west
Donald Trump,Kanye west,Joe Biden


In [130]:
freq = (gather(df) %>% group_by(key,value) %>% tally)

In [131]:
freq$total <- as.numeric(freq$key)*freq$n
freq

key,value,n,total
<chr>,<chr>,<int>,<dbl>
0,Donald Trump,10,0
0,Joe Biden,11,0
0,Kanye west,9,0
1,Donald Trump,6,6
1,Joe Biden,12,12
1,Kanye west,12,12
2,Donald Trump,14,28
2,Joe Biden,7,14
2,Kanye west,9,18


In [118]:
m <- 1:50
n<- m[seq(3, dim(freq)[1], 3)]
n

In [133]:
total <- matrix(nrow = dim(P)[2],ncol = 2)
    for (i in 1:dim(P)[2]) {
        total[i,] <- c(unique(freq$value[seq(i, dim(freq)[1], dim(P)[2])]),sum(freq$total[seq(i, dim(freq)[1], dim(P)[2])]))
}
total[which.max(df$Temp)]

0,1
Donald Trump,34
Joe Biden,26
Kanye west,30


## Live Data for 2020

In [None]:
#Use python integration to grab live election data and compute using formalism above.