# Parsing the Cast-Vote Record of Alaska Special Election 09/2022
The Alaska Election Office has released full Cast Vote Records (CVR) on 09/08/2022.
This notebook contains steps we took to extract rankings for the following IRV race: 
`Special Election for House Representative`
Please note that we cannot guarantee correctness of these processing steps. 

In [134]:
import json
import numpy as np
import pandas as pd
import os

### Download and unzip the json in your local directory. Specify your path here:
https://www.elections.alaska.gov/election-results/e/?id=22sspg  (download https://www.elections.alaska.gov/results/22SSPG/CVR_Export_20220908084311.zip)

In [133]:
datadir = "../../data/AlaskaSpecial22/"  # adjust 

In [135]:
with open(os.path.join(datadir, 'CvrExport.json'), 'rt') as f:
    data = json.load(f)

In [136]:
with open(os.path.join(datadir, "ContestManifest.json"), 'rt') as f:
    contest_mani = json.load(f)

In [137]:
with open(os.path.join(datadir, "CandidateManifest.json")) as f:
    cand_mani = json.load(f)

This is the only IRV election: `Id==69`

In [156]:
contest_mani['List'][-1]

{'Description': 'U.S. Representative (Special General)',
 'Id': 69,
 'ExternalId': '',
 'DistrictId': 67,
 'VoteFor': 1,
 'NumOfRanks': 4}

### Get a candidateID-to-candidateName translation table 
We also replace commas by '\_' to avoid separator collisions downstream

In [138]:
# Create a candidateID to name index
id2cand = {}
for i in cand_mani['List']:
    id2cand[i['Id']] = i['Description'].replace(',','_')

In [121]:
data.keys()

dict_keys(['Version', 'ElectionId', 'Sessions'])

Each session corresponds to a ballot (that may contain vote marks for multiple races) 

In [122]:
len(data['Sessions'])

192289

In [123]:
sessions = data['Sessions']

Grab only race 69, i.e., the special election for House representative (the only RCV election) 

In [124]:
rcv_ballots = []
special_id = 69   # this is from ContestManifest special election
for s in sessions:
    aux = [i for i in s['Original']['Cards'][0]['Contests'] if i['Id']==special_id]
    if len(aux) > 0:
        rcv_ballots.append(aux[0])

Process a single list of "marks" == one ranking of one voter. 
We handle over and undervotes also. 
Overvote: person ranked multiple candidates at same rank position.
Undervote: gap in rank sequence (excludes trailing gaps - those are valid partial rankings)

In [139]:
def process_marks(m, id2cand, max_ranks=4):
    if len(m) < 1:
        return ['undervote']
    m = sorted(m, key=lambda x: x['Rank'])
    rnk = {}
    # assign ranks and check overvotes
    for mm in m:
        rank = mm['Rank']
        if rank in rnk:  # Overvote
            rnk[rank] = 'overvote'
        else:  # valid rank (first occurrence)
            rnk[rank] = id2cand[mm['CandidateId']]
    # check undervotes            
    highest_rank = max([i for i in rnk.keys()])
    for i in range(1, highest_rank):
        rnk[i] = 'undervote' if i not in rnk else rnk[i]
    return [rnk[i] for i in sorted(rnk.keys())]

Use the above routine to process all rankings and store in `csvrows`

In [140]:
# Convert everything to list of lists
csvrows = [process_marks(k['Marks'], id2cand) for k in rcv_ballots]

In [151]:
rankings[:10]

[['Peltola, Mary S.', 'Begich, Nick'],
 ['Peltola, Mary S.'],
 ['Peltola, Mary S.'],
 ['Peltola, Mary S.'],
 ['Peltola, Mary S.', 'Begich, Nick', 'Palin, Sarah'],
 ['Peltola, Mary S.'],
 ['Palin, Sarah'],
 ['Peltola, Mary S.', 'Begich, Nick'],
 ['Peltola, Mary S.', 'Begich, Nick', 'Palin, Sarah'],
 ['Begich, Nick', 'Palin, Sarah', 'Peltola, Mary S.']]

Sanity check: overvotes, undervotes

In [142]:
ov = [i for i in rcv_ballots if i['Overvotes']>0]
uv = [i for i in rcv_ballots if i['Undervotes']>0]
ouv = [i for i in rcv_ballots if i['Undervotes']>0 and i['Overvotes']>0 ]

In [150]:
uv[:3]

[{'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []},
 {'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []},
 {'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []}]

In [148]:
csvrows[:10]

[['Peltola_ Mary S.', 'Begich_ Nick'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.', 'Begich_ Nick', 'Palin_ Sarah'],
 ['Peltola_ Mary S.'],
 ['Palin_ Sarah'],
 ['Peltola_ Mary S.', 'Begich_ Nick'],
 ['Peltola_ Mary S.', 'Begich_ Nick', 'Palin_ Sarah'],
 ['Begich_ Nick', 'Palin_ Sarah', 'Peltola_ Mary S.']]

### Save as CSV file of your choosing

In [130]:
pd.DataFrame(csvrows).to_csv('tmp.csv')