# Parsing the Cast-Vote Record of Alaska Special Election 09/2022
The Alaska Election Office has released full Cast Vote Records (CVR) on 09/08/2022.
This notebook contains steps we took to extract rankings for the following IRV race: 
`Special Election for House Representative`
Please note that we cannot guarantee correctness of these processing steps. 

In [1]:
import json
import numpy as np
import pandas as pd
import os

### Download and unzip the json in your local directory. Specify your path here:
https://www.elections.alaska.gov/election-results/e/?id=22sspg  (download https://www.elections.alaska.gov/results/22SSPG/CVR_Export_20220908084311.zip)

In [2]:
datadir = "../../ranked-voting/data/AlaskaSpecial22/"  # adjust 
exclude_ambiguous_ballots = True  # if any ballot mark has `IsAmbiguous==True` exclude the ballot 

In [3]:
with open(os.path.join(datadir, 'CvrExport.json'), 'rt') as f:
    data = json.load(f)

In [4]:
with open(os.path.join(datadir, "ContestManifest.json"), 'rt') as f:
    contest_mani = json.load(f)

In [5]:
with open(os.path.join(datadir, "CandidateManifest.json")) as f:
    cand_mani = json.load(f)

This is the only IRV election: `Id==69`

In [6]:
contest_mani['List'][-1]

{'Description': 'U.S. Representative (Special General)',
 'Id': 69,
 'ExternalId': '',
 'DistrictId': 67,
 'VoteFor': 1,
 'NumOfRanks': 4}

### Get a candidateID-to-candidateName translation table 
We also replace commas by '\_' to avoid separator collisions downstream

In [7]:
# Create a candidateID to name index
id2cand = {}
for i in cand_mani['List']:
    id2cand[i['Id']] = i['Description'].replace(',','_')

In [8]:
data.keys()

dict_keys(['Version', 'ElectionId', 'Sessions'])

Each session corresponds to a ballot (that may contain vote marks for multiple races) 

In [9]:
len(data['Sessions'])

192289

In [10]:
sessions = data['Sessions']

Grab only race 69, i.e., the special election for House representative (the only RCV election) 

In [11]:
rcv_ballots = []
special_id = 69   # this is from ContestManifest special election
for s in sessions:
    aux = [i for i in s['Original']['Cards'][0]['Contests'] if i['Id']==special_id]
    if len(aux) > 0:
        rcv_ballots.append(aux[0])

In [12]:
rcv_ballots[0]

{'Id': 69,
 'ManifestationId': 64,
 'Undervotes': 0,
 'Overvotes': 0,
 'OutstackConditionIds': [14],
 'Marks': [{'CandidateId': 218,
   'ManifestationId': 953,
   'PartyId': 6,
   'Rank': 1,
   'MarkDensity': 100,
   'IsAmbiguous': False,
   'IsVote': True,
   'OutstackConditionIds': []},
  {'CandidateId': 215,
   'ManifestationId': 955,
   'PartyId': 14,
   'Rank': 2,
   'MarkDensity': 100,
   'IsAmbiguous': False,
   'IsVote': False,
   'OutstackConditionIds': []}]}

Process a single list of "marks" == one ranking of one voter. 
We handle over and undervotes also. 
Overvote: person ranked multiple candidates at same rank position.
Undervote: gap in rank sequence (excludes trailing gaps - those are valid partial rankings)

In [21]:
def ballot_is_ambiguous(m):
    return False if len(m) < 1 else any([i['IsAmbiguous'] for i in m])

In [30]:
# TODO: optionally handle ambiguous marks
def process_marks(m, id2cand, max_ranks=4, handle_ambiguous=False):
    """
        Returns a list of candidates; number of marks removed/ignored
    """
    nignored = 0
    if handle_ambiguous:
        # delete all marks that are labeled ambiguous
        newm = [i for i in m if not i['IsAmbiguous']]
        nignored += len(m) - len(newm)
        m = newm
    if len(m) < 1:
        return ['undervote'], nignored        
    m = sorted(m, key=lambda x: x['Rank'])
    rnk = {}
    # assign ranks and check overvotes
    for mm in m:
        rank = mm['Rank']
        if rank in rnk:  # Overvote
            rnk[rank] = 'overvote'
        else:  # valid rank (first occurrence)
            rnk[rank] = id2cand[mm['CandidateId']]
    # check undervotes            
    highest_rank = max([i for i in rnk.keys()])
    for i in range(1, highest_rank):
        rnk[i] = 'undervote' if i not in rnk else rnk[i]
    return [rnk[i] for i in sorted(rnk.keys())], nignored

Use the above routine to process all rankings and store in `csvrows`

In [26]:
def num_ambiguous_marks(m):
    return 0 if len(m) < 1 else np.sum([i['IsAmbiguous'] for i in m])

In [27]:
from collections import Counter
cnt = []
tot_ambiguous = 0
for b in rcv_ballots:
    c = num_ambiguous_marks(b['Marks'])
    cnt.append(c)
    tot_ambiguous += c
print(f"Ambiguous-marks stats: {Counter(cnt)}")
print(f"Total ambiguous = {tot_ambiguous}")

Ambiguous-marks stats: Counter({0: 192116, 1: 143, 2: 9, 3: 7, 4: 4, 8: 3, 12: 2, 7: 2, 10: 1, 6: 1, 5: 1})
Total ambiguous = 281


In [31]:
# Convert everything to list of lists
csvrows = []
nignored = 0
for k in rcv_ballots:
    r, n = process_marks(k['Marks'], id2cand, handle_ambiguous=exclude_ambiguous_ballots)                    
    csvrows.append(r)
    nignored += n
print(f'INFO: Num ignored marks = {nignored}')

INFO: Num ignored marks = 281


In [32]:
## this will exclude any ballot with an ambiguous mark 
# # Convert everything to list of lists
# csvrows = []
# ambiguous = 0
# for b in rcv_ballots:    
#     if exclude_ambiguous_ballots and is_ambiguous(b['Marks']):
#         ambiguous += 1
#         continue
#     csvrows.append(process_marks(b['Marks'], id2cand))
# if ambiguous > 0:
#     print(f"INFO: Excluded {ambiguous} ambiguous ballots.")

In [36]:
csvrows[:10]

[['Peltola_ Mary S.', 'Begich_ Nick'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.'],
 ['Peltola_ Mary S.', 'Begich_ Nick', 'Palin_ Sarah'],
 ['Peltola_ Mary S.'],
 ['Palin_ Sarah'],
 ['Peltola_ Mary S.', 'Begich_ Nick'],
 ['Peltola_ Mary S.', 'Begich_ Nick', 'Palin_ Sarah'],
 ['Begich_ Nick', 'Palin_ Sarah', 'Peltola_ Mary S.']]

Sanity check: overvotes, undervotes

In [37]:
ov = [i for i in rcv_ballots if i['Overvotes']>0]
uv = [i for i in rcv_ballots if i['Undervotes']>0]
ouv = [i for i in rcv_ballots if i['Undervotes']>0 and i['Overvotes']>0 ]

In [38]:
uv[:3]

[{'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []},
 {'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []},
 {'Id': 69,
  'ManifestationId': 64,
  'Undervotes': 1,
  'Overvotes': 0,
  'OutstackConditionIds': [13],
  'Marks': []}]

### Save as CSV file of your choosing

In [39]:
pd.DataFrame(csvrows).to_csv('tmp.csv')