<a href="https://colab.research.google.com/github/mggg/Training_Materials_25/blob/main/notebooks/practitioners/Thursday/load_clean_run.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cast Vote Records and Elections

In this tutorial notebook, we show how VoteKit can be used to load the cast vote record, clean the ballots, and then run elections.



## Cast Vote Record (CVR)

A cast vote record is the collection of ballots used in an election. While previously we had been working with generated ballots, all of the examples below will use real-world data.

## Scottish Profiles

Scottish elections give us a great source for real-world ranked data, because STV is used for local government elections. Thanks to David McCune of William Jewell College, we have a fantastic [repository](https://github.com/mggg/scot-elex) of shiny, clean ranking data from over 1000 elections, which feature 3-14 candidates apiece, running with a party label.

Go to the [repository](https://github.com/mggg/scot-elex), choose a locality, and download the csv file to your working directory (the same folder as your code). You will need to edit the code below to reflect your file name.

In [1]:
from votekit.cvr_loaders import load_scottish

# the load_scottish function returns a tuple of information:
# the first element is the profile itself, the second is the number of seats in the election
# the third is a list of candidates, the fourth a dictionary mapping candidates to parties,
# and the fourth the ward name
scottish_profile, num_seats, cand_list, cand_to_party, ward = load_scottish("../../../data/west_dunbartonshire_2017_ward2.csv") 

Let's quickly look at each of the returned variables.

In [2]:
print(f"This election took place in {ward}.")
print(f"The number of seats up for election was {num_seats}.")
print(f"The number of candidates was {len(cand_list)}.")

This election took place in Leven.
The number of seats up for election was 4.
The number of candidates was 8.


In [3]:
from votekit.pref_profile import profile_df_head
print(scottish_profile)
print()
print("The top 10 ballots by weight are")
print(profile_df_head(scottish_profile, 10).to_string())


Profile contains rankings: True
Maximum ranking length: 8
Profile contains scores: False
Candidates: ('Jim Bollan', 'Ian Dickson', 'George Drummond', 'Caroline Mcallister', 'Michelle Marie Mcginty', 'John Kelly Millar', 'Peter Parlane', 'Sean Quinn')
Candidates who received votes: ('John Kelly Millar', 'Sean Quinn', 'George Drummond', 'Peter Parlane', 'Caroline Mcallister', 'Ian Dickson', 'Jim Bollan', 'Michelle Marie Mcginty')
Total number of Ballot objects: 1283
Total weight of Ballot objects: 5893


The top 10 ballots by weight are
                             Ranking_1                 Ranking_2              Ranking_3 Ranking_4 Ranking_5 Ranking_6 Ranking_7 Ranking_8 Weight Voter Set
Ballot Index                                                                                                                                              
314                      (Ian Dickson)     (Caroline Mcallister)                    (~)       (~)       (~)       (~)       (~)       (~)    342     

In Scottish elections, voters can rank up to the number of candidates. The most common vote in Scottish elections tends to be a ballot of length `num_seats`, followed by bullet votes (votes for one candidate).

One of the utilities of this repository of elections is that the candidates are labeled with what party they ran under.

In [4]:
for cand, party in cand_to_party.items():
    print(f"{cand} ran under the following party: {party}\n")

Jim Bollan ran under the following party: West Dunbartonshire Community (WDuns)

Ian Dickson ran under the following party: Scottish National Party (SNP)

George Drummond ran under the following party: Liberal Democrat (LD)

Caroline Mcallister ran under the following party: Scottish National Party (SNP)

Michelle Marie Mcginty ran under the following party: Labour (Lab)

John Kelly Millar ran under the following party: Labour (Lab)

Peter Parlane ran under the following party: Conservative and Unionist Party (Con)

Sean Quinn ran under the following party: Green (Gr)



Scottish elections use the STV mechanism, so let's quickly see who the winner set is.

In [5]:
from votekit.elections import STV

e = STV(scottish_profile, m=num_seats)

print(e.get_elected())

(frozenset({'Ian Dickson'}), frozenset({'Jim Bollan'}), frozenset({'John Kelly Millar'}), frozenset({'Caroline Mcallister'}))


  denom = common_den * common_den


We read this tuple as a ranking: the first entry of the tuple is the candidate elected first, etc. Your tuple should look something like `(frozenset({'Ian Dickson'}), frozenset({'Jim Bollan'}), frozenset({'John Kelly Millar'}), frozenset({'Caroline Mcallister'}))` (albeit with different candidate names).
This means Ian Dickson was elected first, then Jim Bollan, then John Kelly Millar, then Caroline Mcallister.

## Minnesota 2013


Another possible data source is real-world elections that return their cast vote records (CVRs) as csv files. To be readable by VoteKit, the csv file must have a row for each voter, and must have one column per ranking position.

The Minnesota 2013 Mayoral race, which used IRV, did just that. Let's load the csv file into VoteKit. You can find the file [here](https://github.com/mggg/Training_Materials_25/blob/main/data/mn_2013_cast_vote_record.csv). Download it and put it into your working directory.

Voters were allowed to rank three candidates.



In [6]:
from votekit.cvr_loaders import load_csv

mn_profile = load_csv("../../../data/mn_2013_cast_vote_record.csv", rank_cols=[0,1,2]) # the first 3 columns of the csv hold the ranking information     
                                                                                     # in order from 1st place to 3rd place

Let's look at the candidates for the race.

In [7]:
for candidate in mn_profile.candidates:
    print(candidate)

ABDUL M RAHAMAN "THE ROCK"
DAN COHEN
JAMES EVERETT
MARK V ANDERSON
TROY BENJEGERDES
undervote
ALICIA K. BENNETT
BETSY HODGES
MARK ANDREW
MIKE GOULD
BILL KAHN
BOB FINE
CAM WINTON
DON SAMUELS
JACKIE CHERRYHOMES
JEFFREY ALAN WAGNER
JOHN LESLIE HARTWIG
KURTIS W. HANNA
JOSHUA REA
MERRILL ANDERSON
NEAL BAXTER
STEPHANIE WOODRUFF
UWI
BOB "AGAIN" CARNEY JR
TONY LANE
CAPTAIN JACK SPARROW
GREGG A. IVERSON
JAMES "JIMMY" L. STROUD, JR.
JAYMIE KELLY
CYD GORMAN
EDMUND BERNARD BRUYERE
DOUG MANN
CHRISTOPHER ROBIN ZIMMERMAN
RAHN V. WORKCUFF
JOHN CHARLES WILSON
OLE SAVIOR
overvote
CHRISTOPHER CLARK


Woah, that’s a little funky! There are candidates called ‘undervote’, ‘overvote’, and ‘UWI’. This cast vote record was already cleaned by the City of Minneapolis, and they chose this way of parsing the ballots: ‘undervote’ indicates that the voter left a position unfilled, such as by having no candidate listed in second place. The ‘overvote’ notation arises when a voter puts two candidates in one position, like by putting Hodges and Samuels both in first place. Unfortunately this way of storing the profile means we have lost any knowledge of the voter intent (which was probably to indicate equal preference). ‘UWI’ stands for unregistered write-in.

This reminds us that it is really important to think carefully about how we want to handle cleaning ballots, as some storage methods are efficient but lossy. For now, let’s assume that we want to further condense the ballots, discarding ‘undervote’, ‘overvote’, and ‘UWI’ as candidates. We will then move up lower ranked candidates to replace the removed non-candidates. The `remove_and_condense` function does this for us.

In [8]:
from votekit.cleaning import remove_and_condense

remove_cand_mn_profile = remove_and_condense(["overvote", "undervote", "UWI"], mn_profile)

Let's see that the three "candidates" have been removed.

In [9]:
print("The following candidates appear in the uncleaned profile but have been removed.")
print(set(mn_profile.candidates)-set(remove_cand_mn_profile.candidates))

The following candidates appear in the uncleaned profile but have been removed.
{'undervote', 'overvote', 'UWI'}


Now all of the ballots are properly formatted to run an IRV election.

In [10]:
from votekit.elections import IRV

e = IRV(mn_profile)

print(e.get_elected())

(frozenset({'BETSY HODGES'}),)




## Pre-saved PreferenceProfiles: Portland

VoteKit allows you to save PreferenceProfiles to what are called "pickle" files. Pkl files save Python variables so you can access them after closing a Python session. The Data and Democracy Lab has cleaned and prepared the cvr from the 2024 Portland, OR City Council election, district 1. You can download the file [here](https://github.com/mggg/Training_Materials_25/blob/main/data/Portland_D1_cleaned_votekit_pref_profile.pkl). Then place it in your working directory.

In [11]:
from votekit.pref_profile import PreferenceProfile

# change this file name to reflect where the file is on your computer.
profile = PreferenceProfile.from_pickle("../../../data/Portland_D1_cleaned_votekit_pref_profile.pkl") 

In [12]:
# 3 seat election
election = STV(profile, m=3)

  denom = common_den * common_den


Do we have the correct candidates? Do we have the same vote totals? Do we get the same STV winner set? The Election object, called `election` here, has lots of built in methods that allow us to check these stats.

In district 1, Avalos, Dunphy, and Smith were elected. The winners, first place vote distribution, and lots of other stats we can double check, are given [here](https://www.portland.gov/sites/default/files/2024/Portland-District-1-Certified-Abstract-Nov-2024.pdf).

In [13]:
print("Winners in order of election")
i=0
for cand_set in election.get_elected():
    i+=1
    # this extra loop is necessary b/c it's possible two or more candidates are elected simultaneously
    for cand in cand_set:
        print(i, cand)

Winners in order of election
1 Candace Avalos
2 Loretta Smith
3 Jamie Dunphy


In [14]:
# threshold
print(f"Election Threshold: {election.threshold:,}")

Election Threshold: 10,718


In [15]:
from votekit.utils import first_place_votes

fpv_dict = first_place_votes(profile)
cands_sorted_by_fpv = sorted(zip(fpv_dict.keys(), fpv_dict.values()), # creates a list of tuples, (name, fpv)
                                reverse=True,  #decreasing order
                                key = lambda x: x[1], # sort by second element of tuple, which is fpv)
                                    )

print("Candidates in decreasing order of first-place votes.\n")
for cand, fpv in cands_sorted_by_fpv:
    print(cand, fpv)

Candidates in decreasing order of first-place votes.

Candace Avalos 8297
Loretta Smith 5586
Jamie Dunphy 5064
Noah Ernst 4052
Terrence Hayes 3975
Steph Routh 3894
Timur Ender 3550
Doug Clove 1698
Peggy Sue Owens 1266
David Linn 1111
Joe Allen 978
Michael (Mike) Sands 952
Deian Salazar 720
Cayle Tern 711
Thomas Shervey 385
Joe Furi 355
Uncertified Write In 277


In [21]:
print("The final ordering of the candidates is")
for i, cand_set in enumerate(election.get_ranking()):
    for cand in cand_set:
        print(i+1, cand)

The final ordering of the candidates is
1 Candace Avalos
2 Loretta Smith
3 Jamie Dunphy
4 Terrence Hayes
5 Noah Ernst
6 Steph Routh
7 Timur Ender
8 Doug Clove
9 Peggy Sue Owens
10 David Linn
11 Joe Allen
12 Michael (Mike) Sands
13 Deian Salazar
14 Cayle Tern
15 Thomas Shervey
16 Joe Furi
17 Uncertified Write In


Take a moment to verify these against the [official record](https://www.portland.gov/sites/default/files/2024/Portland-District-1-Certified-Abstract-Nov-2024.pdf).

## Comparing to other systems

VoteKit makes it very easy to try running the same profile through different election methods. This allows us to answer questions like "If Portland had used a Plurality election, who would have won?"

In [18]:
from votekit.elections.election_types.ranking import CondoBorda, Plurality, Borda

profile = PreferenceProfile.from_pickle("../../../data/Portland_D1_cleaned_votekit_pref_profile.pkl") 

alt_elections = {"Condorcet": CondoBorda(profile, m=3),
                 "Borda": Borda(profile, m=3),
                 "Plurality": Plurality(profile, m=3),
                 "STV": STV(profile, m=3)
                 }

for e_name, e in alt_elections.items():
    print(e_name)
    e_winners = [c for s in e.get_elected() for c in s]
    print("In order of election, the winners are")

    for i, winner in enumerate(e_winners):
        print(f"{i}) {winner}")
    print()

  denom = common_den * common_den


Condorcet
In order of election, the winners are
0) Candace Avalos
1) Steph Routh
2) Jamie Dunphy

Borda
In order of election, the winners are
0) Candace Avalos
1) Steph Routh
2) Loretta Smith

Plurality
In order of election, the winners are
0) Candace Avalos
1) Loretta Smith
2) Jamie Dunphy

STV
In order of election, the winners are
0) Candace Avalos
1) Loretta Smith
2) Jamie Dunphy





Go to the list of ranking elections that VoteKit supports https://votekit.readthedocs.io/en/latest/social_choice_docs/scr/#ranking-based, and try importing one and applying it to the Portland profile.

In [None]:
from votekit.elections.election_types.ranking import ??????

election = ??????(profile, m=)

Try changing the number of seats `m`, either on the election type you just imported, or on some of the elections we previously used. Who wins then?

In [None]:
# your code here