<a href="https://colab.research.google.com/github/mggg/Training_Materials/blob/main/notebooks/joint/Joint_1_Viz.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install votekit

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from urllib.request import urlopen
from votekit import PreferenceProfile
import pickle
from votekit.plots import profile_mentions_plot, multi_bar_plot, profile_ballot_lengths_plot
from votekit.elections import STV
from votekit.utils import first_place_votes, mentions
from votekit.graphs import PairwiseComparisonGraph
from votekit.matrices import matrix_heatmap, boost_matrix, candidate_distance_matrix, comentions_matrix
from io import BytesIO
import geopandas as gpd

# Plotting in Votekit

We have added a plethora of new plotting functions into votekit that should
make it much easier for you all to generate useful figures for reports and
internal analysis.

You will notice that there is very little in the way of narrative throughout
this notebook, and that is because this notebook is designed with
experimentation in mind so we will be giving you some of the basic functions
for making different plots, and then you will be asked to play around
with the formatting and parameters so you can see what is possible.

### First, we are going to load our data really quick

In [None]:
with urlopen('https://github.com/mggg/Training_Materials_25/raw/refs/heads/main/data/Portland_D1_cleaned_votekit_pref_profile.pkl') as response:
    profile = pickle.load(response)

In [None]:
profile

In [None]:
with urlopen('https://github.com/mggg/Training_Materials_25/raw/refs/heads/main/data/visualization/Portland_election_for_visualization.pkl') as response:
    election = pickle.load(response)

In [None]:
election

# Histograms

In [None]:
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

### Try it yourself!

Your task is to do each of the following (preferably in order and with only one modification):

1. Edit the axis names
2. Try normalizing the data
3. Change the color of the bars
4. Change the width of the bars
5. Change the order of the bars
6. Try relabeling the candidates with integers
7. Change the legend font size
8. Play around with the legend location

Parameters for `profile_mentions_plot`:

- `profile` (PreferenceProfile): Profile to plot statistics for.
- `profile_label` (str, optional): Label for profile. Defaults to "Profile".
- `mentions_kwds` (dict[str, Any], optional): Keyword arguments to pass to
    ``mentions``. Defaults to None, in which case default values for ``mentions``
    are used.
- `normalize` (bool, optional): Whether or not to normalize data. Defaults to False.
- `profile_color` (str, optional): Color to plot. Defaults to the first color from
    ``COLOR_LIST`` from ``utils`` module.
- `bar_width` (float, optional): Width of bars. Defaults to None which computes the bar width
    as 0.7 divided by the number of data sets. Must be in the interval `(0,1]`.
- `candidate_ordering` (list[str], optional): Ordering of x-labels. Defaults to decreasing
    order of mentions.
- `x_axis_name` (str, optional): Name of x-axis. Defaults to None, which does not plot a name.
- `y_axis_name` (str, optional): Name of y-axis. Defaults to None, which does not plot a name.
- `title` (str, optional): Title for the figure. Defaults to None, which does not plot a title.
- `show_profile_legend` (bool, optional): Whether or not to plot the profile legend.
    Defaults to False. Is automatically shown if any threshold lines have the keyword
    "label" passed through ``threshold_kwds``.
- `candidate_legend` (dict[str, str], optional): Dictionary mapping candidates
    to alternate label. Defaults to None. If provided, generates a second legend.
- `relabel_candidates_with_int` (bool, optional): Relabel the candidates with integer labels.
    Defaults to False. If ``candidate_legend`` is passed, those labels supercede.
- `threshold_values` (Union[list[float], float], optional): List of values to plot horizontal
    lines at. Can be provided as a list or a single float.
- `threshold_kwds` (Union[list[dict], dict], optional): List of plotting
    keywords for the horizontal lines. Can be a list or single dictionary. These will be
    passed to plt.axhline(). Common keywords include "linestyle", "linewidth", and "label".
    If "label" is passed, automatically plots the data set legend with the labels.
- `legend_font_size` (float, optional): The font size to use for the legend. Defaults to 10.0
    + the number of categories.
- `legend_loc` (str, optional): The location parameter to pass to ``Axes.legend(loc=)``.
    Defaults to "center left".
- `legend_bbox_to_anchor` (Tuple[float, float], otptional): The bounding box to anchor
    the legend to. Defaults to (1, 0.5).
- `ax` (Axes, optional): A matplotlib axes object to plot the figure on. Defaults to None, in
    which case the function creates and returns a new axes. The figure height is 6 inches
    and the figure width is 3 inches times the number of categories.




In [None]:
# Task 1: Edit the axis names
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
#Task 2: Normalize the data
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 3: Change the color of the bars
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 4: Change the width of the bars
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 5: Change the order of the bars
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 6: Relabel the candidates with integer labels
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 7: Change the legend font size
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

In [None]:
# Task 8: Play around wit the legend location
profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True
)

## Some quick info on Matplotlib

Below is a very small bit of code that shows the standard way that we plot a figure using
matplotlib.

In [None]:
fig, ax = plt.subplots(figsize = (8,8))             # Create a figure containing a single Axes.
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])  # Plot some data on the Axes.
plt.show()                           # Show the figure.

We will be interested in editing the Axes object `ax` next to make our plots look nicer.
Here is an image that tells you many of the things that you can change in the axes object:

<img src="https://github.com/mggg/Training_Materials_25/raw/refs/heads/main/data/visualization/parts_of_a_figure.webp"
     alt="parts of a figure"
     width="40%"/>


### Try it yourself!

In [None]:
# Task 1: Use `fig, ax = plt.subplots(figsize=(<pick a size>))` to create a figure and axes object. make sure to play around with different figure sizes to see how they affect the plot.

fig_size = (0,0) # Edit this line
fig, ax = plt.subplots(figsize=fig_size)

profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True,
    ax = ax # pass the axes object to the function
)

In [None]:
# Task 2: Use `ax.spines[<side>].set_visible(False)` to hide some of the spines of the plot.

ax = profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True,
)

# Uncomment the line below and select a side to hide
# ax.spines["SELECT A SIDE HERE"].set_visible(False)

In [None]:
# Task 3: Use `ax.tick_params(ax='x', rotation=<pick a rotation>)` to rotate the x-axis ticks and make them easier to read.

ax = profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True,
)


ax.tick_params(axis="x", rotation=0) 

In [None]:
# Task 4: Use `ax.set_yscale(<scale>)` to change the scale of the y-axis. Docs -> https://matplotlib.org/stable/users/explain/quick_start.html#scales

ax = profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True,
)

In [None]:
# Task 5: Use `fig, ax = plt.subplots(figsize=(<pick a size>), nrows=2)` and plot two different histograms on the same figure. Docs ->  https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots

ax = profile_mentions_plot(
    profile,
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
    show_profile_legend=True,
)

### Filtering to viable candidates

In [None]:
mentions_dict = mentions(profile)
viable_cands = [c for c, mentions in mentions_dict.items() if mentions >= election.threshold]

viable_cands = sorted(viable_cands, reverse=True, key = lambda x: mentions_dict[x])
print("Viable candidates in decreasing order of mentions")

for i, cand in enumerate(viable_cands):
    print(i+1, cand)

In [None]:
profile.df

In [None]:
first_place_votes(profile)

In [None]:
viable_cands_mentions = {cand:mentions for cand, mentions in mentions_dict.items() if cand in viable_cands}
viable_cands_fpv = {cand: fpv for cand, fpv in first_place_votes(profile).items() if cand in viable_cands}

ax = multi_bar_plot(
    data={"Mentions": viable_cands_mentions, "FPV": viable_cands_fpv},
    threshold_values=election.threshold,
    threshold_kwds={
        "label": f"Threshold: {election.threshold:,}",
        "color":"black",
        "linestyle": "--"
    },
)

In [None]:
ax = profile_ballot_lengths_plot(profile, title="Ballot Lengths in D1", normalize=True, y_axis_name="Percentage", x_axis_name="Length")

# change the tick labels to percentages
ax.set_yticks(ax.get_yticks())
ax.set_yticklabels([f"{float(x.get_text()):.0%}" for x in ax.get_yticklabels()])

plt.show()

# Bubble Plots

These plots are relatively new, and have not quite made it into VoteKit at this point, but
we are providing the code for you to use as a reference.

In [None]:
# This was the original code used to generate the election results
# we will be plotting using the bubble plot function. Feel free to
# ignore this cell.


# from votekit.elections import STV
# from votekit.ballot_generator import slate_PlackettLuce
# from tqdm.notebook import tqdm

# num_cands = {"P": 5, "C": 6}
# num_voters = 2500
# num_seats = 5
# num_trials = 100

# # Fixed parameters
# slate_to_candidates = {"P": [f"P_{i}" for i in range(num_cands["P"])], # creates the list ["P_0", "P_1", "P_2"]
#                         "C": [f"C_{i}" for i in range(num_cands["C"])],} # creates the list ["C_0", "C_1", "C_2"]

# cohesion_parameters = {
#     "P": {"P":.7, "C":.3},
#     "C": {"P":.4, "C":.6},
#     }

# alphas = {
#     "P": {"P":0.1, "C":0.1},
#     "C": {"P":0.1, "C":0.1},
#     }

# # Varying parameter
# turnouts = {
#     "high_progressive_turnout"  : {"P": .7, "C": .3},
#     "low_progressive_turnout"  : {"P": .5, "C": .5},
#     }


# # used to store the results
# num_prog_winners_by_turnout = {
#     "high_progressive_turnout": [],
#     "low_progressive_turnout": [],
#     }

# for turnout_label, bloc_voter_prop in turnouts.items():

#     print(turnout_label)

#     for _ in tqdm(range(num_trials)): # tqdm creates a progress bar
#         pl = slate_PlackettLuce.from_params(slate_to_candidates=slate_to_candidates,
#                 bloc_voter_prop=bloc_voter_prop,
#                 cohesion_parameters=cohesion_parameters,
#                 alphas=alphas)

#         profile = pl.generate_profile(num_voters)

#         e = STV(profile, m= num_seats)

#         winners = e.get_elected()

#         # compute the number of profressive winners
#         num_prog_winners = len([c for cand_set in winners for c in cand_set if "P_" in c])

#         # add the number of progressive winners to the end of a list
#         num_prog_winners_by_turnout[turnout_label].append(num_prog_winners)

In [None]:
with urlopen('https://github.com/mggg/Training_Materials_25/raw/refs/heads/main/data/visualization/num_prog_winners_by_turnout.pkl') as response:
    num_prog_winners_by_turnout = pickle.load(response)

In [None]:
from typing import Optional
import numpy as np
from matplotlib.axes import Axes
import matplotlib.pyplot as plt

def bubble_plot_integer(
    data: list[list[int]],
    colors: list[str],
    ax: Optional[Axes] = None,
    marker: str =".",
    size: int = 1000
):
    # create figure

    if ax is None:
        fig, ax = plt.subplots()

    x_max = int(max(max(vector) for vector in data))
    bin_min = 0
    bin_max = x_max
    bins = np.arange(bin_min-.5, bin_max+1.5, 1)

    for j, vector in enumerate(data):
        x = [i for i in range(x_max+1)] # x=0,...,x_max
        y = [j+1]*len(x) # put each vector at a different height

        bin_heights, _ = np.histogram(vector, bins = bins,density=True)
        circle_areas = [size*bin_heights[i] for i in range(x_max+1)]
        ax.scatter(x, y, s=circle_areas, alpha=1, color = colors[j], label = None, edgecolors='black', marker=marker)

    return ax

In [None]:
num_prog_winners_by_turnout.keys()

In [None]:
bubble_plot_integer(
    data = [num_prog_winners_by_turnout["high_progressive_turnout"], num_prog_winners_by_turnout["low_progressive_turnout"]],
    colors = ["#1f77b4", "#ff7f0e"],
    size=3000
)

In [None]:
x_max = 5
ax = bubble_plot_integer(
    data = [num_prog_winners_by_turnout["high_progressive_turnout"], num_prog_winners_by_turnout["low_progressive_turnout"]],
    colors = ["#1f77b4", "#ff7f0e"],
    size=3000
)

ax.set_xticks([i for i in range(x_max+1)])
ax.set_yticks([i+1 for i in range(2)], num_prog_winners_by_turnout.keys())
ax.axvline(x=3, color="pink", zorder = -1, label="Expected winners")
ax.set_xlim((0, 5))
ax.set_ylim((0.5,2.5))


# A trick for adding a legend to the plot
for color, label in zip(
    ["#1f77b4", "#ff7f0e"],
    num_prog_winners_by_turnout.keys()
):
    ax.scatter(
        [], [],                 # no data
        c=color,
        s=10,
        label=label
    )

# finally draw the legend
ax.legend(
    loc="center left",
    scatterpoints=1,
    frameon=True,
    bbox_to_anchor=(1, 0.5),
)

# Choropleths

A _choropleth_ is basically a heatmap used commonly with geographic data. The only difference
from a true heatmap is that the color of each piece of geometry in the map corresponds to a
general aggregate statistic. This can be really useful for visualizing things like population
density.

In [None]:
with urlopen('https://github.com/mggg/Training_Materials_25/raw/refs/heads/main/data/visualization/nc_viz_data.parquet') as response:
    data = response.read()
    buf = BytesIO(data)

# now GeoPandas/pyarrow can read it
gdf = gpd.read_parquet(buf)
gdf

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
gdf.plot(ax=ax, column='total_pop_20', legend=True)

In [None]:
# DPI (dots per inch) controls the resolution of the figure
fig, ax = plt.subplots(figsize=(20, 10), dpi=300)


# Note: You build layers from the background to the foreground in matplotlib.
# so the first layer added to an ax object will be the background layer and subsequent
# layers will be drawn on top of it.


# Add the lines to the background layer
gdf.plot(
    ax=ax,
    edgecolor='black',
    facecolor='none',
)

lower, upper = 0, 10_000
gdf.plot(
    column='total_pop_20',
    cmap='Purples', # Change the colormap to 'Purples' for a purple gradient
    legend=True, # Add the color bar legend
    vmin=lower, # Adjust the lower limit of the color scale
    vmax=upper,  # Adjust the upper limit of the color scale
    ax=ax,
    legend_kwds={
        "shrink": 0.5 # Adjust the size of the legend
    },
    alpha=0.9 # Adjust the transparency of the top layer
)

ax.spines[:].set_visible(False)
ax.set_xticks([])
ax.set_yticks([])

plt.show()

### Try it yourself!

Try plotting two choropleths on top of each other. One should be wvap_20 as a percent of the population
of the vtd (you will need to make a column for this) and the other should be the republican turnout
in the 2020 presidential election as a percent of the total turnout by precinct (this will
require another new column).

# Candidate Similarity

VoteKit also has some plotting functions to help users understand the relationship between
candidates and their support in ranked elections. Currently we provide plotting functions for
three different types of matrix:
- boost
- candidate distance
- comentions

## Boost Matrix

The (i,j) entry of the boost matrix shows P(mention i | mention j) - P(mention i). Thus,
the i,j entry shows the boost given to candidate i by candidate j.

In [None]:
all_cands_sorted_by_mentions = sorted(profile.candidates, reverse=True, key = lambda x: mentions_dict[x])

# computes the matrix
boost_matrix  = boost_matrix(profile, candidates = all_cands_sorted_by_mentions)

In [None]:
all_last_names = [name.split(" ")[-1] if "Write In" not in name else "UWI" for name in all_cands_sorted_by_mentions]

# plots the matrix
ax  = matrix_heatmap(
    boost_matrix,
    row_labels=all_last_names,
    column_labels=all_last_names,
    row_label_rotation = 0,
    column_label_rotation = 90
)

In [None]:
# Adjus the figure and font size
fig, ax = plt.subplots(figsize=(12,12))

ax  = matrix_heatmap(
    boost_matrix,
    row_labels=all_last_names,
    column_labels=all_last_names,
    row_label_rotation = 0,
    column_label_rotation = 90,
    ax = ax,
    cell_font_size = 12
)

## Candidate Distance Matrix

The (i,j) entry of the candidate distance matrix shows the average distance between
candidates i and j when i >= j on the same ballot.

In [None]:
cand_dist_matrix  = candidate_distance_matrix(profile, candidates = viable_cands)

last_names_viable = [name.split(" ")[-1] for name in viable_cands]

ax  = matrix_heatmap(
    cand_dist_matrix,
    row_labels=last_names_viable,
    column_labels=last_names_viable,
    row_label_rotation = 0,
    column_label_rotation = 90,
)

## Comentions Matrix

The (i,j) entry of the comentions matrix shows the number of times candidates i,j were mentioned on the same ballot with i >= j. There is an option to symmetrize the matrix, which makes the (i,j) entry the number of times that i and j were mentioned on the same ballot (irrespective of position).

In [None]:
comentions_mat_asym  = comentions_matrix(profile, candidates = viable_cands)
ax  = matrix_heatmap(
    comentions_mat_asym,
    row_labels=last_names_viable,
    column_labels=last_names_viable,
    row_label_rotation = 0,
    column_label_rotation = 90,
    n_decimals_to_display=0
)

plt.title("Asymmetric Comentions")
plt.show()

In [None]:
comentions_mat_sym  = comentions_matrix(profile, candidates = viable_cands, symmetric=True)
ax  = matrix_heatmap(
    comentions_mat_sym,
    row_labels=last_names_viable,
    column_labels=last_names_viable,
    row_label_rotation = 0,
    column_label_rotation = 90,
    n_decimals_to_display=0
)

plt.title("Symmetric Comentions")
plt.show()

### Try it yourself!

Play around with all of the parameters for the `matrix_heatmap` function to see what you can do!
You might find the following list of colormaps interesting:

[https://matplotlib.org/stable/users/explain/colors/colormaps.html](https://matplotlib.org/stable/users/explain/colors/colormaps.html)

- `matrix` (np.ndarray): A 2D numpy array containing the data to be plotted.
- `ax` (matplotlib.axes.Axes, optional): The matplotlib axis to plot on. Defaults to None,
    in which case an axis is created.
- `show_cell_values` (bool): Whether to show the values of the cells in the heatmap. These
    values are shown in the center of each cell and are dynamically formatted to be
    human-readable.  Defaults to True.
- `n_decimals_to_display` (int): The number of decimal places to display for the values
    in the heatmap.  Defaults to 2.
- `row_labels` (Optional(List[str])): A list of strings containing the labels for the rows
    of the heatmap. Defaults to None.
- `row_label_rotation` (Optional(float)): The rotation to apply to the row labels.
    Defaults to None.
- `row_legend` (Optional(Dict[str, str])): A dictionary mapping row labels to legend
    descriptions. Defaults to None.
- `column_labels` (Optional(List[str])): A list of strings containing the labels for the
    columns of the heatmap. Defaults to None.
- `column_label_rotation` (Optional(float)): The rotation to apply to the column labels.
    Defaults to None.
- `column_legend` (Optional(Dict[str, str])): A dictionary mapping column labels to legend
    descriptions. Defaults to None.
- `cell_color_map` (Optional(Union[str, matplotlib.colors.Colormap])): The color map to use
    for the heatmap. Defaults to `PRGn` if the matrix contains negative values and
    `Greens` otherwise.
- `cell_font_size` (Optional(int)): The font size to use for the cell values. Defaults to
    None, which will then use dynamic font size based on the number of cells and the
    figure size.
- `cell_spacing` (float): The spacing between the cells in the heatmap. Defaults to 0.5.
- `cell_divider_color` (str): The color to use for the cell dividers for spacing cells.
    Defaults to "white".
- `show_colorbar` (bool): Whether to show the colorbar for the heatmap. Defaults to False.
- `legend_font_size` (float): The font size to use for the legend. Defaults to 10.0.
- `legend_location` (str): The location to place the legend. Defaults to "center left".
- `legend_bbox_to_anchor` (Tuple[float, float]): The bounding box to anchor the legend to.
    Defaults to (1.03, 0.5).

# Pairwise Comparison Graph

For this, we just want to show you how to plot the pairwise comparison graph
since it can sometimes be enlightening when studying ranked systems.

In [None]:
pwcg = PairwiseComparisonGraph(profile)
dominating_tiers = pwcg.get_dominating_tiers()

print("The dominating tiers are: ")
for tier in dominating_tiers:
    print(tier)

if pwcg.has_condorcet_winner():
    print(f"\nThe Condorcet candidate is: {next(iter(dominating_tiers[0]))}")
else:
    print(f"\n There is no unique Condorcet winner. The top tier is {dominating_tiers[0]}")

In [None]:
pwcg.draw()

Okay, so maybe the full graph is a lot. Instead, let's just look at the viable
candidates. Here we can see that Candace Avalos is also the condorcet winner.

In [None]:
pwcg.draw(candidate_list = viable_cands)