Merge pull request #121 from cdonnay/v2.0.0

V2.0.0
mggg · Mar 1, 2024 · a1c2526 · a1c2526
2 parents 0606b97 + c393d45
commit a1c2526
Show file tree

Hide file tree

Showing 31 changed files with 2,403 additions and 958 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@ dist/
 .ipynb_checkpoints
 .idea
 extra_data/
+.venv
diff --git a/docs/SCR_ballot_generators.md b/docs/SCR_ballot_generators.md
@@ -16,30 +16,23 @@ The Impartial Anonymous Culture model has $\alpha = 1$. This means that the poin
 
 ## Candidate Simplex Models
 
-### Plackett-Luce
+### Name-Plackett-Luce
+The name-Plackett-Luce model (n-PL) samples ranked ballots as follows. Assume there are $n$ blocs of voters. Within a bloc, say bloc $A$, voters have $n$ preference intervals, one for each slate of candidates. A bloc also has a fixed $n$-tuple of cohesion parameters $\pi_A = (\pi_{AA}, \pi_{AB},\dots)$; we require that $\sum_B \pi_{AB}=1$. To generate a ballot for a voter in bloc $A$, each preference interval $I_B$ is rescaled by the corresponding cohesion parameter $\pi_{AB}$, and then concatenated to create one preference interval. 
+Voters then sample without replacement from the combined preference interval.
 
-The Plackett-Luce model (PL) samples ranked ballots as follows. Given a bloc's preference interval, it samples candidates without replacement from the interval. That means when a candidate is selected, their portion of the interval is removed, and the interval is normalized to be length 1 again. 
+### Name-Bradley-Terry
+The name-Bradley-Terry model (n-BT) samples ranked ballots as follows. Assume there are $n$ blocs of voters. Within a bloc, say bloc $A$, voters have $n$ preference intervals, one for each slate of candidates. A bloc also has a fixed $n$-tuple of cohesion parameters $\pi_A = (\pi_{AA}, \pi_{AB},\dots)$; we require that $\sum_B \pi_{AB}=1$. To generate a ballot for a voter in bloc $A$, each preference interval $I_B$ is rescaled by the corresponding cohesion parameter $\pi_{AB}$, and then concatenated to create one preference interval. 
+Voters then sample ballots proportional to pairwise probabilities of candidates. That is, the probability that the ballot $C_1>C_2>C_3$ is sampled is proprotional to $P(C_1>C_2)P(C_2>C_3)P(C_1>C_3)$, where these pairwise probabilities are given by $P(C_1>C_2) = C_1/(C_1+C_2)$.
+Here $C_i$ denotes the length of $C_i$'s share of the combined preference interval.
 
-- The PL model generates full ballots, with the caveat that any candidates with 0 support are listed as ties at the end of the ballot.
+### Name-Cumulative
+The name-Cumulative model (n-C) samples ranked ballots as follows. Assume there are $n$ blocs of voters. Within a bloc, say bloc $A$, voters have $n$ preference intervals, one for each slate of candidates. A bloc also has a fixed $n$-tuple of cohesion parameters $\pi_A = (\pi_{AA}, \pi_{AB},\dots)$; we require that $\sum_B \pi_{AB}=1$. To generate a ballot for a voter in bloc $A$, each preference interval $I_B$ is rescaled by the corresponding cohesion parameter $\pi_{AB}$, and then concatenated to create one preference interval. To generate a ballot, voters sample with replacement from the combined interval as many times as determined by the length of the desired ballot.
 
-- It can be initialized directly from a set of preference intervals (one for each bloc), or by using [from_params](api.md#ballot-generators). This method uses cohesion and Dirichlet parameters.
+### Slate-Plackett-Luce
+The slate-Plackett-Luce model (s-PL) samples ranked ballots as follows. Assume there are $n$ blocs of voters. Within a bloc, say bloc $A$, voters have $n$ preference intervals, one for each slate of candidates. A bloc also has a fixed $n$-tuple of cohesion parameters $\pi_A = (\pi_{AA}, \pi_{AB},\dots)$; we require that $\sum_B \pi_{AB}=1$. Now the cohesion parameters play a different role than in the name models above. For s-PL, $\pi_{AB}$ gives the probability that we put a $B$ candidate in each position on the ballot. If we have already exhausted the number of $B$ candidates, we remove $\pi_{AB}$ and renormalize. Once we have a ranking of the slates on the ballot, we fill in candidate ordering by sampling without replacement from each individual preference interval (we do not concatenate them!).
 
-- The PL model can handle arbitrarily many blocs.
-
-- The PL model also requires information about what proportion of voters belong to each bloc.
-
-### Bradley-Terry
-
-The Bradley-Terry model (BT) samples ranked ballots as follows. Given a preference interval, the probability of sampling the ballot $A>B>C$ is equal to the product of the probabilities $P(A>B)P(B>C)P(A>C)$. One of these probabilities can be computed as $P(A>B) = A/(A+B)$, where we let $A$ denote both the candidate and the length of its interval.
-
-
-- The BT model generates full ballots, with the caveat that any candidates with 0 support are listed as ties at the end of the ballot.
-
-- It can be initialized directly from a set of preference intervals (one for each bloc), or by using [from_params](api.md#ballot-generators). This method uses cohesion and Dirichlet parameters.
-
-- The BT model can handle arbitrarily many blocs.
-
-- The BT model also requires information about what proportion of voters belong to each bloc.
+### Slate-Bradley-Terry
+The slate-Bradley-Terry model (s-BT) samples ranked ballots as follows. We assume there are 2 blocs of voters. Within a bloc, say bloc $A$, voters have 2 preference intervals, one for each slate of candidates. A bloc also has a fixed tuple of cohesion parameters $\pi_A = (\pi_A, 1-\pi_A)$. Now the cohesion parameters play a different role than in the name models above. For s-BT, we again start by filling out a ballot with bloc labels only. Now, the probability that we sample the ballot $A>A>B$ is proportional to $\pi_A^2$; just like name-Bradley-Terry, we are computing pairwise comparisons. In $A>A>B$, slate $A$ must beat slate $B$ twice. As another example, the probability of $A>B>A$ is proportional to $\pi_A(1-\pi_A)$. Once we have a ranking of the slates on the ballot, we fill in candidate ordering by sampling without replacement from each individual preference interval (we do not concatenate them!).
 
 ### Alternating-Crossover
 

diff --git a/docs/SCR_simplex.md b/docs/SCR_simplex.md
@@ -44,12 +44,11 @@ The value $\alpha$ is never allowed to be 0 or $\infty$, so VoteKit uses an arbi
 
 ### Cohesion Parameters
 
-When there are multiple blocs, or types, of voters, we utilize cohesion parameters to measure how much voters prefer candidates from their own bloc versus the opposing blocs. Suppose there are two blocs of voters, $X,Y$. We assume that voters from the $X$ bloc have some underlying [preference interval](SCR_preference_intervals.md) $I_{XX}$ for candidates within their bloc, and a different underlying preference interval $I_{XY}$ for the candidates in the opposing bloc . We then assume that voters in $X$ prefer $X$ candidates with proportion $\pi_X$.
-
-In order to construct one preference interval for $X$ voters, we take $I_{XX}$ and scale it by $\pi_X$, then we take $I_{XY}$ and scale it by $1-\pi_X$, and finally we concatenate the two. As a concrete example, if $\pi_X = .75$, this means that 3/4 of the preference interval for $X$ voters is taken up by candidates from the $X$ bloc, and the other 1/4 by $Y$ candidates.
+When there are multiple blocs, or types, of voters, we utilize cohesion parameters to measure how much voters prefer candidates from their own bloc versus the opposing blocs. In our name models, like `name_PlackettLuce` or `name_BradleyTerry`, the cohesion parameters operate as follows. Suppose there are two blocs of voters, $X,Y$. We assume that voters from the $X$ bloc have some underlying [preference interval](SCR_preference_intervals.md) $I_{XX}$ for candidates within their bloc, and a different underlying preference interval $I_{XY}$ for the candidates in the opposing bloc. In order to construct one preference interval for $X$ voters, we take $I_{XX}$ and scale it by $\pi_X$, then we take $I_{XY}$ and scale it by $1-\pi_X$, and finally we concatenate the two. As a concrete example, if $\pi_X = .75$, this means that 3/4 of the preference interval for $X$ voters is taken up by candidates from the $X$ bloc, and the other 1/4 by $Y$ candidates.
 
 ![](assets/cohesion_parameters.png)
 
+In our slate models, like `slate_PlackettLuce`, the cohesion parameter is used to determine the probability of sampling a particular slate at each position in the ballot. How exactly this is done depends on the model. Then candidate names are filled in afterwards by sampling without replacement from each preference interval.
 ### Combining Dirichlet and Cohesion
 
 When there are multiple blocs of voters, we need more than one $\alpha$ value for the Dirichlet distribution. Suppose there are two blocs of voters, $X,Y$. Then we need four values, $\alpha_{XX}, \alpha_{XY}, \alpha_{YX}, \alpha_{YY}$. The value $\alpha_{XX}$ determines what kind of preferences $X$ voters will have for $X$ candidates. The value $\alpha_{XY}$ determines what kind of preferences $X$ voters have for $Y$ candidates. We sample preference intervals from the candidate simplex using these $\alpha$ values, and then use cohesion parameters to combine them into a single interval, one for each bloc. This is how [from_params](api.md#ballot-generators) initializes different ballot generator models.

diff --git a/docs/api.md b/docs/api.md
@@ -10,6 +10,10 @@ hide:
     rendering:
     heading_level: 4
 
+### ::: votekit.pref_interval
+    rendering:
+    heading_level: 4
+
 ### ::: votekit.pref_profile
     rendering:
       heading_level: 4
@@ -39,13 +43,16 @@ hide:
         members:
             - BallotGenerator
             - BallotSimplex
-            - PlackettLuce
-            - BradleyTerry
+            - slate_PlackettLuce
+            - name_PlackettLuce
+            - slate_BradleyTerry
+            - name_BradleyTerry
             - AlternatingCrossover
             - CambridgeSampler
             - OneDimSpatial
             - ImpartialCulture
             - ImpartialAnonymousCulture
+            - name_Cumulative
 
 ## Elections
 ### ::: votekit.elections.election_types

diff --git a/src/votekit/__init__.py b/src/votekit/__init__.py
@@ -1,14 +1,19 @@
 from .ballot_generator import (  # noqa
-    PlackettLuce,
-    BradleyTerry,
+    name_PlackettLuce,
+    name_BradleyTerry,
     BallotSimplex,
     ImpartialCulture,
     ImpartialAnonymousCulture,
     CambridgeSampler,
     AlternatingCrossover,
+    name_Cumulative,
+    slate_BradleyTerry,
+    slate_PlackettLuce,
 )
+from .pref_interval import PreferenceInterval
 from .ballot import Ballot  # noqa
 from .pref_profile import PreferenceProfile  # noqa
+from .pref_interval import PreferenceInterval  # noqa
 from .cleaning import (  # noqa
     remove_empty_ballots,
     deduplicate_profiles,

diff --git a/src/votekit/ballot.py b/src/votekit/ballot.py
@@ -12,10 +12,11 @@ class Ballot:
 
     **Attributes**
     `ranking`
-    :   list of candidate ranking. Entry i of the list is a set of candidates ranked in position i.
+    :   tuple of candidate ranking. Entry $i$ of the tuple is a frozenset of candidates ranked
+        in position $i$.
 
     `weight`
-    :   weight assigned to a given a ballot. Defaults to 1.
+    :   (Fraction) weight assigned to a given a ballot. Defaults to 1.
 
     `voter_set`
     :   optional set of voters who cast a given a ballot.
@@ -24,7 +25,7 @@ class Ballot:
     :   optional ballot id.
     """
 
-    ranking: list[set] = field(default_factory=list)
+    ranking: tuple[frozenset, ...] = field(default_factory=tuple)
     weight: Fraction = Fraction(1, 1)
     voter_set: Optional[set[str]] = None
     id: Optional[str] = None
@@ -62,7 +63,7 @@ def __eq__(self, other):
         return True
 
     def __hash__(self):
-        return hash(str(self.ranking))
+        return hash(self.ranking)
 
     def __str__(self):
         weight_str = f"Weight: {self.weight}\n"