# B-Call: Integrating Ideological Position and Political Cohesion in Legislative Voting Models


This code provides the `BCall` class, which allows you to replicate the results from the BCall paper and apply the method to other legislative data. This class requires two mandatory parameters: (1) `rollcall`, a DataFrame containing the numerical voting records of legislators, and (2) `clustering`, another DataFrame with a single column that categorizes legislators into two clusters. Additionally, there are optional parameters: `pivot`, a legislator whom the model will position on the right side; and `threshold`, which filters out legislators who do not meet a minimum voting participation rate.

Additionally, the code includes the `Clustering` class, an algorithm described in the appendix of the paper that clusters legislators into two groups. It is important to note that `BCall` can work with any division of legislators, so it is not necessary to use the `Clustering` algorithm. Other models, such as Agglomerative Clustering, K-Means, or Gaussian Mixtures, can also be used. User-defined arbitrary divisions, such as by party or coalition, can also be used as long as they meet the criteria of the `clustering` parameter of `BCall`, which requires a DataFrame with a single column containing only two distinct values representing the clusters.

This document is divided into two parts. The first part uses voting data from the 2021 U.S. House of Representatives to calculate `BCall` and compare it with the results from Nominate. Pearson and Spearman correlations, as reported in the paper, are calculated for that legislative session. The second part combines the voting data from the House of Representatives for 2021 and 2022, clusters the legislators using the `Clustering` algorithm, and then calculates `BCall`. A replication of the chart shown in the paper for the 2021-2022 House is also visualized.

In [1]:
from collections import defaultdict
import altair as alt
import pandas as pd
import numpy as np

## 4. Comparison with other models
This part uses voting data from the 2021 U.S. House of Representatives to calculate `BCall` and compare it with the results from Nominate. Pearson and Spearman correlations, as reported in the paper, are calculated for that legislative session. 

In [2]:
class BCall:
    def __init__(self, rollcall: pd.DataFrame, clustering: pd.DataFrame, 
                 pivot: str = "", threshold: float = 0.1):
        """
        Initialize the BCall object with rollcall, clustering, and optional legislators data.
        
        Parameters:
        - rollcall (pd.DataFrame): DataFrame containing the roll call data.
        - clustering (pd.DataFrame): DataFrame containing clustering information.
        - pivot (str): The pivot legislator for clustering (right-wing).
        - threshold (float): The threshold for participation filtering.
        """
        self.rollcall = self._validate_dataframe(rollcall, 'rollcall')
        self.clustering = self._validate_dataframe(clustering, 'clustering')

        self.pivot = pivot
        self.threshold = threshold

        self._validate_inputs()
        self.stats = self._calculate()

    def _validate_dataframe(self, df: pd.DataFrame, name: str) -> pd.DataFrame:
        """
        Validate that the input is a DataFrame-like object with a 'shape' attribute.

        Parameters:
        - df (pd.DataFrame): The DataFrame to validate.
        - name (str): The name of the DataFrame (for error messages).

        Returns:
        - pd.DataFrame: The validated DataFrame.
        """
        if not hasattr(df, 'shape'):
            raise ValueError(f"{name} must be a DataFrame or similar object with a 'shape' attribute.")
        return df

    def _validate_inputs(self):
        """Validate the inputs provided to the BCall object."""
        if self.clustering.shape[0] != self.rollcall.shape[0]:
            raise ValueError("Length of clustering must match the number of rows in rollcall.")
        
        if self.clustering.shape[1] != 1:
            raise ValueError("clustering must only have one column.")
        
        self.clustering = self.clustering.squeeze()
        
        if self.clustering.nunique() != 2:
            raise ValueError("clustering must have only two unique values.")
        
        if self.pivot not in self.rollcall.index:
            raise ValueError("pivot must be an element of the rollcall's index.")

    def _calculate(self) -> pd.DataFrame:
        """Calculate the statistics for the rollcall data based on clustering."""
        print(f'Rollcall dataframe contains {self.rollcall.shape[0]} legislators and {self.rollcall.shape[1]} votes.')

        participation = 1 - self.rollcall.isna().sum(axis=1) / self.rollcall.shape[1]
        self.rollcall = self.rollcall[participation > self.threshold]
        self.clustering = self.clustering.loc[self.rollcall.index]

        print(f'{self.rollcall.shape[0]} legislators meet the participation threshold.')

        if self.pivot not in self.clustering.index:
            raise ValueError("Choose another pivot as the previous one does not meet the participation threshold.")

        pivot_cluster = self.clustering.loc[self.pivot]
        left_legislators = self.clustering[self.clustering != pivot_cluster].index
        right_legislators = self.clustering[self.clustering == pivot_cluster].index

        if left_legislators.empty:
            raise ValueError("There must be at least one legislator in the left cluster.")
        
        left_mean = self.rollcall.loc[left_legislators].mean(axis=0)
        right_mean = self.rollcall.loc[right_legislators].mean(axis=0)

        overall_mean = self.rollcall.mean(axis=0)
        overall_std = self.rollcall.std(axis=0)

        standardization = (self.rollcall - overall_mean) / overall_std * (2 * (left_mean < right_mean).astype(int) - 1)

        stats = pd.DataFrame({
            'd1': standardization.mean(axis=1),
            'd2': standardization.std(axis=1)
        })

        return stats

    def plot(self) -> alt.Chart:
        """
        Plot the d1 and d2 statistics, color-coded by cluster.
        
        Returns:
        - alt.Chart: An Altair chart object.
        """
        df = self.stats.copy()
        df['cluster'] = self.clustering
        df.reset_index(inplace=True)
        df.columns = ['legislators', 'd1', 'd2', 'cluster']

        plot = alt.Chart(df).mark_circle().encode(
            x='d1',
            y='d2',
            tooltip='legislators',
            color='cluster'
        ).properties(
            width=400,
            height=400
        )

        return plot


In [3]:
rollcall = pd.read_csv('data/USA-House-2021-rollcall.csv', index_col=0)
rollcall.head()

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,439,440,441,442,443,444,445,446,447,448
legislator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"ADAMS, Alma",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
"ADERHOLT, Robert",-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
"AGUILAR, Peter Rey",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
"ALLEN, Rick W.",-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
"ALLRED, Colin",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [4]:
# This DataFrame is not used. It is provided for cases where the user wants to cluster BCall based on 
# the actual political parties of the legislators, instead of using the clustering algorithm.
legislators = pd.read_csv('data/legislators.csv', index_col=0)
legislators

Unnamed: 0_level_0,party
legislator,Unnamed: 1_level_1
"ADAMS, Alma",DEM
"ADERHOLT, Robert",REP
"AGUILAR, Peter Rey",DEM
"ALLEN, Rick W.",REP
"ALLRED, Colin",DEM
...,...
"WRIGHT, Ron",REP
"YAKYM, Rudy, III",REP
"YARMUTH, John",DEM
"YOUNG, Donald Edwin",REP


In [5]:
nominate = pd.read_csv('data/USA-House-2021-nominate.csv', index_col=0).dropna()
nominate

Unnamed: 0_level_0,x1,x2
legislators,Unnamed: 1_level_1,Unnamed: 2_level_1
"ADAMS, Alma",-0.745453,0.159787
"ADERHOLT, Robert",0.605108,0.136112
"AGUILAR, Peter Rey",-0.699631,0.119065
"ALLEN, Rick W.",0.691193,0.205900
"ALLRED, Colin",-0.622715,0.178018
...,...,...
"WITTMAN, Robert J.",0.554464,0.448706
"WOMACK, Steve",0.493297,0.755664
"YARMUTH, John",-0.797804,-0.138937
"YOUNG, Donald Edwin",0.414111,0.239458


In [6]:
clustering = pd.DataFrame((nominate['x1'] > 0))
rollcall = rollcall.loc[clustering.index]
rollcall = rollcall.loc[clustering.index]

b_call = BCall(rollcall, clustering, pivot='YOUNG, Donald Edwin', threshold=0)

Rollcall dataframe contains 439 legislators and 448 votes.
439 legislators meet the participation threshold.


In [7]:
b_call.stats

Unnamed: 0_level_0,d1,d2
legislators,Unnamed: 1_level_1,Unnamed: 2_level_1
"ADAMS, Alma",-0.667770,0.413026
"ADERHOLT, Robert",0.698168,0.654742
"AGUILAR, Peter Rey",-0.661398,0.416099
"ALLEN, Rick W.",0.859528,0.815093
"ALLRED, Colin",-0.643222,0.435236
...,...,...
"WITTMAN, Robert J.",0.621259,0.626555
"WOMACK, Steve",0.544708,0.652289
"YARMUTH, John",-0.698832,0.418940
"YOUNG, Donald Edwin",0.435018,0.745566


In [8]:
b_call.clustering

legislators
ADAMS, Alma            False
ADERHOLT, Robert        True
AGUILAR, Peter Rey     False
ALLEN, Rick W.          True
ALLRED, Colin          False
                       ...  
WITTMAN, Robert J.      True
WOMACK, Steve           True
YARMUTH, John          False
YOUNG, Donald Edwin     True
ZELDIN, Lee M           True
Name: x1, Length: 439, dtype: bool

In [9]:
b_call.plot()

Below are the correlations between the first dimension of BCall and Nominate, and the correlation is plotted.

In [10]:
comparator = pd.DataFrame({
    'b-call' : b_call.stats['d1'],
    'nominate' : nominate['x1']
})

comparator.corr().round(3)

Unnamed: 0,b-call,nominate
b-call,1.0,0.988
nominate,0.988,1.0


In [11]:
comparator.corr(method="spearman").round(3)

Unnamed: 0,b-call,nominate
b-call,1.0,0.985
nominate,0.985,1.0


In [12]:
alt.Chart(comparator.reset_index()).mark_circle().encode(
    x = 'b-call',
    y = 'nominate',
    tooltip='legislators'
).properties(
    width=400,
    height=400
)

## 5. Empirical evaluation of B-Call and advantages over other models
This part combines the voting data from the House of Representatives for 2021 and 2022, clusters the legislators using the `Clustering` algorithm, and then calculates `BCall`. A replication of the chart shown in the paper for the 2021-2022 House is also visualized.

In [13]:
class Clustering:
    def __init__(self, rollcalls: pd.DataFrame, N: int = 1, pivot: str = None):
        """
        Initialize the Clustering object.

        Parameters:
        - rollcalls (pd.DataFrame): DataFrame containing roll call data.
        - N (int): The distance metric (1 for Manhattan, 2 for Euclidean).
        - pivot (str, optional): The legislator to use as the pivot for determining left and right clusters.
        """
        self.N = N
        self.X = rollcalls

        if pivot is None:
            pivot = rollcalls.index[0]

        self.clusters = defaultdict(list)
        self._initialize_clusters(pivot)
        self._iterate_clustering()

        self._assign_clusters(pivot)
        self._create_clustering_dataframe()
        
        self.reclassify_legislators()

    def _initialize_clusters(self, pivot: str):
        """Initialize the two clusters with the farthest legislators."""
        distances = self.get_distances(self.X, self.X)
        idx0, idx1 = np.unravel_index(np.argmax(distances), distances.shape)
        leg0, leg1 = self.X.index[idx0], self.X.index[idx1]

        self.clusters[0].append(leg0)
        self.clusters[1].append(leg1)

    def _iterate_clustering(self):
        """Iteratively assign legislators to the nearest cluster."""
        for _ in range(len(self.X) - 2):
            self.iteration()

    def _assign_clusters(self, pivot: str):
        """Assign left and right clusters based on the pivot legislator."""
        if pivot in self.clusters[0]:
            self.is_right = 0
            self.clusters['right'] = self.clusters[0]
            self.clusters['left'] = self.clusters[1]
        elif pivot in self.clusters[1]:
            self.is_right = 1
            self.clusters['left'] = self.clusters[0]
            self.clusters['right'] = self.clusters[1]
        else:
            raise ValueError("Pivot legislator not found in either cluster.")

    def _create_clustering_dataframe(self):
        """Create a DataFrame to store clustering results."""
        clustering_data = [(x, 'left') for x in self.clusters['left']] + [(x, 'right') for x in self.clusters['right']]
        self.clustering = pd.DataFrame(clustering_data, columns=['legislators', 'cluster'])
        self.clustering.set_index('legislators', inplace=True)
        self.clustering = self.clustering.loc[self.X.index]

    def iteration(self):
        """Assign the next legislator to the nearest cluster."""
        remaining_legislators = self.X.drop(self.clusters[0] + self.clusters[1])
        centroid0 = self.X.loc[self.clusters[0]].mean(axis=0)
        centroid1 = self.X.loc[self.clusters[1]].mean(axis=0)

        centroids = np.vstack((centroid0, centroid1))
        distances = self.get_distances(remaining_legislators, centroids)
        idx0, idx1 = np.unravel_index(np.argmin(distances), distances.shape)
        self.clusters[idx1].append(remaining_legislators.index[idx0])

    def get_distances(self, tensor1: pd.DataFrame, tensor2: pd.DataFrame) -> np.ndarray:
        """
        Calculate the distances between two sets of vectors.

        Parameters:
        - tensor1 (pd.DataFrame): The first set of vectors.
        - tensor2 (pd.DataFrame): The second set of vectors.

        Returns:
        - np.ndarray: A matrix of distances between the vectors in tensor1 and tensor2.
        """
        tensor1, tensor2 = np.array(tensor1), np.array(tensor2)

        qty1, qty2 = (~np.isnan(tensor1)).astype(int), (~np.isnan(tensor2)).astype(int)
        norm = np.dot(qty1, qty2.T)

        diff = tensor1[:, None] - tensor2

        if self.N == 1:
            distances = np.nansum(np.abs(diff), axis=2)
            norm *= 2
        elif self.N == 2:
            distances = np.sqrt(np.nansum(diff**2, axis=2))
            norm = 2 * np.sqrt(norm)
        else:
            raise ValueError("Unsupported distance metric. Use N=1 for Manhattan or N=2 for Euclidean.")
        
        with np.errstate(divide='ignore', invalid='ignore'):
            result = np.divide(distances, norm)
            result[~np.isfinite(result)] = 0

        return result
    
    def reclassify_legislators(self):
        """
        Reclassify legislators who are closer to the centroid of the opposite cluster.

        This method iteratively adjusts the clusters until all legislators are correctly assigned
        to the cluster with the smallest distance to its centroid.
        """
        
        adjustments_needed = True
        
        while adjustments_needed:
            
            centroid0 = self.X.loc[self.clusters['left']].mean(axis=0)
            centroid1 = self.X.loc[self.clusters['right']].mean(axis=0)

            centroids = np.vstack((centroid0, centroid1))
            distances = self.get_distances(self.X, centroids)

            distances = pd.DataFrame(distances, index=self.X.index)
            distances.loc[self.clusters['left'], 'cluster'] = 'left'
            distances.loc[self.clusters['right'], 'cluster'] = 'right'

            distances['distance'] = (distances[0] < distances[1]).replace({True : 'left', False : 'right'})

            wrong_classified = distances[distances['distance'] != distances['cluster']]
            
            if wrong_classified.shape[0] == 0:
                adjustments_needed = False
                
            else:
                wrong_left = wrong_classified[wrong_classified['cluster'] == 'left']
                wrong_right = wrong_classified[wrong_classified['cluster'] == 'right']
                
                for leg in wrong_left.index:
                    self.clusters['left'].remove(leg)
                    self.clusters['right'].append(leg)
                    
                for leg in wrong_right.index:
                    self.clusters['right'].remove(leg)
                    self.clusters['left'].append(leg)
        
        self._create_clustering_dataframe()


In [14]:
rollcall_2021 = pd.read_csv('data/USA-House-2021-rollcall.csv', index_col=0)
rollcall_2022 = pd.read_csv('data/USA-House-2022-rollcall.csv', index_col=0)
rollcall = pd.concat([rollcall_2021, rollcall_2022], axis=1)
rollcall

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,987,988,989,990,991,992,993,994,995,996
legislator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"ADAMS, Alma",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,,1.0
"ADERHOLT, Robert",-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0,-1.0
"AGUILAR, Peter Rey",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0
"ALLEN, Rick W.",-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,...,1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,-1.0
"ALLRED, Colin",1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"WRIGHT, Ron",-1.0,,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,...,,,,,,,,,,
"YAKYM, Rudy, III",,,,,,,,,,,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0,-1.0
"YARMUTH, John",1.0,,1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,...,1.0,,,,,,,1.0,-1.0,1.0
"YOUNG, Donald Edwin",-1.0,1.0,,,-1.0,1.0,-1.0,1.0,-1.0,-1.0,...,,,,,,,,,,


In [15]:
clustering = Clustering(rollcall, pivot='YOUNG, Donald Edwin')
clustering.clustering

Unnamed: 0_level_0,cluster
legislator,Unnamed: 1_level_1
"ADAMS, Alma",left
"ADERHOLT, Robert",right
"AGUILAR, Peter Rey",left
"ALLEN, Rick W.",right
"ALLRED, Colin",left
...,...
"WRIGHT, Ron",right
"YAKYM, Rudy, III",right
"YARMUTH, John",left
"YOUNG, Donald Edwin",right


In [16]:
b_call = BCall(rollcall, clustering.clustering, pivot='YOUNG, Donald Edwin', threshold=0.1)
b_call.plot()

Rollcall dataframe contains 457 legislators and 996 votes.
443 legislators meet the participation threshold.
