# Basic Analysis
#### It's a little more than basic now, but *c'est la vie*
Now that we've cleaned up our data and have only the features we care about, we can run some basic statistical analysis to see if we can find any obvious patterns or interesting insights.

<sub>To see how cleaning happened, check out [data_cleaning.ipynb](https://github.com/silas-wunder/LegiScan/blob/master/data_cleaning.ipynb)</sub>

In [1]:
import pandas as pd
import numpy as np
import lzma, ast, gc
from scipy.sparse import lil_matrix, csr_matrix, save_npz, load_npz
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
with lzma.open("./cleaned_input/bills.pkl.xz", 'r') as f:
    bills = pd.read_pickle(f)
with lzma.open("./cleaned_input/people.pkl.xz", 'r') as f:
    people = pd.read_pickle(f)
with lzma.open("./cleaned_input/votes.pkl.xz", 'r') as f:
    votes = pd.read_pickle(f)

Let's take a quick look at our people dataframe, there are some interesting things going on that might be interesting to point out.

In [3]:
people

Unnamed: 0_level_0,Name,Party,Role,State,District
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6033,Carl Gatto,R,Rep,AK,HD-013
6034,Robert Lynn,R,Rep,AK,HD-026
6035,Max Gruenberg,D,Rep,AK,HD-016
6036,Nancy Dahlstrom,R,Rep,AK,HD-018
6037,Wes Keller,R,Rep,AK,HD-010
...,...,...,...,...,...
8675,Cale Case,R,Sen,WY,SD-025
8679,Dan Dockstader,R,Sen,WY,SD-016
8711,Dan Zwonitzer,R,Rep,WY,HD-043
8713,Bob Nicholas,R,Rep,WY,HD-008


Woah, 177,598 people have served in elected legislative positions since 2008? That seems wrong, I suspect there's probably a fair number of duplicates in there. Let's look at the dataframe with the duplicates removed.

In [4]:
people.loc[~people.duplicated()]

Unnamed: 0_level_0,Name,Party,Role,State,District
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6033,Carl Gatto,R,Rep,AK,HD-013
6034,Robert Lynn,R,Rep,AK,HD-026
6035,Max Gruenberg,D,Rep,AK,HD-016
6036,Nancy Dahlstrom,R,Rep,AK,HD-018
6037,Wes Keller,R,Rep,AK,HD-010
...,...,...,...,...,...
24296,Forrest Chadwick,R,Rep,WY,HD-062
24307,Joshua Larson,R,Rep,WY,HD-017
24311,Stacy Jones,R,Sen,WY,SD-013
24375,Liz Storer,D,Rep,WY,HD-023


Much better, 21,761 is far more reasonable. It's important to note how we have removed duplicates, as we only removed rows that were exactly the same, representing people who served in the same position in multiple years. Some people have served in different positions or different districts, so it's important to keep those "duplicates," even though the indexes are the same because we do get some interesting information from that.

In [5]:
people = people.loc[~people.duplicated()]

Now we want to combine this with our people data so we can see how each person voted simply by looking at their information, rather than scraping through all votes. To do this, we'll collect all the votes of each person into a new dataframe then merge it with the current `people` dataframe.

In [6]:
try:
    with lzma.open("./cleaned_input/people_votes.pkl.xz", 'r') as f:
        people = pd.read_pickle(f)
except FileNotFoundError:
    people_votes = {}
    for roll_call in votes.index:
        vote = ast.literal_eval(votes.loc[roll_call]["Votes"])
        bill = votes.loc[roll_call]["Bill ID"]
        for person, v in vote:
            if person in people_votes.keys():
                people_votes[person].append((bill, roll_call, v))
            else:
                people_votes[person] = [(bill, roll_call, v)]
    for p in people_votes.keys():
        people_votes[p] = f"{people_votes[p]}"
    people_votes_df = pd.DataFrame.from_records(people_votes, index=["Votes"]).T
    people = people.join(people_votes_df)
    people.to_pickle("./cleaned_input/people_votes.pkl.xz")

Now that that's done, let's create a massive matrix of all people and all votes to make comparisons easy. This will take up a massive amount of space (18,000 x 1,292,603), but it will be insanely sparse, so we'll take advantage of that and use sparse matrix representation offered by SciPy.

In [7]:
try:
    smaller_votes = load_npz("./similarity_matrix_votes.npz")
except FileNotFoundError:
    people_votes_matrix = lil_matrix((np.max(people.index) + 1, np.max(votes.index) + 1), dtype="uint8")
    for person in people.index:
        if type(people.loc[person]["Votes"]) != str:
            if type(people.loc[person]["Votes"]) == float:
                continue
            if type(people.loc[person]["Votes"].iloc[0]) == float:
                continue
            person_votes = ast.literal_eval(people.loc[person]["Votes"].iloc[0])
        else:
            person_votes = ast.literal_eval(people.loc[person]["Votes"])
        for vote in person_votes:
            vote_id = vote[1]
            vote_actual = vote[2]
            people_votes_matrix[person, vote_id] = 1 if vote_actual == "Yea" else -1 if vote_actual == "Nay" else 0
    smaller_votes = csr_matrix(people_votes_matrix)
    save_npz("./similarity_matrix_votes.npz", smaller_votes)
    del people_votes_matrix
    gc.collect()

Alright, now let's run cosine similarity on this to determine similarity between representatives.

In [8]:
votes_similarities = cosine_similarity(smaller_votes)

Let's make sure that this is doing what we want it to do by checking to see who is similar to Lauren Boebert (I suspect MTG will be quite similar, as well as some other QAnon wackos).

In [9]:
boebert = people.loc[people["Name"] == "Lauren Boebert"].index[0]
people.loc[votes_similarities[boebert].argsort()[::-1][:10]]

Unnamed: 0,Name,Party,Role,State,District,Votes
21927,Lauren Boebert,R,Rep,US,HD-CO-3,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21935,Marjorie Greene,R,Rep,US,HD-GA-14,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21975,Robert Good,R,Rep,US,HD-VA-5,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21952,Matt Rosendale,R,Rep,US,HD-MT,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21934,Andrew Clyde,R,Rep,US,HD-GA-9,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21941,Mary Miller,R,Rep,US,HD-IL-15,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21969,Ronny Jackson,R,Rep,US,HD-TX-13,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21970,Troy Nehls,R,Rep,US,HD-TX-22,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21929,Byron Donalds,R,Rep,US,HD-FL-19,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21928,Kat Cammack,R,Rep,US,HD-FL-3,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."


Yeah, that all checks out. These people are very similar to Boebert, so it makes sense that they would all vote similarly. Looks like cosine similarity does exactly what we want it to do. 

For sanity's sake, let's also check out Manchin to make sure Boebert's list isn't just a fluke.

In [10]:
manchin = people.loc[people["Name"] == "Joe Manchin"].index[0]
people.loc[votes_similarities[manchin].argsort()[::-1][:10]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9625,Joe Manchin,D,Sen,US,SD-WV,"[(198943, 1066553, 'Yea'), (198943, 1066554, '..."
9624,Christopher Coons,D,Sen,US,SD-DE,"[(198943, 1066553, 'Yea'), (198943, 1066554, '..."
11199,Richard Blumenthal,D,Sen,US,SD-CT,"[(339448, 216028, 'Yea'), (433171, 216031, 'Ye..."
9403,Robert Casey,D,Sen,US,SD-PA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9471,Jon Tester,D,Sen,US,SD-MT,"[(1505019, 1065977, 'Yea'), (1505019, 1065978,..."
9473,Mark Warner,D,Sen,US,SD-VA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9463,Michael Bennet,D,Sen,US,SD-CO,"[(1505019, 1065977, 'Yea'), (1505019, 1065978,..."
9408,Amy Klobuchar,D,Sen,US,SD-MN,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9425,Thomas Carper,D,Sen,US,SD-DE,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9422,Jeanne Shaheen,D,Sen,US,SD-NH,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."


Yup, that checks out. As an aside, I see a lot of familiar faces in that list, which doesn't exactly bode well.

### Clustering

Now let's add some similarity metrics to the bills so we can further compare representatives across states and legislative sessions.

In [11]:
with lzma.open("./cleaned_input/bills3.pkl.xz", 'r') as f:
    clustered_bills = pd.read_pickle(f)

In [12]:
with open("./cleaned_input/legiscam2.csv", 'r') as f:
    cluster_names = pd.read_csv(f)

In [13]:
clustered_bills

Unnamed: 0_level_0,Name,Number,State,Description,Status,Text,Progress,Sponsors,Votes,sentiment,Description2,topic
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1753037,Oceanside Academy volleyball champs,H4241,SC,recognize honor oceanside collegiate academy v...,Passed,,"[('2023-04-04', 1), ('2023-04-04', 4)]","[24305, 21741, 24124, 2202, 2205, 18085, 20147...",[],1,recognize honor oceanside collegiate academy v...,395
1604648,An Act Concerning Transit-oriented Development.,HB05429,CT,allow right development housing minimum overal...,Intro,https://legiscan.com/CT/text/HB05429/id/2542407,"[('2022-03-09', 1), ('2022-03-09', 9)]","[5759, 18265, 15086, 18262, 18281]",[],0,allow right development housing minimum overal...,2
1538680,Relating To Concessions.,HB1094,HI,provides department transportation flexibility...,Intro,https://legiscan.com/HI/text/HB1094/id/2455882,"[('2021-01-27', 1), ('2021-02-01', 9)]",[8342],[],0,provides department transportation flexibility...,-1
1698444,Bonds; authorize issuance to assist Petal Exce...,HB1557,MS,act authorize issuance state general obligatio...,Failed,https://legiscan.com/MS/text/HB1557/id/2686040,"[('2023-02-07', 1), ('2023-02-07', 9), ('2023-...",[6527],[],0,act authorize issuance state general obligatio...,147
1680034,"AN ACT to amend Tennessee Code Annotated, Titl...",SB0503,TN,"introduced, extends october november, time wit...",Intro,https://legiscan.com/TN/text/SB0503/id/2673085,"[('2023-01-25', 1)]",[21356],[],0,"introduced, extends october november, time wit...",130
...,...,...,...,...,...,...,...,...,...,...,...,...
1660327,Mississippi Rural Physicians Scholarship Resid...,SB2315,MS,"act amend section 37-144-3, mississippi code 1...",Failed,https://legiscan.com/MS/text/SB2315/id/2642210,"[('2023-01-16', 1), ('2023-01-16', 9), ('2023-...",[21460],[],0,"act amend section 37-144-3, mississippi code 1...",389
1729520,Minneapolis Minnesota Veterans Home historic b...,SF2731,MN,minneapolis minnesota veterans home historic b...,Intro,https://legiscan.com/MN/text/SF2731/id/2731308,"[('2023-03-08', 1), ('2023-03-08', 9)]",[6301],[],0,minneapolis minnesota veterans home historic b...,-1
1708728,INC TX-LONG TERM CARE,HB2598,IL,amends illinois income tax act. creates income...,Intro,https://legiscan.com/IL/text/HB2598/id/2700320,"[('2023-02-15', 1), ('2023-02-15', 9)]","[22579, 20901]",[],0,amends illinois income tax act. creates income...,166
1556975,Prohibits toll-free passage for SJTA employees...,A1305,NJ,"prohibits toll-free passage sjta employees, of...",Intro,https://legiscan.com/NJ/text/A1305/id/2476422,"[('2022-01-11', 1), ('2022-01-11', 9)]",[3130],[],0,"prohibits toll-free passage sjta employees, of...",-1


It's very important to note that this is just a random sample of 20,000 bills from the past year, as running on a larger corpus was too intensive for the resources available to us. This may skew the results as we are missing quite a lot of data (1.8 million -> 20k, only 1/90 of the data).

In [14]:
cluster_names.set_index("Topic", inplace=True)
cluster_names.drop(["Unnamed: 0", "Count"], axis=1, inplace=True)

Let's check out these names, to see what kind of groups Berttopic came up with.

In [15]:
cluster_names

Unnamed: 0_level_0,Name
Topic,Unnamed: 1_level_1
-1,-1_amends_act_title
0,0_school_education_students
1,1_care_health_nursing
2,2_tax_property_income
3,3_election_elections_voter
...,...
406,406_elections_partisan_modifies
407,407_allergy_food_allergens
408,408_kidney_disease_optional
409,409_gaming_casino_casinos


With our new shiny similarity metrics, we don't want to compare every single vote to every single other vote, primarily because that matrix would be MASSIVE (a dense 20,000 x 1,200,000 with each entry being a float, about 390,000,000gb). Instead, we'll average each representative's votes on each bill, then average all votes from all bills in each category to get a general "temperature" of how each person is voting on each topic. This is fairly rudamentary, and could definitely be improved in the future, but it works for now.

First, we'll convert the indexing of the votes dataframe to better see which votes pertain to each bill. This will allow us to go bill by bill, rather than vote by vote, making temperature calculations a bit easier.

In [16]:
votes = votes.reset_index().set_index(["Bill ID", "ID"])
votes

Unnamed: 0_level_0,Unnamed: 1_level_0,Description,Passed,Votes
Bill ID,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
454312,306479,Senate: <pre> SR 1 Final Passage,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306480,Senate: CSHB 84(FIN)(efd am S) Third Reading -...,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306481,Senate: CSHB 84(FIN)(efd am S) Third Reading -...,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306482,House: Concur,True,"[(6034, 'Yea'), (6035, 'Yea'), (6037, 'Yea'), ..."
545632,306483,House: Special Order of Business,True,"[(6034, 'Yea'), (6035, 'Yea'), (6037, 'Yea'), ..."
...,...,...,...,...
1673024,1268431,Line Item Veto Override 27-3-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268432,Line Item Veto Override 29-1-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268433,Line Item Veto Override 27-3-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268434,Line Item Veto Override 23-7-1-0-0,True,"[(8641, 'Nay'), (8663, 'Nay'), (8675, 'Yea'), ..."


In [17]:
people_votes_temperatures = {}
for bill_id, vote_id in votes.index:
    # turn the string representation of the vote list into an actual list
    vote_list = ast.literal_eval(votes.loc[bill_id, vote_id]["Votes"])
    for vote in vote_list:
        # if we don't have this person yet, add them
        if vote[0] not in people_votes_temperatures:
            people_votes_temperatures[vote[0]] = {bill_id: [-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0]}
        # if we have this person but don't have this bill for them, add it
        elif bill_id not in people_votes_temperatures[vote[0]]:
            people_votes_temperatures[vote[0]][bill_id] = [-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0]
        # otherwise, add this vote to the end of the list
        else:
            people_votes_temperatures[vote[0]][bill_id].append(-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0)
# average temperatures per bill
for person in people_votes_temperatures:
    for bill in people_votes_temperatures[person]:
        people_votes_temperatures[person][bill] = np.mean(people_votes_temperatures[person][bill])

Now we'll do something similar, but this time we'll incorporate the topics of each bill and give each person a temperature on a topic, rather than specific bills.

In [18]:
people_topics_temperature = {}
for person in people_votes_temperatures:
    for bill in people_votes_temperatures[person]:
        # if we don't have information for this bill, we'll skip it
        if bill not in clustered_bills.index:
            continue
        # grab topic and sentiment of this bill
        bill_topic = clustered_bills.loc[bill]["topic"]
        bill_sentiment = clustered_bills.loc[bill]["sentiment"]
        # if the sentiment is neutral, count as positive to not zero temperatures
        if bill_sentiment == 0:
            bill_sentiment = 1
        # if we don't have this person yet, add them
        # bill temperatures are multiplied by sentiment to normalize values
        if person not in people_topics_temperature:
            people_topics_temperature[person] = {bill_topic: [people_votes_temperatures[person][bill]] * bill_sentiment}
        # if we have this person but don't have this topic for them, add it
        elif bill_topic not in people_topics_temperature[person]:
            people_topics_temperature[person][bill_topic] = [people_votes_temperatures[person][bill] * bill_sentiment]
        # otherwise, add the temperature to the list
        else:
            people_topics_temperature[person][bill_topic].append(people_votes_temperatures[person][bill] * bill_sentiment)
# compute the mean temperature for each topic
for person in people_topics_temperature:
    for topic in people_topics_temperature[person]:
        people_topics_temperature[person][topic] = np.mean(people_topics_temperature[person][topic])

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


Cool, now lets turn that into a big matrix so we can do the same kind of similarity measurements on it.

In [19]:
people_topics_matrix = lil_matrix((np.max(list(people_topics_temperature.keys())) + 1, np.max(cluster_names.index) + 1))
for person in people_topics_temperature:
    for topic in people_topics_temperature[person]:
        people_topics_matrix[person, topic] = people_topics_temperature[person][topic]
smaller_topics = csr_matrix(people_topics_matrix)
# remove NaN
smaller_topics.data = np.nan_to_num(smaller_topics.data)
save_npz("./similarity_matrix_topics.npz", smaller_topics)

In [20]:
topic_similarites = cosine_similarity(smaller_topics)

Now let's check to see if it did what we expected. Again, we'll look at Lauren Boebert to see who comes out as similar.

In [21]:
people.loc[topic_similarites[boebert].argsort()[::-1][:10]]

Unnamed: 0,Name,Party,Role,State,District,Votes
16447,Rick Allen,R,Rep,US,HD-GA-12,"[(668287, 383926, 'Yea'), (668287, 383927, 'Ye..."
21944,Victoria Spartz,R,Rep,US,HD-IN-5,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
24055,Andrew Ogles,R,Rep,US,HD-TN-5,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."
24053,Zachary Nunn,R,Rep,US,HD-IA-3,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."
15243,Ann Wagner,R,Rep,US,HD-MO-2,"[(440461, 218634, 'Yea'), (440461, 218635, 'Na..."
24050,Nathaniel Moran,R,Rep,US,HD-TX-1,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."
24048,Max Miller,R,Rep,US,HD-OH-7,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."
24047,Marcus Molinaro,R,Rep,US,HD-NY-19,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."
21945,Jacob Laturner,R,Rep,US,HD-KS-2,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
24060,George Santos,R,Rep,US,HD-NY-3,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."


Interesting, this list is different from the earlier similarity list, even though it's still only people within the same legislative body. This is almost certainly a result of the reduced data size, and would be far more accurate with a larger corpus available to us.

Once more, let's check on Manchin and see if his list is a little more accurate.

In [22]:
people.loc[topic_similarites[manchin].argsort()[::-1][:10]]

Unnamed: 0,Name,Party,Role,State,District,Votes
14888,Kevin Cramer,R,Rep,US,SD-ND,"[(440461, 218634, 'Yea'), (440461, 218635, 'Ye..."
14888,Kevin Cramer,R,Sen,US,SD-ND,"[(440461, 218634, 'Yea'), (440461, 218635, 'Ye..."
24473,Monica Robb Blasdel,R,Rep,OH,HD-079,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24459,Thaddeus Claggett,R,Rep,OH,HD-068,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24460,Richard Dell'Aquila,D,Rep,OH,HD-015,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24461,Steve Demetriou,R,Rep,OH,HD-035,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24462,Dave Dobos,R,Rep,OH,HD-010,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24463,Elliot Forhan,D,Rep,OH,HD-021,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24464,Michele Grim,D,Rep,OH,HD-043,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."
24465,Dani Isaacsohn,D,Rep,OH,HD-024,"[(1659645, 1231941, 'Yea'), (1659638, 1231942,..."


Just like Boebert's list, it's a little different but the same general theme prevails.

## Fort Collins Representatives

Now let's look at some of our actual representatives here in the Fort Collins area. These representatives are, at the federal level, Michael Bennet, John Hickenlooper, Joe Neguse, and, at the state level, Andrew Boesenecker and Joann Ginal.

In [23]:
bennet = people.loc[people["Name"] == "Michael Bennet"].index[0]
hickenlooper = people.loc[people["Name"] == "John Hickenlooper"].index[0]
neguse = people.loc[people["Name"] == "Joseph Neguse"].index[0]
boesenecker = people.loc[people["Name"] == "Andrew Boesenecker"].index[0]
ginal = people.loc[people["Name"] == "Joann Ginal"].index[0]

Let's make sure we're only looking at similarities for people who are somewhat represented in the random sample of data we looked at.

In [24]:
print(people_topics_temperature[bennet], people_topics_temperature[hickenlooper], people_topics_temperature[neguse], people_topics_temperature[boesenecker], people_topics_temperature[ginal])

{-1: 1.0} {-1: 1.0} {-1: -0.6666666666666666, 58: -1.0, 0: -1.0} {3: 1.0, -1: 0.9595959595959597, 78: 1.0, 72: 1.0, 6: 1.0, 69: 1.0, 305: -1.0, 7: 1.0, 10: 0.7777777777777777, 0: 1.0, 88: -0.5, 1: 0.8888888888888888, 20: 1.0, 190: 1.0, 2: 1.0, 5: 1.0, 32: 1.0, 24: 1.0, 9: 1.0, 244: 1.0, 157: 1.0, 15: 1.0, 182: 1.0, 127: 1.0, 62: 1.0, 54: 1.0, 19: -1.0, 257: 1.0, 202: 1.0, 266: 1.0, 193: 1.0, 4: 1.0, 176: 1.0, 47: 1.0, 48: 1.0, 66: 1.0} {-1: 0.9017094017094016, 1: 0.9027777777777778, 78: 1.0, 69: 1.0, 2: 0.7037037037037037, 72: 1.0, 127: 1.0, 32: 1.0, 6: 1.0, 5: 1.0, 7: 0.7333333333333334, 3: -0.3333333333333333, 20: 1.0, 190: 1.0, 0: 0.8888888888888888, 54: 1.0, 9: 1.0, 62: 1.0, 24: 1.0, 157: 1.0, 19: -1.0, 88: -1.0, 244: 1.0, 182: 1.0, 15: 1.0, 29: 1.0, 193: 1.0, 202: 1.0, 4: 1.0, 28: 1.0, 325: 1.0, 160: 1.0, 90: 1.0, 66: 1.0, 47: 1.0, 214: 1.0}


Unfortunately, Bennet and Hickenlooper are not well represented in this sample, so we can't really do them justice when it comes to looking at their behaviors with regard to topics. The other three are actually represented, so let's look at their similarities.

In [25]:
people.loc[topic_similarites[neguse].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
24030,Jonathan Jackson,D,Rep,US,HD-IL-1,"[(1651107, 1231303, 'Yea'), (1651107, 1231304,..."
20059,Elizabeth Fletcher,D,Rep,US,HD-TX-7,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20056,Madeleine Dean,D,Rep,US,HD-PA-4,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20054,Sharice Davids,D,Rep,US,HD-KS-3,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
22929,Troy Carter,D,Rep,US,HD-LA-2,"[(1475945, 1077132, 'Yea'), (1508222, 1077133,..."


In [26]:
people.loc[topic_similarites[boesenecker].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22921,Andrew Boesenecker,D,Rep,CO,HD-053,"[(1471462, 1064440, 'Yea'), (1452991, 1064494,..."
21549,Steven Woodrow,D,Rep,CO,HD-006,"[(1384036, 982511, 'Yea'), (1384048, 982516, '..."
23046,Mandy Lindsay,D,Rep,CO,HD-042,"[(1559720, 1124791, 'Yea'), (1559720, 1124792,..."
20739,Meg Froelich,D,Rep,CO,HD-003,"[(1148964, 782425, 'Yea'), (1137913, 782787, '..."
22225,Iman Jodeh,D,Rep,CO,HD-041,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."


In [27]:
people.loc[topic_similarites[ginal].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
14268,Joann Ginal,D,Rep,CO,SD-014,"[(447615, 219824, 'Yea'), (447260, 221064, 'Ye..."
14268,Joann Ginal,D,Sen,CO,SD-014,"[(447615, 219824, 'Yea'), (447260, 221064, 'Ye..."
11607,Rhonda Fields,D,Rep,CO,SD-029,"[(235346, 31766, 'Yea'), (237125, 31904, 'Abse..."
11607,Rhonda Fields,D,Sen,CO,SD-029,"[(235346, 31766, 'Yea'), (237125, 31904, 'Abse..."
18719,Janet Buckner,D,Rep,CO,HD-040,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
18719,Janet Buckner,D,Sen,CO,SD-028,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
18719,Janet Buckner,D,Rep,CO,SD-028,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
20136,Kyle Mullica,D,Rep,CO,HD-034,"[(1137962, 781702, 'Yea'), (1137764, 781703, '..."
20136,Kyle Mullica,D,Sen,CO,SD-024,"[(1137962, 781702, 'Yea'), (1137764, 781703, '..."
14277,Dominick Moreno,D,Rep,CO,SD-021,"[(447382, 218381, 'Yea'), (447366, 218382, 'Ye..."


While it's a little less informative, let's also look at the vote similarities for each of these people to get a measure of which of their colleagues they are most similar to.

In [28]:
people.loc[votes_similarities[bennet].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9463,Michael Bennet,D,Sen,US,SD-CO,"[(1505019, 1065977, 'Yea'), (1505019, 1065978,..."
9422,Jeanne Shaheen,D,Sen,US,SD-NH,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9408,Amy Klobuchar,D,Sen,US,SD-MN,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9415,Debbie Stabenow,D,Sen,US,SD-MI,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9431,Patty Murray,D,Sen,US,SD-WA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."


In [29]:
people.loc[votes_similarities[hickenlooper].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22004,John Hickenlooper,D,Sen,US,SD-CO,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22486,Alex Padilla,D,Sen,US,SD-CA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22485,Jon Ossoff,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22487,Raphael Warnock,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
21702,Mark Kelly,D,Sen,US,SD-AZ,"[(1223763, 982837, 'Nay'), (1223763, 982838, '..."


In [30]:
people.loc[votes_similarities[neguse].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
20088,Joseph Neguse,D,Rep,US,HD-CO-2,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20116,Lori Trahan,D,Rep,US,HD-MA-3,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20056,Madeleine Dean,D,Rep,US,HD-PA-4,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20069,Jahana Hayes,D,Rep,US,HD-CT-5,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20081,Mike Levin,D,Rep,US,HD-CA-49,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."


In [31]:
people.loc[votes_similarities[boesenecker].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22921,Andrew Boesenecker,D,Rep,CO,HD-053,"[(1471462, 1064440, 'Yea'), (1452991, 1064494,..."
22229,David Ortiz,D,Rep,CO,HD-038,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22231,Naquetta Ricks,D,Rep,CO,HD-040,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22221,Tracey Bernett,D,Rep,CO,HD-012,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22223,Lindsey Daugherty,D,Rep,CO,HD-029,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."


In [32]:
people.loc[votes_similarities[ginal].argsort()[::-1][:5]]

Unnamed: 0,Name,Party,Role,State,District,Votes
14268,Joann Ginal,D,Rep,CO,SD-014,"[(447615, 219824, 'Yea'), (447260, 221064, 'Ye..."
14268,Joann Ginal,D,Sen,CO,SD-014,"[(447615, 219824, 'Yea'), (447260, 221064, 'Ye..."
16583,Jessie Danielson,D,Rep,CO,SD-020,"[(668910, 384725, 'Yea'), (668985, 384816, 'NV..."
16583,Jessie Danielson,D,Sen,CO,SD-020,"[(668910, 384725, 'Yea'), (668985, 384816, 'NV..."
16581,Faith Winter,D,Rep,CO,SD-024,"[(668882, 384606, 'Yea'), (669102, 384660, 'Ye..."
16581,Faith Winter,D,Sen,CO,SD-024,"[(668882, 384606, 'Yea'), (669102, 384660, 'Ye..."
19048,Jeff Bridges,D,Rep,CO,SD-026,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
19048,Jeff Bridges,D,Sen,CO,SD-026,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
14279,Brittany Pettersen,D,Rep,CO,SD-022,"[(447388, 218391, 'Yea'), (447428, 218406, 'Ye..."
14279,Brittany Pettersen,D,Sen,CO,SD-022,"[(447388, 218391, 'Yea'), (447428, 218406, 'Ye..."


Duplicate people come from redistricting, i.e. the same person represents the same general population but that district has changed slightly and has been assigned a new number.