# Basic Analysis
#### It's a little more than basic now, but *c'est la vie*
Now that we've cleaned up our data and have only the features we care about, we can run some basic statistical analysis to see if we can find any obvious patterns or interesting insights.

<sub>To see how cleaning happened, check out [data_cleaning.ipynb](https://github.com/silas-wunder/LegiScan/blob/master/data_cleaning.ipynb)</sub>

In [13]:
import pandas as pd
import numpy as np
import lzma, ast, gc
from scipy.sparse import lil_matrix, csr_matrix, save_npz, load_npz
from sklearn.metrics.pairwise import cosine_similarity

In [14]:
with lzma.open("./cleaned_input/bills.pkl.xz", 'r') as f:
    bills = pd.read_pickle(f)
with lzma.open("./cleaned_input/people.pkl.xz", 'r') as f:
    people = pd.read_pickle(f)
with lzma.open("./cleaned_input/votes.pkl.xz", 'r') as f:
    votes = pd.read_pickle(f)

Let's take a quick look at our people dataframe, there are some interesting things going on that might be interesting to point out.

In [15]:
people

Unnamed: 0_level_0,Name,Party,Role,State,District
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6033,Carl Gatto,R,Rep,AK,HD-013
6034,Robert Lynn,R,Rep,AK,HD-026
6035,Max Gruenberg,D,Rep,AK,HD-016
6036,Nancy Dahlstrom,R,Rep,AK,HD-018
6037,Wes Keller,R,Rep,AK,HD-010
...,...,...,...,...,...
8675,Cale Case,R,Sen,WY,SD-025
8679,Dan Dockstader,R,Sen,WY,SD-016
8711,Dan Zwonitzer,R,Rep,WY,HD-043
8713,Bob Nicholas,R,Rep,WY,HD-007


Woah, 177,598 people have served in elected legislative positions since 2008? That seems wrong, I suspect there's probably a fair number of duplicates in there. Let's look at the dataframe with the duplicates removed.

In [16]:
people.loc[~people.duplicated()]

Unnamed: 0_level_0,Name,Party,Role,State,District
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6033,Carl Gatto,R,Rep,AK,HD-013
6034,Robert Lynn,R,Rep,AK,HD-026
6035,Max Gruenberg,D,Rep,AK,HD-016
6036,Nancy Dahlstrom,R,Rep,AK,HD-018
6037,Wes Keller,R,Rep,AK,HD-010
...,...,...,...,...,...
24307,Joshua Larson,R,Rep,WY,HD-017
24311,Stacy Jones,R,Sen,WY,SD-013
24375,Liz Storer,D,Rep,WY,HD-023
24753,Dalton Banks,R,Rep,WY,HD-026


Much better, 21,761 is far more reasonable. It's important to note how we have removed duplicates, as we only removed rows that were exactly the same, representing people who served in the same position in multiple years. Some people have served in different positions or different districts, so it's important to keep those "duplicates," even though the indexes are the same because we do get some interesting information from that.

In [17]:
people = people.loc[~people.duplicated()]

Now we want to combine this with our people data so we can see how each person voted simply by looking at their information, rather than scraping through all votes. To do this, we'll collect all the votes of each person into a new dataframe then merge it with the current `people` dataframe.

In [18]:
try:
    # NOTE: delete old people_votes if using new data
    with lzma.open("./cleaned_input/people_votes.pkl.xz", 'r') as f:
        people = pd.read_pickle(f)
except FileNotFoundError:
    people_votes = {}
    for roll_call in votes.index:
        vote = ast.literal_eval(votes.loc[roll_call]["Votes"])
        bill = votes.loc[roll_call]["Bill ID"]
        for person, v in vote:
            if person in people_votes.keys():
                people_votes[person].append((bill, roll_call, v))
            else:
                people_votes[person] = [(bill, roll_call, v)]
    for p in people_votes.keys():
        people_votes[p] = f"{people_votes[p]}"
    people_votes_df = pd.DataFrame.from_records(people_votes, index=["Votes"]).T
    people = people.join(people_votes_df)
    people.to_pickle("./cleaned_input/people_votes.pkl.xz")

Now that that's done, let's create a massive matrix of all people and all votes to make comparisons easy. This will take up a massive amount of space (18,000 x 1,292,603), but it will be insanely sparse, so we'll take advantage of that and use sparse matrix representation offered by SciPy.

In [19]:
try:
    # NOTE: delete old similarity matrix if using new data
    smaller_votes = load_npz("./similarity_matrix_votes.npz")
except FileNotFoundError:
    people_votes_matrix = lil_matrix((np.max(people.index) + 1, np.max(votes.index) + 1), dtype="uint8")
    for person in people.index:
        if type(people.loc[person]["Votes"]) != str:
            if type(people.loc[person]["Votes"]) == float:
                continue
            if type(people.loc[person]["Votes"].iloc[0]) == float:
                continue
            person_votes = ast.literal_eval(people.loc[person]["Votes"].iloc[0])
        else:
            person_votes = ast.literal_eval(people.loc[person]["Votes"])
        for vote in person_votes:
            vote_id = vote[1]
            vote_actual = vote[2]
            people_votes_matrix[person, vote_id] = 1 if vote_actual == "Yea" else -1 if vote_actual == "Nay" else 0
    smaller_votes = csr_matrix(people_votes_matrix)
    save_npz("./similarity_matrix_votes.npz", smaller_votes)
    del people_votes_matrix
    gc.collect()

Alright, now let's run cosine similarity on this to determine similarity between representatives.

In [20]:
votes_similarities = cosine_similarity(smaller_votes)

Let's make sure that this is doing what we want it to do by checking to see who is similar to Lauren Boebert (I suspect MTG will be quite similar, as well as some other QAnon wackos).

In [21]:
boebert = people.loc[people["Name"] == "Lauren Boebert"].index[0]
people.loc[votes_similarities[boebert].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
21935,Marjorie Greene,R,Rep,US,HD-GA-14,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21975,Robert Good,R,Rep,US,HD-VA-5,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21952,Matt Rosendale,R,Rep,US,HD-MT,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21952,Matt Rosendale,R,Rep,US,HD-MT-2,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21934,Andrew Clyde,R,Rep,US,HD-GA-9,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21941,Mary Miller,R,Rep,US,HD-IL-15,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."


Yeah, that all checks out. These people are very similar to Boebert, so it makes sense that they would all vote similarly. Looks like cosine similarity does exactly what we want it to do. 

For sanity's sake, let's also check out Manchin to make sure Boebert's list isn't just a fluke. As a US centrist, he should be fairly similar to other center-dem leaning politicians.

In [22]:
manchin = people.loc[people["Name"] == "Joe Manchin"].index[0]
people.loc[votes_similarities[manchin].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9624,Christopher Coons,D,Sen,US,SD-DE,"[(198943, 1066553, 'Yea'), (198943, 1066554, '..."
11199,Richard Blumenthal,D,Sen,US,SD-CT,"[(339448, 216028, 'Yea'), (433171, 216031, 'Ye..."
9471,Jon Tester,D,Sen,US,SD-MT,"[(1505019, 1065977, 'Yea'), (1505019, 1065978,..."
9403,Robert Casey,D,Sen,US,SD-PA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9473,Mark Warner,D,Sen,US,SD-VA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."


Interesting, I expected to see more R in the list, given Mancin's general centrist views, but it's possible that no republicans are voting towards democrats, just the other way around.

### Clustering

Now let's look at the topics of each bill group them by topic. With this grouping, we don't want to compare every single vote to every single other vote, primarily because that matrix would be MASSIVE (a dense 20,000 x 1,200,000 with each entry being a float, about 390,000,000gb). Instead, we'll average each representative's votes on each bill, then average all votes from all bills in each category to get a general "temperature" of how each person is voting on each topic. This is fairly rudamentary, and could definitely be improved in the future, but it works for now.

First, we'll convert the indexing of the votes dataframe to better see which votes pertain to each bill. This will allow us to go bill by bill, rather than vote by vote, making temperature calculations a bit easier.

In [23]:
votes = votes.reset_index().set_index(["Bill ID", "ID"])
votes

Unnamed: 0_level_0,Unnamed: 1_level_0,Description,Passed,Votes
Bill ID,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
454312,306479,Senate: <pre> SR 1 Final Passage,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306480,Senate: CSHB 84(FIN)(efd am S) Third Reading -...,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306481,Senate: CSHB 84(FIN)(efd am S) Third Reading -...,True,"[(6044, 'Yea'), (6061, 'Yea'), (6064, 'Yea'), ..."
472178,306482,House: Concur,True,"[(6034, 'Yea'), (6035, 'Yea'), (6037, 'Yea'), ..."
545632,306483,House: Special Order of Business,True,"[(6034, 'Yea'), (6035, 'Yea'), (6037, 'Yea'), ..."
...,...,...,...,...
1673024,1268431,Line Item Veto Override 27-3-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268432,Line Item Veto Override 29-1-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268433,Line Item Veto Override 27-3-1-0-0,True,"[(8641, 'Yea'), (8663, 'Yea'), (8675, 'Yea'), ..."
1673024,1268434,Line Item Veto Override 23-7-1-0-0,True,"[(8641, 'Nay'), (8663, 'Nay'), (8675, 'Yea'), ..."


In [24]:
people_votes_temperatures = {}
for bill_id, vote_id in votes.index:
    # turn the string representation of the vote list into an actual list
    vote_list = ast.literal_eval(votes.loc[bill_id, vote_id]["Votes"])
    for vote in vote_list:
        # if we don't have this person yet, add them
        if vote[0] not in people_votes_temperatures:
            people_votes_temperatures[vote[0]] = {bill_id: [-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0]}
        # if we have this person but don't have this bill for them, add it
        elif bill_id not in people_votes_temperatures[vote[0]]:
            people_votes_temperatures[vote[0]][bill_id] = [-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0]
        # otherwise, add this vote to the end of the list
        else:
            people_votes_temperatures[vote[0]][bill_id].append(-1 if vote[1] == "Nay" else 1 if vote[1] == "Yea" else 0)
# average temperatures per bill
for person in people_votes_temperatures:
    for bill in people_votes_temperatures[person]:
        people_votes_temperatures[person][bill] = np.mean(people_votes_temperatures[person][bill])

Now we'll do something similar, but this time we'll incorporate the topics of each bill and give each person a temperature on a topic, rather than specific bills. We'll also save every topic to a `set` so we can easily see how many unique topics the bills cover.

In [25]:
people_topics_temperature = {}
topics = set()
for person in people_votes_temperatures:
    for bill in people_votes_temperatures[person]:
        # grab topic list of this bill
        bill_topics = ast.literal_eval(bills.loc[bill]["Topics"])
        # if the topics list is empty, the topics are unknown
        if len(bill_topics) == 0:
            bill_topics = ["Unknown"]
        # for every topic in the list, we'll add it to the dictionary
        for bill_topic in bill_topics:
            # add the topic to the topics set
            topics.add(bill_topic)
            # if we don't have this person yet, add them
            if person not in people_topics_temperature:
                people_topics_temperature[person] = {bill_topic: [people_votes_temperatures[person][bill]]}
            # if we have this person but don't have this topic for them, add it
            elif bill_topic not in people_topics_temperature[person]:
                people_topics_temperature[person][bill_topic] = [people_votes_temperatures[person][bill]]
            # otherwise, add the temperature to the list
            else:
                people_topics_temperature[person][bill_topic].append(people_votes_temperatures[person][bill])
# compute the mean temperature for each topic
for person in people_topics_temperature:
    for topic in people_topics_temperature[person]:
        people_topics_temperature[person][topic] = np.mean(people_topics_temperature[person][topic])

Before anything else, let's convert the set to a list and alphabetize it so we can use numerical indexing.

In [26]:
topics = list(topics)
topics.sort()
topics

['',
 '"Certificate of birth resulting in stillbirth"',
 '"Hang on Sloopy"-official state rock song',
 '"Hawaii Made"',
 '"Hawaii Made" Program',
 '"Made in Hawaii"',
 '"Redboxing"',
 '"Stir Crazy in Williamsburg"',
 '"Stoppers"',
 '#TEXAS',
 '#TEXASTODO',
 '#TXLEGE',
 '(1) 4-1-001:013',
 '(h)',
 '100th anniversary of the American Legion',
 '100th anniversary of the Army Warrant Officer Corp',
 '100th anniversary of the creation of the National',
 '1847 COLT WALKER',
 '1998 Makaha Beach Park Master Plan',
 '2012 Summer Paralympic Games',
 '2017 Statutory Construction Bill',
 '2018',
 '2018 Winter Olympics',
 '2018-2019',
 '2020-2021',
 '2020-2021 (h)',
 '2020-2021 School Year',
 '2021',
 '2021 Compact Trust Fund',
 '2021 Gaming Compact Amendment',
 '2021-2022',
 '2022',
 '2022 (h)',
 '2022 Amendments',
 '2022 Election Cycle',
 '2022 Taxable Year',
 '2022-2023',
 '2023',
 '2023-2024',
 '2030 Agenda for Sustainable Development',
 '2030 Development Agenda',
 '2050 (h)',
 '20th Avenue',
 '

Cool, now lets turn that into a big matrix so we can do the same kind of similarity measurements on it.

In [27]:
try:
    # NOTE: delete old similarity matrix if using new data
    smaller_topics = load_npz("./similarity_matrix_topics.npz")
except FileNotFoundError:
    people_topics_matrix = lil_matrix((np.max(list(people_topics_temperature.keys())) + 1, len(topics) + 1))
    for person in people_topics_temperature:
        for topic in people_topics_temperature[person]:
            people_topics_matrix[person, topics.index(topic)] = people_topics_temperature[person][topic]
    smaller_topics = csr_matrix(people_topics_matrix)
    # remove NaN
    smaller_topics.data = np.nan_to_num(smaller_topics.data)
    save_npz("./similarity_matrix_topics.npz", smaller_topics)

In [28]:
topic_similarites = cosine_similarity(smaller_topics)

Now let's check to see if it did what we expected. Again, we'll look at Lauren Boebert to see who comes out as similar.

In [29]:
people.loc[topic_similarites[boebert].argsort()[::-1][1:11]]

Unnamed: 0,Name,Party,Role,State,District,Votes
21952,Matt Rosendale,R,Rep,US,HD-MT,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21952,Matt Rosendale,R,Rep,US,HD-MT-2,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21941,Mary Miller,R,Rep,US,HD-IL-15,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21935,Marjorie Greene,R,Rep,US,HD-GA-14,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21934,Andrew Clyde,R,Rep,US,HD-GA-9,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21975,Robert Good,R,Rep,US,HD-VA-5,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21969,Ronny Jackson,R,Rep,US,HD-TX-13,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21967,Patrick Fallon,R,Rep,US,HD-TX-4,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21929,Byron Donalds,R,Rep,US,HD-FL-19,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."
21956,Yvette Herrell,R,Rep,US,HD-NM-2,"[(1394633, 1014585, 'Yea'), (1460770, 1015299,..."


Interesting, this list is a little different from the earlier similarity list, but it's still only people within the same legislative body.

Once more, let's check on Manchin and see if his list is a little more accurate.

In [30]:
people.loc[topic_similarites[manchin].argsort()[::-1][1:11]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9403,Robert Casey,D,Sen,US,SD-PA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
11199,Richard Blumenthal,D,Sen,US,SD-CT,"[(339448, 216028, 'Yea'), (433171, 216031, 'Ye..."
9402,Sherrod Brown,D,Sen,US,SD-OH,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9471,Jon Tester,D,Sen,US,SD-MT,"[(1505019, 1065977, 'Yea'), (1505019, 1065978,..."
9415,Debbie Stabenow,D,Sen,US,SD-MI,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9414,Charles Schumer,D,Sen,US,SD-NY,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9408,Amy Klobuchar,D,Sen,US,SD-MN,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9443,Sheldon Whitehouse,D,Sen,US,SD-RI,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9460,Jack Reed,D,Sen,US,SD-RI,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
15348,Angus King,I,Sen,US,SD-ME,"[(473811, 221934, 'Yea'), (473815, 221935, 'Ye..."


Just like Boebert's list, it's a little different but the same general theme prevails.

I suspect this has something to do with that large list of topics obsfucating things and making it so that topics aren't actually aligned. Let's take a look at that list once more.

In [31]:
topics[:25]

['',
 '"Certificate of birth resulting in stillbirth"',
 '"Hang on Sloopy"-official state rock song',
 '"Hawaii Made"',
 '"Hawaii Made" Program',
 '"Made in Hawaii"',
 '"Redboxing"',
 '"Stir Crazy in Williamsburg"',
 '"Stoppers"',
 '#TEXAS',
 '#TEXASTODO',
 '#TXLEGE',
 '(1) 4-1-001:013',
 '(h)',
 '100th anniversary of the American Legion',
 '100th anniversary of the Army Warrant Officer Corp',
 '100th anniversary of the creation of the National',
 '1847 COLT WALKER',
 '1998 Makaha Beach Park Master Plan',
 '2012 Summer Paralympic Games',
 '2017 Statutory Construction Bill',
 '2018',
 '2018 Winter Olympics',
 '2018-2019',
 '2020-2021']

A lot of these seem very similar or highly specialized (see: '"Hang on Sloopy"-official state rock song'), so this could certainly use with some trimming and/or combining. Let's take a look at anything even involving abortion to make sure.

In [32]:
[x for x in topics if "abortion" in x.lower()]

['ABORTION',
 'ABORTION COMPLICATIONS',
 'ABORTION COMPLICATIONS REPORTING ACT',
 'Abortion',
 'Abortion Care',
 'Abortion Data',
 'Abortion Data Reporting Act',
 'Abortion-inducing Drugs',
 'Abortion-judicial consent for minor-hearing procedure/burden of proof',
 'Abortion-pregnant minor-judicial consent',
 'Abortions',
 'Abortions For Minors',
 'Aspiration Abortions',
 'Born-alive Abortion Survivors Protection Act',
 'Crimes, abortion',
 'Crimes: abortion',
 'Health, abortion',
 'Health: abortion',
 'PUBLIC FUNDS FOR ABORTION',
 'Post-viability abortion criminal laws',
 'Post-viability abortions criminal law',
 'Prohibit abortion-unborn human with detectable fetal heartbeat',
 'Qualified health plans-prohibit from covering certain abortions',
 'Unborn child-heartbeat detectable-no abortion/promote/support adoption',
 'abortion']

As expected, `ABORTION`, `Abortion`, and `abortion` count as three different categories, despite all pertaining to the exact same topic. Some of the other topics in this list could likely be condensed down as well, like `Health: abortion` and `Abortion care` could both be connected to `abortion`. Of course, going through and doing this all manually will take far too long, so we'll utilize some advanced machine learning methods to cluster the topics together. Since we've strayed a bit from "basic" analysis, we'll start a new notebook to take care of that.

## Fort Collins Representatives

Real quick, before leaving this notebook, let's look at some of our actual representatives here in the Fort Collins area. These representatives are, as of time of writing, at the federal level, Michael Bennet, John Hickenlooper, Joe Neguse, and, at the state level, Andrew Boesenecker and Joann Ginal.

In [33]:
bennet = people.loc[people["Name"] == "Michael Bennet"].index[0]
hickenlooper = people.loc[people["Name"] == "John Hickenlooper"].index[0]
neguse = people.loc[people["Name"] == "Joseph Neguse"].index[0]
boesenecker = people.loc[people["Name"] == "Andrew Boesenecker"].index[0]
ginal = people.loc[people["Name"] == "Joann Ginal"].index[0]

In [34]:
people.loc[topic_similarites[bennet].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9422,Jeanne Shaheen,D,Sen,US,SD-NH,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9624,Christopher Coons,D,Sen,US,SD-DE,"[(198943, 1066553, 'Yea'), (198943, 1066554, '..."
9472,Tom Udall,D,Sen,US,SD-NM,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9473,Mark Warner,D,Sen,US,SD-VA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9412,Claire McCaskill,D,Sen,US,SD-MO,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."


In [35]:
people.loc[topic_similarites[hickenlooper].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22487,Raphael Warnock,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22485,Jon Ossoff,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22486,Alex Padilla,D,Sen,US,SD-CA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
24549,John Fetterman,D,Sen,US,SD-PA,"[(1681770, 1236988, 'Yea'), (1691546, 1240713,..."
24062,Keith Self,R,Rep,US,HD-TX-3,"[(1651107, 1231303, 'Nay'), (1651107, 1231304,..."


In [36]:
people.loc[topic_similarites[neguse].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
20116,Lori Trahan,D,Rep,US,HD-MA-3,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20062,Sylvia Garcia,D,Rep,US,HD-TX-29,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20107,Greg Stanton,D,Rep,US,HD-AZ-9,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20107,Greg Stanton,D,Rep,US,HD-AZ-4,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20111,Haley Stevens,D,Rep,US,HD-MI-11,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20056,Madeleine Dean,D,Rep,US,HD-PA-4,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."


In [37]:
people.loc[topic_similarites[boesenecker].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22221,Tracey Bernett,D,Rep,CO,HD-012,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
16582,Alec Garnett,D,Rep,CO,HD-002,"[(668882, 384606, 'Yea'), (669102, 384660, 'Ye..."
18711,Barbara McLachlan,D,Rep,CO,HD-059,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
18712,Chris Kennedy,D,Rep,CO,HD-023,"[(1027471, 677470, 'Yea'), (1027471, 677471, '..."
18712,Chris Kennedy,D,Rep,CO,HD-030,"[(1027471, 677470, 'Yea'), (1027471, 677471, '..."
18713,Dafna Michaelson Jenet,D,Rep,CO,HD-030,"[(1027471, 677470, 'Yea'), (1027471, 677471, '..."
18713,Dafna Michaelson Jenet,D,Rep,CO,HD-032,"[(1027471, 677470, 'Yea'), (1027471, 677471, '..."


In [38]:
people.loc[topic_similarites[ginal].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
18939,James Coleman,D,Rep,CO,HD-007,"[(1027471, 677470, 'Yea'), (908852, 574910, 'Y..."
18939,James Coleman,D,Sen,CO,SD-033,"[(1027471, 677470, 'Yea'), (908852, 574910, 'Y..."
18939,James Coleman,D,Rep,CO,SD-033,"[(1027471, 677470, 'Yea'), (908852, 574910, 'Y..."
20031,Robert Rodriguez,D,Sen,CO,SD-032,"[(1137766, 781704, 'Yea'), (1137801, 781705, '..."
10780,Pete Lee,D,Rep,CO,SD-011,"[(237121, 31712, 'Yea'), (235346, 31766, 'Abse..."
10780,Pete Lee,D,Sen,CO,SD-011,"[(237121, 31712, 'Yea'), (235346, 31766, 'Abse..."
19205,Chris Hansen,D,Rep,CO,SD-031,"[(1027471, 677470, 'Yea'), (908852, 574910, 'Y..."
19205,Chris Hansen,D,Sen,CO,SD-031,"[(1027471, 677470, 'Yea'), (908852, 574910, 'Y..."
22458,Chris Kolker,D,Sen,CO,SD-027,"[(1452465, 1007421, 'Yea'), (1452571, 1007428,..."
22458,Chris Kolker,D,Sen,CO,SD-016,"[(1452465, 1007421, 'Yea'), (1452571, 1007428,..."


While it's a little less informative, let's also look at the vote similarities for each of these people to get a measure of which of their colleagues they are most similar to.

In [39]:
people.loc[votes_similarities[bennet].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
9422,Jeanne Shaheen,D,Sen,US,SD-NH,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9408,Amy Klobuchar,D,Sen,US,SD-MN,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9415,Debbie Stabenow,D,Sen,US,SD-MI,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9431,Patty Murray,D,Sen,US,SD-WA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."
9427,Maria Cantwell,D,Sen,US,SD-WA,"[(1505019, 1065977, 'Nay'), (1505019, 1065978,..."


In [40]:
people.loc[votes_similarities[hickenlooper].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22486,Alex Padilla,D,Sen,US,SD-CA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22485,Jon Ossoff,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
22487,Raphael Warnock,D,Sen,US,SD-GA,"[(1444468, 1002349, 'Yea'), (1401017, 1005547,..."
21702,Mark Kelly,D,Sen,US,SD-AZ,"[(1223763, 982837, 'Nay'), (1223763, 982838, '..."
19499,Tina Smith,D,Sen,US,SD-MN,"[(911384, 685131, 'Nay'), (911384, 685470, 'Na..."


In [41]:
people.loc[votes_similarities[neguse].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
20116,Lori Trahan,D,Rep,US,HD-MA-3,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20056,Madeleine Dean,D,Rep,US,HD-PA-4,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20069,Jahana Hayes,D,Rep,US,HD-CT-5,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
20081,Mike Levin,D,Rep,US,HD-CA-49,"[(1137447, 781692, 'Nay'), (1137447, 781693, '..."
19694,Mary Gay Scanlon,D,Rep,US,HD-PA-5,"[(1130094, 778819, 'Nay'), (1118370, 778820, '..."


In [42]:
people.loc[votes_similarities[boesenecker].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
22229,David Ortiz,D,Rep,CO,HD-038,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22231,Naquetta Ricks,D,Rep,CO,HD-040,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22221,Tracey Bernett,D,Rep,CO,HD-012,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22223,Lindsey Daugherty,D,Rep,CO,HD-029,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22223,Lindsey Daugherty,D,Rep,CO,HD-024,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22219,Judith Amabile,D,Rep,CO,HD-013,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."
22219,Judith Amabile,D,Rep,CO,HD-049,"[(1452252, 1006615, 'Yea'), (1452563, 1006621,..."


In [43]:
people.loc[votes_similarities[ginal].argsort()[::-1][1:6]]

Unnamed: 0,Name,Party,Role,State,District,Votes
16583,Jessie Danielson,D,Rep,CO,SD-020,"[(668910, 384725, 'Yea'), (668985, 384816, 'NV..."
16583,Jessie Danielson,D,Sen,CO,SD-020,"[(668910, 384725, 'Yea'), (668985, 384816, 'NV..."
16583,Jessie Danielson,D,Sen,CO,SD-022,"[(668910, 384725, 'Yea'), (668985, 384816, 'NV..."
16581,Faith Winter,D,Rep,CO,SD-024,"[(668882, 384606, 'Yea'), (669102, 384660, 'Ye..."
16581,Faith Winter,D,Sen,CO,SD-024,"[(668882, 384606, 'Yea'), (669102, 384660, 'Ye..."
19048,Jeff Bridges,D,Rep,CO,SD-026,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
19048,Jeff Bridges,D,Sen,CO,SD-026,"[(1027471, 677470, 'Yea'), (908827, 574901, 'Y..."
14279,Brittany Pettersen,D,Rep,CO,SD-022,"[(447388, 218391, 'Yea'), (447428, 218406, 'Ye..."
14279,Brittany Pettersen,D,Sen,CO,SD-022,"[(447388, 218391, 'Yea'), (447428, 218406, 'Ye..."
10780,Pete Lee,D,Rep,CO,SD-011,"[(237121, 31712, 'Yea'), (235346, 31766, 'Abse..."


Duplicate people come from redistricting, i.e. the same person represents the same general population but that district has changed slightly and has been assigned a new number.