# Linear Algebra Lab: Voting Records

In this lab, we will represent a US senator’s voting record as a vector over R, and will use
dot-products to compare voting records. For this lab, we will just use a list to represent a
vector.

### Load data

In [121]:
#load file
file = open('US_Senate_voting_data_109.txt')
mylist = list(file)
mylist[0]

'Akaka D HI -1 -1 1 1 1 -1 -1 1 1 1 1 1 1 1 -1 1 1 1 -1 1 1 1 1 1 -1 1 -1 -1 1 1 1 1 1 1 0 0 1 -1 -1 1 -1 1 -1 1 1 -1\n'

In [122]:
#split data and store in dictionary!
import numpy as np

def create_voting_dict(strlist):
    votes = {}
    for line in strlist:
        split = line.split(" ")
        votes[split[0]] = [int(x) for x in split[3:]]
    return votes

votes = create_voting_dict(mylist)
print(votes.keys())

dict_keys(['Akaka', 'Alexander', 'Allard', 'Allen', 'Baucus', 'Bayh', 'Bennett', 'Biden', 'Bingaman', 'Bond', 'Boxer', 'Brownback', 'Bunning', 'Burns', 'Burr', 'Byrd', 'Cantwell', 'Carper', 'Chafee', 'Chambliss', 'Clinton', 'Coburn', 'Cochran', 'Coleman', 'Collins', 'Conrad', 'Cornyn', 'Craig', 'Crapo', 'Dayton', 'DeMint', 'DeWine', 'Dodd', 'Dole', 'Domenici', 'Dorgan', 'Durbin', 'Ensign', 'Enzi', 'Feingold', 'Feinstein', 'Frist', 'Graham', 'Grassley', 'Gregg', 'Hagel', 'Harkin', 'Hatch', 'Hutchison', 'Inhofe', 'Inouye', 'Isakson', 'Jeffords', 'Johnson', 'Kennedy', 'Kerry', 'Kohl', 'Kyl', 'Landrieu', 'Lautenberg', 'Leahy', 'Levin', 'Lieberman', 'Lincoln', 'Lott', 'Lugar', 'Martinez', 'McCain', 'McConnell', 'Mikulski', 'Murkowski', 'Murray', 'Nelson1', 'Nelson2', 'Obama', 'Pryor', 'Reed', 'Reid', 'Roberts', 'Rockefeller', 'Salazar', 'Santorum', 'Sarbanes', 'Schumer', 'Sessions', 'Shelby', 'Smith', 'Snowe', 'Specter', 'Stabenow', 'Stevens', 'Sununu', 'Talent', 'Thomas', 'Thune', 'Vitter'

### Policy Comparison

***2.12.3: Write a procedure most similar(sen, voting dict) that, given the name
of a senator and a dictionary mapping senator names to lists representing voting records,
returns the name of the senator whose political mindset is most like the input senator
(excluding, of course, the input senator him/herself).***

In [123]:
def policy_compare(sen_a, sen_b, voting_dict):
    return sum([a*b for (a, b) in zip(voting_dict[sen_a], voting_dict[sen_b])])

policy_compare('Akaka', 'McConnell', votes)

14

***Task 2.12.3: Write a procedure most similar(sen, voting dict) that, given the name
of a senator and a dictionary mapping senator names to lists representing voting records,
returns the name of the senator whose political mindset is most like the input senator
(excluding, of course, the input senator him/herself).***

In [124]:
def most_similar(sen, voting_dict):
    diff = set([sen])
    comps = list(set(voting_dict.keys()).difference(diff))
    scores = [policy_compare(sen, comp, voting_dict) for comp in comps]
    max_index = scores.index(max(scores))
    return comps[max_index]

most_similar('McConnell',votes)

'Domenici'

***Task 2.12.4: Write a very similar procedure least similar(sen, voting dict) that
returns the name of the senator whose voting record agrees the least with the senator whose
name is sen.***


In [125]:
def least_similar(sen, voting_dict):
    diff = set([sen])
    comps = list(set(voting_dict.keys()).difference(diff))
    scores = [policy_compare(sen, comp, voting_dict) for comp in comps]
    min_index = scores.index(min(scores))
    return comps[min_index]

least_similar('McConnell',votes)

'Feingold'

***Task 2.12.5: Use these procedures to figure out which senator is most like Rhode Island
legend Lincoln Chafee. Then use these procedures to see who disagrees most with Pennsylvania’s
Rick Santorum. Give their names.***

In [126]:
most_similar('Chafee', votes)

'Jeffords'

In [127]:
least_similar('Santorum', votes)

'Feingold'

***Task 2.12.6: How similar are the voting records of the two senators from your favorite
state?***

In [128]:
print(most_similar('Biden', votes))
print(least_similar('Biden', votes))

Sarbanes
Sununu


### Not You Average Democrat

***Task 2.12.7: Write a procedure find average similarity(sen, sen set, voting dict)
that, given the name sen of a senator, compares that senator’s voting record to the voting
records of all senators whose names are in sen set, computing a dot-product for each, and
then returns the average dot-product.***

***Use your procedure to compute which senator has the greatest average similarity with
the set of Democrats (you can extract this set from the input file).***

In [129]:
def find_average_similarity(sen, sen_set, voting_dict):
    sens = sen_set.difference(set([sen]))
    l=[policy_compare(sen, sen_comp, voting_dict) for sen_comp in sens]
    return sum(l)/len(l)
    
sen_set = {'Feingold', 'Feinstein', 'Frist', 'Graham', 'Grassley', 'Gregg'}
find_average_similarity('Akaka', sen_set, votes)

18.333333333333332

In [130]:
#extract dems
dems = set([row.split(" ")[0] for row in mylist if row.split(" ")[1]=='D'])

#find senator most simlar to average democrat
all_sens = list(votes.keys())
most_average_dem = all_sens[np.argmax([find_average_similarity(sen, dems, votes) for sen in all_sens])]

most_average_dem

'Biden'

***Task 2.12.8: Write a procedure find average record(sen set, voting dict) that, given a set of names of senators, finds the average voting record. That is:***
- perform vector addition on the lists representing their voting records, and then divide the sum by the number of vectors. The result should be a vector.
- Use this procedure to compute the average voting record for the set of Democrats, and assign the result to the variable average Democrat record. 
- Next find which senator’s voting record is most similar to the average Democrat voting record. Did you get the same result as in Task 2.12.7? Can you explain?

<font color=red>In the last task, you had to compare each senator’s record to the voting record of each
Democrat senator. If you were doing the same computation with, say, the movie preferences
of all Netflix subscribers, it would take far too long to be practical.
Next we see that there is a computational shortcut, ***based on an algebraic property of
the dot-product: the distributive property:***
(v1 + v2) · x = v1 · x + v2 · x</font>



<font color=red size=6> The single star * unpacks the sequence/collection into positional arguments, so you can do this:</font>

In [131]:
def find_average_record(sen_set, voting_dict):
    l = [voting_dict[sen] for sen in sen_set]
    z = list(zip(*l))
    return [sum(a)/len(a) for a in z]

x = {'Boxer','Biden','Akaka'}
    
democrat_record = find_average_record(dems, votes)
democrat_record


[-0.16279069767441862,
 -0.23255813953488372,
 1.0,
 0.8372093023255814,
 0.9767441860465116,
 -0.13953488372093023,
 -0.9534883720930233,
 0.813953488372093,
 0.9767441860465116,
 0.9767441860465116,
 0.9069767441860465,
 0.7674418604651163,
 0.6744186046511628,
 0.9767441860465116,
 -0.5116279069767442,
 0.9302325581395349,
 0.9534883720930233,
 0.9767441860465116,
 -0.3953488372093023,
 0.9767441860465116,
 1.0,
 1.0,
 1.0,
 0.9534883720930233,
 -0.4883720930232558,
 1.0,
 -0.32558139534883723,
 -0.06976744186046512,
 0.9767441860465116,
 0.8604651162790697,
 0.9767441860465116,
 0.9767441860465116,
 1.0,
 1.0,
 0.9767441860465116,
 -0.3488372093023256,
 0.9767441860465116,
 -0.4883720930232558,
 0.23255813953488372,
 0.8837209302325582,
 0.4418604651162791,
 0.9069767441860465,
 -0.9069767441860465,
 1.0,
 0.9069767441860465,
 -0.3023255813953488]

In [132]:
votes_2 = votes.copy()
votes_2['average_dem'] = democrat_record
most_similar('average_dem', votes_2)

'Biden'

##### Wooohooo it worked! Both funcitons are giving me Biden.

### Bitter Rivals

***Task 2.12.9: Write a procedure bitter rivals(voting dict) to find which two senators
disagree the most.
This task again requires comparing each pair of voting records. Can this be done faster than
the obvious way? There is a slightly more efficient algorithm, using fast matrix multiplication.
We will study matrix multiplication later, although we won’t cover the theoretically fast
algorithms.***

In [133]:
#build function using just slow way (vector multiplcation)
import pandas as pd
votes_df = pd.DataFrame(votes)
votes_mat = votes_df.as_matrix()
sen_names =  list(votes.keys())

print(votes_mat.T.shape)
print(votes_mat.shape)
print(len(sen_names))

(99, 46)
(46, 99)
99


<font color = red> Note when multiplying matrices, you want to have the first matrix in the form such that the axis you are interested in in the rows. So for instance, we are interested in getting the sum product of senators voting record with another senator, we first transpose the matrix so that each row represents a single senator. Then we multiply by a single senator in its orignal format. This will give us a series of column vectors representing the sum product of each senator vs the senator you were comparing against.</font>

In [134]:
comp_all = votes_mat.T.dot(votes_mat)
comp_all

array([[44, 11, 10, ..., 18, 16, 34],
       [11, 45, 43, ..., 35, 41, 17],
       [10, 43, 46, ..., 38, 40, 16],
       ..., 
       [18, 35, 38, ..., 46, 40, 24],
       [16, 41, 40, ..., 40, 46, 22],
       [34, 17, 16, ..., 24, 22, 46]], dtype=int64)

In [135]:
np.fill_diagonal(comp_all,0)
comp_all

array([[ 0, 11, 10, ..., 18, 16, 34],
       [11,  0, 43, ..., 35, 41, 17],
       [10, 43,  0, ..., 38, 40, 16],
       ..., 
       [18, 35, 38, ...,  0, 40, 24],
       [16, 41, 40, ..., 40,  0, 22],
       [34, 17, 16, ..., 24, 22,  0]], dtype=int64)

Ok, lets see if this makes sense. Use the policy compare function to compare Akaka, the first senator in the list, to each other senator. **This should equal the first COLUMN of the new comparison matrix**.

In [136]:
Akaka_comp = [policy_compare('Akaka', sen, votes) for sen in votes.keys()]

print(Akaka_comp[0:20])
print(list(comp_all[0:20,0]))

[44, 11, 10, 12, 28, 31, 15, 38, 35, 12, 39, 12, 8, 16, 16, 31, 30, 22, 30, 10]
[0, 11, 10, 12, 28, 31, 15, 38, 35, 12, 39, 12, 8, 16, 16, 31, 30, 22, 30, 10]


#### OMG IT WORKED!!! Now lets test it out!

In [137]:
#first test it out
comp_min = comp_all.min() #find min value
comp_min

-3

In [138]:
list(zip(*np.where(comp_all == comp_min)))[0] # find min value coordinates

(39, 49)

In [139]:
print(sen_names[39], sen_names[49]) #get the senator names!

Feingold Inhofe


In [140]:
policy_compare('Feingold', 'Inhofe', votes) #test out the comparison to make sure it works!!!

-3

#### Neat! It works, lets write the procedure now!

In [141]:
def bitter_rivals(voting_dict):
    df = pd.DataFrame(voting_dict)
    mat = df.as_matrix()
    sen_names =  list(voting_dict.keys())
    comps_mat = mat.T.dot(mat)
    comp_min = comps_mat.min()
    min_index = list(zip(*np.where(comps_mat == comp_min)))[0]
    return sen_names[min_index[0]], sen_names[min_index[1]]

bitter_rivals(votes)

('Feingold', 'Inhofe')

In [142]:
def best_friends(voting_dict):
    df = pd.DataFrame(voting_dict)
    mat = df.as_matrix()
    sen_names =  list(voting_dict.keys())
    comps_mat = mat.T.dot(mat)
    np.fill_diagonal(comps_mat,0) #IMPORTANT! Fill diagonals with zeros, otherwise max will return the same senators names
    comp_max = comps_mat.max()
    max_index = list(zip(*np.where(comps_mat == comp_max)))[0]
    return sen_names[max_index[0]], sen_names[max_index[1]]

best_friends(votes)

('Allard', 'Chambliss')

# Submission Stencil

In [143]:
# version code 80e56511a793+
# Please fill out this stencil and submit using the provided submission script.

# Be sure that the file voting_record_dump109.txt is in the matrix/ directory.


## 1: (Task 2.12.1) Create Voting Dict
def create_voting_dict(strlist):
    """
    Input: a list of strings.  Each string represents the voting record of a senator.
           The string consists of 
              - the senator's last name, 
              - a letter indicating the senator's party,
              - a couple of letters indicating the senator's home state, and
              - a sequence of numbers (0's, 1's, and negative 1's) indicating the senator's
                votes on bills
              all separated by spaces.
    Output: A dictionary that maps the last name of a senator
            to a list of numbers representing the senator's voting record.
    Example: 
        >>> vd = create_voting_dict(['Kennedy D MA -1 -1 1 1', 'Snowe R ME 1 1 1 1'])
        >>> vd == {'Snowe': [1, 1, 1, 1], 'Kennedy': [-1, -1, 1, 1]}
        True

    You can use the .split() method to split each string in the
    strlist into a list; the first element of the list will be the senator's
    name, the second will be his/her party affiliation (R or D), the
    third will be his/her home state, and the remaining elements of
    the list will be that senator's voting record on a collection of bills.

    You can use the built-in procedure int() to convert a string
    representation of an integer (e.g. '1') to the actual integer
    (e.g. 1).

    The lists for each senator should preserve the order listed in voting data.
    In case you're feeling clever, this can be done in one line.
    """
    import numpy as np
    import pandas as pd
    
    votes = {}
    for line in strlist:
        split = line.split(" ")
        votes[split[0]] = np.array([int(x) for x in split[3:]])
    return votes



## 2: (Task 2.12.2) Policy Compare
def policy_compare(sen_a, sen_b, voting_dict):
    """
    Input: last names of sen_a and sen_b, and a voting dictionary mapping senator
           names to lists representing their voting records.
    Output: the dot-product (as a number) representing the degree of similarity
            between two senators' voting policies
    Example:
        >>> voting_dict = {'Fox-Epstein':[-1,-1,-1,1],'Ravella':[1,1,1,1]}
        >>> policy_compare('Fox-Epstein','Ravella', voting_dict)
        -2
    
    The code should correct compute dot-product even if the numbers are not all in {0,1,-1}.
        >>> policy_compare('A', 'B', {'A':[100,10,1], 'B':[2,5,3]})
        253
        
    You should definitely try to write this in one line.
    """
    return sum(voting_dict[sen_a] * voting_dict[sen_b])



## 3: (Task 2.12.3) Most Similar
def most_similar(sen, voting_dict):
    """
    Input: the last name of a senator, and a dictionary mapping senator names
           to lists representing their voting records.
    Output: the last name of the senator whose political mindset is most
            like the input senator (excluding, of course, the input senator
            him/herself). Resolve ties arbitrarily.
    Example:
        >>> vd = {'Klein': [1,1,1], 'Fox-Epstein': [1,-1,0], 'Ravella': [-1,0,0]}
        >>> most_similar('Klein', vd)
        'Fox-Epstein'
        >>> vd == {'Klein': [1,1,1], 'Fox-Epstein': [1,-1,0], 'Ravella': [-1,0,0]}
        True
        >>> vd = {'a': [1,1,1,0], 'b': [1,-1,0,0], 'c': [-1,0,0,0], 'd': [-1,0,0,1], 'e': [1, 0, 0,0]}
        >>> most_similar('c', vd)
        'd'

    Note that you can (and are encouraged to) re-use your policy_compare procedure.
    """
    
    diff = set([sen])
    comps = list(set(voting_dict.keys()).difference(diff))
    scores = [policy_compare(sen, comp, voting_dict) for comp in comps]
    return comps[np.argmax(scores)]



## 4: (Task 2.12.4) Least Similar
def least_similar(sen, voting_dict):
    """
    Input: the last name of a senator, and a dictionary mapping senator names
           to lists representing their voting records.
    Output: the last name of the senator whose political mindset is least like the input
            senator.
    Example:
        >>> vd = {'a': [1,1,1], 'b': [1,-1,0], 'c': [-1,0,0]}
        >>> least_similar('a', vd)
        'c'
        >>> vd == {'a': [1,1,1], 'b': [1,-1,0], 'c': [-1,0,0]}
        True
        >>> vd = {'a': [-1,0,0], 'b': [1,0,0], 'c': [-1,1,0], 'd': [-1,1,1]}
        >>> least_similar('c', vd)
        'b'
    """
    diff = set([sen])
    comps = list(set(voting_dict.keys()).difference(diff))
    scores = [policy_compare(sen, comp, voting_dict) for comp in comps]
    return comps[np.argmin(scores)]



## 5: (Task 2.12.5) Chafee, Santorum
most_like_chafee    = ''
least_like_santorum = '' 



## 6: (Task 2.12.7) Most Average Democrat
def find_average_similarity(sen, sen_set, voting_dict):
    """
    Input: the name of a senator, a set of senator names, and a voting dictionary.
    Output: the average dot-product between sen and those in sen_set.
    Example:
        >>> vd = {'Klein':[1,1,1], 'Fox-Epstein':[1,-1,0], 'Ravella':[-1,0,0], 'Oyakawa':[-1,-1,-1], 'Loery':[0,1,1]}
        >>> sens = {'Fox-Epstein','Ravella','Oyakawa','Loery'}
        >>> find_average_similarity('Klein', sens, vd)
        -0.5
        >>> sens == {'Fox-Epstein','Ravella', 'Oyakawa', 'Loery'}
        True
        >>> vd == {'Klein':[1,1,1], 'Fox-Epstein':[1,-1,0], 'Ravella':[-1,0,0], 'Oyakawa':[-1,-1,-1], 'Loery':[0,1,1]}
        True
    """
    sens = sen_set.difference(set([sen]))
    return np.average([policy_compare(sen, sen_comp, voting_dict) for sen_comp in sens])


most_average_Democrat = 'Biden' # give the last name (or code that computes the last name)



## 7: (Task 2.12.8) Average Record
def find_average_record(sen_set, voting_dict):
    """
    Input: a set of last names, a voting dictionary
    Output: a vector containing the average components of the voting records
            of the senators in the input set
    Example: 
        >>> voting_dict = {'Klein': [-1,0,1], 'Fox-Epstein': [-1,-1,-1], 'Ravella': [0,0,1]}
        >>> senators = {'Fox-Epstein','Ravella'}
        >>> find_average_record(senators, voting_dict)
        [-0.5, -0.5, 0.0]
        >>> voting_dict == {'Klein': [-1,0,1], 'Fox-Epstein': [-1,-1,-1], 'Ravella': [0,0,1]}
        True
        >>> senators
        {'Fox-Epstein','Ravella'}
        >>> d = {'c': [-1,-1,0], 'b': [0,1,1], 'a': [0,1,1], 'e': [-1,-1,1], 'd': [-1,1,1]}
        >>> find_average_record({'a','c','e'}, d)
        [-0.6666666666666666, -0.3333333333333333, 0.6666666666666666]
        >>> find_average_record({'a','c','e','b'}, d)
        [-0.5, 0.0, 0.75]
        >>> find_average_record({'a'}, d)
        [0.0, 1.0, 1.0]
    """
    vec_sum = np.zeros(len(list(voting_dict.values())[0]))
    for sen in sen_set:
        vec_sum += voting_dict[sen]
    avg_votes = vec_sum / len(sen_set) 
    return avg_votes

average_Democrat_record = [-0.1627907 , -0.23255814,  1.        ,  0.8372093 ,  0.97674419,
       -0.13953488, -0.95348837,  0.81395349,  0.97674419,  0.97674419,
        0.90697674,  0.76744186,  0.6744186 ,  0.97674419, -0.51162791,
        0.93023256,  0.95348837,  0.97674419, -0.39534884,  0.97674419,
        1.        ,  1.        ,  1.        ,  0.95348837, -0.48837209,
        1.        , -0.3255814 , -0.06976744,  0.97674419,  0.86046512,
        0.97674419,  0.97674419,  1.        ,  1.        ,  0.97674419,
       -0.34883721,  0.97674419, -0.48837209,  0.23255814,  0.88372093,
        0.44186047,  0.90697674, -0.90697674,  1.        ,  0.90697674,
       -0.30232558]



## 8: (Task 2.12.9) Bitter Rivals
def bitter_rivals(voting_dict):
    """
    Input: a dictionary mapping senator names to lists representing
           their voting records
    Output: a tuple containing the two senators who most strongly
            disagree with one another.
    Example: 
        >>> voting_dict = {'Klein':[-1,0,1], 'Fox-Epstein':[-1,-1,-1], 'Ravella':[0,0,1], 'Oyakawa':[1,1,1], 'Loery':[1,1,0]}
        >>> br = bitter_rivals(voting_dict)
        >>> br == ('Fox-Epstein', 'Oyakawa') or br == ('Oyakawa', 'Fox-Epstein')
        True
    """
    df = pd.DataFrame(voting_dict)
    mat = df.as_matrix()
    sen_names =  list(voting_dict.keys())
    comps_mat = mat.T.dot(mat)
    comp_min = comps_mat.min()
    min_index = list(zip(*np.where(comps_mat == comp_min)))[0]
    return (sen_names[min_index[0]], sen_names[min_index[1]])

