In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
from ipython_memwatcher import MemWatcher
mw = MemWatcher()
mw.start_watching_memory()

In [4] used 0.000 MiB RAM in 0.005s, peaked 0.000 MiB above current, total RAM usage 43.074 MiB


# Ideal Point Estimation

Here we perform Ideal Point Estimation of legislators in 113th Congress.

## Load Data
### Legislators
First we have to load in all legislators (this is for all of time, from GovTrack).

In [5]:
import ideal_point.raw_data

In [5] used 32.363 MiB RAM in 1.060s, peaked 0.000 MiB above current, total RAM usage 75.438 MiB


In [None]:
legislator_df = ideal_point.raw_data.legislators()
legislator_df.head()

Unnamed: 0,last_name,first_name,birthday,gender,type,state,district,party,url,address,...,thomas_id,opensecrets_id,lis_id,cspan_id,govtrack_id,votesmart_id,ballotpedia_id,washington_post_id,icpsr_id,wikipedia_id
0,Brown,Sherrod,1952-11-09,M,sen,OH,,Democrat,https://www.brown.senate.gov,713 Hart Senate Office Building Washington DC ...,...,136.0,N00003535,S307,5051.0,400050,27018.0,Sherrod Brown,,29389.0,Sherrod Brown
1,Cantwell,Maria,1958-10-13,F,sen,WA,,Democrat,https://www.cantwell.senate.gov,511 Hart Senate Office Building Washington DC ...,...,172.0,N00007836,S275,26137.0,300018,27122.0,Maria Cantwell,,39310.0,Maria Cantwell
2,Cardin,Benjamin,1943-10-05,M,sen,MD,,Democrat,https://www.cardin.senate.gov,509 Hart Senate Office Building Washington DC ...,...,174.0,N00001955,S308,4004.0,400064,26888.0,Ben Cardin,,15408.0,Ben Cardin
3,Carper,Thomas,1947-01-23,M,sen,DE,,Democrat,http://www.carper.senate.gov,513 Hart Senate Office Building Washington DC ...,...,179.0,N00012508,S277,663.0,300019,22421.0,Tom Carper,,15015.0,Tom Carper
4,Casey,Robert,1960-04-13,M,sen,PA,,Democrat,https://www.casey.senate.gov,393 Russell Senate Office Building Washington ...,...,1828.0,N00027503,S309,47036.0,412246,2541.0,"Bob Casey, Jr.",,40703.0,Bob Casey Jr.


In [6] used 10.203 MiB RAM in 0.259s, peaked 0.000 MiB above current, total RAM usage 85.641 MiB


### Votes

Next we can load in all the votes. We get two dataframes from this, `vote_df` and `position_df`.

Each row of `vote_df` corresponds to one roll call vote (like on the passage of a bill).

Each row of `position_df` corresponds to one legislators position on a vote.

In [None]:
vote_df, position_df = ideal_point.raw_data.votes(legislator_df)

In [None]:
vote_df.head()

In [None]:
position_df.head()

## Transform Data

Next we have to transform our data to a format we can train our model on.

Our observed data is basically `position_df`. but instead of categorical `position`s, we need them to
be 1s and 0s. Also, since we aren't using all of the legislators, we need to transform
the `legislator_index` into a relative index. We call this transformed dataframe `model_position_df`.

In [None]:
import ideal_point.ideal_point

In [None]:
model_position_df, model_legislator_index, model_vote_index = ideal_point.ideal_point.transform_data(position_df, vote_df, legislator_df)

In [None]:
model_position_df.head()

In [None]:
model_position_df.info()

The two series `model_legislator_index` and `model_vote_index` map the values in `model_position_df` to the full dataframes, from index to value.

## Create Model

Now we can create our model, given we have observed those votes. The notation is based
on ["Comparing NOMINATE and IDEAL: Points of Difference and Monte Carlo Tests"](http://scholar-qa.princeton.edu/sites/default/files/jameslo/files/lsq_nomvsideal.pdf).

In [None]:
model = ideal_point.ideal_point.create_model(model_position_df)

## Train Model

Now we can run variational inference to compute estimated parameters for the model.

In [None]:
#advi_params = ideal_point.ideal_point.advi_params(model)
#ideal_point.ideal_point.save_advi_params(advi_params)

Or load it from disk, if we have already computed it (takes about an hour and a half on my computer)

In [None]:
advi_params = ideal_point.ideal_point.load_advi_params()

## Integrate Data

Now we can integrate the parameters we learned backed info our `vote_df` and `legislator_df`. We add a `ideology` column to both of them and filter out rows without ideal points. We also add a `bias` to the votes (which is greater if any senator is more likely to vote yes).

In [None]:
legislators_pt_df = ideal_point.ideal_point.leg_add_ideology(legislator_df, model_legislator_index, advi_params)
vote_pt_df = ideal_point.ideal_point.vote_add_ideology_and_bias(vote_df, model_vote_index, advi_params)

### Visualize Points

We can do a quick gut chuck of our legislator ideal points to make sure they seperate democrats and republicans

In [None]:
from altair import *

In [None]:
Chart(legislators_pt_df).mark_tick().encode(
    x='ideology:Q',
    y='party:O',
)

In [None]:
legislators_pt_df.sort_values(by=["ideology"])

### Validation

Some of the most conservative members in our model include Mike Pompeo, who lead the house inquiry into Benghazi, and Randy Weber, who drew fire for a tweet declaring Barack Obama a "socialist dictator."

Some of the most liberal members include Jim McGovern, who represents the pioneer valley and Jerrold Nadler, who represents Manhattan's upper west side. The most liberal legislator, Jan Schakowsky, is a longtime critic of the Iraq war.

The house bills to remove voted on by all democrats are close to the democratic ideology.

In [None]:
r = vote_pt_df[vote_pt_df["question"].str.contains("To repeal the Patient Protection ")]
r

This is to get bills linked w/ duplication data for now

In [None]:
vote_pt_df.sort_values("ideology", ascending=False)

## Text reuse

In [None]:
import pandas as pd
reuse_df = pd.DataFrame.from_csv("pairs_enhanced.txt")
reuse_df["congress_b"] = reuse_df["congress_b"].astype(float)
reuse_df["congress_a"] = reuse_df["congress_a"].astype(float)
reuse_df["bill_no_a"] = reuse_df["bill_no_a"].astype(float)
reuse_df["bill_no_b"] = reuse_df["bill_no_b"].astype(float)

In [None]:
pd.merge(reuse_df, vote_pt_df, how='inner', left_on = 'bill_kind_a',  right_on = 'bill_type')
#result_a = pd.merge(reuse_df, vote_pt_df, how='inner', left_on = [['bill_no_a', 'bill_kind_a']], right_on = [['bill_number', 'bill_type']])
# result_b = pd.merge(reuse_df, vote_pt_df, how='inner', left_on = [['bill_no_b', 'bill_kind_b']], right_on = [['bill_number', 'bill_type']])

#print(result_a.head()[["bill_no_a", "bill_number", "ideology"]])
#print(result_b.head()[["bill_no_b", "bill_number", "ideology"]])