# Overview

This notebook assumes you have already collected and scored MOC tweets. It creates a dataset for use in R to analyze the patterns of polarization over time. You will do some parsing on an AWS server and some locally before ultimately making a CSV file that you can open and analyze in R.

LH Note: on my computer, Git stuff and data live in different places, so you'll see notes about moving files or changing directories. I haven't figured out a good way to keep files in both places or to mirror or sync or whatever. So, for now, paths are hard-coded or there's a note about where to find a file.

# Get Data

2016 election data is on an AWS server under ```/data/purpletag```. 

To login: 

```ssh -i ~/.ssh/carolgrrr.pem ubuntu@purpletag.casmlab.org```

The data is large (> 4GB), so best to run Juypter notebooks to parse on the server. Then CSV files can be used locally.

You can run a notebook on the server and use your local browser with these two commands:

* ```ssh -L 8080:localhost:8888 -i ~/.ssh/carolgrrr.pem ubuntu@purpletag.casmlab.org```
* ```nohup jupyter notebook --no-browser > log.txt 2>&1 &```

Then access ```http://localhost:8080``` in your browser.

## On server: Parsing from ```scores``` files to CSV

This section assumes you have already run purpletag's ```collect``` and ```score``` functions and gotten the Twitter data that you want in JSON format and parsed that data into score files.

In [15]:
# Based on https://stackoverflow.com/questions/26415906/read-multiple-txt-files-into-pandas-dataframe-with-filename-as-column-header
import pandas as pd
import os
import glob

def build_file_list(directory, extension):
    '''
    args: 
        directory - full path to where the files are
            ex: /data/purpletag/scores
        extension - tells us which files to include in the list
            ex: *.l.moc.scores # using 1-day purpletag MOC scores
    '''
    
    # Step 1: get a list of all score files in target directory
    fileList = []
    os.chdir( directory )

    # Step 2: Build up list of files:
    for files in glob.glob(extension): 
        fileName, fileExtension = os.path.splitext(files)
        fileList.append(files) #filename with extension
        
    return fileList

def build_df(fileList, outfile, score_type):
    '''
    args:
        fileList - list of files to include, usually output from build_file_list
        outfile - full path to where to put the df
            ex: /data/purpletag/mocs_by_date.pkl
    '''
    # Step 3: Build up DataFrame:
    # Based on https://stackoverflow.com/questions/35717706/python-how-to-turn-a-dictionary-of-dataframes-into-one-big-dataframe-with-colum
    d = {} # dictionary to hold multiple dfs

    for filename in fileList:
        df1 = pd.read_csv(filename, header=None, sep=' ', index_col=0)
        if score_type == 'moc': # moc score files
            d[filename[:-13]] = df1
        else: # tag score files
            d[filename[:-9]] = df1

    df = pd.concat(d, axis=1)
    df.columns = df.columns.droplevel(-1) 

    df.to_pickle(outfile)

In [16]:
fileList = build_file_list('/data/purpletag/scores', '*.1.moc.scores')
build_df(fileList, '/data/purpletag/mocs_by_date_test.pkl', 'moc')

Move the file from the AWS server to local if you want to work locally. For example, to move the file ```mocs_by_date.pkl``` from the server to my local repo, I use:

```scp -i ~/.ssh/carolgrrr.pem ubuntu@purpletag.casmlab.org:/data/purpletag/mocs_by_date.pkl ~/Documents/git/casmlab/purpletag/files/```

## Locally: Prepping for stats

We now have a pickled dataframe of the form handleXdate. We need to keep data only from Labor Day to Election Day and get weekly averages.

In [50]:
import pandas as pd

df = pd.read_pickle('/data/purpletag/mocs_by_date_test.pkl')
df.head()

Unnamed: 0,2015-11-10,2015-11-11,2015-11-12,2015-11-13,2015-11-14,2015-11-15,2015-11-16,2015-11-17,2015-11-18,2015-11-19,...,2016-10-30,2016-10-31,2016-11-01,2016-11-02,2016-11-03,2016-11-04,2016-11-05,2016-11-06,2016-11-07,2016-11-08
austinscottga08,,2.40933,,,,,,,,,...,,,,,,,,,,
benniegthompson,,,,,,,,3.85585,,,...,,,,,,,,,,
bettymccollum04,,,,,,,,,,,...,,,-90.838,-61.3435,-40.482,,-33.4513,,,
billpascrell,,,-1.16875,-0.916501,-0.972477,,,,,,...,,,,-17.9542,,,,,,
boblatta,,,0.100723,,,1.70286,,1.56897,,-0.031978,...,,,,,,,,,,


In [51]:
def weekly_avg(df):
    '''
    Given a df from build_df, keep just the weeks we are interested in.
    '''
    week1_dates = ['2016-09-06','2016-09-07','2016-09-08','2016-09-09','2016-09-10','2016-09-11','2016-09-12']
    week2_dates = ['2016-09-13','2016-09-14','2016-09-15','2016-09-16','2016-09-17','2016-09-18','2016-09-19']
    week3_dates = ['2016-09-20','2016-09-21','2016-09-22','2016-09-23','2016-09-24','2016-09-25','2016-09-26']
    week4_dates = ['2016-09-27','2016-09-28','2016-09-29','2016-09-30','2016-10-01','2016-10-02','2016-10-03']
    week5_dates = ['2016-10-04','2016-10-05','2016-10-06','2016-10-07','2016-10-08','2016-10-09','2016-10-10']
    week6_dates = ['2016-10-11','2016-10-12','2016-10-13','2016-10-14','2016-10-15','2016-10-16','2016-10-17']
    week7_dates = ['2016-10-18','2016-10-19','2016-10-20','2016-10-21','2016-10-22','2016-10-23','2016-10-24']
    week8_dates = ['2016-10-25','2016-10-26','2016-10-27','2016-10-28','2016-10-29','2016-10-30','2016-10-31']
    week9_dates = ['2016-11-01','2016-11-02','2016-11-03','2016-11-04','2016-11-05','2016-11-06','2016-11-07']

    df['week1'] = df[week1_dates].mean(axis=1)
    df['week2'] = df[week2_dates].mean(axis=1)
    df['week3'] = df[week3_dates].mean(axis=1)
    df['week4'] = df[week4_dates].mean(axis=1)
    df['week5'] = df[week5_dates].mean(axis=1)
    df['week6'] = df[week6_dates].mean(axis=1)
    df['week7'] = df[week7_dates].mean(axis=1)
    df['week8'] = df[week8_dates].mean(axis=1)
    df['week9'] = df[week9_dates].mean(axis=1)
    
    return df

df = weekly_avg(df)
df.head()

Unnamed: 0,2015-11-10,2015-11-11,2015-11-12,2015-11-13,2015-11-14,2015-11-15,2015-11-16,2015-11-17,2015-11-18,2015-11-19,...,2016-11-08,week1,week2,week3,week4,week5,week6,week7,week8,week9
austinscottga08,,2.40933,,,,,,,,,...,,,4.932035,,1.06429,3.6757,6.0426,1.56161,,
benniegthompson,,,,,,,,3.85585,,,...,,,-1.21138,,,,-1.82301,,,
bettymccollum04,,,,,,,,,,,...,,-56.669016,-119.88476,-67.1729,-59.540265,-27.995798,-33.955595,-4.879673,-33.080633,-56.5287
billpascrell,,,-1.16875,-0.916501,-0.972477,,,,,,...,,-2.87143,-2.62381,-2.353348,-2.337897,,-1.02655,,-1.33333,-17.9542
boblatta,,,0.100723,,,1.70286,,1.56897,,-0.031978,...,,1.37997,9.673527,,,,0.974138,,,


In [52]:
weekly_df = df[['week1','week2','week3','week4','week5','week6','week7','week8','week9']]
weekly_df

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
austinscottga08,,4.932035,,1.064290,3.675700,6.042600,1.561610,,
benniegthompson,,-1.211380,,,,-1.823010,,,
bettymccollum04,-56.669016,-119.884760,-67.172900,-59.540265,-27.995798,-33.955595,-4.879673,-33.080633,-56.528700
billpascrell,-2.871430,-2.623810,-2.353348,-2.337897,,-1.026550,,-1.333330,-17.954200
boblatta,1.379970,9.673527,,,,0.974138,,,
bradsherman,,-1.941180,,-5.381950,,,,,-0.748092
call_me_dutch,-1.312583,-4.765343,-3.129013,-1.573270,-2.967365,-2.715476,-20.466664,-1.576630,-6.732820
candicemiller,1.678195,,0.029657,-0.969076,,1.086960,,,
cathymcmorris,7.926215,3.141282,8.873570,13.830632,1.000000,7.823813,3.747878,11.442992,10.554400
cbrangel,-39.434357,-61.711175,-41.164824,-37.333904,-19.490345,-32.642642,-8.514975,-4.536950,-18.458800


In [53]:
weekly_df.sort_values(by = 'week2', ascending = True)

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
replawrence,-28.383236,-308.088000,-18.402877,-32.248687,0.698665,-43.500575,,,-0.811634
repdennyheck,,-300.145000,-10.527903,-17.045025,,-18.147370,-16.686711,-40.557700,-62.144650
repbobbyrush,,-270.786000,-30.679700,-1.161955,,,,-8.652900,-1.898720
nitalowey,-1.298840,-215.463000,-39.049700,-6.699170,,-11.493604,-1.084320,-3.043239,-18.510867
repcleaver,-6.302088,-194.110000,-8.202654,-32.020135,-0.119293,0.375556,-18.248317,-3.684110,-9.614692
repricklarsen,-1.214050,-182.703000,-17.130765,-2.510251,-18.431150,-4.880172,-1.862667,-2.021230,-4.488790
frankpallone,-8.588512,-151.124760,-25.066625,-23.365505,-9.540670,-19.151520,-15.818077,-3.291823,-6.997820
louiseslaughter,-21.443100,-148.318000,,-0.011527,-11.180330,-0.911504,,-1.255100,-1.822600
usrepmikedoyle,,-142.917000,-1.560995,,,,-1.010750,,-7.221050
rephuffman,-39.934045,-137.699665,-17.295775,-74.548650,-11.781800,,-30.461250,-30.347825,-32.612780


In [164]:
import pandas as pd
import yaml

# get the data from Govtrack
with open('/Users/libbyh/Dropbox/CASM/SMCE/Shared Social Media and Civic Engagement/Data/purpletag/legislators-social-media.yaml', 'r') as f:
    df_social = pd.io.json.json_normalize(yaml.load(f))

with open('/Users/libbyh/Dropbox/CASM/SMCE/Shared Social Media and Civic Engagement/Data/purpletag/legislators-current.yaml', 'r') as f:
    df_current = pd.io.json.json_normalize(yaml.load(f))

print(len(weekly_df))
# merge everything into one data frame with one row per MOC
df_meta = pd.merge(df_current, df_social, on="id.govtrack")
df_meta["handle"] = df_meta["social.twitter"].str.lower()
weekly_df["handle"] = weekly_df.index.str.lower()

print(len(df_meta))

df_merged = pd.merge(df_meta, weekly_df, left_on="handle", right_index=True)

print(len(df_merged))

#cols_to_keep = ['id.govtrack','social.twitter','name.official_full','bio.gender','terms','week1','week2','week3','week4','week5','week6','week7','week8','week9']

df_merged = df_merged[['id.govtrack','social.twitter','name.official_full','bio.gender','terms','week1','week2','week3','week4','week5','week6','week7','week8','week9']]


511
529
444


Not sure why we have only 444 matches, but it's better than 12.

In [165]:
df1 = pd.concat([df_merged.drop(['terms'], axis=1), df_merged['terms'].apply(pd.Series)], axis=1)
df2 = pd.concat([df1.drop([0], axis=1), df1[0].apply(pd.Series)], axis=1)

keep_df = df2[['id.govtrack','social.twitter','name.official_full','bio.gender','type','party','week1','week2','week3','week4','week5','week6','week7','week8','week9']]
keep_df

Unnamed: 0,id.govtrack,social.twitter,name.official_full,bio.gender,type,party,week1,week2,week3,week4,week5,week6,week7,week8,week9
0,400050,SenSherrodBrown,Sherrod Brown,M,rep,Democrat,-0.222430,-1.141480,-2.894056,-28.188720,-0.587860,1.175650,-19.819673,-2.136967,-0.367748
1,300018,SenatorCantwell,Maria Cantwell,F,rep,Democrat,-118.229375,-12.501053,-5.904561,-39.350840,-11.031340,-20.399800,-33.510324,-18.325139,-28.551134
2,400064,SenatorCardin,Benjamin L. Cardin,M,rep,Democrat,-25.636298,-8.955374,-10.335548,-28.631307,0.866007,-25.627037,-16.434565,-21.518943,-17.753120
3,300019,SenatorCarper,Thomas R. Carper,M,rep,Democrat,-155.209127,-0.681740,-8.735320,-12.539793,-14.515218,-0.307598,-6.753922,-1.830320,-6.501631
4,412246,SenBobCasey,"Robert P. Casey, Jr.",M,sen,Democrat,-36.255503,-6.993170,-1.272941,-28.430050,-0.082187,-5.102067,-16.612726,-12.302640,-77.053400
5,412248,SenBobCorker,Bob Corker,M,sen,Republican,1.884995,8.134470,3.616570,7.727762,,1.819670,,1.748260,
6,300043,SenFeinstein,Dianne Feinstein,F,sen,Democrat,-63.976412,-759.904700,-38.750658,-55.663120,-7.274102,-6.091820,-29.935897,,-161.321474
7,300052,SenOrrinHatch,Orrin G. Hatch,M,sen,Republican,78.846271,155.737157,147.056983,155.817800,20.409763,66.542193,16.915500,52.884700,2.214622
9,412243,McCaskillOffice,Claire McCaskill,F,sen,Democrat,-4.946375,-728.367104,-52.026723,-3.490048,-0.763953,-10.002037,-0.857143,-2.249950,-0.748092
10,400272,SenatorMenendez,Robert Menendez,M,rep,Democrat,-36.628585,-9.758632,-0.829845,-29.610837,-2.319673,-19.566032,-16.767273,-6.268110,-242.438693


In [170]:
# melt it so each row is a person x week
df_long = pd.melt(keep_df, id_vars=['id.govtrack','social.twitter','name.official_full','bio.gender','party','type'],
                value_vars=['week1','week2','week3','week4','week5','week6','week7','week8','week9'],
                var_name='week', value_name='avg_score')
df_long['week'] = df_long['week'].str[-1:]

df_long.rename(columns = {'type':'chamber', 'social.twitter': 'handle', 'name.official_full': 'name', 'bio.gender': 'gender'}, inplace = True)

df_long.head()

Unnamed: 0,id.govtrack,handle,name,gender,party,chamber,week,avg_score
0,400050,SenSherrodBrown,Sherrod Brown,M,Democrat,rep,1,-0.22243
1,300018,SenatorCantwell,Maria Cantwell,F,Democrat,rep,1,-118.229375
2,400064,SenatorCardin,Benjamin L. Cardin,M,Democrat,rep,1,-25.636298
3,300019,SenatorCarper,Thomas R. Carper,M,Democrat,rep,1,-155.209127
4,412246,SenBobCasey,"Robert P. Casey, Jr.",M,Democrat,sen,1,-36.255503


In [175]:
# make sure we have just two parties
df_long.party.unique()

array(['Democrat', 'Republican'], dtype=object)

In [176]:
# get an absolute value of the polar score
df_long['abs'] = df_long['avg_score'].abs()

In [177]:
df_long.to_csv('data-files/weekly_averages_long.csv')

# Now move to R for analysis

Run ```~/Documents/git/casmlab/purpletag/2016_election.R```

That R script sends its output to ```2016_election_results.txt```

In [180]:
results = open('data-files/2016_election_results.txt', 'r')
print(results.read())


> # for pretty regression tables
> # http://stackoverflow.com/questions/30195718/stargazer-save-to-file-dont-show-in-console
> mod_stargazer <- functi .... [TRUNCATED] 

> df <- read.csv('weekly_averages_long.csv', header = TRUE, sep = ",", quote = "\"",
+                dec = ".", fill = TRUE, comment.char = "")

> summary(df)
       X           id.govtrack                 handle                 name      gender  
 Min.   :   0.0   Min.   :300002   AustinScottGA08:   9   Adam B. Schiff:   9   F: 792  
 1st Qu.: 998.8   1st Qu.:400326   BennieGThompson:   9   Adam Kinzinger:   9   M:3204  
 Median :1997.5   Median :412292   BettyMcCollum04:   9   Adam Smith    :   9           
 Mean   :1997.5   Mean   :401868   BillPascrell   :   9   Adrian Smith  :   9           
 3rd Qu.:2996.2   3rd Qu.:412533   BobLatta       :   9   Al Franken    :   9           
 Max.   :3995.0   Max.   :412674   BradSherman    :   9   Al Green      :   9           
                                   (Other)    

Based on the outlier-excluded linear mixed-effects models, it makes sense to remove RepThompson. The pattern stays the same even with RepThompson in the set though: negative effect for republican and week, positive effect for their interaction. ```lmm5``` is the model-of-best-fit. 

## Changing the way we score hashtags

What if we score tags for the 63-day period and then score MOCS?

Run the following (on the server) to get new scores:

* purpletag parse -t 63 -d 200
* purpletag score
* purpletag score --counts --score-mocs

That first command took a week because the code starts with today and works backwards 200 days, one day at a time. Each day takes over an hour. See Issue #18 about options for changing this behavior.

With the new tag measures, can start the process over. Start at "On Server: Parsing scores to CSV" with a new file name.

# Getting Tag Data for Paper

We need to know more about the tags people were using to make sense of the regression results. So, let's get some tag data.'

In [17]:
score_files = build_file_list('/data/purpletag/scores', '*.1.scores')
build_df(score_files, '/data/purpletag/scores_by_date.pkl', 'tag')

In [31]:
import pandas as pd

df = pd.read_pickle('/data/purpletag/scores_by_date.pkl')
df.head()

Unnamed: 0,2015-11-10,2015-11-11,2015-11-12,2015-11-13,2015-11-14,2015-11-15,2015-11-16,2015-11-17,2015-11-18,2015-11-19,...,2016-10-30,2016-10-31,2016-11-01,2016-11-02,2016-11-03,2016-11-04,2016-11-05,2016-11-06,2016-11-07,2016-11-08
02byyou,,,,,,,,,,,...,,,,,,,,,,
0h03,,,,,,,,,-1.09155,,...,,,,,,,,,,
10000minutes,,,,,,,,,,,...,,,,,,,,,,
1000blackgirlbooks,,,,,,,,,,,...,,,,,,,,,,
1000culverts,,,,,,,,,,,...,,,,,,,,,,


In [36]:
df = weekly_avg(df)
df.head()

Unnamed: 0,2015-11-10,2015-11-11,2015-11-12,2015-11-13,2015-11-14,2015-11-15,2015-11-16,2015-11-17,2015-11-18,2015-11-19,...,2016-11-08,week1,week2,week3,week4,week5,week6,week7,week8,week9
02byyou,,,,,,,,,,,...,,,,,,,,,,
0h03,,,,,,,,,-1.09155,,...,,,,,,,,,,
10000minutes,,,,,,,,,,,...,,,,,,,,,,
1000blackgirlbooks,,,,,,,,,,,...,,,,,,,,,,
1000culverts,,,,,,,,,,,...,,,,,,,,,,


In [42]:
df_tags_weeks = df[['week1','week2','week3','week4','week5','week6','week7','week8','week9']]
df_tags_weeks = df_tags_weeks.dropna(how='all')
df_tags_weeks.head()
len(df_tags_weeks.index) # number of hashtags in our df

8788

What was happening in week 2 that made Democrats so polarized that week?

In [48]:
week2 = df_tags_weeks.sort_values(by = 'week2', ascending = True)
week2.head(10)

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
closetheloophole,,-59.2059,,,,,,,
gunvote,,-37.42143,-1.04795,,,,,,
doyourjob,-38.184,-35.275373,-19.99351,-9.581359,-8.85495,-1.294718,-0.857143,-1.294215,
noflynobuy,-2.327613,-31.11748,-0.895503,-2.447173,-1.0,-0.639344,-0.848739,-1.15044,-0.862385
zika,-10.867272,-17.640425,-13.09881,0.378861,0.409471,0.770685,-0.523072,-0.240739,
gunviolence,-12.015845,-16.567943,-1.425833,-1.305606,-5.510243,-8.491866,-4.862071,-1.556957,-1.096452
vawa,,-9.79423,-0.876106,,-1.043135,-1.0991,,-1.2551,
vawa22,,-6.92,,,,,,,
citizenshipday,,-6.87434,,,,,,,
flint,-7.977703,-6.397713,-17.578173,-10.841666,,-0.92,-1.71429,,


In [70]:
week2 = df_tags_weeks.sort_values(by = 'week2', ascending = False)
week2.head(10)

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
gitmo,0.92586,10.028098,2.044487,1.461175,1.18182,,,0.869231,
afbday,,9.92286,1.28721,,,,,,
nationalpeanutday,,7.80347,,,,,,,
betterway,11.940174,7.387886,7.833811,7.556914,7.35754,7.865916,8.01733,13.162771,14.429357
obamacare,4.67052,6.241857,2.69893,5.81081,5.181158,5.950158,4.944576,14.13046,11.389783
vaaccountability,,6.17962,,,1.03478,,,,1.33673
powmiarecognitionday,,6.11426,1.14141,,,,,,
constitutionday2016,,5.199355,0.199447,,,,,,
choiceact,0.852349,5.028185,1.275,1.06429,,,2.224915,,
missamerica,0.967105,4.64282,,,,,,,


In [71]:
week9_dems = df_tags_weeks.sort_values(by = 'week9', ascending = True)
week9_dems.head(10)

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
flashbackfriday,1.7047,-2.42276,-2.05217,-0.004867,-0.846154,-0.911504,,2.25,-33.4513
wagegap,-1.17323,-1.645199,-1.02609,,-0.654545,,-0.857143,,-32.916
strongeramerica,,-1.345185,,,-1.0,-5.721689,-0.949151,-28.0,-32.74485
latinaequalpay,,,,,,,,,-19.20651
studentloans,,-1.120145,,-1.24752,,-16.377675,,,-18.123395
equalpay,-1.0035,-1.375968,-3.175365,-1.053165,-1.414139,-0.969027,-0.933947,-1.540167,-10.697953
protectpell,,,,,,,,,-7.085778
getcovered,,,,-1.07639,,,,,-6.854688
trabajadoras,,,,,,,,,-6.73282
parisagreement,-2.928,,-1.60954,,-4.748297,1.007904,-0.944954,,-6.132178


In [72]:
week9_reps = df_tags_weeks.sort_values(by = 'week9', ascending = False)
week9_reps.head(10)

Unnamed: 0,week1,week2,week3,week4,week5,week6,week7,week8,week9
betterway,11.940174,7.387886,7.833811,7.556914,7.35754,7.865916,8.01733,13.162771,14.429357
obamacare,4.67052,6.241857,2.69893,5.81081,5.181158,5.950158,4.944576,14.13046,11.389783
tbt,1.753045,0.727676,-0.916108,-0.367255,0.955896,-0.55058,1.770165,-0.313871,3.27988
obamacarehorror,,,,,,,,,3.14259
wikileaks,,,,,,,1.05825,0.869231,2.67347
curesnow,2.131257,1.179782,1.091992,1.875733,2.307863,1.08696,2.088168,1.375054,2.568982
mobilebanking,,,,,,,,,2.31915
china,-0.976,0.876149,0.974576,,1.008405,1.04167,0.989362,1.5,2.27723
northdakota,-1.17323,,,,,,,0.75,2.27723
obamacarefail,,,,,,,,1.340804,2.27723


# Go to the JSON for examples

In [64]:
import json
from datetime import datetime

# print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))

# search JSON for high score tweeters
with open('/data/purpletag/jsons/1478545228.json','r') as f:
    for line in f:
        data = json.loads(line)
        if (data['user']['screen_name'].lower() == 'replawrence' 
            or data['user']['screen_name'].lower() == 'repdennyheck'): # most polarized Dems during Week 2
            print(data['user']['screen_name'])
            print(data['created_at'])
            print(data['text'])
                        
# print(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) # takes about 2 min to search the whole thing

2017-06-07 02:43:08
RepDennyHeck
Fri Nov 04 14:03:09 +0000 2016
RT @RepSwalwell: #FlashbackFriday to 1971, when 18 y/os got the right to vote. Young people have shaped the nation since. Let's keep that g…
RepDennyHeck
Thu Nov 03 20:17:57 +0000 2016
RT @repjohnlewis: I’ve marched, protested, been beaten and arrested--all for the right to vote. Friends of mine gave their lives. Honor the…
RepDennyHeck
Wed Nov 02 14:11:08 +0000 2016
RT @WAUTC: "Thank you @RepDennyHeck and @CristinGoodwin for a great discussion on cyber security at Camp Murray today." - UTC Chairman Dann…
RepDennyHeck
Mon Oct 31 16:37:20 +0000 2016
RT @thenewstribune: Today is last day to register to vote for the Nov. 8 election: https://t.co/gYuw1eVzvX #waelex
RepDennyHeck
Mon Oct 31 14:40:22 +0000 2016
RT @RepDianaDeGette: What's scarier than #Halloween? Big #studentloans. @HouseDemocrats want to let you refinance. #StrongerAmerica https:/…
RepDennyHeck
Thu Oct 27 18:45:01 +0000 2016
Veterans Day program to honor decorat

In [69]:
matches = list()

# search JSON for high score tags
with open('/data/purpletag/jsons/1478545228.json','r') as f:
    for line in f:
        data = json.loads(line)
        tags = data['entities']['hashtags']
        for tag in tags:
            if tag['text'].lower() == 'gunvote':
                matches.append(data['id'])
                print(data['user']['screen_name'])
                print(data['created_at'])
                print(data['text'])

print(len(matches))

RepWilson
Wed Sep 14 17:30:19 +0000 2016
American people want Congress to #DoYourJob. End gun violence in our community. Give us a vote. #NoFlyNoBuy  #CloseTheLoophole #GunVote
RepWilson
Wed Sep 14 17:10:09 +0000 2016
Enough moments of silence.  We need moments of action.  @SpeakerRyan: give us a vote #NoFlyNoBuy  #CloseTheLoophole #GunVote
RepWilson
Wed Sep 14 16:58:07 +0000 2016
Retweet if you stand with @HouseGVP call for @SpeakerRyan to allow a vote to close background check loophole #GunVote
RepWilson
Wed Sep 14 16:53:06 +0000 2016
@PewResearch: 85% of people support closing gun show loophole. @SpeakerRyan give us a vote #CloseTheLoophole #GunVote
RepWilson
Wed Sep 14 16:45:07 +0000 2016
Thousands killed w/ guns over reckless Republican recess. When will @SpeakerRyan give us a vote? #NoFlyNoBuy #CloseTheLoophole #GunVote
RepWilson
Wed Sep 14 16:32:05 +0000 2016
272 mass shootings this year. When will @SpeakerRyan give us a vote? #NoFlyNoBuy  #CloseTheLoophole #GunVote
NydiaVelazqu