# A Song of Vice and Higher: Characterizing Presidential Nominees through Game of Thrones

## Prepare to access Reddit's API 

We got the hang of using Reddit's API by following [Shropshire's article](https://towardsdatascience.com/exploring-reddits-ask-me-anything-using-the-praw-api-wrapper-129cf64c5d65).

1.  Install or Update PRAW in your Terminal.

2. Create and/or Login to Your Reddit Account to begin Authenticating via OAuth

### Import Necessary Libraries

In [21]:
import os             # file system stuff
import json           # digest json
import praw           # reddit API
import pandas as pd   # Dataframes
import pymongo        # MongoDB
import helper     # Custom helper functions

### Load the API keys

3. Create your first authorized Reddit instance.

In [22]:
# Define path to secret
secret_path = os.path.join(os.environ['HOME'], 'mia/.secret', 'reddit_api.json')

In [23]:
keys = helper.get_keys(secret_path)

In [24]:
reddit = praw.Reddit(client_id=keys['client_id'] 
                     ,client_secret=keys['api_key']
                     ,username=keys['username']
                     ,password=keys['password']
                     ,user_agent='reddit_research accessAPI:v0.0.1 (by /u/FlatDubs)')

4. Obtain a Subreddit Instance from your Reddit Instance. Ours will come from two different subreddits.

In [25]:
politics = reddit.subreddit('politics')

In [26]:
got = reddit.subreddit('gameofthrones')

5. Obtain a submission instance from your Subreddit instance and compile the submission stats to a list
6. Create a Pandas dataframe of the submission stats

In [27]:
#step 5 obtain submissions through search
got_search = got.search('bran' or 'brandon stark' 
                        or 'jon snow' or 'jon' #will reddit authors be included in results?
                        or 'khaleesi' or 'dany' or 'daenerys' or 'danyris', 
                        sort='comments',
                       limit=5)

#step 5 compile submission into list
title = [] 
num_comments = []
upvote_ratio = []
sub_id = []
i=0

for submission in got_search:
    i+=1
    title.append(submission.title)
    num_comments.append(submission.num_comments)
    upvote_ratio.append(submission.upvote_ratio)
    sub_id.append(submission.id)
    if i%100 == 0:
        print(f'{i} submissions completed')

#step 6 create dataframe
df_got = pd.DataFrame(
    {'title': title,
     'num_comments': num_comments,
     'upvote_ratio': upvote_ratio,
     'id':sub_id
    })
df_got

Unnamed: 0,title,num_comments,upvote_ratio,id
0,[S7E5] Post-Premiere Discussion - S7E5 'Eastwa...,26053,0.98,6tjeos
1,[S6E5] Post-Premiere Discussion - S6E5 'The Door',17604,0.97,4klpws
2,[S6E3] Post-Premiere Discussion - S6E3 'Oathbr...,11830,0.97,4ihick
3,[S6E2] Post-Premiere Discussion - S6E2 'Home',11359,0.98,4hdflw
4,[S7E5] Live Premiere Discussion - S7E5 'Eastwa...,9374,0.97,6tj3lx


In [28]:
#do same for politics 
dem_search = politics.search('kamala', 
                              sort='comments',
                             limit=5)

title = [] 
num_comments = []
upvote_ratio = []
sub_id = []
i=0

for submission in dem_search:
    i+=1
    title.append(submission.title)
    num_comments.append(submission.num_comments)
    upvote_ratio.append(submission.upvote_ratio)
    sub_id.append(submission.id) 
    if i%100 == 0:
        print(f'{i} submissions completed')

df_dem = pd.DataFrame(
    {'title': title,
     'num_comments': num_comments,
     'upvote_ratio': upvote_ratio,
     'id':sub_id
    })
df_dem

Unnamed: 0,title,num_comments,upvote_ratio,id
0,Megathread: AG Willam Barr releases his top li...,45574,0.88,b50gkr
1,Megathread: President Trump delivers remarks o...,32332,0.82,6tx8h7
2,Megathread: Likely Explosive Devices Addressed...,21359,0.9,9rlm9p
3,Megathread: President Trump announces a deal t...,12928,0.88,ajsubi
4,[Megathread] President Trump’s Address on Bord...,9081,0.91,ae2e7b


## Retrieve Comments

In [9]:
submission = reddit.submission(id=df_dem['id'][0])

In [29]:
# Instantiate list to hold comments
test_comments = []
comments_dicts = []

submission.comments.replace_more(limit=5)
for comment in submission.comments.list()[:100]:
#     print(comment.body)
    # List of comments, as strings
    test_comments.append(comment.body)

    # List of comments (dicts)
    comments_dicts.append({
        'comment': comment.body
    })
    

In [67]:
# Check 
test_comments[:10]

['Lawrence O\'Donnell just reporting that the Trump 2020 campaign committee sent an email to his supporters asking them to donate to their "Official Secure the Border Fund," and that those that donated would have their name sent to Trump. Immediately after the speech, a second email went out again asking for the donations. \n\nProblem: the fine print says the money goes to his reelection campaign. \n\nHe held this press conference to scoop up campaign money.',
 'I saw a comment that USA Today Politics ([@usatodayDC](https://twitter.com/usatodayDC)) was live fact checking the address via twitter, so checked it out right at the start. Here are their tweets (including some retweets) from during and after the address. Going to work on the formatting and may continue to update it as the night goes on.\n\nQuotes are the text of the tweets. Citations are links (**[also compiled in a child comment to this post due to character limit](https://www.reddit.com/r/politics/comments/ae2e7b/megathread

In [152]:
# Put them in a dataframe, as POC
pol_df = pd.DataFrame(test_comments, columns=['comment'])

len(pol_df)

100

In [31]:
# #test to see how we'll search strings for later when we use vader
# pol_df['comment'].str.contains('joyann')
# #case sensitive
# #should write function that attributes comment to person 
# #forward slashes in links seem to operate as spaces 
# #make all comments all lowercase to simplify attributing phase

## How about some Vader Sentiment Action?

In [14]:
#pip install --upgrade vaderSentiment

In [32]:
# from vaderSentiment import vaderSentiment

# analyzer = vaderSentiment.SentimentIntensityAnalyzer()

# for comment in test_comments:
#     print(comment)
#     print(analyzer.polarity_scores(comment))
# #https://github.com/cjhutto/vaderSentiment#about-the-scoring 

In [168]:
dems_dict = {'harris':["senator harris","k. harris", "kamala"],
             'biden':['biden'],
            'test':['formatting','format']}

In [96]:
len(pol_df)

100

In [97]:
    for key, value in dems_dict.items():
        if pol_df['comment'].str.contains(value, case=False).any()==True:
            print(key)

first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first step
test
first st

In [153]:
def tokenize_comment(comment):
    list_comment=comment.split(' ')
    return list_comment    

In [156]:
for comment in pol_df['comment']:
    comment= comment.split(' ')

In [157]:
pol_df.head()

Unnamed: 0,comment
0,Lawrence O'Donnell just reporting that the Tru...
1,I saw a comment that USA Today Politics ([@usa...
2,[deleted]
3,Holy shit anyone watching MSNBC? This asshole...
4,Republican on CNN saying how relieved he is th...


# Work with Kena

In [213]:
pol_df['character']=''

In [215]:
def attribute_comment(df):
    """Put character's name in the character column."""
    for index, row in df.iloc[:4,].iterrows(): #loop thru comments
        for key, value in dems_dict.items(): #loop through dictionary
            for item in value:
#                 print(row['comment'])
                if item in row['comment']:
                    df['character'][index]=key
#                     print( item, '\n', row['comment'])
            
#             if column_name.str.contains(value, case=False).any()==True:
#                 print(key)
#             else:
#                 print('fail')

In [216]:
attribute_comment(pol_df)

pol_df.drop(['character'],axis=1,inplace=True)

In [217]:
pol_df

Unnamed: 0,comment,character
0,Lawrence O'Donnell just reporting that the Tru...,
1,I saw a comment that USA Today Politics ([@usa...,test
2,[deleted],
3,Holy shit anyone watching MSNBC? This asshole...,
4,Republican on CNN saying how relieved he is th...,
5,"If Mexico is going to pay for the wall, why do...",
6,President holding federal workers as hostages ...,
7,Lawrence ODonnell just pointed out that trump ...,
8,Chuck Schumer's opening statement pretty much ...,
9,"Let's pretend everything he said was true, and...",


In [161]:
def attribute_comment(column_name, df):
    """Put character's name in the character column."""
    for row in df: #loop thru comments
        for key, value in dems_dict.items(): #loop through dictionary
            if column_name.str.contains(value, case=False).any()==True:
                print(key)
            else:
                print('fail')

In [162]:
attribute_comment(pol_df['comment'], pol_df)

TypeError: unhashable type: 'list'

In [None]:
def attribute_comment(column_name, df):
    """Put character's name in the character column."""
    for key, value in dems_dict.items():
        for row in df: 
            if comment.str.contains(value) in dems_dict.values(), case=False).any()==True:
                print(key)
            else:
                print('fail')

In [123]:
string_example = 'hi werlindo!'
list_example = {'mia':'mia', 'werlindo':'werlindo', 'k':'kamala'}
if 'mia' in list_example:
    print('found')
    print(list_example.keys())
else: 
    print('not found')

found
dict_keys(['mia', 'werlindo', 'k'])
