# Data Acquistion from Reddit

Go to <a href=#bookmark>bookmark</a>

### 2019-06-08 - Goal - Develop End-to-End Data Flow, at least at small scale.

![](https://images.unsplash.com/photo-1515255384510-23e8b6a6ca3c?ixlib=rb-1.2.1&auto=format&fit=crop&w=1489&q=80)

---

## Libraries

In [1]:
# Install libs on this computer:
# !pip install praw
# !pip install pymongo
# !pip install psycopg2

In [2]:
import os             # file system stuff
import json           # digest json
import praw           # reddit API
import pandas as pd   # Dataframes
import pymongo        # MongoDB
import numpy as np    # math and arrays

import time           # To time stuff

#DATA STORAGE
from sqlalchemy import create_engine # SQL helper
import psycopg2 as psql #PostgreSQL DBs

In [3]:
import helper     # Custom helper functions

In [4]:
import reddit_data # Custom reddit scraper

---

## 1A Load Reddit keys

Step 3: Create your first Authorized Reddit Instance

In [5]:
# Define path to secret

secret_path = os.path.join(os.environ['HOME'], '.secret', 'reddit.json')
#secret_path = os.path.join(os.environ['HOME'], 'mia/.secret', 'reddit_api.json')

secret_path

'/Users/werlindo/.secret/reddit.json'

#### Save submissions to DB

In [6]:
# Define path to secret

secret_path_aws = os.path.join(os.environ['HOME'], '.secret', 
                           'aws_ps_flatiron.json')
secret_path_aws

'/Users/werlindo/.secret/aws_ps_flatiron.json'

## 1B Load AWS-PostgreSQL DB keys

#### Load keys

In [7]:
aws_keys = helper.get_keys(secret_path_aws)
user = aws_keys['user']
ps = aws_keys['password']
host = aws_keys['host']
db = aws_keys['db_name']

In [8]:
aws_ps_engine = ('postgresql://' + user + ':' + ps + '@' + host + '/' + db)

### Use SQLAlchemy to create PSQL engine

In [9]:
# dialect+driver://username:password@host:port/database
sql_alch_engine = create_engine(aws_ps_engine)

## 2 Load keys, Create Reddit Instance

In [10]:
keys = helper.get_keys(secret_path)

In [11]:
reddit = praw.Reddit(client_id=keys['client_id'] 
                     ,client_secret=keys['api_key']
                     ,username=keys['username']
                     ,password=keys['password']
                     ,user_agent='reddit_research accessAPI:v0.0.1 (by /u/FlatDubs)')

---

## 3 Initialize parameters for this submissions pull

https://ballotpedia.org/Presidential_candidates,_2020

https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_characters

In [12]:
subreddit_nm = 'gameofthrones'

# query = """
#         "qyburn" OR "yara"
        
#         """

query = "harry strickland"

results_lim = 1000

nm_subs_tbl = 'got_subs'
nm_comms_tbl = 'got_comms'

### Create list of multiples so can run loop of searches.

In [39]:
queries = [
# """ "qyburn" OR "yara" OR "shaggydog" OR "greywind" OR "summer" """
# ,""" "petyr" OR "greyworm" OR "olenna" OR "pycelle" OR "ghost"  """
# ,""" "lady" OR "nymeria"  """   
# ,
#     """ "beto" OR "bernie" OR "julian" OR "tulsi" OR "cory" OR "elizabeth" OR "kirsten" """    
#     """ "gray worm" OR "grayworm" OR "tormund" OR "giantsbane" OR "night king" """
#    """  " nk " OR "lyanna" OR "oberyn" OR "red viper" """
    """ "deblasio" OR "blasio" """
    ,
    """ "nightking" OR "tirion" OR "brianne" OR "euron" OR "cercei" OR "jaime" """
    ,
    """ "varis" OR "joff" OR "danny" OR "danaerys" OR "greyworm" OR "daenarys" """
]

In [40]:
subreds = [
            'politics'
#            ,'gameofthrones'
#            ,'gameofthrones'
#            ,
           ,'gameofthrones'
           ,'gameofthrones'
    
          ]

In [41]:
sub_tbls = [
            'supp_dem_subs'
#             ,'supp_got_subs'
           ,'supp_got_subs'
#            ,
            ,'supp_got_subs'
]

In [42]:
comm_tbls = [
                'supp_dem_comms'
               ,'supp_got_comms'
#            ,'supp_got_comms'
#            ,
               ,'supp_got_comms'
]

In [43]:
search_set = zip(subreds, queries, sub_tbls, comm_tbls)

In [44]:
search_set = list(search_set)

In [45]:
search_set

[('politics', ' "deblasio" OR "blasio" ', 'supp_dem_subs', 'supp_dem_comms'),
 ('gameofthrones',
  ' "nightking" OR "tirion" OR "brianne" OR "euron" OR "cercei" OR "jaime" ',
  'supp_got_subs',
  'supp_got_comms'),
 ('gameofthrones',
  ' "varis" OR "joff" OR "danny" OR "danaerys" OR "greyworm" OR "daenarys" ',
  'supp_got_subs',
  'supp_got_comms')]

---

## 4 Get subreddit submissions and their comments

The following **for** loop goes through the search set and obtains the related submissions.

In [46]:
for search in search_set:
    reddit_data.get_subred_subs_coms(praw_reddit=reddit
                        ,sql_alch_engine=sql_alch_engine
                        ,subreddit_nm=search[0]
                        ,query=search[1]
                        ,results_lim=results_lim
                        ,nm_subs_tbl=search[2]
                        ,nm_comms_tbl=search[3]
                        ,)

Starting: Thu Jun 13 13:21:45 2019

Searching on these terms:

 "deblasio" OR "blasio" 


Retrieved submissions.


Retrieved comments.
Writing to supp_dem_subs


Writing to supp_dem_comms



Finished: Thu Jun 13 13:23:44 2019


It took 1.97 minutes to complete.


There were 55 submissions added.


There were 848 comments added.


Starting: Thu Jun 13 13:23:44 2019

Searching on these terms:

 "nightking" OR "tirion" OR "brianne" OR "euron" OR "cercei" OR "jaime" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved comments.
Writing to supp_got_subs


Writing to supp_got_comms



Finished: Thu Jun 13 13:48:30 2019


It took 24.77 minutes to complete.


There were 245 submissions added.


There were 27,636 comments added.


Starting: Thu Jun 13 13:48:30 2019

Searching on these terms:

 "varis" OR "joff" OR "danny" OR "danaerys" OR "greyworm" OR "daenarys" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved commen

---

That should all there be to getting data now. 
- Note that results just append. So will eventually need to either get rid of dupes on the SQL or python side.

---

# Prior Runs

For simple note taking, tooks the printed comments above and just saved in the section below as markdown.

## Game of Thrones

persons = """"
doran" OR "davos"
"""

persons = """
            "bran" OR 'brandon stark' OR 'jon snow' OR 'jon' 
                         OR 'khaleesi' OR 'dany' OR 'daenerys' OR 'danyris'
          """
          
It took 14.21 minutes to complete.
There were 249 submissions added.
There were 11272 comments added.

persons = """
            "cersei" OR 'tyrion' OR 'sansa' OR 'arya' 
                        OR 'stannis' OR 'varys' OR 'jamie' OR 'brienne'
"""

It took 92.47 minutes to complete.  
There were 246 submissions added.  
There were 65,896 comments added.

persons = """
            "samwell" OR "jorah" OR "theon" OR "hound" OR "littlefinger" 
          """

It took 30.70 minutes to complete.  
There were 246 submissions added.  
There were 30,374 comments added.  

persons = """
            "joffrey" OR "sandor" OR "mountain" OR "gregor" OR "baelish" 
          """  
          
It took 24.55 minutes to complete.  
There were 249 submissions added.  
There were 23,146 comments added.

persons = """
            "robb" OR "drogo" OR "melisandre" OR "bronn" OR "gilly" OR
            "ramsey" OR "missandei" OR "gendry" OR "grey worm"
          """
          
It took 23.15 minutes to complete.  
There were 249 submissions added.  
There were 28,920 comments added.  

persons = """
            "ned" OR "eddard" OR "catelyn" OR "bronn" OR "torumund" OR
            "robert" OR "tommen" OR "viserys" OR "margaery"
          """
          
It took 21.79 minutes to complete.  
There were 250 submissions added.  
There were 23,731 comments added.    

---

query = """
        "lyanna mormont"  OR "jaqen" OR "hodor" OR "ygritte" OR "mance" OR "hodor" OR "ramsay"
        """
        
Starting: Wed Jun 12 06:00:05 2019

100 submissions completed
200 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to got_subs
Writing to got_comms

Finished: Wed Jun 12 06:21:19 2019

It took 21.23 minutes to complete.
There were 248 submissions added.
There were 8,343 comments added.

---

query = """
        "oberyn" OR "viper" OR "tormund" OR "tywin" OR "night king"
        """
        
Starting: Wed Jun 12 06:00:43 2019

100 submissions completed
200 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to got_subs
Writing to got_comms

Finished: Wed Jun 12 06:34:19 2019

It took 33.61 minutes to complete.
There were 249 submissions added.
There were 24,784 comments added.


## Democratic Candidates

persons = """
            "kamala" OR "senator harris" OR "K. Harris" OR "biden" OR 
            "mayor pete" OR "buttigidg" OR "buttigieg" OR "bootijedge"
        """
        
It took 80.46 minutes to complete.  
There were 249 submissions added.  
There were 61,298 comments added.

persons = """
            "gillibrand" OR "hickenlooper" OR "klobuchar" OR "warren" OR
            "booker" OR "inslee" OR "castro" OR "gabbard" OR "sanders" 
            
It took 122.33 minutes to complete.  
There were 250 submissions added.  
There were 95,034 comments added.  

persons = """
            "de blasio" OR "bullock" OR "gravel" OR "messam"  
        """  
        
It took 7.84 minutes to complete.  
There were 91 submissions added.  
There were 2,486 comments added.

persons = """
            "o'rourke"  
        """
        
It took 17.92 minutes to complete.  
There were 87 submissions added.  
There were 7,150 comments added.

persons = """
            "bennet" OR "delaney" OR "moulton" OR "swalwell" OR "williamson"
            OR "yang"
        """  
        
It took 18.63 minutes to complete.  
There were 97 submissions added.  
There were 6,907 comments added.

-------

query = """
        "amy klobuchar" OR "wayne messam" OR "seth moulton" OR 
        "beto o'rourke" OR "tim ryan"
        """

Starting: Mon Jun 10 07:27:24 2019

Retrieved submissions.
Retrieved comments.
Writing to dems_subs
Writing to dems_comms

Finished: Mon Jun 10 07:34:19 2019

It took 6.92 minutes to complete.
There were 73 submissions added.
There were 5,528 comments added.

---

query = """
        "bernie sanders" OR "eric swalwell" OR "elizabeth warren" OR "marianne williamson" OR "andrew yang"
        """
        
Starting: Tue Jun 11 06:21:48 2019

100 submissions completed
200 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to dems_subs
Writing to dems_comms

Finished: Tue Jun 11 07:12:29 2019

It took 50.68 minutes to complete.
There were 250 submissions added.
There were 61,467 comments added.

---

query ="""
         "michael bennet" OR "joe biden" OR "bill de blasio" OR "cory booker" OR "steve bullock"
        """

Starting: Tue Jun 11 20:59:57 2019

100 submissions completed
200 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to dems_subs
Writing to dems_comms

Finished: Tue Jun 11 21:23:51 2019

It took 23.90 minutes to complete.
There were 248 submissions added.
There were 27,128 comments added.

---
query = """
        "pete buttigieg" OR "juli√°n castro" OR "john delaney" OR "tulsi gabbard" OR "kirsten gillibrand"
        """

Starting: Tue Jun 11 21:25:58 2019

100 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to dems_subs
Writing to dems_comms

Finished: Tue Jun 11 21:36:36 2019

It took 10.63 minutes to complete.
There were 125 submissions added.
There were 9,677 comments added.

---
query = """
        "mike gravel" OR "kamala harris" OR "john hickenlooper" OR "jay inslee"
        """

Starting: Tue Jun 11 22:46:07 2019

100 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to dems_subs
Writing to dems_comms

Finished: Tue Jun 11 22:53:42 2019

It took 7.58 minutes to complete.
There were 126 submissions added.
There were 6,073 comments added.

### Combo searches


Starting: Thu Jun 13 05:58:54 2019

Searching on these terms:
 "beto" OR "bernie" OR "julian" OR "tulsi" OR "cory" OR "elizabeth" OR "kirsten" 
100 submissions completed
200 submissions completed
Retrieved submissions.
Retrieved comments.
Writing to supp_dem_subs
Writing to supp_dem_comms

Finished: Thu Jun 13 06:56:47 2019

It took 57.88 minutes to complete.
There were 249 submissions added.
There were 69,317 comments added.

---
Starting: Thu Jun 13 11:01:05 2019

Searching on these terms:

 "gray worm" OR "grayworm" OR "tormund" OR "giantsbane" OR "night king" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved comments.
Writing to supp_got_subs


Writing to supp_got_comms



Finished: Thu Jun 13 11:21:04 2019

---

Starting: Thu Jun 13 11:23:40 2019

Searching on these terms:

  " nk " OR "lyanna" OR "oberyn" OR "red viper" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved comments.
Writing to supp_got_subs


Writing to supp_got_comms



Finished: Thu Jun 13 11:37:07 2019


It took 13.44 minutes to complete.


There were 248 submissions added.


There were 11,653 comments added.

---
Starting: Thu Jun 13 13:21:45 2019

Searching on these terms:

 "deblasio" OR "blasio" 


Retrieved submissions.


Retrieved comments.
Writing to supp_dem_subs


Writing to supp_dem_comms



Finished: Thu Jun 13 13:23:44 2019


It took 1.97 minutes to complete.


There were 55 submissions added.


There were 848 comments added.


Starting: Thu Jun 13 13:23:44 2019

Searching on these terms:

 "nightking" OR "tirion" OR "brianne" OR "euron" OR "cercei" OR "jaime" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved comments.
Writing to supp_got_subs


Writing to supp_got_comms



Finished: Thu Jun 13 13:48:30 2019


It took 24.77 minutes to complete.


There were 245 submissions added.


There were 27,636 comments added.


Starting: Thu Jun 13 13:48:30 2019

Searching on these terms:

 "varis" OR "joff" OR "danny" OR "danaerys" OR "greyworm" OR "daenarys" 


100 submissions completed
200 submissions completed
Retrieved submissions.


Retrieved comments.
Writing to supp_got_subs


Writing to supp_got_comms



Finished: Thu Jun 13 14:05:46 2019


It took 17.28 minutes to complete.


There were 248 submissions added.


There were 13,301 comments added.


## Checking the obtained data <a name='bookmark' />

### f. Check that the table was created, or can be appended.

In [47]:
# Setup PSQL connection
conn = psql.connect(
    database=db,
    user=user,
    password=ps,
    host=host,
    port='5432'
)

In [48]:
# QUERY TO GET LIST OF TABLES
query = """
    SELECT * FROM pg_catalog.pg_tables
    WHERE schemaname = 'public';
"""

In [49]:
# Instantiate cursor
cur = conn.cursor()

In [1]:
### Set up query
query = """
    SELECT count(*) ct FROM got_comms;
"""

In [50]:
# Execute the query
cur.execute(query)

_**Run `rollback` if error occurs on query execution._

In [None]:
# conn.rollback()

In [51]:
# Check results
df_clone = pd.DataFrame(cur.fetchall())
df_clone.columns = [col.name for col in cur.description]

In [52]:
conn.commit()

In [53]:
df_clone

Unnamed: 0,schemaname,tablename,tableowner,tablespace,hasindexes,hasrules,hastriggers,rowsecurity
0,public,dems_subs,dsaf,,True,False,False,False
1,public,dems_comms,dsaf,,True,False,False,False
2,public,test_table,dsaf,,True,False,False,False
3,public,comms_sntmnt,dsaf,,True,False,False,False
4,public,comms_sntmnt_2,dsaf,,True,False,False,False
5,public,supp_got_subs,dsaf,,True,False,False,False
6,public,supp_got_comms,dsaf,,True,False,False,False
7,public,pitches_test,dsaf,,True,False,False,False
8,public,got_subs,dsaf,,True,False,False,False
9,public,got_comms,dsaf,,True,False,False,False


Looks good! Close the connection.

In [54]:
conn.close()