In [1]:
import praw
import pandas as pd
from datetime import datetime 
import configparser
import os
import time

Read reddit API credentials from config file

In [2]:
parser = configparser.ConfigParser()
parser.read('reddit_config.cfg')
clid = parser.get('my_api','client_id')
clsec = parser.get('my_api','client_secret')
usag = parser.get('my_api','user_agent')

Establish connection to Reddit API with PRAW

In [3]:
r = praw.Reddit(client_id=clid, client_secret=clsec, user_agent=usag)

Want to find Daily Discussion subreddit from 4 days at hi/lo fear and greed index per https://coingecko.live/en/fear-and-greed-index

Date, Score, Subreddit sub, num comments

Greedy
* 10/21/2021, 84, qcekzf, 14000
* 08/23/2021, 79, p9o1td, 59000

Fear
* 12/06/2021, 16, r9t7ks, 17000
* 01/08/2022, 10, rylwuh, 12700

In [4]:
sublist = ['qcekzf','p9o1td','r9t7ks','rylwuh']
# sublist = ['rylwuh']
lim = 100

Collect comments for each submission in the list 'sublist', save each comment into a dataframe for analysis later.

Cell takes a long time to run beacuse of PRAWs 'replace_more' function. For testing, suggest reducing the 'lim' setting above, which gives fewer comments but makes fewer calls to 'replace_more'

In [5]:
df_collect = pd.DataFrame()

for sub in sublist :
    tic = time.perf_counter()    
    comment_list = []
    dt_list = []

    submission = r.submission(id=sub)
    submission.comments.replace_more(limit=lim)
    for comment in submission.comments.list():
        comment_list.append(comment.body)        
        dt_list.append(str(datetime.fromtimestamp(comment.created_utc)))        

    df = pd.DataFrame(list(zip(comment_list, dt_list)), columns=['comment','dt'])      
    df['sub'] = sub
    toc = time.perf_counter()
    print(f"Extract for sub {sub} took {toc - tic:0.1f} seconds")
    
    df_collect = df_collect.append(df)

Extract for sub qcekzf took 102.3 seconds
Extract for sub p9o1td took 519.2 seconds
Extract for sub r9t7ks took 109.8 seconds
Extract for sub rylwuh took 88.6 seconds


Save dataframe off to a pickle for later use.

In [6]:
df_collect.to_pickle('data/reddit_extract.pkl')

In [7]:
df_collect['sub'].value_counts()

p9o1td    10375
r9t7ks     7819
qcekzf     7605
rylwuh     6245
Name: sub, dtype: int64