# **Generic Reddit Scraper** 

We are going to scrape reddit today. Since we are trying to analyze the sentiments for finanicial data, we are scraping a specific subreddit in this example, you can use this for any other subrredit too. 

We will be using the Praw Library to parse Reddit. Please refer the documentation for more details: https://praw.readthedocs.io/en/latest/code_overview/ Alternatively, please have your own secrets.

You can find the instructions to create a secret on [Github Repo](https://github.com/JosephLai241/Universal-Reddit-Scraper) in 
**How to get Reddit API Credentials.** 
Lets start the fun!


In [161]:
!pip install praw



Necessary Imports

In [0]:
import argparse
import sys
import praw
from prawcore import NotFound, PrawcoreException
import csv
import json
import datetime as dt
import pandas as pd

Initialize your reddit secret developer keys here

In [0]:
c_id = ""               # Personal Use Script (14 char)
c_secret = ""           # Secret key (27 char)
u_a = ""               # App name
usrnm = ""      # Reddit username
passwd = ""     # Reddit login password

Lets login to reddit using the secret token key:

In [0]:
reddit = praw.Reddit(client_id = c_id, 
                         client_secret = c_secret, 
                         user_agent = u_a, 
                         username = usrnm, 
                         password = passwd)

We will be scraping the wallstreetbets for financial posts and comments, you can simply replace any subreddit here and the code should work for you.

In [0]:
yoursubbreddit='wallstreetbets'
subbreddit=reddit.subreddit('wallstreetbets')

So the way it works with reddit is, There are multiple subrredits, consider like forums, for us its r/wallstreetbets. Inside each subrredit, there are posts made by users and each post has nested comments.

In summary the structure:

Reddit -> Subreddits (r/wallstreetbets) -> posts -> comments -> comments/MutiComments

Okay, so now we will fetch some top posts in the subreddit and parse all the nested comments and populate our dataframe (commentData)

In [0]:
commentsColumns=[
  'total_awards_received ',
 'approved_at_utc ',
 'author_flair_template_id ',
 'likes ',
 'user_reports ',
 'saved ',
 'id ',
 'banned_at_utc ',
 'mod_reason_title ',
 'gilded ',
 'archived ',
 'no_follow ',
 'author ',
 'score ',
 'author_fullname ',
 'report_reasons ',
 'approved_by ',
 'all_awardings ',
 'subreddit_id ',
 'body ',
 'edited ',
 'author_flair_css_class ',
 'is_submitter ',
 'downs ',
 'author_flair_richtext ',
 'subreddit ',
 'author_flair_text_color ',
 'score_hidden ',
 'permalink ',
 'num_reports ',
 'locked ',
 'name ',
 'created ',
 'author_flair_text ',
 'collapsed ',
 'created_utc ',
 'subreddit_name_prefixed ',
 'controversiality ']

##removing spaces 
commentColumns= [c.strip() for c in commentsColumns]

In [172]:
print(commentColumns)

['total_awards_received', 'approved_at_utc', 'author_flair_template_id', 'likes', 'user_reports', 'saved', 'id', 'banned_at_utc', 'mod_reason_title', 'gilded', 'archived', 'no_follow', 'author', 'score', 'author_fullname', 'report_reasons', 'approved_by', 'all_awardings', 'subreddit_id', 'body', 'edited', 'author_flair_css_class', 'is_submitter', 'downs', 'author_flair_richtext', 'subreddit', 'author_flair_text_color', 'score_hidden', 'permalink', 'num_reports', 'locked', 'name', 'created', 'author_flair_text', 'collapsed', 'created_utc', 'subreddit_name_prefixed', 'controversiality']


Once we have the subrredit object we can get hold of the top/hot posts and limit by a limit parameters across time horizon of day/week/monthly/yearly

After that, we fetch all the comments and populate the dataframe with all columns from the commment Object to dataframe. 

In [167]:
topPosts=subbreddit.top(limit=1,time_filter='day')
totalPosts=0
totalComments=0
badData=0
commentsData=pd.DataFrame(columns=commentColumns)
for post in topPosts:
  totalPosts=totalPosts+1
  print("POST TITTLE")
  print(post.title)
  post.comments.replace_more(limit=None)
  title=post.title
  for comment in post.comments.list():
    try:
      if(type(comment) == praw.models.reddit.comment.Comment):
          totalComments=totalComments+1
          tempList=[getattr(comment,j.strip()) for j in commentColumns] +[title]
          commentsData=commentsData.append(pd.DataFrame([tempList], columns = commentColumns+ ['Title']))
    except:
      badData=badData+1
      
        
print("Total Posts parsed", totalPosts)
print("TotalComments Parse across Posts", totalComments)
print("Deserialization Failed for some Object. Dirty Data etc", badData)     

POST TITTLE
German state finance minister committed suicide.


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Total Posts parsed 1
TotalComments Parse across Posts 1423
Deserialization Failed for some Object. Dirty Data etc 16


In [0]:
commentsData.to_csv('DailyTopPostsComments.csv')

In [169]:
commentsData

Unnamed: 0,Title,all_awardings,approved_at_utc,approved_by,archived,author,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_fullname,banned_at_utc,body,collapsed,controversiality,created,created_utc,downs,edited,gilded,id,is_submitter,likes,locked,mod_reason_title,name,no_follow,num_reports,permalink,report_reasons,saved,score,score_hidden,subreddit,subreddit_id,subreddit_name_prefixed,total_awards_received,user_reports
0,German state finance minister committed suicide.,[],,,False,Devianted90,,[],,,,t2_4nmq04fq,,He was a STATE finance minister. The state of ...,False,0,1.585521e+09,1.585492e+09,0,False,0,fltzog6,False,,False,,t1_fltzog6,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,3166,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,changechange1,,[],,,,t2_14c9i2j,,Tragic. Left behind a wife and 2 kids. Rip bro x,False,0,1.585513e+09,1.585484e+09,0,False,0,fltqspo,False,,False,,t1_fltqspo,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,4573,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,fufm,,[],,,,t2_xmdqk,,Jesus...hope this isn’t the start of a trend,False,0,1.585513e+09,1.585484e+09,0,False,0,fltr3vf,False,,False,,t1_fltr3vf,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,2691,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,TheSocDoc,,[],,,,t2_1493wl,,Is it really that bad? Holy fuck,False,0,1.585512e+09,1.585484e+09,0,False,0,fltqdgz,False,,False,,t1_fltqdgz,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1793,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,HolidayPotential8,,[],,,,t2_4mhh1mzw,,Sad. I bet some of us feel this way when we lo...,False,0,1.585513e+09,1.585484e+09,0,False,0,fltr1dh,False,,False,,t1_fltr1dh,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,544,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,German state finance minister committed suicide.,[],,,False,armandhammr,,"[{'e': 'text', 't': 'weak and alone'}]",,weak and alone,dark,t2_5dwv4mr,,"Cool story bro, me dumb. Me not know what anyt...",False,0,1.585541e+09,1.585512e+09,0,False,0,fluxm8c,False,,False,,t1_fluxm8c,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,bonemaster5000,,[],,,,t2_4w9w5gkz,,Says the guy who has literally not given a sin...,False,0,1.585536e+09,1.585507e+09,0,False,0,flup3x0,False,,False,,t1_flup3x0,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,BoldIntrepid,,[],,,,t2_bzd8w,,F,False,0,1.585524e+09,1.585496e+09,0,False,0,flu5gk3,False,,False,,t1_flu5gk3,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,2,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,SwiFT808-,,[],,,,t2_2u3s8god,,I cited two sources one being literal governme...,False,0,1.585536e+09,1.585508e+09,0,False,0,flupz3o,False,,False,,t1_flupz3o,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]


Thats it, now we have our reddit comments data in the dataframe format.

# Generic Parser, Plug and Play Code:

In [0]:
class RedditScrapper():
    #c_id # Personal Use Script (14 char)
    #c_secret = ""           # Secret key (27 char)
    #u_a = ""               # App name
    #usrnm = ""      # Reddit username
    #passwd = ""     # Reddit login password
  def __init__(self,c_id,c_secret,u_a,usrnm,passwd):
    self.reddit = praw.Reddit(client_id = c_id, 
                         client_secret = c_secret, 
                         user_agent = u_a, 
                         username = usrnm, 
                         password = passwd)
    self.commentColumns=['total_awards_received', 'approved_at_utc', 'author_flair_template_id', 'likes', 
                         'user_reports', 'saved', 'id', 'banned_at_utc', 'mod_reason_title', 'gilded', 'archived', 
                         'no_follow', 'author', 'score', 'author_fullname', 'report_reasons', 'approved_by', 
                         'all_awardings', 'subreddit_id', 'body', 'edited', 'author_flair_css_class', 
                         'is_submitter', 'downs', 'author_flair_richtext', 'subreddit', 
                         'author_flair_text_color', 'score_hidden', 'permalink', 'num_reports', 
                         'locked', 'name', 'created', 'author_flair_text', 'collapsed', 'created_utc', 
                         'subreddit_name_prefixed', 'controversiality']
  
  """ Subrredit Name: The Name of subreddit to parse
      Topk: TopK posts in the subrredit to parse
      timeHorizon: topK in what timehorizon, can be month,day,year

      Returns you the flattern nested comments from the topK Posts in
      a dataframe for further processing
  """
  def getTopPostsForSubReddit(self,subredditName,topK,timeHorizon):
    subbreddit=reddit.subreddit(subredditName)
    topPosts=subbreddit.top(limit=topK,time_filter=timeHorizon)
    totalPosts=0
    totalComments=0
    badData=0
    commentsData=pd.DataFrame(columns=commentColumns)
    for post in topPosts:
      totalPosts=totalPosts+1
      print("POST TITTLE")
      print(post.title)
      post.comments.replace_more(limit=None)
      title=post.title
      for comment in post.comments.list():
        try:
          if(type(comment) == praw.models.reddit.comment.Comment):
              totalComments=totalComments+1
              tempList=[getattr(comment,j.strip()) for j in commentColumns] +[title]
              commentsData=commentsData.append(pd.DataFrame([tempList], columns = commentColumns+ ['Title']))
        except:
          badData=badData+1
          
            
    print("Total Posts parsed", totalPosts)
    print("TotalComments Parse across Posts", totalComments)
    print("Deserialization Failed for some Object. Dirty Data etc", badData) 
    return commentsData  


In [0]:
c_id = ""               # Personal Use Script (14 char)
c_secret = ""           # Secret key (27 char)
u_a = ""               # App name
usrnm = ""      # Reddit username
passwd = ""     # Reddit login password

scraper=RedditScrapper(c_id,c_secret,u_a,usrnm,passwd)

In [187]:
data=scraper.getTopPostsForSubReddit('wallstreetbets',1,'day')

POST TITTLE
German state finance minister committed suicide.


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Total Posts parsed 1
TotalComments Parse across Posts 1424
Deserialization Failed for some Object. Dirty Data etc 16


In [192]:
commentsData

Unnamed: 0,Title,all_awardings,approved_at_utc,approved_by,archived,author,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_fullname,banned_at_utc,body,collapsed,controversiality,created,created_utc,downs,edited,gilded,id,is_submitter,likes,locked,mod_reason_title,name,no_follow,num_reports,permalink,report_reasons,saved,score,score_hidden,subreddit,subreddit_id,subreddit_name_prefixed,total_awards_received,user_reports
0,German state finance minister committed suicide.,[],,,False,Devianted90,,[],,,,t2_4nmq04fq,,He was a STATE finance minister. The state of ...,False,0,1.585521e+09,1.585492e+09,0,False,0,fltzog6,False,,False,,t1_fltzog6,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,3166,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,changechange1,,[],,,,t2_14c9i2j,,Tragic. Left behind a wife and 2 kids. Rip bro x,False,0,1.585513e+09,1.585484e+09,0,False,0,fltqspo,False,,False,,t1_fltqspo,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,4573,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,fufm,,[],,,,t2_xmdqk,,Jesus...hope this isn’t the start of a trend,False,0,1.585513e+09,1.585484e+09,0,False,0,fltr3vf,False,,False,,t1_fltr3vf,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,2691,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,TheSocDoc,,[],,,,t2_1493wl,,Is it really that bad? Holy fuck,False,0,1.585512e+09,1.585484e+09,0,False,0,fltqdgz,False,,False,,t1_fltqdgz,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1793,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,HolidayPotential8,,[],,,,t2_4mhh1mzw,,Sad. I bet some of us feel this way when we lo...,False,0,1.585513e+09,1.585484e+09,0,False,0,fltr1dh,False,,False,,t1_fltr1dh,False,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,544,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,German state finance minister committed suicide.,[],,,False,armandhammr,,"[{'e': 'text', 't': 'weak and alone'}]",,weak and alone,dark,t2_5dwv4mr,,"Cool story bro, me dumb. Me not know what anyt...",False,0,1.585541e+09,1.585512e+09,0,False,0,fluxm8c,False,,False,,t1_fluxm8c,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,bonemaster5000,,[],,,,t2_4w9w5gkz,,Says the guy who has literally not given a sin...,False,0,1.585536e+09,1.585507e+09,0,False,0,flup3x0,False,,False,,t1_flup3x0,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,BoldIntrepid,,[],,,,t2_bzd8w,,F,False,0,1.585524e+09,1.585496e+09,0,False,0,flu5gk3,False,,False,,t1_flu5gk3,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,2,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
0,German state finance minister committed suicide.,[],,,False,SwiFT808-,,[],,,,t2_2u3s8god,,I cited two sources one being literal governme...,False,0,1.585536e+09,1.585508e+09,0,False,0,flupz3o,False,,False,,t1_flupz3o,True,,/r/wallstreetbets/comments/fr4wd7/german_state...,,False,1,False,wallstreetbets,t5_2th52,r/wallstreetbets,0,[]
