## Scraping Reddit using PRAW
___

### Imports

In [1]:
import pandas as pd
import requests, json, time, datetime

from time import ctime
from datetime import datetime

import praw

In [2]:
creds_file = open('../assets/creds.json', 'r')

reddit_creds = json.loads(creds_file.read())

In [3]:
reddit_creds['id']

'hRkuKcI_IL0iUw'

Using reddit credentials to instantiate the reddit class from PRAW:

In [4]:
reddit = praw.Reddit(
    client_id     = reddit_creds['id'],
    client_secret = reddit_creds['secret'],
    username      = reddit_creds['user'],
    password      = reddit_creds['pass'],
    user_agent    = 'dmay'
)


reddit.read_only

False

In [5]:
str(ctime(time.time())).replace(':',".")

'Mon Oct 14 20.03.10 2019'

Creating function to process time.

In [15]:
def time_since(utc_stamp):
    return (time.time() - utc_stamp) / 60

### Extracting information I am interested in & Creating DataFrame

## Define function to extract Subreddit Data & Export

In [16]:
def subreddit_extractor(name):
    start = datetime.now()
    # instantiating reddit class from PRAW with credentials
    reddit = praw.Reddit(
    client_id     = reddit_creds['id'],
    client_secret = reddit_creds['secret'], 
    username      = reddit_creds['user'],
    password      = reddit_creds['pass'],
    user_agent    = 'dmay') 
    
    # Access subreddit
    subred = reddit.subreddit(name)
    new_subred = list(subred.new(limit=1500))
    
    # Accessing meta data per post and creating lists
    titles = [post.title for post in new_subred]
    print(f'titles complete. Time elapsed: {datetime.now() - start}')
    
    bodies = [post.selftext for post in new_subred]
    print(f'bodies complete. Time elapsed: {datetime.now() - start}')
    
    # creating a delay between comprehensions
    time.sleep(5)
    
    num_coms = [post.num_comments for post in new_subred]
    print(f'comments complete. Time elapsed: {datetime.now() - start}')
    
    upvote_ratio = [post.upvote_ratio for post in new_subred]
    print(f'upvote complete. Time elapsed: {datetime.now() - start}')
    
    # creating a delay between comprehensions
    time.sleep(5)
    
    urls = [post.url for post in new_subred]
    print(f'urls complete. Time elapsed: {datetime.now() - start}')
    
    time_col = [time_since(post.created_utc) for post in new_subred]
    print(f'time complete. Time elapsed: {datetime.now() - start}')
    
    # creating a delay between comprehensions
    time.sleep(5)
    
    sub = [post.subreddit_name_prefixed for post in new_subred]
    print(f'name complete. Time elapsed: {datetime.now() - start}')
    
    # Saving list to a dictionary
    df_dict = {
    'title': titles,
    'body': bodies,
    'num_comments': num_coms,
    'upvote_ratio': upvote_ratio,
    'url': urls,
    'elapsed_time': time_col,
    'subreddit': sub
    }

    df = pd.DataFrame(df_dict)
    
    date_time = str(ctime(time.time())).replace(':',".")
#     csv_path = "./csv_folder/" + name + "__" + date_time +".csv"
    csv_path = "../data/scrapes/" + name + "__" + date_time +".csv"
    df.to_csv(csv_path, index=False)
    
    stop = datetime.now()
    
    print(f'The number of posts returned for subreddit {name} is: {len(new_subred)}')
    print(f'The total time passed during scrape (hh:mm:ss:ms) is: {stop - start}')
    return df

### Fantasy Football Export

In [24]:
subreddit_extractor('fantasyfootball')

titles complete. Time elapsed: 0:00:25.121519
bodies complete. Time elapsed: 0:00:25.121910
comments complete. Time elapsed: 0:00:30.127634
upvote complete. Time elapsed: 0:23:57.745545
urls complete. Time elapsed: 0:24:02.746188
time complete. Time elapsed: 0:24:02.746986
name complete. Time elapsed: 0:24:07.750943
The number of posts returned for subreddit fantasyfootball is: 997
The total time passed during scrape (hh:mm:ss:ms) is: 0:24:07.806464


Unnamed: 0,title,body,num_comments,upvote_ratio,url,elapsed_time,subreddit
0,"Week 6 Overperformers: Robby Anderson, Hunter ...",,36,0.78,https://www.thefantasyfootballadvice.com/artic...,113.369934,r/fantasyfootball
1,Fantasy Football Week 7: Starts & Sits,,8,0.88,https://www.youtube.com/watch?v=bBNYQpk4ziQ,129.536600,r/fantasyfootball
2,2019 Accuracy Challenge Week 7,#####Accuracy Challenge Week 7\n\n\n######How ...,0,0.67,https://www.reddit.com/r/fantasyfootball/comme...,137.136600,r/fantasyfootball
3,"Based on the first 6 weeks, what does the firs...",Says me:\n\n1. CMC\n2. Dalvin\n3. Saquon\n4. C...,112,0.52,https://www.reddit.com/r/fantasyfootball/comme...,235.953267,r/fantasyfootball
4,Joe Mixon has run 20 pass routes over the last...,,52,0.88,https://twitter.com/GrahamBarfield/status/1184...,251.553267,r/fantasyfootball
...,...,...,...,...,...,...,...
992,With a bunch of players in concussion protocol...,https://www.sbnation.com/nfl/2016/9/18/1294092...,10,0.90,https://www.reddit.com/r/fantasyfootball/comme...,8061.086612,r/fantasyfootball
993,Jaguars CB Jalen Ramsey is back on the practic...,,10,0.92,https://twitter.com/espndirocco/status/1182686...,8063.469945,r/fantasyfootball
994,Did anyone else notice Tom Brady refusing to l...,I appreciated it as a Brady owner. He snuck ba...,502,0.94,https://www.reddit.com/r/fantasyfootball/comme...,8066.536612,r/fantasyfootball
995,Vernon Davis isn’t taking any reps. He’s gonna...,,56,0.88,https://twitter.com/craighoffman/status/118269...,8068.803278,r/fantasyfootball


### NFL Export

In [25]:
subreddit_extractor('nfl')

titles complete. Time elapsed: 0:00:16.287696
bodies complete. Time elapsed: 0:00:16.288039
comments complete. Time elapsed: 0:00:21.289539
upvote complete. Time elapsed: 0:21:04.836434
urls complete. Time elapsed: 0:21:09.842139
time complete. Time elapsed: 0:21:09.842913
name complete. Time elapsed: 0:21:14.848112
The number of posts returned for subreddit nfl is: 996
The total time passed during scrape (hh:mm:ss:ms) is: 0:21:14.867078


Unnamed: 0,title,body,num_comments,upvote_ratio,url,elapsed_time,subreddit
0,Scott Van Pelt just shared the best story I’ve...,He said he recently attended a full serviced d...,1,0.60,https://www.reddit.com/r/nfl/comments/dj2lyk/s...,32.077082,r/nfl
1,What would it take to get Hunter Henry,The chargers have had a disappointing start af...,21,0.17,https://www.reddit.com/r/nfl/comments/dj290x/w...,72.110416,r/nfl
2,Who announced MNF this week?,,5,0.14,https://www.reddit.com/r/nfl/comments/dj24ss/w...,85.277082,r/nfl
3,[Stroud] I’m told this won’t happen (an OJ How...,,12,1.00,https://twitter.com/NFLSTROUD/status/118463969...,92.743749,r/nfl
4,Where is Josh Rosen going next?,Josh Rosen is a great talent who has been abus...,26,0.36,https://www.reddit.com/r/nfl/comments/dj203i/w...,99.860416,r/nfl
...,...,...,...,...,...,...,...
991,[Cimini] LT Kelvin Beachum is out with an inju...,,3,1.00,https://twitter.com/RichCimini/status/11835082...,4862.060427,r/nfl
992,[Joe Fann] Russell Wilson confirmed that he le...,,522,0.97,https://twitter.com/Joe_Fann/status/1183497240...,4869.127094,r/nfl
993,After throwing for 116 yards on the opening dr...,That's an average of 17.4 passing yards per drive,44,0.97,https://www.reddit.com/r/nfl/comments/dhhehl/a...,4873.493760,r/nfl
994,What is the earliest in a half a team has burn...,,16,0.78,https://www.reddit.com/r/nfl/comments/dhhczc/w...,4877.227094,r/nfl
