## Scenario:

With so many subreddits available, it's hard to make sure you're targeting your ads to the right audience. For a budding RPG developer, you may not have the manpower available to make sure your ads are placed in a subreddit related to video game RPGs or tabletops, but you want to save face and resources by only advertising to the correct market. What can you do?

Enter the **Brakefield Enterprises RPG Classifier&trade;**! A friendly, helpful model, trained on the baseline data of posts and comments in the [tabletop](https://www.reddit.com/r/rpg/) and [video game](https://www.reddit.com/r/rpg_gamers/) RPG subreddits, that will tell you whether the posts you're looking more closely resemble flashy video game or good ol' pen-and-paper role-playing games!

Try the **Brakefield Enterprises RPG Classifier&trade;** today! Don't *roll the dice* with your valuable advertising resources another minute!

In [1]:
import requests
import time
import pandas as pd
import json

## First-Class Data

To bring such a fine model to the market, the Brakefield Enterprises team had to start with great data from our sources at [Reddit](https://www.reddit.com/), using their API and pulling close to one-*thousand* posts!

In [2]:
# establish a list for posts,and our initial parameters
posts = []
after = None
headers = {'User-agent':'BBLab03'}

# iterating any more than this is a waste.
for i in range(50):
    # our API scrape uses Reddit's 'after' parameter
    if after == None:
        params = {}
    else:
        params = {'after':after}
    # starting with our pen and paper rpg subreddit
    url = 'https://www.reddit.com/r/rpg/new.json'
    res = requests.get(url, params = params, headers=headers)
    # A quick check that everything is coming through alright
    if res.status_code == 200:
        the_json = res.json()
        # add to posts
        posts.extend(the_json['data']['children'])
        after = the_json['data']['after']
    # Print any error codes that come up
    else:
        print(res.status_code)
        break
    # Print something to make sure progress happens while the program runs 
    if i % 10 == 0:
        headers['User-agent'] = 'BBLab03-'+str(i)
        print(str((i)*25)+" posts so far.")
    # make sure not to overload Reddit with requests!
    time.sleep(1)
# see how much
len(posts)

0 posts so far.
250 posts so far.
500 posts so far.
750 posts so far.
1000 posts so far.


1243

In [3]:
# I want to find post names, subreddit, title and body text for each post to set to a 
# dataframe of all the data. Since many posts are just titles with no body text, I will
# combine the two features into a single 'text' column. I'm also keeping each post's URL

# establish our empty list
list_of_lists = []
# iterate over the posts
for i in range(len(posts)):
    # fill in our desired fields
    sub = posts[i]['data']['subreddit']
    name = posts[i]['data']['name']
    title = posts[i]['data']['title']
    body = posts[i]['data']['selftext']
    suffix = posts[i]['data']['permalink']
    url = 'https://www.reddit.com'+ str(suffix)
    text = title + body
    row = [sub,name,text,url,None]
    # Here's where I catch duplicates, which come up when you try to get more than 1000
    # posts through Reddit's API
    if row not in list_of_lists:
        list_of_lists.append(row)

# Here I put my list into an easy-to-use DataFrame!    
dfpp = pd.DataFrame(data=list_of_lists,columns=['sub','name','text','url','comments'])

dfpp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 993 entries, 0 to 992
Data columns (total 5 columns):
sub         993 non-null object
name        993 non-null object
text        993 non-null object
url         993 non-null object
comments    0 non-null object
dtypes: object(5)
memory usage: 38.9+ KB


In [4]:
dfpp.head()

Unnamed: 0,sub,name,text,url,comments
0,rpg,t3_a7pwer,What are your favorite pre-made campaigns?I do...,https://www.reddit.com/r/rpg/comments/a7pwer/w...,
1,rpg,t3_a7pprr,50 Fantasy RPG Quest Ideas,https://www.reddit.com/r/rpg/comments/a7pprr/5...,
2,rpg,t3_a7pdaz,What system should I use for a fantasy army vs...,https://www.reddit.com/r/rpg/comments/a7pdaz/w...,
3,rpg,t3_a7pd5z,Physical Purchases 2018I just did an inventory...,https://www.reddit.com/r/rpg/comments/a7pd5z/p...,
4,rpg,t3_a7opa2,Roleplaying Intelligent Creatures in D&amp;D 5...,https://www.reddit.com/r/rpg/comments/a7opa2/r...,


### Now, that's a good-looking data frame!

"But wait," you may be thinking, "mightn't the Brakefield Enterprises RPG Classifier&trade; do better with EVEN MORE data?"

Well, my savvy friend, you'd be correct! That's why we at Brakefield Enterprises are also incorporating the text from the comments of every single post! That's as much as **50 times more text data**, without using sneaky thrid-party software workarounds!

In [6]:
# iterate over the dataframe
for row in range(len(dfpp)):
    # finish formatting the url address
    url = str(dfpp['url'][row]+'.json')
    res = requests.get(url,headers=headers)
    the_json = res.json()
    # empty list for depositing our comments
    comment_list = []
    # A quick check to skip over comment-less posts
    if the_json[1]['data']['children']:
        for comment in range(len(the_json[1]['data']['children'])):
            # the Reddit API doesn't give body text for more than 50 comments on a single
            # posts
            if comment <= 50:
                try:
                    comment_list.append(the_json[1]['data']['children'][comment]['data']['body'])
                except KeyError:
                    print('We got some invalid comments!')
                    print("row: ",row,'; comment: ',comment)
                    break
                dfpp['comments'][row] = comment_list
    # print to ensure the program runs, delay it enough that it doesn't clog up the series
    # of tubes
    if row % 10 == 0:
        print(str(row)+" rows down!")
    time.sleep(.6)

0 rows down!
10 rows down!
20 rows down!
30 rows down!
40 rows down!
50 rows down!
60 rows down!
70 rows down!
80 rows down!
90 rows down!
100 rows down!
110 rows down!
120 rows down!
130 rows down!
140 rows down!
150 rows down!
160 rows down!
170 rows down!
180 rows down!
190 rows down!
200 rows down!
210 rows down!
220 rows down!
230 rows down!
240 rows down!
250 rows down!
260 rows down!
270 rows down!
280 rows down!
290 rows down!
300 rows down!
310 rows down!
320 rows down!
330 rows down!
340 rows down!
350 rows down!
360 rows down!
370 rows down!
380 rows down!
390 rows down!
400 rows down!
410 rows down!
420 rows down!
430 rows down!
440 rows down!
450 rows down!
460 rows down!
470 rows down!
480 rows down!
490 rows down!
500 rows down!
510 rows down!
520 rows down!
530 rows down!
540 rows down!
550 rows down!
560 rows down!
570 rows down!
580 rows down!
590 rows down!
600 rows down!
610 rows down!
We got some invalid comments!
row:  614 ; comment:  21
620 rows down!
630 rows do

In [7]:
dfpp.head()

Unnamed: 0,sub,name,text,url,comments
0,rpg,t3_a7pwer,What are your favorite pre-made campaigns?I do...,https://www.reddit.com/r/rpg/comments/a7pwer/w...,[Operation Morpheus for Aftermath!. It is the...
1,rpg,t3_a7pprr,50 Fantasy RPG Quest Ideas,https://www.reddit.com/r/rpg/comments/a7pprr/5...,
2,rpg,t3_a7pdaz,What system should I use for a fantasy army vs...,https://www.reddit.com/r/rpg/comments/a7pdaz/w...,[Gurps. Also check out the novel The Doomfarer...
3,rpg,t3_a7pd5z,Physical Purchases 2018I just did an inventory...,https://www.reddit.com/r/rpg/comments/a7pd5z/p...,[It's only a problem if you don't play them al...
4,rpg,t3_a7opa2,Roleplaying Intelligent Creatures in D&amp;D 5...,https://www.reddit.com/r/rpg/comments/a7opa2/r...,


## Now that's an even *better*-looking Data Frame!

Let's go ahead and make one for our video game RPG data!

In [8]:
# All steps as above:

# get the posts:
posts = []
after = None
headers = {'User-agent':'BBLab03'}

for i in range(60):
    if after == None:
        params = {}
    else:
        params = {'after':after}
    url = 'https://www.reddit.com/r/rpg_gamers/new/.json'
    res = requests.get(url, params = params, headers=headers)
    if res.status_code == 200:
        the_json = res.json()
        posts.extend(the_json['data']['children'])
        after = the_json['data']['after']
    else:
        print(res.status_code)
        break
    if i % 10 == 0:
        headers['User-agent'] = 'BBLab03-'+str(i)
        print(str((i)*25)+" posts so far.")
    time.sleep(1)
    
# make the dataframe:
list_of_lists = []
for i in range(len(posts)):
    sub = posts[i]['data']['subreddit']
    name = posts[i]['data']['name']
    title = posts[i]['data']['title']
    body = posts[i]['data']['selftext']
    suffix = posts[i]['data']['permalink']
    url = 'https://www.reddit.com'+ str(suffix)
    text = title + body
    row = [sub,name,text,url,None]
    if row not in list_of_lists:
        list_of_lists.append(row)

# Here I put my list into an easy-to-use DataFrame!    
dfvg = pd.DataFrame(data=list_of_lists,columns=['sub','name','text','url','comments'])

# get the comments:
for row in range(len(dfvg)):
    url = str(dfvg['url'][row]+'.json')
    res = requests.get(url,headers=headers)
    the_json = res.json()
    comment_list = []
    if the_json[1]['data']['children']:
        for comment in range(len(the_json[1]['data']['children'])):
            if comment <= 50:
                try:
                    comment_list.append(the_json[1]['data']['children'][comment]['data']['body'])
                except KeyError:
                    print('We got some invalid comments!')
                    print("row: ",row,'; comment: ',comment)
                    break
                dfvg['comments'][row] = comment_list
    if row % 10 == 0:
        print(str(row)+" rows down!")
    time.sleep(.6)
    
dfvg.head()

0 posts so far.
250 posts so far.
500 posts so far.
750 posts so far.
1000 posts so far.
1250 posts so far.
0 rows down!
10 rows down!
20 rows down!
30 rows down!
40 rows down!
50 rows down!
60 rows down!
70 rows down!
80 rows down!
90 rows down!
100 rows down!
110 rows down!
120 rows down!
130 rows down!
140 rows down!
150 rows down!
160 rows down!
170 rows down!
180 rows down!
190 rows down!
200 rows down!
210 rows down!
220 rows down!
230 rows down!
240 rows down!
250 rows down!
260 rows down!
270 rows down!
280 rows down!
290 rows down!
300 rows down!
310 rows down!
320 rows down!
330 rows down!
340 rows down!
350 rows down!
360 rows down!
370 rows down!
380 rows down!
390 rows down!
400 rows down!
410 rows down!
420 rows down!
430 rows down!
440 rows down!
450 rows down!
460 rows down!
470 rows down!
480 rows down!
We got some invalid comments!
row:  482 ; comment:  40
490 rows down!
500 rows down!
510 rows down!
520 rows down!
530 rows down!
540 rows down!
550 rows down!
560 rows

Unnamed: 0,sub,name,text,url,comments
0,rpg_gamers,t3_a7q3sm,Fallout Inspired RPG Atom Released,https://www.reddit.com/r/rpg_gamers/comments/a...,
1,rpg_gamers,t3_a7olpq,Check out new game in the making !,https://www.reddit.com/r/rpg_gamers/comments/a...,
2,rpg_gamers,t3_a7o0qt,People who've played Ni No Kuni 2.https://yout...,https://www.reddit.com/r/rpg_gamers/comments/a...,
3,rpg_gamers,t3_a7mq9q,Open-World hardcore RPG 'Outward’ Trailer,https://www.reddit.com/r/rpg_gamers/comments/a...,"[Looks good, but I do hope there is a story to..."
4,rpg_gamers,t3_a7mmjs,The Philosophy of Planescape: Torment,https://www.reddit.com/r/rpg_gamers/comments/a...,"[Nice channel, thanks for sharing. ]"


#### Now, let's take one last look at our data before they get shipped:

In [9]:
dfpp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 993 entries, 0 to 992
Data columns (total 5 columns):
sub         993 non-null object
name        993 non-null object
text        993 non-null object
url         993 non-null object
comments    857 non-null object
dtypes: object(5)
memory usage: 38.9+ KB


In [12]:
dfpp.head()

Unnamed: 0,sub,name,text,url,comments
0,rpg,t3_a7pwer,What are your favorite pre-made campaigns?I do...,https://www.reddit.com/r/rpg/comments/a7pwer/w...,[Operation Morpheus for Aftermath!. It is the...
1,rpg,t3_a7pprr,50 Fantasy RPG Quest Ideas,https://www.reddit.com/r/rpg/comments/a7pprr/5...,
2,rpg,t3_a7pdaz,What system should I use for a fantasy army vs...,https://www.reddit.com/r/rpg/comments/a7pdaz/w...,[Gurps. Also check out the novel The Doomfarer...
3,rpg,t3_a7pd5z,Physical Purchases 2018I just did an inventory...,https://www.reddit.com/r/rpg/comments/a7pd5z/p...,[It's only a problem if you don't play them al...
4,rpg,t3_a7opa2,Roleplaying Intelligent Creatures in D&amp;D 5...,https://www.reddit.com/r/rpg/comments/a7opa2/r...,


In [10]:
dfvg.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
sub         1000 non-null object
name        1000 non-null object
text        1000 non-null object
url         1000 non-null object
comments    890 non-null object
dtypes: object(5)
memory usage: 39.1+ KB


In [13]:
dfvg.head()

Unnamed: 0,sub,name,text,url,comments
0,rpg_gamers,t3_a7q3sm,Fallout Inspired RPG Atom Released,https://www.reddit.com/r/rpg_gamers/comments/a...,
1,rpg_gamers,t3_a7olpq,Check out new game in the making !,https://www.reddit.com/r/rpg_gamers/comments/a...,
2,rpg_gamers,t3_a7o0qt,People who've played Ni No Kuni 2.https://yout...,https://www.reddit.com/r/rpg_gamers/comments/a...,
3,rpg_gamers,t3_a7mq9q,Open-World hardcore RPG 'Outward’ Trailer,https://www.reddit.com/r/rpg_gamers/comments/a...,"[Looks good, but I do hope there is a story to..."
4,rpg_gamers,t3_a7mmjs,The Philosophy of Planescape: Torment,https://www.reddit.com/r/rpg_gamers/comments/a...,"[Nice channel, thanks for sharing. ]"


## Grab Your Hard Hats!

With a strong foundation of solid Reddit post and comment text, we can start construction on the **Brakefield Enterprises RPG Classifier&trade;**!

But time is money, so rather than run those 15+ minute delayed loops every time we get to work, we'll instead package our grade A Data Frames into easy-to-use csv files!

In [11]:
# importing the csv files will add another index
dfpp.to_csv('./data/pen-and-paper.csv',index=False)
dfvg.to_csv('./data/video-game.csv',index=False)