# Getting the Data
- [Create Reddit Instance](#Creating-a-read-only-Reddit-Instance-with-PRAW)
- [Scraping r/WritingPrompts](#Scraping-r/WritingPrompts-with-PRAW)
- [Scraping r/ShowerThoughts](#Scraping-r/ShowerThoughts-with-PRAW)
- [Create a Combined Dataframe](#Combining-the-two-dataframes-into-a-single-dataframe)

In [31]:
import requests
import time
import pandas as pd

## Creating a read only Reddit Instance with PRAW

In [1]:
import praw

In [4]:
reddit = praw.Reddit(client_id='5pWFYyztFmLjMA',
                     client_secret='KSgTw2thW4D1p74t2pOYIKYa5AQ',
                     user_agent='bill')

## Scraping r/WritingPrompts with PRAW

In [30]:
# creating a list of dictionaries for submissions to r/WritingPrompts
# including title and subreddit columns
wp_entries = []
for submission in reddit.subreddit('writingprompts').new(limit=1000):
    temp = {}
    temp['title'] = submission.title
    temp['subreddit'] = submission.subreddit
    wp_entries.append(temp)
len(wp_entries)

998

In [37]:
# saving the list of dictionaries to a pandas dataframe
# and displaying the first 5 rows
wp_df = pd.DataFrame(wp_entries)
wp_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,[WP] It's been over 800 days since you landed ...
1,WritingPrompts,[WP] Humans are the only species known to have...
2,WritingPrompts,"[WP] He has been blind all his life. Now, he i..."
3,WritingPrompts,[WP] You’re dying...and dying. And then you di...
4,WritingPrompts,[WP] Humanity has found a way to circumvent th...


In [40]:
# removing '[WP]' from the titles of the entries,
# since it appears in almost every single one
# and such an obvious token seems contrary to the spirit of the project
wp_df['title'] = [i.replace('[WP]', '') for i in wp_df['title']]
wp_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,It's been over 800 days since you landed on P...
1,WritingPrompts,Humans are the only species known to have dom...
2,WritingPrompts,"He has been blind all his life. Now, he is th..."
3,WritingPrompts,You’re dying...and dying. And then you die. B...
4,WritingPrompts,Humanity has found a way to circumvent the ne...


In [55]:
# saving these 998 values to a csv
wp_df.to_csv('./datasets/wp_df.csv', index=False)

## Scraping r/ShowerThoughts with PRAW

In [41]:
# creating a list of dictionaries for submissions to r/ShowerThoughts
# including title and subreddit columns
st_entries = []
for submission in reddit.subreddit('showerthoughts').new(limit=1000):
    temp = {}
    temp['title'] = submission.title
    temp['subreddit'] = submission.subreddit
    st_entries.append(temp)
len(st_entries)

998

In [44]:
# saving the list of dictionaries to a pandas dataframe
# and displaying the first 5 rows
st_df = pd.DataFrame(st_entries)
st_df.head()

Unnamed: 0,subreddit,title
0,Showerthoughts,There is no reason for the alphabet to be in t...
1,Showerthoughts,The Quesadilla is just the cousin of the Grill...
2,Showerthoughts,by touching a door you’re saying that perhaps ...
3,Showerthoughts,James Bond surely won't get lucky every time
4,Showerthoughts,A random day in the year is actually you’re pl...


In [56]:
# saving these 998 values to a csv
st_df.to_csv('./datasets/st_df.csv', index=False)

## Combining the two dataframes into a single dataframe

In [51]:
# combine the dataframes with concat
combined_df = pd.concat([wp_df, st_df], ignore_index=True)

In [52]:
# check out the head
combined_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,It's been over 800 days since you landed on P...
1,WritingPrompts,Humans are the only species known to have dom...
2,WritingPrompts,"He has been blind all his life. Now, he is th..."
3,WritingPrompts,You’re dying...and dying. And then you die. B...
4,WritingPrompts,Humanity has found a way to circumvent the ne...


In [50]:
# check out the tail
combined_df.tail()

Unnamed: 0,subreddit,title
1991,Showerthoughts,The fact that we have collectively decided to ...
1992,Showerthoughts,"""Leaving the sinking ship"" fit metaphoricly pe..."
1993,Showerthoughts,"If you could lift objects with your mind, you ..."
1994,Showerthoughts,In the one episode of Phineas and Ferb where F...
1995,Showerthoughts,If Final Fantasy ever reaches the 30th main ga...


In [53]:
# check that we have an equal number of observations
# from each subreddit
combined_df.subreddit.value_counts()

Showerthoughts    998
WritingPrompts    998
Name: subreddit, dtype: int64

In [57]:
# saving the combined dataframe to csv
combined_df.to_csv('./datasets/combined_df.csv', index=False)