# Getting the Data
- [Create Reddit Instance](#Creating-a-read-only-Reddit-Instance-with-PRAW)
- [Scraping r/WritingPrompts](#Scraping-r/WritingPrompts-with-PRAW)
- [Scraping r/ShowerThoughts](#Scraping-r/ShowerThoughts-with-PRAW)
- [Create a Combined Dataframe](#Combining-the-two-dataframes-into-a-single-dataframe)

## Creating a read only Reddit Instance with PRAW

In [2]:
import pandas as pd
import praw

In [3]:
reddit = praw.Reddit(client_id='5pWFYyztFmLjMA',
                     client_secret='KSgTw2thW4D1p74t2pOYIKYa5AQ',
                     user_agent='bill')

In [2]:
import pandas as pd
import praw

In [3]:
reddit = praw.Reddit(client_id='ENTER CLIENT ID HERE',
                     client_secret='ENTER CLIENT SECRET HERE',
                     user_agent='ENTER ANYTHING HERE')

## Scraping r/WritingPrompts with PRAW

In [4]:
# creating a list of dictionaries for submissions to r/WritingPrompts
# including title and subreddit columns
wp_entries = []
for submission in reddit.subreddit('writingprompts').new(limit=None):
    temp = {}
    temp['title'] = submission.title
    temp['subreddit'] = submission.subreddit
    wp_entries.append(temp)
len(wp_entries)

994

In [5]:
# saving the list of dictionaries to a pandas dataframe
# and displaying the first 5 rows
wp_df = pd.DataFrame(wp_entries)
wp_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,[WP] You are a C average student in high schoo...
1,WritingPrompts,[PI]A solar flare has mutated human blood to h...
2,WritingPrompts,[WP] You are responsible for looking after a r...
3,WritingPrompts,[WP] Doctors have found I way to measure happi...
4,WritingPrompts,[WP] You are a noble in a powerful kingdom. Th...


In [6]:
# removing '[WP]' from the titles of the entries,
# since it appears in almost every single one
# and such an obvious token seems contrary to the spirit of the project
wp_df['title'] = [i.replace('[WP]', '') for i in wp_df['title']]
wp_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,"You are a C average student in high school, b..."
1,WritingPrompts,[PI]A solar flare has mutated human blood to h...
2,WritingPrompts,You are responsible for looking after a retir...
3,WritingPrompts,Doctors have found I way to measure happiness...
4,WritingPrompts,You are a noble in a powerful kingdom. The ki...


In [7]:
# saving these 998 values to a csv
wp_df.to_csv('./datasets/wp_df.csv', index=False)

## Scraping r/ShowerThoughts with PRAW

In [8]:
# creating a list of dictionaries for submissions to r/ShowerThoughts
# including title and subreddit columns
st_entries = []
for submission in reddit.subreddit('showerthoughts').new(limit=None):
    temp = {}
    temp['title'] = submission.title
    temp['subreddit'] = submission.subreddit
    st_entries.append(temp)
len(st_entries)

998

In [9]:
# saving the list of dictionaries to a pandas dataframe
# and displaying the first 5 rows
st_df = pd.DataFrame(st_entries)
st_df.head()

Unnamed: 0,subreddit,title
0,Showerthoughts,Hocking a loogie is like picking your nose and...
1,Showerthoughts,Literally any song is a road trip song.
2,Showerthoughts,Comic books in the Harry Potter universe are p...
3,Showerthoughts,If we’re all living in a simulation Blind and/...
4,Showerthoughts,"If you want a girl to have sex with you , you ..."


In [10]:
# saving these 998 values to a csv
st_df.to_csv('./datasets/st_df.csv', index=False)

## Combining the two dataframes into a single dataframe

In [11]:
# combine the dataframes with concat
combined_df = pd.concat([wp_df, st_df], ignore_index=True)

In [12]:
# check out the head
combined_df.head()

Unnamed: 0,subreddit,title
0,WritingPrompts,"You are a C average student in high school, b..."
1,WritingPrompts,[PI]A solar flare has mutated human blood to h...
2,WritingPrompts,You are responsible for looking after a retir...
3,WritingPrompts,Doctors have found I way to measure happiness...
4,WritingPrompts,You are a noble in a powerful kingdom. The ki...


In [13]:
# check out the tail
combined_df.tail()

Unnamed: 0,subreddit,title
1987,Showerthoughts,It's hard to make something unique without eit...
1988,Showerthoughts,The odds of flipping a cling probably aren't e...
1989,Showerthoughts,Clothes in China probably have “Made Around th...
1990,Showerthoughts,You could theoretically make diamonds from tre...
1991,Showerthoughts,A whole new universe unlocks after you figure ...


In [14]:
# check that we have an equal number of observations
# from each subreddit
combined_df.subreddit.value_counts()

Showerthoughts    998
WritingPrompts    994
Name: subreddit, dtype: int64

In [15]:
# saving the combined dataframe to csv
combined_df.to_csv('./datasets/combined_df.csv', index=False)