# Reddit (PRAW) API

The dataset used in my project was downloaded from [Kaggle](https://www.kaggle.com/kaggle/random-acts-of-pizza). The post submissions in that dataset ranged from December 8, 2010 through September 29, 2013. 

In conjunction with PushShiftData API, I was able to grab the submission IDs from January 1, 2014 through August 13, 2020 and then pull the most updated link flair text using PRAW API. 

In [57]:
import praw 
import json
import credentials

In [61]:
reddit = praw.Reddit(client_id = credentials.client_id, client_secret = credentials.client_secret, user_agent = credentials.user_agent)

In [62]:
subred = reddit.subreddit('Random_Acts_Of_Pizza')

In [63]:
type(subred)

praw.models.reddit.subreddit.Subreddit

I pulled in a total of 58,541 entries that was extracted out from PushShiftData API. 

In [316]:
reddit_df = pd.read_csv('reddit_testing.csv').reset_index()

In [317]:
reddit_df.head()

Unnamed: 0,index,Post ID,Title,Publish Date,Flair
0,1u4mqs,Im Hungry on nye [Request],2013-12-31 19:25:50,"I'm a lonely loner tonight, with my kittens . ...",
1,1u4nh5,"[request] Just another alone on NYE, sick of b...",2013-12-31 19:35:31,So my sob story is about the same as any other...,
2,1u4nm8,"[request] Today is also my birthday, spending ...",2013-12-31 19:37:32,"I turned 21 today, and my mom left my dad on b...",
3,1u4ow8,[Request] Alone and sick on NYE.. could use a ...,2013-12-31 19:54:26,Feeling a little bummed out that I'm bringing ...,
4,1u4oyt,"[REQUEST] home, alone, broke, and hungry",2013-12-31 19:55:18,"Broke, $1 to my name, wishing I had a job so b...",


In [318]:
reddit_df.columns = ['submission_id', 'title', 'unix_timestamp_of_request_utc', 'request_text', 'received_request']

In [319]:
reddit_df.head()

Unnamed: 0,submission_id,title,unix_timestamp_of_request_utc,request_text,received_request
0,1u4mqs,Im Hungry on nye [Request],2013-12-31 19:25:50,"I'm a lonely loner tonight, with my kittens . ...",
1,1u4nh5,"[request] Just another alone on NYE, sick of b...",2013-12-31 19:35:31,So my sob story is about the same as any other...,
2,1u4nm8,"[request] Today is also my birthday, spending ...",2013-12-31 19:37:32,"I turned 21 today, and my mom left my dad on b...",
3,1u4ow8,[Request] Alone and sick on NYE.. could use a ...,2013-12-31 19:54:26,Feeling a little bummed out that I'm bringing ...,
4,1u4oyt,"[REQUEST] home, alone, broke, and hungry",2013-12-31 19:55:18,"Broke, $1 to my name, wishing I had a job so b...",


In [320]:
submission_id_list = reddit_df['submission_id']

In [338]:
reddit_df['received_request'].value_counts()

Fulfilled           1125
No Longer Needed     575
In Progress          112
Closed                 3
Request                2
Fulfilled              2
Expired                1
no longer needed       1
Thanks                 1
Contest                1
fulfilled              1
No longer needed       1
Name: received_request, dtype: int64

Here, I decided to drop all rows that had no values within the body of the request, since we wouldn't be able to apply NLP to them.

In [351]:
indexNames = reddit_df[reddit_df['request_text'].isnull()].index
# Delete these row indexes from dataFrame
reddit_df.drop(indexNames, inplace = True)

In [352]:
reddit_df

Unnamed: 0,submission_id,title,unix_timestamp_of_request_utc,request_text,received_request
0,1u4mqs,Im Hungry on nye [Request],2013-12-31 19:25:50,"I'm a lonely loner tonight, with my kittens . ...",
1,1u4nh5,"[request] Just another alone on NYE, sick of b...",2013-12-31 19:35:31,So my sob story is about the same as any other...,
2,1u4nm8,"[request] Today is also my birthday, spending ...",2013-12-31 19:37:32,"I turned 21 today, and my mom left my dad on b...",
3,1u4ow8,[Request] Alone and sick on NYE.. could use a ...,2013-12-31 19:54:26,Feeling a little bummed out that I'm bringing ...,
4,1u4oyt,"[REQUEST] home, alone, broke, and hungry",2013-12-31 19:55:18,"Broke, $1 to my name, wishing I had a job so b...",
...,...,...,...,...,...
58536,i959n8,"[Request] it’s been a long two weeks, my accou...",2020-08-13 14:24:11,"Hi guys, a couple things came up and my accoun...",
58537,i95d92,[request] I got invited to a friend and want t...,2020-08-13 14:29:38,I'm in the Netherlands and will pay it forward...,
58538,i98cwp,[request] Ohioan no money or food until the 17...,2020-08-13 17:09:00,[removed],
58539,i99qcp,"[REQUEST] Don’t get paid till the 14th, and my...",2020-08-13 18:24:41,$5 in my account,


Since there were a total of 52,748 entries, I had to break the submissions in a total of 3 parts.

In [353]:
submission_id_list = reddit_df['submission_id']

In [None]:
received_request_true = []
for each in submission_id_list:
    submission = reddit.submission(id = each)
    received_request_true.append(submission.link_flair_text)

In [357]:
len(received_request_true)

27482

In [368]:
received_request_true.count('Fulfilled')

678

In [370]:
second_slice_pizza = reddit_df.iloc[27482:]

In [372]:
submission_id_list2 = second_slice_pizza['submission_id']

In [None]:
received_request_true2 = []
for each in submission_id_list2:
    submission = reddit.submission(id = each)
    received_request_true2.append(submission.link_flair_text)

In [399]:
third slice_pizza = reddit_df.iloc[40332:]

In [400]:
submission_id_list3 = third_slice_pizza['submission_id']

In [401]:
received_request_true3 = []
for each in submission_id_list3:
    submission = reddit.submission(id = each)
    received_request_true3.append(submission.link_flair_text)

In [402]:
len(received_request_true3)

12416

In [403]:
total_joined_list = received_request_true + received_request_true2 + received_request_true3

In [404]:
len(total_joined_list)

52748

In [405]:
reddit_df['received_request_api'] = total_joined_list

In [408]:
reddit_df['received_request_api'].value_counts(dropna = False)

NaN                 47398
Fulfilled            3750
No Longer Needed     1400
In Progress           174
Closed                 15
Expired                 2
Request                 2
Fulfilled               2
no longer needed        1
Thanks                  1
Contest                 1
fulfilled               1
No longer needed        1
Name: received_request_api, dtype: int64

After combining all the grabbed flair texts into a single dataframe, I cleaned up by removing all requests' bodies that had 'removed' or 'deleted'.

In [426]:
clean_reddit_df = reddit_df[reddit_df['request_text'] != '[removed]']
clean_reddit_df = clean_reddit_df[clean_reddit_df['request_text'] != '[deleted]']

In [430]:
clean_reddit_df['received_request_api'] = clean_reddit_df['received_request_api'].str.lower().str.strip()

In [436]:
clean_reddit_df['received_request_api'].value_counts(dropna = False)

NaN                 29311
fulfilled            3673
no longer needed     1137
in progress           168
closed                 14
expired                 2
thanks                  1
request                 1
contest                 1
Name: received_request_api, dtype: int64

I decided to filter out only for requests that were either fulfilled or NaN, since fulfilled would mean that the user received pizza and NaN would mean the user did not.

In [1]:
final_clean = clean_reddit_df[(clean_reddit_df['received_request_api'] == 'fulfilled') | (clean_reddit_df['received_request_api'].isna())]

NameError: name 'clean_reddit_df' is not defined

I combined the title and request text together to deploy NLP more easily in my project.

In [445]:
# Let's go ahead and drop the columns I do not need.

final_clean['request_text'] = final_clean['title'] + ' ' + final_clean['request_text']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [448]:
final_clean = final_clean[['request_text', 'received_request_api', 'unix_timestamp_of_request_utc']]

In [450]:
final_clean.rename(columns = {'received_request_api': 'received_pizza'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [460]:
# Let's convert the 'received_pizza' column values from None to False, and fulfilled to True to match the Kaggle Dataset DF.

final_clean = final_clean.fillna('False')
final_clean['received_pizza'] = final_clean['received_pizza'].str.replace('fulfilled', 'True')

In [529]:
# Let's go ahead and also drop all the duplicates.

final_clean = final_clean.drop_duplicates()

In [530]:
final_clean['received_pizza'].value_counts()

False    29307
True      3673
Name: received_pizza, dtype: int64

In [66]:
final_clean

NameError: name 'final_clean' is not defined

In [538]:
#final_clean.to_csv('final_clean2.csv', index = False)

In [543]:
test_csv = pd.read_csv('final_clean2.csv')

# Kaggle Testing Dataset 

The Kaggle Testing Dataset had no flairs, so I decided to use the submissionids to pull the flairs using PRAW API.

In [10]:
import pandas as pd

In [46]:
kaggle_testing = pd.read_csv('kaggle_pizza_test.csv')

In [47]:
kaggle_testing

Unnamed: 0,request_text,submissionids,unix_timestamp_of_request_utc
0,[request] pregger gf 95 degree house and no fo...,i8iy4,2011-06-24 23:56:59
1,"[Request] Lost my job day after labour day, st...",1mfqi0,2013-09-15 15:45:23
2,[Request] Just moved to a new state(Waltham MA...,1jdgdj,2013-07-30 20:38:02
3,(Request) pizza for my kids please? Hi Reddit....,lclka,2011-10-14 22:53:41
4,"[Request] Two girls in between paychecks, we'v...",t2qt4,2012-05-02 03:52:38
...,...,...,...
1626,"[Request] MONTERREY, I'm hungry :( I hope we M...",iwbsf,2011-07-21 23:57:17
1627,[Request] Durham NH broke call center employee...,1foy3z,2013-06-05 01:35:40
1628,[Request] California USA Last of my money wen...,11wza2,2012-10-23 00:52:29
1629,requesting some pizza for tonight!! Hey everyo...,iihye,2011-07-06 22:33:02


In [48]:
submission_kaggle = kaggle_testing['submissionids']

In [49]:
received_request_kaggle = []
for each in submission_kaggle:
    submission = reddit.submission(id = each)
    received_request_kaggle.append(submission.link_flair_text)

In [50]:
kaggle_testing['received_pizza'] = received_request_kaggle

In [51]:
kaggle_testing

Unnamed: 0,request_text,submissionids,unix_timestamp_of_request_utc,received_pizza
0,[request] pregger gf 95 degree house and no fo...,i8iy4,2011-06-24 23:56:59,
1,"[Request] Lost my job day after labour day, st...",1mfqi0,2013-09-15 15:45:23,
2,[Request] Just moved to a new state(Waltham MA...,1jdgdj,2013-07-30 20:38:02,
3,(Request) pizza for my kids please? Hi Reddit....,lclka,2011-10-14 22:53:41,
4,"[Request] Two girls in between paychecks, we'v...",t2qt4,2012-05-02 03:52:38,
...,...,...,...,...
1626,"[Request] MONTERREY, I'm hungry :( I hope we M...",iwbsf,2011-07-21 23:57:17,
1627,[Request] Durham NH broke call center employee...,1foy3z,2013-06-05 01:35:40,
1628,[Request] California USA Last of my money wen...,11wza2,2012-10-23 00:52:29,
1629,requesting some pizza for tonight!! Hey everyo...,iihye,2011-07-06 22:33:02,


In [53]:
kaggle_testing = kaggle_testing.fillna('False')

In [54]:
kaggle_testing

Unnamed: 0,request_text,submissionids,unix_timestamp_of_request_utc,received_pizza
0,[request] pregger gf 95 degree house and no fo...,i8iy4,2011-06-24 23:56:59,False
1,"[Request] Lost my job day after labour day, st...",1mfqi0,2013-09-15 15:45:23,False
2,[Request] Just moved to a new state(Waltham MA...,1jdgdj,2013-07-30 20:38:02,False
3,(Request) pizza for my kids please? Hi Reddit....,lclka,2011-10-14 22:53:41,False
4,"[Request] Two girls in between paychecks, we'v...",t2qt4,2012-05-02 03:52:38,False
...,...,...,...,...
1626,"[Request] MONTERREY, I'm hungry :( I hope we M...",iwbsf,2011-07-21 23:57:17,False
1627,[Request] Durham NH broke call center employee...,1foy3z,2013-06-05 01:35:40,False
1628,[Request] California USA Last of my money wen...,11wza2,2012-10-23 00:52:29,False
1629,requesting some pizza for tonight!! Hey everyo...,iihye,2011-07-06 22:33:02,False


In [65]:
kaggle_testing['received_pizza'].value_counts()

False    1631
Name: received_pizza, dtype: int64

In [55]:
kaggle_testing.to_csv('kaggle_pizza_test_final.csv', index = False)

None of the requests in the testing dataset received any pizzas. 