# Using the Pushshift API

In this notebook, we'll be following the instructions from [this video](https://www.youtube.com/watch?v=AcrjEWsMi_E&feature=youtu.be) on how to use the [Pushshift API](https://github.com/pushshift/api) for getting Reddit posts. The hope is that by better understanding how to use the API, we can wrap most of its functionality up into a function that will grab any arbitrary amount of posts from the two subreddits of our choosing.

**NOTE:** The content of this notebook is unrelated to the rest of the project and can probably be ignored by most.

## Importing libraries

In [1]:
import pandas as pd
import requests

## Trying it out

Following Riley's example, we'll be grabbing data from the [/r/boardgames](https://www.reddit.com/r/boardgames) subreddit.

In [2]:
url = 'https://api.pushshift.io/reddit/search/submission'

In [3]:
params = {
    'subreddit': 'boardgames',
    'size': 500,
}

In [4]:
res = requests.get(url, params)

In [5]:
res.status_code

200

In [6]:
data = res.json()

In [7]:
posts = data['data']

In [8]:
boardgames = pd.DataFrame(posts)

In [9]:
boardgames[['subreddit', 'title', 'selftext']]

Unnamed: 0,subreddit,title,selftext
0,boardgames,Has anyone heard back from Asmodee about fixin...,"The last update broke the app, the touch is ho..."
1,boardgames,"Dude, what is UP with Battle Wizards Murdershr...",I got this pack after loving playing the first...
2,boardgames,Challenge accepted (OC),
3,boardgames,Which Vast game do people prefer?,Crystal caverns vs mysterious manor
4,boardgames,Mechanics never seen before ? Maztec May 1st o...,
...,...,...,...
495,boardgames,Tremer no Adobe Premiere,
496,boardgames,Refreshing to play a game with basic components?,I'm super excited because after a month or two...
497,boardgames,Lord of the Rings Battlefields Expansion,Hoping someone can point in the right directio...
498,boardgames,Codenames in Excel - feel free to download and...,Someone in /r/excel recommended I post over he...


## Getting more posts

Using the Pushshift API, we can only grab 500 posts at a time. So if we want to work with more data in our NLP model, we'll have to be a little clever about how we use Pushshift. There is a `before` parameter which will allow us to specify that we want posts from before a given date, so we'll grab the 500 posts from before the oldest post we already have. How? Here's what we know:
- The elements of the `posts` list are dictionaries which each have the key `created_utc`
- The value of this key is a timestamp representing the time the post was created in Coordinated Universal Time
- If we can get the minimum timestamp, we'll effectively have our oldest post

In [10]:
# Grab timestamp of the oldest post
oldest_post = min([post['created_utc'] for post in posts])

In [11]:
params = {
    'subreddit': 'boardgames',
    'size': 500,
    'before': oldest_post
}

In [12]:
res = requests.get(url, params)

In [13]:
res.status_code

200

In [14]:
data = res.json()

In [15]:
posts = data['data']

In [16]:
more_boardgames = pd.DataFrame(posts)

In [17]:
more_boardgames[['subreddit', 'title', 'selftext']]

Unnamed: 0,subreddit,title,selftext
0,boardgames,Does this detachment work?,Once one of the nurgle deamons die can i just ...
1,boardgames,Online Cards Against Humanity with webcam and ...,
2,boardgames,Family Game Night During Lockdown.,Board games are bringing the family back toget...
3,boardgames,Long distance “buzzer” app/solutions?,My extended family are trying to play “jeopard...
4,boardgames,Problem visiting Rio Grande's website,For some reason it is redirected to some excel...
...,...,...,...
495,boardgames,"Pandemic: In the Lab, Lab Challenge. Can we pl...",I just excitingly got Pandemic In The Lab expa...
496,boardgames,Splendor App: Trading Posts Release Date?,[deleted]
497,boardgames,Rule Bending ?,Horseopoly - Rules state if a player lands on ...
498,boardgames,Rule Bending ?,[deleted]


## Concatenating the dataframes

So we now have two separate */r/boardgames* dataframes with 500 posts each. To have a single, neat dataframe to work with, we simply need to concatenate the two dataframes we already have.

In [18]:
combined = pd.concat([boardgames, more_boardgames], ignore_index=True)

In [19]:
combined[['subreddit', 'title', 'selftext']]

Unnamed: 0,subreddit,title,selftext
0,boardgames,Has anyone heard back from Asmodee about fixin...,"The last update broke the app, the touch is ho..."
1,boardgames,"Dude, what is UP with Battle Wizards Murdershr...",I got this pack after loving playing the first...
2,boardgames,Challenge accepted (OC),
3,boardgames,Which Vast game do people prefer?,Crystal caverns vs mysterious manor
4,boardgames,Mechanics never seen before ? Maztec May 1st o...,
...,...,...,...
995,boardgames,"Pandemic: In the Lab, Lab Challenge. Can we pl...",I just excitingly got Pandemic In The Lab expa...
996,boardgames,Splendor App: Trading Posts Release Date?,[deleted]
997,boardgames,Rule Bending ?,Horseopoly - Rules state if a player lands on ...
998,boardgames,Rule Bending ?,[deleted]


## Building a function

Now that I have some functional knowledge of the Pushshift API, I'll define a function that can automate the process of grabbing posts and turning them into a dataframe. The results can be seen in `pushshift_functions.py`.