<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li><li><span><a href="#Pushshift-API-Example" data-toc-modified-id="Pushshift-API-Example-2">Pushshift API Example</a></span><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-2.1">Setup</a></span></li><li><span><a href="#Making-our-first-API-call" data-toc-modified-id="Making-our-first-API-call-2.2">Making our first API call</a></span><ul class="toc-item"><li><span><a href="#Try-1" data-toc-modified-id="Try-1-2.2.1">Try 1</a></span></li><li><span><a href="#Try-2" data-toc-modified-id="Try-2-2.2.2">Try 2</a></span></li></ul></li><li><span><a href="#Processing-JSON" data-toc-modified-id="Processing-JSON-2.3">Processing JSON</a></span></li><li><span><a href="#Exploring-our-data" data-toc-modified-id="Exploring-our-data-2.4">Exploring our data</a></span></li><li><span><a href="#Let's-make-a-dataframe-out-of-a-post" data-toc-modified-id="Let's-make-a-dataframe-out-of-a-post-2.5">Let's make a dataframe out of a post</a></span></li></ul></li><li><span><a href="#Further-Resources" data-toc-modified-id="Further-Resources-3">Further Resources</a></span></li></ul></div>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li><li><span><a href="#Pushshift-API-Example" data-toc-modified-id="Pushshift-API-Example-2">Pushshift API Example</a></span><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-2.1">Setup</a></span></li><li><span><a href="#Making-our-first-API-call" data-toc-modified-id="Making-our-first-API-call-2.2">Making our first API call</a></span><ul class="toc-item"><li><span><a href="#Try-1" data-toc-modified-id="Try-1-2.2.1">Try 1</a></span></li><li><span><a href="#Try-2" data-toc-modified-id="Try-2-2.2.2">Try 2</a></span></li><li><span><a href="#Processing-JSON" data-toc-modified-id="Processing-JSON-2.2.3">Processing JSON</a></span></li></ul></li></ul></li></ul></div>

## Introduction

## Pushshift API Example

This document gives an example of a successful call to the pushshift API.

The documentation found at [https://github.com/pushshift/api](https://github.com/pushshift/api) is a good reference and was used to make this guide.

Another walkthrough that is helpful is found here: https://www.osrsbox.com/blog/2019/03/18/watercooler-scraping-an-entire-subreddit-2007scape/.  I do not know why this is a Runescape blog that did a great walkthrough, but we'll take it.

### Setup

In [34]:
import pandas as pd   
import requests  #library that uses internet to make web requests to outside pages
import time    #specialized functions to 
import datetime

### Making our first API call

#### Try 1

First we need to tell the computer where it is going to go to get the data

In [19]:
#list the url where we want to request data.
reddit_url = 'https://api.pushshift.io/reddit/search/submisssion/?subreddit=PrideandPrejudice&size=100'

Hold up, that's ugly as sin. What are all the parts of this?
- https://api.pushshift.io :  
- /reddit/ : 
- /search/ : 
- /submission/ :
- ?subreddit=PrideandPrejudice :
- &size=100 : 

More to come...

Then we make the request using the requests library:

In [24]:
r = requests.get(reddit_url)   #this is the actual request, the r is a convention to pull data that's not processed
print(r.text)  #the method .text will present the full text

''

Hmmm that doesn't look right...let's check if our request connected properly

In [22]:
#function to check status of request
r.raise_for_status()

HTTPError: 404 Client Error: Not Found for url: https://api.pushshift.io/reddit/search/submisssion/?subreddit=PrideandPrejudice&size=100

Common Errors with API Requests:
- 401 : Unauthorized
- 403 : Forbidden
- 404 : Page does not exist
- 429 : Too many requests

200 indicates a success!

See more here: https://blog.runscope.com/posts/how-to-debug-common-api-errors and https://realpython.com/python-requests/#status-codes.

#### Try 2

We found our error so let's try that same API call.

In [26]:
#list the url where we want to request data.
reddit_url = 'https://api.pushshift.io/reddit/search/submission/?subreddit=PrideandPrejudice&size=100'

What if we followed this link in Chrome instead of making the request?

In [31]:
#rying with repaired URL
r = requests.get(reddit_url)   #this is the actual request, the r is a convention to pull data that's not processed

#what is r exactly?
print(type(r))
#special type known as a Request

r.text #the method .text will present the full text

<class 'requests.models.Response'>


'{\n    "data": [\n        {\n            "all_awardings": [],\n            "allow_live_comments": false,\n            "author": "ThinChange8",\n            "author_flair_css_class": null,\n            "author_flair_richtext": [],\n            "author_flair_text": null,\n            "author_flair_type": "text",\n            "author_fullname": "t2_66gntw47",\n            "author_patreon_flair": false,\n            "author_premium": false,\n            "awarders": [],\n            "can_mod_post": false,\n            "contest_mode": false,\n            "created_utc": 1586702420,\n            "domain": "freshdice.com",\n            "full_link": "https://www.reddit.com/r/PrideandPrejudice/comments/fzxu5l/if_youre_looking_for_a_deposit_bonus_check_out/",\n            "gildings": {},\n            "id": "fzxu5l",\n            "is_crosspostable": false,\n            "is_meta": false,\n            "is_original_content": false,\n            "is_reddit_media_domain": false,\n            "is_robot_

Ok now THAT is ugly... How might we make this cleaner?

In [36]:
r.json()  #json method will show it in json format and transform it

Woah lots of stuff! Does this look familiar to anyone...?

### Processing JSON 

In [39]:
#let's explore this a bit...what type is this new json object?
type(data)

dict

![btfgif](https://i.kym-cdn.com/photos/images/newsfeed/001/480/980/897.gif)

We know how to work with dictionaries!

Let's use what we know about dictionaries to explore further

In [41]:
#look at keys
data.keys()

dict_keys(['data'])

Oddly only one key: 'data'...what is its corresponding value?

In [44]:
data['data']

[{'all_awardings': [],
  'allow_live_comments': False,
  'author': 'ThinChange8',
  'author_flair_css_class': None,
  'author_flair_richtext': [],
  'author_flair_text': None,
  'author_flair_type': 'text',
  'author_fullname': 't2_66gntw47',
  'author_patreon_flair': False,
  'author_premium': False,
  'awarders': [],
  'can_mod_post': False,
  'contest_mode': False,
  'created_utc': 1586702420,
  'domain': 'freshdice.com',
  'full_link': 'https://www.reddit.com/r/PrideandPrejudice/comments/fzxu5l/if_youre_looking_for_a_deposit_bonus_check_out/',
  'gildings': {},
  'id': 'fzxu5l',
  'is_crosspostable': False,
  'is_meta': False,
  'is_original_content': False,
  'is_reddit_media_domain': False,
  'is_robot_indexable': False,
  'is_self': False,
  'is_video': False,
  'link_flair_background_color': '',
  'link_flair_richtext': [],
  'link_flair_text_color': 'dark',
  'link_flair_type': 'text',
  'locked': False,
  'media_only': False,
  'no_follow': True,
  'num_comments': 0,
  'num_c

Now we have it as a list...? What is each item of the list? What should the length of the list be?

In [50]:
#confirm list length:
len(data['data'])
#why is it shorter??

#lets make this list a variable so we can explore each item further
posts = data['data']

{'author': 'shawbin',
 'author_created_utc': 1236097150,
 'author_flair_css_class': None,
 'author_flair_text': None,
 'author_fullname': 't2_3edv2',
 'created_utc': 1374073236,
 'domain': 'self.PrideandPrejudice',
 'full_link': 'https://www.reddit.com/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'gilded': 0,
 'id': '1ihm1u',
 'is_self': True,
 'media_embed': {},
 'mod_reports': [],
 'num_comments': 0,
 'over_18': False,
 'permalink': '/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'retrieved_on': 1412043459,
 'score': 1,
 'secure_media_embed': {},
 'selftext': 'I\'m looking for good questions for two friends that have each read the book 8-10 times (So no "Who did Mr. Darcy marry?") and also questions that would be funny, cheeky, and "inside-baseball" to someone that familiar with the book. ',
 'stickied': False,
 'subreddit': 'PrideandPrejudice',
 'subreddit_id': 't5_2wfc1',
 'thumbnail': 'default',
 'title': "I go

### Exploring our data

In [54]:
#show us the first post from the reddit
posts[-1]

{'author': 'shawbin',
 'author_created_utc': 1236097150,
 'author_flair_css_class': None,
 'author_flair_text': None,
 'author_fullname': 't2_3edv2',
 'created_utc': 1374073236,
 'domain': 'self.PrideandPrejudice',
 'full_link': 'https://www.reddit.com/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'gilded': 0,
 'id': '1ihm1u',
 'is_self': True,
 'media_embed': {},
 'mod_reports': [],
 'num_comments': 0,
 'over_18': False,
 'permalink': '/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'retrieved_on': 1412043459,
 'score': 1,
 'secure_media_embed': {},
 'selftext': 'I\'m looking for good questions for two friends that have each read the book 8-10 times (So no "Who did Mr. Darcy marry?") and also questions that would be funny, cheeky, and "inside-baseball" to someone that familiar with the book. ',
 'stickied': False,
 'subreddit': 'PrideandPrejudice',
 'subreddit_id': 't5_2wfc1',
 'thumbnail': 'default',
 'title': "I go

### Let's make a dataframe out of a post

In [55]:
post = posts[-1]

## Further Resources

In [61]:
post

{'author': 'shawbin',
 'author_created_utc': 1236097150,
 'author_flair_css_class': None,
 'author_flair_text': None,
 'author_fullname': 't2_3edv2',
 'created_utc': 1374073236,
 'domain': 'self.PrideandPrejudice',
 'full_link': 'https://www.reddit.com/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'gilded': 0,
 'id': '1ihm1u',
 'is_self': True,
 'media_embed': {},
 'mod_reports': [],
 'num_comments': 0,
 'over_18': False,
 'permalink': '/r/PrideandPrejudice/comments/1ihm1u/i_got_charged_with_administering_a_pride/',
 'retrieved_on': 1412043459,
 'score': 1,
 'secure_media_embed': {},
 'selftext': 'I\'m looking for good questions for two friends that have each read the book 8-10 times (So no "Who did Mr. Darcy marry?") and also questions that would be funny, cheeky, and "inside-baseball" to someone that familiar with the book. ',
 'stickied': False,
 'subreddit': 'PrideandPrejudice',
 'subreddit_id': 't5_2wfc1',
 'thumbnail': 'default',
 'title': "I go

In [70]:
for_df = {'author': post['author'], 'title': post['title']}

# The End

In [71]:
for_df

{'author': 'shawbin',
 'title': "I got charged with administering a Pride &amp; Prejudice quiz to two superfans of the book and movies, but I haven't read it! Suggestions for good challenging or insider-funny questions?"}

![collins-gif](https://media1.tenor.com/images/e25c4f2744fa7c5d96bcd2c5ca4c1435/tenor.gif?itemid=5782673)

In [75]:
pd.DataFrame(data=for_df, index=[0])

Unnamed: 0,author,title
0,shawbin,I got charged with administering a Pride &amp;...
