#Setup
Reddit requires keys to submit API requests. Keys are free to get. Instructions included below on how you can access reddit API.

Create a reddit user and developer account using the link below. When creating developer account select 'script' option for personal use:

[reddit.com/prefs/apps](https://reddit.com/prefs/apps)


Creating developer account will provide you with access token and secret token. Copy these and save somewhere safe. 

Once you have reddit user and developer account, request access to use by filling out form. Upon completing form you will have access. My access was immediate, but times can differ. 

[https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform](https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform)


List of API fields available to pull from.
[Reddit fields](https://www.reddit.com/dev/api/#fullnames)

What the API data looks like. The data you are pulling from. 
[Reddit API data](https://www.reddit.com/r/python/top.json?limit=100&t=year)

Below I will provide sample code but not my credentials. You may input yours in when prompted. 


#Sample Code


In [None]:
#Import libraries
import requests

#input credentials for access and secret. 
access = ''
secret = ''

In [None]:
# note that CLIENT_ID refers to 'personal use script' and SECRET_TOKEN to 'token'
auth = requests.auth.HTTPBasicAuth('K-GqWsrNzjiEs9EsuCzmkg', 'IKGSFyS7-H3yMcG6rNkyQAv2kfX1MQ')

# here we pass our login method (password), username, and password
data = {'grant_type': 'password',
        'username': '',
        'password': ''}

# setup our header info, which gives reddit a brief description of our app
headers = {'User-Agent': 'MyBot/0.0.1'}

# send our request for an OAuth token
res = requests.post('https://www.reddit.com/api/v1/access_token',
                    auth=auth, data=data, headers=headers)

# convert response to JSON and pull access_token value
TOKEN = res.json()['access_token']

# add authorization to our headers dictionary
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}

# while the token is valid (~2 hours) we just add headers=headers to our requests
requests.get('https://oauth.reddit.com/api/v1/me', headers=headers)

<Response [200]>

The program accepts my tokens, login credentials and stores within the bearer token. Bearer token is then stored into headers and added to URL when submitting requests. 

Result of 'Response 200' is success. 

In [None]:
#Pull a request. popular posts related to python. 
res = requests.get("https://oauth.reddit.com/r/python/hot",
                   headers=headers)

print(res.json())  # let's see what we get. Similar to the webpage URL from the beginning. 

{'kind': 'Listing', 'data': {'after': 't3_10a1nf9', 'dist': 27, 'modhash': None, 'geo_filter': None, 'children': [{'kind': 't3', 'data': {'approved_at_utc': None, 'subreddit': 'Python', 'selftext': "Tell /r/python what you're working on this week! You can be bragging, grousing, sharing your passion, or explaining your pain. Talk about your current project or your pet project; whatever you want to share.", 'author_fullname': 't2_145f96', 'saved': False, 'mod_reason_title': None, 'gilded': 0, 'clicked': False, 'title': "Sunday Daily Thread: What's everyone working on this week?", 'link_flair_richtext': [{'e': 'text', 't': 'Daily Thread'}], 'subreddit_name_prefixed': 'r/Python', 'hidden': False, 'pwls': 6, 'link_flair_css_class': 'daily-thread', 'downs': 0, 'thumbnail_height': None, 'top_awarded_type': None, 'hide_score': False, 'name': 't3_1063vg5', 'quarantine': False, 'link_flair_text_color': 'light', 'upvote_ratio': 0.91, 'author_flair_background_color': '#7289da', 'subreddit_type': '

#Structure 
Data attribute contains all information, children is layer below that. Containing each individual entry for number of requests pulled, 15 min 100 max. Data again is another layer deep from children containing desired metadata. First and second data are different. 

In [None]:
#The above object has attributes. Like title of post.
#Attributes/Fields that we can pull from are included in link below from reddit documentation
#https://www.reddit.com/dev/api/


for post in res.json()['data']['children']:
    print(post['data']['title'])

Sunday Daily Thread: What's everyone working on this week?
Thursday Daily Thread: Python Careers, Courses, and Furthering Education!
Why Polars uses less memory than Pandas
Is it still relevant today to take a Python and Pandas course that was created two years ago?
Train a language model from scratch
Build a ChatGPT-like SMS Chatbot with OpenAI and Python
Python Folium: Create Web Maps From Your Data – Real Python
atomfeed.py - A single file atom feed library with no dependencies
Hey pythonistas, friendly reminder that Python 3.7 is EOL in June this year.
The first Cardano Smart contract written in eopsin/Python compiled and deployed on preprod testnet.
Mockitup - Convenient DSL around `unittest.mock`
Solving Django race conditions with select_for_update and optimistic updates
Electron and Django
Easy-to-Use Python Library to Access BLS Data
ML-Powered Search with Doug Turnbull (Shopify)
Do you want to easily fetch weather data in Python for your Data Science projects?
einspect - Muta

#Pulling More Data
In the following request we will pull much more data. title, text, ups, downs count, score, etc. 

This search will be on the trending searches based on a keyword. Searching the trending searches for 'python'. look at URL. 'hot' refers to trending. 

In [None]:
#Now place into a dataframe.
#Pull title, date-time, content, reactions
# make a request for the trending posts in /r/Python. hot keyword means treanding. 
import pandas as pd
res = requests.get("https://oauth.reddit.com/r/python/hot",
                   headers=headers)

df = pd.DataFrame()  # initialize dataframe

# loop through each post retrieved from GET request
for post in res.json()['data']['children']:
    # append relevant data to dataframe
    df = df.append({
        'subreddit': post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'upvote_ratio': post['data']['upvote_ratio'],
        'ups': post['data']['ups'],
        'downs': post['data']['downs'],
        'score': post['data']['score'],
        'name': post['data']['name']
    }, ignore_index=True)
    

In [None]:
display(df)

Unnamed: 0,subreddit,title,selftext,upvote_ratio,ups,downs,score,name
0,Python,Sunday Daily Thread: What's everyone working o...,Tell /r/python what you're working on this wee...,0.92,10.0,0.0,10.0,t3_1063vg5
1,Python,"Thursday Daily Thread: Python Careers, Courses...",Discussion of using Python in a professional e...,1.0,1.0,0.0,1.0,t3_109k9to
2,Python,Why Polars uses less memory than Pandas,,0.94,109.0,0.0,109.0,t3_10a2tjg
3,Python,Is it still relevant today to take a Python an...,I'm learning Python nowadays and I'm taking th...,0.88,81.0,0.0,81.0,t3_10a1zir
4,Python,Train a language model from scratch,,0.76,6.0,0.0,6.0,t3_10a72bh
5,Python,Build a ChatGPT-like SMS Chatbot with OpenAI a...,,0.64,3.0,0.0,3.0,t3_10aasev
6,Python,Python Folium: Create Web Maps From Your Data ...,,0.88,12.0,0.0,12.0,t3_10a0t7w
7,Python,atomfeed.py - A single file atom feed library ...,I finally got around implementing an atom feed...,1.0,9.0,0.0,9.0,t3_10a171p
8,Python,"Hey pythonistas, friendly reminder that Python...",,0.95,469.0,0.0,469.0,t3_1096yh8
9,Python,The first Cardano Smart contract written in eo...,,0.57,1.0,0.0,1.0,t3_10a9qha


Next request will still include the keyword Python. But this time will search for the most recent results rather than the hottest. 

In [None]:
# make a request for the most recent posts in /r/Python. instead of hot this time new
res = requests.get("https://oauth.reddit.com/r/python/new",
                   headers=headers)

df = pd.DataFrame()  # initialize dataframe
#URL below section 'Pulling all the most recent posts in a subreddit and creating a local database' gray box shows fields allowed to pull. 
#https://brentgaisford.medium.com/how-to-use-python-and-the-reddit-api-to-build-a-local-database-of-reddit-posts-and-comments-ca9f3843bfc2

# loop through each post retrieved from GET request
for post in res.json()['data']['children']:
    # append relevant data to dataframe
    df = df.append({
        'subreddit': post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'upvote_ratio': post['data']['upvote_ratio'],
        'ups': post['data']['ups'],
        'downs': post['data']['downs'],
        'score': post['data']['score'],
        'author': post['data']['author'],
        'comments_count': post['data']['num_comments']
    }, ignore_index=True)

In [None]:
display(df)

Unnamed: 0,subreddit,title,selftext,upvote_ratio,ups,downs,score,author,comments_count
0,Python,resources for learning 🐍,"hi!!! as the title says, can you guys share wi...",1.0,1.0,0.0,1.0,matalora2001,0.0
1,Python,Easy-to-Use Python Library to Access BLS Data,"Hi Python Enthusiasts,\n\nI've created a simpl...",1.0,1.0,0.0,1.0,ryan_s007,0.0
2,Python,ML-Powered Search with Doug Turnbull (Shopify),"Hey all,\n\nI thought I’d just drop a quick no...",1.0,1.0,0.0,1.0,lorenzo_1999,0.0
3,Python,Build a ChatGPT-like SMS Chatbot with OpenAI a...,,0.71,3.0,0.0,3.0,lizziepika,1.0
4,Python,Electron and Django,,0.67,1.0,0.0,1.0,will_r3ddit_4_food,0.0
5,Python,The first Cardano Smart contract written in eo...,,0.67,2.0,0.0,2.0,dominatingslash,0.0
6,Python,coppercube game engine visual scripting system...,coppercube is a non coding game engine. howeve...,0.2,0.0,0.0,0.0,mhjhacker1,3.0
7,Python,Train a language model from scratch,,0.8,6.0,0.0,6.0,davidmezzetti,0.0
8,Python,Do you want to easily fetch weather data in Py...,Have a look at this medium article: \n\n[https...,0.6,1.0,0.0,1.0,Nice-Tomorrow2926,0.0
9,Python,The experience I gained as a developer by Buil...,Read Post: [https://dev.to/fahad\_islam/my-exp...,1.0,1.0,0.0,1.0,RFGadgetsTech,1.0


#Another Reddit API feature within Reddit. Praw
a python package to scrape Reddit Post data. This package provides the scraper with more power to filter requests

In [None]:
!pip install praw
import praw


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting praw
  Downloading praw-7.6.1-py3-none-any.whl (188 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.8/188.8 KB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting update-checker>=0.18
  Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Collecting prawcore<3,>=2.1
  Downloading prawcore-2.3.0-py3-none-any.whl (16 kB)
Collecting websocket-client>=0.54.0
  Downloading websocket_client-1.4.2-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 KB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: websocket-client, update-checker, prawcore, praw
Successfully installed praw-7.6.1 prawcore-2.3.0 update-checker-0.18.0 websocket-client-1.4.2


In [None]:
#Input credentials. client_id is access token. 'user_agent' is name of developer account. 
reddit = praw.Reddit(client_id='K-GqWsrNzjiEs9EsuCzmkg',
                     client_secret='IKGSFyS7-H3yMcG6rNkyQAv2kfX1MQ', password='InTr8O_Smippl$',
                     user_agent='red_test1', username='jgrips9')

In [None]:
#Create empty lists. This is where data will be stored
author_list = []

Top reddit posts for searchword 'worldnews'

In [None]:
subreddit = reddit.subreddit('worldnews')
hot_post = subreddit.hot(limit = 10)
for sub in hot_post:
  author_list.append(sub.author)

print(author_list)

In [None]:
#Storing more information. 
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [None]:
#Searching more than 1 topic. 
subreddit_list=  ['worldnews',
                  'announcements',
                  'funny',
                  'gaming',
                  'science',
                  'movies'
                 ]

In [None]:
#Store into dataframe. populate lists
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  hot_post = subreddit.hot(limit = 10)
  for sub in hot_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)
    


In [None]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,109pphj,WorldNewsMods,/r/WorldNews Live Thread: Russian Invasion of ...,1086,1277,0.96,worldnews
1,10a5wun,Keffpie,Huge deposits of rare earth elements discovere...,785,11016,0.98,worldnews
2,10a7nj0,JustMyOpinionz,Exxon accurately predicted global warming from...,77,1500,0.96,worldnews
3,109vsfz,SteO153,International blunder as Swiss firm gives Taiw...,760,9520,0.96,worldnews
4,10a2xbk,WinterPlanet,Lula says he suspects pro-Bolsonaro staff help...,61,2300,0.96,worldnews
5,10a1rx8,PatientBuilder499,Erdogan calls Taliban ban on women's education...,149,2181,0.96,worldnews
6,109xzsv,BusbyBusby,"Scale of alleged torture, detentions by Russia...",101,3187,0.97,worldnews
7,109x3hw,Narvi_-,Execution Of 19-Year-Old Iranian ‘Stayed” Afte...,44,2052,0.97,worldnews
8,109zwvp,misana123,Ten oligarchs who used ‘golden visa’ route to ...,40,1272,0.96,worldnews
9,10a1hfy,Bananaramas,US Navy veteran released from Russian custody,40,913,0.95,worldnews


Perform the same action with top posts, new posts. 

In [None]:
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [None]:
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  top_post = subreddit.top(limit = 10)
  for sub in top_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)

In [None]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,k4qide,stem12345679,An anti-gay Hungarian politician has resigned ...,8437,204538,0.93,worldnews
1,eclwg9,MachoNachoTaco,Trump Impeached for Abuse of Power,20127,202895,0.88,worldnews
2,t3pgaz,bichonista,Vladimir Putin's black belt revoked by interna...,6958,200152,0.89,worldnews
3,901p5f,DoremusJessup,"Two weeks before his inauguration, Donald J. T...",18086,189354,0.84,worldnews
4,x96k3v,pipsdontsqueak,"Queen Elizabeth II has died, Buckingham Palace...",16742,188842,0.79,worldnews
5,t0b6fb,geiwne,More than 150 senior Russian officials sign op...,7735,178008,0.93,worldnews
6,t1o8wq,CyberArtillery,"Rejecting US evacuation offer, Zelensky says I...",8327,171625,0.94,worldnews
7,fi91qc,Eurynom0s,Mexico is considering closing its border to st...,9047,168672,0.93,worldnews
8,t1f287,o-Themis-o,Anonymous leaks database of the Russian Minist...,6447,165232,0.93,worldnews
9,4d75i7,mister_geaux,2.6 terabyte leak of Panamanian shell company ...,12066,154759,0.95,worldnews


In [None]:
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [None]:
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  new_post = subreddit.new(limit = 10)
  for sub in new_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)

In [None]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,10ad1ul,Tartan_Samurai,Fried or scrambled? Prehistoric ostrich eggs f...,2,4,0.83,worldnews
1,10ad0we,Tartan_Samurai,Brazil Congress: Bolsonaro supporters inside p...,0,5,1.0,worldnews
2,10acysa,Pure_Candidate_3831,"At 113, one of Canada's oldest people has died...",2,3,0.71,worldnews
3,10ac5k3,DoremusJessup,Israel's top judge lashed out Thursday at the ...,3,21,0.82,worldnews
4,10ac0jr,greatdevonhope,Boris Johnson given £1m donation by former Bre...,1,32,0.91,worldnews
5,10abzd6,vichistor,The Humanitarian Crisis in Nagorno-Karabakh Is...,2,20,0.81,worldnews
6,10abpbm,Lionel54321,Haiti left with no elected government official...,28,63,0.92,worldnews
7,10ab7z3,--_--______---_---,Archaeologists believe they found the temple o...,5,49,0.88,worldnews
8,10ab1ey,goprinterm,Russia releases U.S. Navy veteran into Poland,4,24,0.86,worldnews
9,10aam6j,N2929,France fines TikTok $5.4 mln for online tracki...,2,42,0.9,worldnews


#Helpful Information
[Setup and sample Python code.](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c)

Praw tutorial and code
[Praw tutorial](https://medium.com/analytics-vidhya/praw-a-python-package-to-scrape-reddit-post-data-b759a339ed9a)