#Setup
Reddit requires keys to submit API requests. Keys are free to get. Instructions included below on how you can access reddit API.

Create a reddit user and developer account using the link below. When creating developer account select 'script' option for personal use:

[reddit.com/prefs/apps](https://reddit.com/prefs/apps)


Creating developer account will provide you with access token and secret token. Copy these and save somewhere safe. 

Once you have reddit user and developer account, request access to use by filling out form. Upon completing form you will have access. My access was immediate, but times can differ. 

[https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform](https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform)


List of API fields available to pull from.
[Reddit fields](https://www.reddit.com/dev/api/#fullnames)

What the API data looks like. The data you are pulling from. 
[Reddit API data](https://www.reddit.com/r/python/top.json?limit=100&t=year)

Below I will provide sample code but not my credentials. You may input yours in when prompted. 


#Sample Code


In [9]:
#Import libraries
import requests

#input credentials for access and secret. 
access = 'K-GqWsrNzjiEs9EsuCzmkg'
secret = 'IKGSFyS7-H3yMcG6rNkyQAv2kfX1MQ'

In [10]:
# note that CLIENT_ID refers to 'personal use script' and SECRET_TOKEN to 'token'
auth = requests.auth.HTTPBasicAuth('K-GqWsrNzjiEs9EsuCzmkg', 'IKGSFyS7-H3yMcG6rNkyQAv2kfX1MQ')

# here we pass our login method (password), username, and password
data = {'grant_type': 'password',
        'username': 'jgrips9',
        'password': 'InTr8O_Smippl$'}

# setup our header info, which gives reddit a brief description of our app
headers = {'User-Agent': 'MyBot/0.0.1'}

# send our request for an OAuth token
res = requests.post('https://www.reddit.com/api/v1/access_token',
                    auth=auth, data=data, headers=headers)

# convert response to JSON and pull access_token value
TOKEN = res.json()['access_token']

# add authorization to our headers dictionary
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}

# while the token is valid (~2 hours) we just add headers=headers to our requests
requests.get('https://oauth.reddit.com/api/v1/me', headers=headers)

<Response [200]>

The program accepts my tokens, login credentials and stores within the bearer token. Bearer token is then stored into headers and added to URL when submitting requests. 

Result of 'Response 200' is success. 

In [11]:
#Pull a request. Trending posts related to python. 
res = requests.get("https://oauth.reddit.com/r/python/hot",
                   headers=headers)

print(res.json())  # let's see what we get. Similar to the webpage URL from the beginning. 

{'kind': 'Listing', 'data': {'after': 't3_10bvk6z', 'dist': 27, 'modhash': None, 'geo_filter': None, 'children': [{'kind': 't3', 'data': {'approved_at_utc': None, 'subreddit': 'Python', 'selftext': "Tell /r/python what you're working on this week! You can be bragging, grousing, sharing your passion, or explaining your pain. Talk about your current project or your pet project; whatever you want to share.", 'author_fullname': 't2_145f96', 'saved': False, 'mod_reason_title': None, 'gilded': 0, 'clicked': False, 'title': "Sunday Daily Thread: What's everyone working on this week?", 'link_flair_richtext': [{'e': 'text', 't': 'Daily Thread'}], 'subreddit_name_prefixed': 'r/Python', 'hidden': False, 'pwls': 6, 'link_flair_css_class': 'daily-thread', 'downs': 0, 'thumbnail_height': None, 'top_awarded_type': None, 'hide_score': False, 'name': 't3_10c4t91', 'quarantine': False, 'link_flair_text_color': 'light', 'upvote_ratio': 1.0, 'author_flair_background_color': '#7289da', 'subreddit_type': 'p

#Structure 
Data attribute contains all information, children is layer below that. Containing each individual entry for number of requests pulled, 15 min 100 max. Data again is another layer deep from children containing desired metadata. First and second data are different. 

In [12]:
#The above object has attributes. Like title of post.
#Attributes/Fields that we can pull from are included in link below from reddit documentation
#https://www.reddit.com/dev/api/


for post in res.json()['data']['children']:
    print(post['data']['title'])

Sunday Daily Thread: What's everyone working on this week?
Saturday Daily Thread: Resource Request and Sharing! Daily Thread
Discord Bot pretending to be human using Chat GPT
Supply Chain Attack Using Identical PyPI Packages, “colorslib”, “httpslib”, and “libhttps” | FortiGuard Labs
What are people using to organize virtual environments these days?
Activities to keep in touch with python programming
Introducing my-package: A powerful and easy-to-use json database tool
Project - Computational MAth EXamples on github
Analyzing Wireshark Package with Python
Total beginner - Could Python be interesting for sales?
Yellowpage scraper powered by Python
I've updated the README of Panel. Let me know what you think. Thanks.
P2PD: Improving async networking in Python
How to improve Python packaging, or why fourteen tools are at least twelve too many
How to Create a Face Recognition Using Python
Check out AWESOME PANEL SHARING - An easy to use sharing service for Panel data apps
(Maybe) Everything

#Pulling More Data
In the following request we will pull much more data. title, text, ups, downs count, score, etc. 

This search will be on the trending searches based on a keyword. Searching the trending searches for 'python'. look at URL. 'hot' refers to trending. 

In [13]:
#Now place into a dataframe.
#Pull title, date-time, content, reactions
# make a request for the trending posts in /r/Python. hot keyword means treanding. 
import pandas as pd
res = requests.get("https://oauth.reddit.com/r/python/hot",
                   headers=headers)

df = pd.DataFrame()  # initialize dataframe

# loop through each post retrieved from GET request
for post in res.json()['data']['children']:
    # append relevant data to dataframe
    df = df.append({
        'subreddit': post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'upvote_ratio': post['data']['upvote_ratio'],
        'ups': post['data']['ups'],
        'downs': post['data']['downs'],
        'score': post['data']['score'],
        'name': post['data']['name']
    }, ignore_index=True)
    

In [14]:
display(df)

Unnamed: 0,subreddit,title,selftext,upvote_ratio,ups,downs,score,name
0,Python,Sunday Daily Thread: What's everyone working o...,Tell /r/python what you're working on this wee...,1.0,3.0,0.0,3.0,t3_10c4t91
1,Python,Saturday Daily Thread: Resource Request and Sh...,Found a neat resource related to Python over t...,0.67,2.0,0.0,2.0,t3_10b9r0f
2,Python,Discord Bot pretending to be human using Chat GPT,Python script for a **Discord bot** that uses ...,0.76,62.0,0.0,62.0,t3_10cjm62
3,Python,Supply Chain Attack Using Identical PyPI Packa...,,0.84,8.0,0.0,8.0,t3_10cm2yo
4,Python,What are people using to organize virtual envi...,Thinking multiple Python versions and packages...,0.95,258.0,0.0,258.0,t3_10bxkjp
5,Python,Activities to keep in touch with python progra...,"Hello Everyone, I am a working professional(ch...",0.81,9.0,0.0,9.0,t3_10chz9y
6,Python,Introducing my-package: A powerful and easy-to...,"Hey everyone,\n\nI want to share with you a n...",1.0,4.0,0.0,4.0,t3_10cmb94
7,Python,"How to improve Python packaging, or why fourte...",,1.0,3.0,0.0,3.0,t3_10cnx5i
8,Python,Project - Computational MAth EXamples on github,A couple of years ago I've submitted a project...,0.81,3.0,0.0,3.0,t3_10cjzdz
9,Python,Analyzing Wireshark Package with Python,,0.8,3.0,0.0,3.0,t3_10cmque


Next request will still include the keyword Python. But this time will search for the most recent results rather than the hottest. 

In [15]:
# make a request for the most recent posts in /r/Python. instead of hot this time new
res = requests.get("https://oauth.reddit.com/r/python/new",
                   headers=headers)

df = pd.DataFrame()  # initialize dataframe
#URL below section 'Pulling all the most recent posts in a subreddit and creating a local database' gray box shows fields allowed to pull. 
#https://brentgaisford.medium.com/how-to-use-python-and-the-reddit-api-to-build-a-local-database-of-reddit-posts-and-comments-ca9f3843bfc2

# loop through each post retrieved from GET request
for post in res.json()['data']['children']:
    # append relevant data to dataframe
    df = df.append({
        'subreddit': post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'upvote_ratio': post['data']['upvote_ratio'],
        'ups': post['data']['ups'],
        'downs': post['data']['downs'],
        'score': post['data']['score'],
        'author': post['data']['author'],
        'comments_count': post['data']['num_comments']
    }, ignore_index=True)

In [16]:
display(df)

Unnamed: 0,subreddit,title,selftext,upvote_ratio,ups,downs,score,author,comments_count
0,Python,"How to improve Python packaging, or why fourte...",,1.0,3.0,0.0,3.0,Kwpolska,0.0
1,Python,Analyzing Wireshark Package with Python,,0.8,3.0,0.0,3.0,ismailtasdelen,0.0
2,Python,Introducing my-package: A powerful and easy-to...,"Hey everyone,\n\nI want to share with you a n...",1.0,6.0,0.0,6.0,Eight_Oh_Four,1.0
3,Python,Supply Chain Attack Using Identical PyPI Packa...,,0.84,8.0,0.0,8.0,dlorenc,5.0
4,Python,How to get user IP location using python-,,0.57,1.0,0.0,1.0,eren_rndm,1.0
5,Python,Project - Computational MAth EXamples on github,A couple of years ago I've submitted a project...,0.81,3.0,0.0,3.0,Andrei_Keino,0.0
6,Python,Discord Bot pretending to be human using Chat GPT,Python script for a **Discord bot** that uses ...,0.76,62.0,0.0,62.0,Karki2002,22.0
7,Python,Activities to keep in touch with python progra...,"Hello Everyone, I am a working professional(ch...",0.86,10.0,0.0,10.0,Silly_You9597,6.0
8,Python,Internal Audit Department Software,Hello guys! \n\nOur director suggested that we...,0.5,0.0,0.0,0.0,AzizAlharbi,3.0
9,Python,P2PD: Improving async networking in Python,,0.8,3.0,0.0,3.0,pmz,0.0


#Another Reddit API feature within Reddit. Praw
a python package to scrape Reddit Post data. This package provides the scraper with more power to filter requests

In [18]:
!pip install praw
import praw


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting praw
  Downloading praw-7.6.1-py3-none-any.whl (188 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.8/188.8 KB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting prawcore<3,>=2.1
  Downloading prawcore-2.3.0-py3-none-any.whl (16 kB)
Collecting websocket-client>=0.54.0
  Downloading websocket_client-1.4.2-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 KB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting update-checker>=0.18
  Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Installing collected packages: websocket-client, update-checker, prawcore, praw
Successfully installed praw-7.6.1 prawcore-2.3.0 update-checker-0.18.0 websocket-client-1.4.2


In [19]:
#Input credentials. client_id is access token. 'user_agent' is name of developer account. 
reddit = praw.Reddit(client_id='K-GqWsrNzjiEs9EsuCzmkg',
                     client_secret='IKGSFyS7-H3yMcG6rNkyQAv2kfX1MQ', password='InTr8O_Smippl$',
                     user_agent='red_test1', username='jgrips9')

In [20]:
#Create empty lists. This is where data will be stored
author_list = []

Top reddit posts for searchword 'worldnews'

In [21]:
subreddit = reddit.subreddit('worldnews')
hot_post = subreddit.hot(limit = 10)
for sub in hot_post:
  author_list.append(sub.author)

print(author_list)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



[Redditor(name='WorldNewsMods'), Redditor(name='AgentBlue62'), Redditor(name='Status-Gas-7347'), Redditor(name='hieronymusanonymous'), Redditor(name='hieronymusanonymous'), Redditor(name='HarakenQQ'), Redditor(name='HarakenQQ'), Redditor(name='magnus123'), Redditor(name='1survivor'), Redditor(name='HarakenQQ')]


In [22]:
#Storing more information. 
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [23]:
#Searching more than 1 topic. 
subreddit_list=  ['worldnews',
                  'announcements',
                  'funny',
                  'gaming',
                  'science',
                  'movies'
                 ]

In [24]:
#Store into dataframe. populate lists
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  hot_post = subreddit.hot(limit = 10)
  for sub in hot_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)
    


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/l

In [25]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,10c9ze6,WorldNewsMods,/r/WorldNews Live Thread: Russian Invasion of ...,715,1033,0.96,worldnews
1,10cgkv6,AgentBlue62,Moscow is allegedly preparing to deport some 1...,1526,25564,0.94,worldnews
2,10ci10a,Status-Gas-7347,All of the bases in DNA and RNA have now been ...,399,3493,0.97,worldnews
3,10chpz6,hieronymusanonymous,Russia Sets Ultimatum to Formally Pull a Third...,395,2686,0.96,worldnews
4,10chf97,hieronymusanonymous,Alexei Navalny: Jailed Putin critic needs 'urg...,28,926,0.96,worldnews
5,10ckjaq,HarakenQQ,Russian soldier blows up grenade at ammunition...,36,465,0.98,worldnews
6,10cjolt,HarakenQQ,World community must strongly condemn use of K...,43,509,0.96,worldnews
7,10cj8tx,magnus123,Afghanistan: Former female lawmaker shot dead ...,32,500,0.96,worldnews
8,10cex84,1survivor,"Women's Rights Not Priority, Says Taliban Spok...",136,1057,0.95,worldnews
9,10cjo2y,HarakenQQ,Finnish President is concerned that Putin will...,87,448,0.95,worldnews


Perform the same action with top posts, new posts. 

In [26]:
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [27]:
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  top_post = subreddit.top(limit = 10)
  for sub in top_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/l

In [28]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,k4qide,stem12345679,An anti-gay Hungarian politician has resigned ...,8437,204547,0.93,worldnews
1,eclwg9,MachoNachoTaco,Trump Impeached for Abuse of Power,20127,202902,0.88,worldnews
2,t3pgaz,bichonista,Vladimir Putin's black belt revoked by interna...,6958,200156,0.89,worldnews
3,901p5f,DoremusJessup,"Two weeks before his inauguration, Donald J. T...",18086,189358,0.84,worldnews
4,x96k3v,pipsdontsqueak,"Queen Elizabeth II has died, Buckingham Palace...",16741,188844,0.79,worldnews
5,t0b6fb,geiwne,More than 150 senior Russian officials sign op...,7735,178005,0.93,worldnews
6,t1o8wq,CyberArtillery,"Rejecting US evacuation offer, Zelensky says I...",8326,171617,0.94,worldnews
7,fi91qc,Eurynom0s,Mexico is considering closing its border to st...,9047,168668,0.93,worldnews
8,t1f287,o-Themis-o,Anonymous leaks database of the Russian Minist...,6447,165229,0.93,worldnews
9,4d75i7,mister_geaux,2.6 terabyte leak of Panamanian shell company ...,12066,154760,0.95,worldnews


In [29]:
author_list = []
id_list = []
num_comments_list = []
score_list = []
title_list = []
upvote_ratio_list = []
topic = []

In [30]:
for subred in subreddit_list:
    
  subreddit = reddit.subreddit(subred)
  new_post = subreddit.new(limit = 10)
  for sub in new_post:
    author_list.append(sub.author)
    id_list.append(sub.id)
    num_comments_list.append(sub.num_comments)
    score_list.append(sub.score)
    title_list.append(sub.title)
    upvote_ratio_list.append(sub.upvote_ratio)
    topic.append(subred)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/l

In [31]:
#Store in dataframe. create dataframe from lists
df = pd.DataFrame({'ID':id_list, 
                   'Author':author_list, 
                   'Title':title_list,
                   'Count_of_Comments':num_comments_list,
                   'Upvote_Count':score_list,
                   'Upvote_Ratio':upvote_ratio_list,
                   'topic': topic
                  })
display(df)

Unnamed: 0,ID,Author,Title,Count_of_Comments,Upvote_Count,Upvote_Ratio,topic
0,10cpiu1,HarakenQQ,Kherson: Russians shell Red Cross premises and...,1,5,0.86,worldnews
1,10cpeeu,-Ima-Phat-Cookie-Ho-,Moscow is allegedly preparing to deport some 1...,0,7,0.89,worldnews
2,10cojd6,northernmonk,Nepal plane crash with 72 onboard leaves at le...,6,21,0.96,worldnews
3,10cocbp,DoremusJessup,Parisians will be invited to vote on whether t...,2,7,0.77,worldnews
4,10co6zr,_Foy,Rights group releases scathing report on Canad...,3,27,0.86,worldnews
5,10co181,DoremusJessup,Climate activists on Sunday accused police of ...,5,65,0.92,worldnews
6,10cn9ar,OkRoll3915,Temporary morgues opened across UK to deal wit...,40,80,0.86,worldnews
7,10cme1e,Sporeboss,"Iran to get Russian Sukhoi Su-35 fighter jets,...",18,39,0.92,worldnews
8,10cm92h,Sporeboss,"Catholic priest burned to death, another shot ...",2,36,0.85,worldnews
9,10cm2yk,Gopu_17,Ukraine war: Chances of more survivors from Dn...,1,43,0.92,worldnews


#Helpful Information
[Setup and sample Python code.](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c)

Praw tutorial and code
[Praw tutorial](https://medium.com/analytics-vidhya/praw-a-python-package-to-scrape-reddit-post-data-b759a339ed9a)