# Use Reddit API With Python and Pushshift.io
This post will show you how to make an API call with Reddit API using Pushshift.io.

@author: Jean-Christophe Chouinard: Technical SEO / Data Scientist > [LinkedIn](https://www.linkedin.com/in/jeanchristophechouinard/) > [@ChouinardJC](https://twitter.com/ChouinardJC) > Blog > [jcchouinard.com](https://www.jcchouinard.com/)
View the post > [How to use Reddit API With Python](https://www.jcchouinard.com/how-to-use-reddit-api-with-python/) 


This work must be attributed to [Duarte O.Carmo](https://duarteocarmo.com/).

In this post, I will show you how to extract data from Reddit API to find out which subreddit have the most activity for your search term. 


## How to Extract Data From Reddit Using Pushshift.io?


## Install Dependent Libraries

In [None]:
# Install packages if you don't have them already installed in the current Jupyter kernel
import sys
!{sys.executable} -m pip install plotly==4.4.1
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install requests

## Load Libraries

In [None]:
import pandas as pd
import requests

## Import the JSON Data With Requests

In [None]:
query="python" 
url = f"https://api.pushshift.io/reddit/search/comment/?q={query}"
request = requests.get(url)
json_response = request.json()
json_response

## Set-up Your Parameters

In [None]:
data_type="comment"     # give me comments, use "submission" to publish something
query="python"          # Add your query
duration="30d"          # Select the timeframe. Epoch value or Integer + "s,m,h,d" (i.e. "second", "minute", "hour", "day")
size=1000               # maximum 1000 comments
sort_type="score"       # Sort by score (Accepted: "score", "num_comments", "created_utc")
sort="desc"             # sort descending
aggs="subreddit"        #"author", "link_id", "created_utc", "subreddit"

## Make a Function to Call The API

In [None]:
def get_pushshift_data(data_type, **kwargs):
    """
    Gets data from the pushshift api.

    data_type can be 'comment' or 'submission'
    The rest of the args are interpreted as payload.

    Read more: https://github.com/pushshift/api
    """

    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

In [None]:
get_pushshift_data(data_type=data_type,     
                   q=query,                 
                   after=duration,          
                   size=size,               
                   sort_type=sort_type,
                   sort=sort)             

In [None]:
data = get_pushshift_data(data_type=data_type,
                          q=query,
                          after=duration,
                          size=size,
                          aggs=aggs)

In [None]:
data = data.get("aggs").get(aggs)

In [None]:
df = pd.DataFrame.from_records(data)[0:10]
df

In [None]:
import plotly.express as px

px.bar(df,              # our dataframe
       x="key",         # x will be the 'key' column of the dataframe
       y="doc_count",   # y will be the 'doc_count' column of the dataframe
       title=f'Subreddits with most activity - comments with "{query}" in the last "{duration}"',
       labels={"doc_count": "# comments","key": "Subreddits"}, # the axis names
       color_discrete_sequence=["blueviolet"], # the colors used
       height=500,
       width=800)

In [None]:
def make_clickable(val):
    """ Makes a pandas column clickable by wrapping it in some html.
    """
    return '<a href="{}">Link</a>'.format(val,val)

In [None]:
# Call the API
data = get_pushshift_data(data_type=data_type,
                          q=query,
                          after="7d",
                          size=10,
                          sort_type=sort_type,
                          sort=sort).get("data")

# Select the columns you care about
df = pd.DataFrame.from_records(data)[["author", "subreddit", "score", "body", "permalink"]]

# Keep the first 400 characters
df['body'] = df['body'].str[0:400] + "..."

# Append the string to all the permalink entries so that we have a link to the comment
df['permalink'] = "https://reddit.com" + df['permalink'].astype(str)


# Create a function to make the link to be clickable and style the last column
def make_clickable(val):
    """ Makes a pandas column clickable by wrapping it in some html.
    """
    return '<a href="{}">Link</a>'.format(val,val)


df.style.format({'permalink': make_clickable})