<h1><center>My Reddit monitoring dashboard for Python </center></h1>


<center><i>A simple dashboard to monitor keywords in Reddit, made with <a href="https://github.com/voila-dashboards/voila">Voila</a>, <a href="https://pandas.pydata.org/">Pandas</a>, <a href="https://plot.ly/python/plotly-express/">Plotly Express</a> and <a href="https://textblob.readthedocs.io/en/dev/">TextBlob</a>.</i></center>

<center><i><a href="https://github.com/NaquibAlam/voila_heroku_demo_2">Source code on github</a></i></center>

In [64]:
# import libraries

import requests
import pandas
import textblob
import plotly.express as px
import nltk
nltk.download('punkt')
pandas.set_option('display.max_colwidth', -1) # don't cut my pandas dataframes

[nltk_data] Downloading package punkt to /Users/m0a04ut/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.



In [75]:
# define variables

COMMENT_COLOR         = "blueviolet"
SUBMISSION_COLOR      = "darkorange"
TEXT_PREVIEW_SIZE     = 240
TERM_OF_INTEREST      = "covid"
SUBREDDIT_OF_INTEREST = "covid"
TIMEFRAME             = "48h" # see more options in the pushshift api docs: https://github.com/pushshift/api
SIZE                  = 500 #no of records to return

In [76]:
# a couple of helper functions

def get_reddit_data(data_type, **kwargs):
    """
    Gets data from the pushshift api.
    
    Data type can be 'comment' or 'submission'
    The rest of the args as interpreted as payload.
    
    Read more: https://github.com/pushshift/api
    """
    
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    
    request = requests.get(base_url, params=payload)
    
    return request.json()


def make_clickable(val):
    """
    Makes a pandas column clickable.
    """
    
    return '<a href="{}">Link</a>'.format(val)

# Figure Index

- [Comment activity](#1)
- [Submission activity](#2)
- [Most upvoted comments](#3) 
- [Most commented submissions](#4) 
- [/r/covid comment sentiment timeline](#5)

## Comment activity <a class="anchor" id="1"></a>

In [77]:
data = get_reddit_data(data_type="comment", q=TERM_OF_INTEREST, after=TIMEFRAME, size= SIZE, aggs="subreddit").get("data")

df = pandas.DataFrame(data)["subreddit"].value_counts()[0:10]
x = df.keys()
y = df.values

px.bar(df,
       x=x,
       y=y,
       title=f"Subreddits with most comments having the term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}",
       labels={"x": "Subreddits", "y": "Number of comments"},
       color_discrete_sequence=[COMMENT_COLOR],
       height=500,
       width=800)

## Submission activity <a class="anchor" id="2"></a>

In [79]:
data = get_reddit_data(data_type="submission", q=TERM_OF_INTEREST, after=TIMEFRAME, size=1000, aggs="subreddit").get("data")

df = pandas.DataFrame(data)["subreddit"].value_counts()[0:10]
x = df.keys()
y = df.values


px.bar(df,
       x=x,
       y=y,
       title=f"Subreddits with most submissions having the term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}",
       labels={"x": "Subreddits", "y": "Number of submissions"},
       color_discrete_sequence=[COMMENT_COLOR],
       height=500,
       width=800)

## Most upvoted comments <a class="anchor" id="3"></a>

In [69]:
data = get_reddit_data(data_type="comment", q=TERM_OF_INTEREST, after=TIMEFRAME, size=10, sort_type="score", sort="desc").get("data")
# to see what columns interest you, simply list(df)
df = pandas.DataFrame(data)[["author", "subreddit", "score", "body", "permalink"]]

# we only keep the first X characters of the body 
df.body = df.body.str[0:100] + "..."

# we append the string to all the permalink entries
df.permalink = "https://reddit.com" + df.permalink.astype(str)

# print 
print(f"\nTop 10 most upvoted comments having the term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}\n")

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})


Top 10 most upvoted comments having the term 'covid' in the last 48h



Unnamed: 0,author,subreddit,score,body,permalink
0,Gold-Giant,PublicFreakout,131,Everybody knows Covid can’t get you while you’re eating cookies....,Link
1,NomDrop,insanepeoplefacebook,107,I like how the true Covid cures were hidden from the public by one political party (who wasn’t even ...,Link
2,oojamaflip123,LiverpoolFC,90,The only thing that makes sense at all is Covid. He's gone from the best winger in the World to not ...,Link
3,CT_x,soccer,79,He's been fairly shite for months now. Every week that passes I am wholly convinced he has a case o...,Link
4,Profession-Unable,WhitePeopleTwitter,77,I’m pretty sure it also makes your symptoms much less severe if you do manage to contract Covid....,Link
5,GobtheCyberPunk,baseball,74,That 90% number is also misleading because thats 90% chance of being *infected at all* if exposed to...,Link
6,fangus,soccer,70,At least not as far as COVID goes...,Link
7,centaurius_,NYYankees,69,#COLE TRAIN: RUNNING EXPRESS #BRUCE: LOOSE #AARON: JUDGE AND JURY #GREEN: ALPHA #LUETGE: CLOSING...,Link
8,ceddzz3000,LivestreamFail,60,theres racists in every country but yeah (sry for the political take or whatever) trump just yelling...,Link
9,TopOfTheKey,baseball,56,Soto doesn't have COVID. League is fucked....,Link


## Most commented submissions <a class="anchor" id="4"></a>

In [70]:
data = get_reddit_data(data_type="submission", q=TERM_OF_INTEREST, after=TIMEFRAME, size=10, sort_type="num_comments", sort="desc").get("data")


# to see what columns interest you, simply list(df)
df = pandas.DataFrame.from_records(data)[["author", "subreddit", "num_comments", "title", "permalink"]]

# we only keep the first X characters of the body 
df.title = df.title[0:100] + "..."

# we append the string to all the permalink entries
df.permalink = "https://reddit.com" + df.permalink.astype(str)

# print 
print(f"\nTop 10 most commented submissions having the term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}\n")

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})



Top 10 most commented submissions having the term 'covid' in the last 48h



Unnamed: 0,author,subreddit,num_comments,title,permalink
0,NewYorkMetsBot,NewYorkMets,4376,"GAME THREAD: Mets (0-1) @ Phillies (4-0) - Tue, Apr 06 @ 07:05 PM EDT...",Link
1,jcceagle,dataisbeautiful,2263,[OC] Are Covid-19 vaccinations working?...,Link
2,PhilsBot,phillies,1860,"Game Thread: Mets (0-1) @ Phillies (4-0) - Tue, Apr 06 @ 07:05 PM EDT...",Link
3,ukpolbot,ukpolitics,1815,Daily Megathread - 06/04/2021...,Link
4,throwaway5272,politics,1726,Biden set to announce he's moving deadline for all US adults to be eligible for Covid vaccine to April 19...,Link
5,nogoyolo,AmItheAsshole,1541,AITA for following through and not going to Easter because I'm tired of EVERY family thing being about the kids?...,Link
6,AutoModerator,Coronavirus,1236,"Daily Discussion Thread | April 06, 2021...",Link
7,ukpolbot,ukpolitics,1182,Daily Megathread - 07/04/2021...,Link
8,Vulphere,indonesia,1131,07 April 2021- Daily Chat Thread...,Link
9,thaonguyenvan,leagueoflegends,1067,Former GAM Esports top laner Zeros banned permanently after joking about Covid-19 on stream...,Link


## /r/covid comment sentiment timeline <a class="anchor" id="5"></a>

In [71]:
data = get_reddit_data(data_type="comment", after=TIMEFRAME, size=1000, sort_type="score", sort="desc", subreddit=SUBREDDIT_OF_INTEREST).get("data")
df = pandas.DataFrame.from_records(data)[["author", "body", "created_utc", "score", "permalink"]]

df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

df["preview"] = df["body"].str[0:50]

df["date"] = pandas.to_datetime(df['created_utc'],unit='s')

px.scatter(df, x="date", 
               y="sentiment_polarity",
               hover_data=["author", "permalink", "preview"],
               color_discrete_map={"positive": "lightseagreen", "negative": "indianred"},
               color="sentiment",
               size_max=10,
               labels={"sentiment_polarity": "Comment positivity", "date": "Date, comment was posted on"},
               title=f"Comment sentiment in /r/{SUBREDDIT_OF_INTEREST} for the past {TIMEFRAME}",
          )
