# Sentiment Analysis
The goal of this notebook is to scrap an API endpoint for regenerative ag data and present it in an interactive plot to deploy in a dashboard which can be shared via link.

## Table of Contents
1. Get data from reddit regarding regenerative agriculture
2. Analyze data with plotly 
3. Top Comments over the past week
4. Sentiment over time

## Get data from reddit regenerative agriculture (or any other) keyword


In [95]:
# load packages
import requests

In [39]:
# create function to get info 
def get_pushshift_data(data_type, **kwargs):
    """ 
    Gets data from the pushshift api.
    data_type can be 'comment' or 'submission'
    other args are interpreted as payload.
    Read more: https://github.com/pushshift/api
    """
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

In [92]:
# use example
get_pushshift_data(data_type="comment",           # give me comments
                   q="python",                    # that mention 'python'
                   after="1y",                    # in the last year
                   size=1000,                     # maximum 1000 comments
                   sort_type="score",             # sort them by score
                   sort="desc",                   # sort descending
                   aggs="subreddit")              # groups result by subreddit

## Analyze data with plotly

In [96]:
# load packages for this step
import pandas as pd
import plotly.express as px

In [91]:
data = get_pushshift_data(data_type="comment",
                          q="python",
                          after="48h",
                          size=1000,
                          aggs="subreddit")


JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [89]:
type(data.get("aggs"))

NoneType

In [71]:
# dig through the json nesting and then return as pandas df
data = data.get("aggs").get("subreddit")
df = pd.DataFrame.from_records(data)[0:10]

AttributeError: 'NoneType' object has no attribute 'get'

In [72]:
# view table
df.head()

Unnamed: 0,author,subreddit,score,body,permalink
0,R_radical,WTF,222,I've never had issues with one. Very rarely do...,https://reddit.com/r/WTF/comments/knf08f/golia...
1,ballasted_orchestra,interestingasfuck,74,I'm so tired of seeing all of these extremely ...,https://reddit.com/r/interestingasfuck/comment...
2,William_Harzia,anime_titties,72,So? The benefit of organic meat is supposed to...,https://reddit.com/r/anime_titties/comments/kn...
3,yumu22,TrueCrimePodcasts,62,"Speaking from what I’ve witnessed, many people...",https://reddit.com/r/TrueCrimePodcasts/comment...
4,Trollercoaster101,spaceporn,50,It’s amazing how Jean Pierre Luminet obtained ...,https://reddit.com/r/spaceporn/comments/kqhddp...


In [73]:
# make plot with plotly
px.bar(df,              # our dataframe
       x="key",         # x will be the 'key' column of the dataframe
       y="doc_count",   # y will be the 'doc_count' column of the dataframe
       title=f"Subreddits with 'Regenerative Agriculture' activity over past year",
       labels={"doc_count": "# comments","key": "Subreddits"}, # the axis names
       color_discrete_sequence=["blueviolet"], # the colors used
       height=500,
       width=800)

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['author', 'subreddit', 'score', 'body', 'permalink'] but received: key

## Top Comments Over the past week

In [68]:
# get top comment data with function
data = get_pushshift_data(data_type="comment", 
                          q="organic", 
                          after="7d", 
                          size=10, 
                          sort_type="score", 
                          sort="desc").get("data")

# put columns of interest in df
df = pd.DataFrame.from_records(data)[["author", "subreddit", "score", "body", "permalink"]]

# limit body of the comment
df['body'] = df['body'].str[0:400] + "..."

# append the string to all the permalink entries so that there's a link to the comment
df['permalink'] = "https://reddit.com" + df['permalink'].astype(str)

# function for making clickable links in df table
def make_clickable(val):
    return '<a href="{}">Link</a>'.format(val,val)

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})

Unnamed: 0,author,subreddit,score,body,permalink
0,R_radical,WTF,222,"I've never had issues with one. Very rarely does sea life of any kind mess with divers unless people are feeding things in that area (which has caused me *serious* issues). Usually the sound of the regs and the bubbles keep just about everything at bay. Please, do not feed the fish. The only time it is acceptable to feed the fish is if you need to evacuate your gut. No food should go overboard ev...",Link
1,ballasted_orchestra,interestingasfuck,74,"I'm so tired of seeing all of these extremely heavily edited nature photos presented as something natural. I don't want to tell anyone what they can and can't do with their art, but it's often presented as something organic. And I understand that a lot of this time this is the OPs fault and not the photographer. I just think it's lying. I'm also tired of people not crediting the photographers. Yo...",Link
2,William_Harzia,anime_titties,72,So? The benefit of organic meat is supposed to be the lack of chemicals and hormones used in the production. Who thought beef produced in a less efficient manner would contribute less to greenhouse gases? This sounds like an industry smear. Honestly OP why did you even post this?...,Link
3,yumu22,TrueCrimePodcasts,62,"Speaking from what I’ve witnessed, many people’s opinion went way down after the whole plagiarism scandal. As well, the interaction seems a bit fake/ non organic. For example, ways in which Brit responds. It still remains a top podcast, besides all of this, though....",Link
4,Trollercoaster101,spaceporn,50,"It’s amazing how Jean Pierre Luminet obtained such an accurate black-hole simulation in 1978. He did all the calculations through a punch card computer and then drew the results dot by dot with india ink. He has an [amazing blog](https://blogs.futura-sciences.com/e-luminet/2018/03/07/45-years-black-hole-imaging-1-early-work-1972-1988/) were the process is described in detail, here is an excerpt...",Link
5,Redacted_G1iTcH,AskReddit,48,"A tree branch. Diamond and most precious metals are not as rare as we think in the galaxy, but something organic that came from a life form is far rarer. Especially from earth, where the life is allegedly unique to other life forms in the universe...",Link
6,HORAMAN76,Cringetopia,38,I only eat organic salmon...,Link
7,j_slosh,houseplants,36,"More or less, the plant eats it as it breaks down. Dirt is organic matter that’s breaks down over time, mostly it’s poop from worms, insects, and also bacteria and fungi. The roots use it up as your watering breaks it down. Plus some is flushed out of drainage holes...",Link
8,cayshek,AliandJohnJames,32,“qUaLiTy TiMe” after we all called her out for saying CPS said Emmy was fine because she was eating organic eggs and custom clothes instead of talking about playing with her/reading to her eat....,Link
9,Rambo7112,AskReddit,31,"Acetic acid does not replace insulin, and natural does not mean good. Being a chem major is really fun when people try to convince you of stuff by using the words, ""chemicals"", ""organic"" and ""natural."" If anything, ""organic"" scares me away from the product....",Link


## Sentiment over time

In [97]:
# load packages 
import textblob

In [98]:
# get the data of interest with function
data = get_pushshift_data(data_type="comment",
                          after="2d",
                          size=1000,
                          sort_type="score",
                          sort="desc",
                          subreddit="python").get("data")

# define columns of interest
columns_of_interest = ["author", "body", "created_utc", "score", "permalink"]

# transform the response into a dataframe with relevant columns
df = pd.DataFrame.from_records(data)[columns_of_interest]

In [99]:
# create a column with sentiment polarity
df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)

# create a column with sentiment subjectivity
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)

# create a column with 'positive' or 'negative' depending on sentiment_polarity
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

# create a column with a text preview that shows the first 50 characters
df["preview"] = df["body"].str[0:50]

# take the created_utc parameter and tranform it into a datetime column
df["date"] = pd.to_datetime(df['created_utc'],unit='s')

In [100]:
#make visual with plotly
px.scatter(df, x="date", # date on the x axis
               y="sentiment_polarity", # sentiment on the y axis
               hover_data=["author", "permalink", "preview"], # data to show on hover
               color_discrete_sequence=["lightseagreen", "indianred"], # colors to use
               color="sentiment", # what should the color depend on?
               size="score", # the more votes, the bigger the circle
               size_max=10, # not too big
               labels={"sentiment_polarity": "Comment positivity", "date": "Date comment was posted"}, # axis names
               title=f"Comment sentiment in /r/python for the past 48h", # title of figure
          )

In [101]:
2+2