# Sentiment Analysis
The goal of this notebook is to scrap an API endpoint for regenerative ag data and present it as an interactive plot.

## Table of Contents
1. Get data from reddit regarding regenerative agriculture
2. Analyze data with plotly 
3. Top Comments over the past week
4. Sentiment over time
5. Summary

## Get data from reddit regenerative agriculture (or any other) keyword


In [1]:
# load packages
import requests

In [2]:
# create function to get info 
def get_pushshift_data(data_type, **kwargs):
    """ 
    Gets data from the pushshift api.
    data_type can be 'comment' or 'submission'
    other args are interpreted as payload.
    Read more: https://github.com/pushshift/api
    """
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

In [92]:
# use example
get_pushshift_data(data_type="comment",           # give me comments
                   q="organic",                   # that mention 'organic'
                   after="1y",                    # in the last year
                   size=1000,                     # maximum 1000 comments
                   sort_type="score",             # sort them by score
                   sort="desc",                   # sort descending
                   aggs="subreddit")              # groups result by subreddit

## Analyze data with plotly

In [3]:
# load packages for this step
import pandas as pd
import plotly.express as px

In [53]:
data = get_pushshift_data(data_type="comment",
                          q="regenerative agriculture",
                          after="1y",
                          size=1000,
                          #aggs="subreddit"
                         ).get("data")
# data

In [54]:
# type(data)

In [55]:
# change list to pandas df
df = pd.DataFrame(data)
# view table
# df.head(10)

In [56]:
# get col names to make sure they are called correctly
# df.columns

In [40]:
# group by subreddit and count times subreddit appears
df['count'] = 1
min_df = df[['subreddit', 'count']]
grouped_df = min_df.groupby(['subreddit']).sum()
grouped_df = grouped_df.sort_values(by=['count'], ascending=False)[0:10]

In [57]:
grouped_df = grouped_df.reset_index()
# grouped_df.columns

In [42]:
# make plot with plotly
px.bar(grouped_df,              # df
       x="subreddit",           # x axis info from df
       y="count",               # y axis 'count' column from df
       title=f"Subreddits with 'Regenerative Agriculture' activity over past year",
       labels={"doc_count": "# comments","key": "Subreddits"}, # axis names
       color_discrete_sequence=["blueviolet"], # colors used
       height=500,
       width=800)

## Top Comments Over the past week

In [43]:
# get top comment data with function
data = get_pushshift_data(data_type="comment", 
                          q="regenerative agriculture", 
                          after="7d", 
                          size=10, 
                          sort_type="score", 
                          sort="desc").get("data")

# put columns of interest in df
df = pd.DataFrame.from_records(data)[["author", "subreddit", "score", "body", "permalink"]]

# limit body of the comment
df['body'] = df['body'].str[0:400] + "..."

# append the string to all the permalink entries so that there's a link to the comment
df['permalink'] = "https://reddit.com" + df['permalink'].astype(str)

# function for making clickable links in df table
def make_clickable(val):
    return '<a href="{}">Link</a>'.format(val,val)

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})

Unnamed: 0,author,subreddit,score,body,permalink
0,valski1337,LivestreamFail,26,Except [regenerative agriculture](https://en.wikipedia.org/wiki/Regenerative_agriculture) exists and is getting more popular....,Link
1,Deep-Duck,worldnews,12,Community Supported Agriculture is great for this and I highly recommend people join a CSA if they have access to one near them. With a CSA you pay for your entire seasons worth of vegetables up front to a local farm. Then throughout the year you get a weekly share of the farms harvest. Most (all in my experience) CSA's put a focus on regenerative farming focusing on farming techniques that rene...,Link
2,vocalghost,LivestreamFail,11,"How in the world did you get ""stop farming altogether"" from my comment? I'm not against regen ag at all, I think its amazing, but the fact of the matter is that farming requires land that they have to make fit to the crops that the market wants. They're referring to conservation as in the conserving the nutrients in the soil so you can get better yields. I'm pretty sure the comment above was t...",Link
3,Skatchan,worldnews,9,"> It is, of course, possible to rear a limited number of animals in ways that cause less damage. This report, which focuses on just one environmental concern – climate change – has found that well-managed grazing in some contexts can cause carbon to be sequestered in the soil – and at the very least can provide an economic rationale for keeping the carbon in the ground. It is important to ident...",Link
4,ChampagneFloozy,Enough_Sanders_Spam,7,regenerative agriculture that focuses on carbon capture. Check and see if Indigo Ag is public yet....,Link
5,GentleOmnicide,PublicFreakout,5,"Concrete jungle vegans are the worst offenders of that “in your face guilt” while they support mass agriculture monocultures. Mono cultures do so much harm to the environment and kill a ton of animals so that people can buy their veggies and feel safe in their own reality. I have nothing against people being vegan on their own, and applaud anyone that can grow their own food minimizing risks towar...",Link
6,valski1337,LivestreamFail,4,"I don't get your point. Stop farming altogether? ""*Regenerative agriculture is a conservation and rehabilitation approach to food and farming systems.*"" Conservation is literally in the definition of regenerative agriculture....",Link
7,AnonyJustAName,exvegans,3,It struck me how high pressure and performance oriented his life was. He couldn't just enjoy music he had to play like a jazz great. Had to be an exhausting way to live and the black and white thinking and focus on things none of us have control over can so easily lead to dispare. Had he been involved in something like 4-H or regenerative agriculture might have given a sense of being needed by tan...,Link
8,riselikethemoon,Permaculture,3,"Define some goals first. Are you restoring it to productive farm land? Check out regenerative agriculture. Restoring it to whatever the native ecosystem was? Check out restoration projects in your area. This depends a lot on location, but local universities and conservation organizations are a good place to start looking for resources....",Link
9,64557175,nextfuckinglevel,2,"Dope! This year was my first year making OHN and FAA along with the usuals. Hot tip - check out this LAB recipe. You'll never go back to milk! https://youtu.be/XYyOBSMDA6o Good stuff, my guy, regenerative agriculture is going to save the world. It's our only hope! Thanks for your posts, it's refreshing and inspiring....",Link


## Sentiment over time

In [44]:
# load packages 
import textblob

In [45]:
# get the data of interest with function
data = get_pushshift_data(data_type="comment",
                          after="2d",
                          size=1000,
                          sort_type="score",
                          sort="desc",
                          subreddit="worldnews").get("data")

# define columns of interest
columns_of_interest = ["author", "body", "created_utc", "score", "permalink"]

# transform the response into a dataframe with relevant columns
df = pd.DataFrame.from_records(data)[columns_of_interest]

In [46]:
# create a column with sentiment polarity
df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)

# create a column with sentiment subjectivity
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)

# create a column with 'positive' or 'negative' depending on sentiment_polarity
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

# create a column with a text preview that shows the first 50 characters
df["preview"] = df["body"].str[0:50]

# take the created_utc parameter and tranform it into a datetime column
df["date"] = pd.to_datetime(df['created_utc'],unit='s')

In [48]:
#make visual with plotly
px.scatter(df, x="date", # date on the x axis
               y="sentiment_polarity", # sentiment on the y axis
               hover_data=["author", "permalink", "preview"], # data to show on hover
               color_discrete_sequence=["lightseagreen", "indianred"], # colors to use
               color="sentiment", # what should the color depend on?
               size="score", # the more votes, the bigger the circle
               size_max=10, # not too big
               labels={"sentiment_polarity": "Comment positivity", "date": "Date comment was posted"}, # axis names
               title=f"Comment sentiment in r/worldnews for the past 48h", # title of figure
          )

## Summary

Herein sentiment anlaysis has been performed on comments from reddit on the key phrase 'regenerative agriculture'. The same work completed above could be utilized for searches of various relevant keywords.  It would be of use to have a dashboard for auto-updating.  

### Future Tasks:  

1. Find alternate endoint apis
   - Reddit tends to get off in the weeds, so some comments related to regenerative ag are on posts that have nothing to do with the topic.
   - Test and make sure above works given alternate endpoints.  
2. Create sharable link
    - Packages to consider for this: docker, jupyter dashboard (extension), ipywidgets

## Watermark

In [51]:
# use watermark in a notebook with the following call
%load_ext watermark

# %watermark? #<-- watermark documentation

%watermark -a "H.GRYK" -d -t -v -p sys
%watermark -p pandas
%watermark -p textblob
%watermark -p plotly
%watermark -p requests

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
H.GRYK 2021-01-06 16:10:05 

CPython 3.7.7
IPython 7.18.1

sys 3.7.7 (default, May  6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]
pandas 1.0.5
textblob 0.15.3
plotly 4.9.0
requests 2.24.0
