# Sentiment Analysis
The goal of this notebook is to scrap an API endpoint for regenerative ag data and present it as an interactive plot.  The reddit API documentation for the enpoint used herein is available at https://www.reddit.com/dev/api/

## Table of Contents
1. Get data from reddit regarding regenerative agriculture
2. Analyze data with plotly 
3. Top Comments over the past week
4. Sentiment over time
5. Summary

## Get data from reddit regenerative agriculture (or any other) keyword


In [49]:
# load packages
import requests

In [50]:
# create function to get info 
def get_pushshift_data(data_type, **kwargs):
    """ 
    Gets data from the pushshift api.
    data_type can be 'comment' or 'submission'
    other args are interpreted as payload.
    Read more: https://github.com/pushshift/api
    """
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

In [51]:
# use example
'''
get_pushshift_data(data_type="comment",           # give me comments
                   q="organic",                   # that mention 'organic'
                   after="1y",                    # in the last year
                   size=1000,                     # maximum 1000 comments
                   sort_type="score",             # sort them by score
                   sort="desc"                    # sort descending
                   )    
'''

'\nget_pushshift_data(data_type="comment",           # give me comments\n                   q="organic",                   # that mention \'organic\'\n                   after="1y",                    # in the last year\n                   size=1000,                     # maximum 1000 comments\n                   sort_type="score",             # sort them by score\n                   sort="desc"                    # sort descending\n                   )    \n'

## Analyze data with plotly

In [52]:
# load packages for this step
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt

In [53]:
data = get_pushshift_data(data_type="comment",
                          q="regenerative agriculture",
                          after="1y",
                          size=1000,
                          #aggs="subreddit"
                         ).get("data")
# data

In [54]:
# type(data)

In [55]:
# change list to pandas df
df = pd.DataFrame(data)
# view table
df.head(10)

Unnamed: 0,all_awardings,associated_award,author,author_flair_background_color,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,...,parent_id,permalink,retrieved_on,score,send_replies,steward_reports,stickied,subreddit,subreddit_id,total_awards_received
0,[],,Torch_fetish,,,[],,,,text,...,t1_fe2x44j,/r/carnivore/comments/enan3s/according_to_a_ne...,1578864356,1,True,[],False,carnivore,t5_2rvme,0
1,[],,WikiTextBot,,,[],,,,text,...,t1_fe4p61x,/r/vegetarianketo/comments/enes50/i_wish_eatin...,1578870923,1,True,[],False,vegetarianketo,t5_2st2u,0
2,[],,TRexDin0,,,[],,,,text,...,t3_ennc40,/r/collapse/comments/ennc40/convert_half_of_uk...,1578890619,1,True,[],False,collapse,t5_2qhw9,0
3,[],,FXOjafar,,,[],,,,text,...,t1_fe6j2xv,/r/technology/comments/engr5i/golden_rice_appr...,1578903761,0,True,[],False,technology,t5_2qh16,0
4,[],,Helkafen1,,,[],,,,text,...,t1_fe7rzdq,/r/environment/comments/enw1sx/expert_says_aus...,1578924636,1,True,[],False,environment,t5_2qh1n,0
5,[],,nb4revolution,,,[],,,,text,...,t3_eo2lcn,/r/DebateAnarchism/comments/eo2lcn/can_resourc...,1578936515,4,True,[],False,DebateAnarchism,t5_2vkaw,0
6,[],,anicca444,,,[],,,,text,...,t1_fe8p93d,/r/educationalgifs/comments/eo1n6d/how_margari...,1578938733,1,True,[],False,educationalgifs,t5_2w708,0
7,[],,YT_kevfactor,,,[],,,,text,...,t3_eo21cx,/r/Futurology/comments/eo21cx/quorn_the_bigges...,1578949721,1,True,[],False,Futurology,t5_2t7no,0
8,[],,consteppedoutside12,,,[],,,,text,...,t1_feag0pp,/r/DebateAVegan/comments/eo9ufw/vegan_opinion_...,1578969618,1,True,[],False,DebateAVegan,t5_2sa7z,0
9,[],,consteppedoutside12,,,[],,,,text,...,t1_feb6jk9,/r/DebateAVegan/comments/eo9ufw/vegan_opinion_...,1578972871,1,True,[],False,DebateAVegan,t5_2sa7z,0


In [56]:
# get col names to make sure they are called correctly
# df.columns

In [57]:
# group by subreddit and count times subreddit appears
df['count'] = 1
min_df = df[['subreddit', 'count']]
grouped_df = min_df.groupby(['subreddit']).sum()
grouped_df = grouped_df.sort_values(by=['count'], ascending=False)[0:10]

In [58]:
grouped_df = grouped_df.reset_index()
grouped_df.head()
# grouped_df.columns

Unnamed: 0,subreddit,count
0,science,8
1,environment,7
2,collapse,4
3,DebateAVegan,4
4,Permaculture,4


In [65]:
# make plot for count per subreddit
# title=f"Subreddits with 'Regenerative Agriculture' activity over past year",
# labels={"doc_count": "# comments","key": "Subreddits"}
# colors -- use green

## Top Comments Over the past week

In [60]:
# get top comment data with function
data = get_pushshift_data(data_type="comment", 
                          q="regenerative agriculture", 
                          after="7d", 
                          size=10, 
                          sort_type="score", 
                          sort="desc").get("data")

# put columns of interest in df
df = pd.DataFrame.from_records(data)[["author", "subreddit", "score", "body", "permalink"]]

# limit body of the comment
df['body'] = df['body'].str[0:400] + "..."

# append the string to all the permalink entries so that there's a link to the comment
df['permalink'] = "https://reddit.com" + df['permalink'].astype(str)

# function for making clickable links in df table
def make_clickable(val):
    return '<a href="{}">Link</a>'.format(val,val)

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})

Unnamed: 0,author,subreddit,score,body,permalink
0,valski1337,LivestreamFail,26,Except [regenerative agriculture](https://en.wikipedia.org/wiki/Regenerative_agriculture) exists and is getting more popular....,Link
1,88questioner,Futurology,22,"My comment is in response to the idea that they aren’t made out of a subsidized product, because they are. Large scale single product farming operations are heavily subsidized, no matter what the product, and processed foods like Beyond Burger falls under this umbrella. People can eat what they want but I find it ironic that folks are touting this particular product because it’s somehow less “subs...",Link
2,Deep-Duck,worldnews,12,Community Supported Agriculture is great for this and I highly recommend people join a CSA if they have access to one near them. With a CSA you pay for your entire seasons worth of vegetables up front to a local farm. Then throughout the year you get a weekly share of the farms harvest. Most (all in my experience) CSA's put a focus on regenerative farming focusing on farming techniques that rene...,Link
3,vocalghost,LivestreamFail,11,"How in the world did you get ""stop farming altogether"" from my comment? I'm not against regen ag at all, I think its amazing, but the fact of the matter is that farming requires land that they have to make fit to the crops that the market wants. They're referring to conservation as in the conserving the nutrients in the soil so you can get better yields. I'm pretty sure the comment above was t...",Link
4,Skatchan,worldnews,9,"> It is, of course, possible to rear a limited number of animals in ways that cause less damage. This report, which focuses on just one environmental concern – climate change – has found that well-managed grazing in some contexts can cause carbon to be sequestered in the soil – and at the very least can provide an economic rationale for keeping the carbon in the ground. It is important to ident...",Link
5,ChampagneFloozy,Enough_Sanders_Spam,7,regenerative agriculture that focuses on carbon capture. Check and see if Indigo Ag is public yet....,Link
6,Better_Cranberry,NewOrleans,6,"I want to give a shoutout to Laughing Buddha Nursery and Farm Store in Metairie! They even have hubs where you can pick up your order in Broadmoor and MidCity. Locally sourced, regenerative agriculture, pasture raised meats, local produce, and specialty items available every week. I cannot recommend them highly enough. Kate co-owns and runs the store and is absolutely amazing. Give them your coin...",Link
7,GentleOmnicide,PublicFreakout,5,"Concrete jungle vegans are the worst offenders of that “in your face guilt” while they support mass agriculture monocultures. Mono cultures do so much harm to the environment and kill a ton of animals so that people can buy their veggies and feel safe in their own reality. I have nothing against people being vegan on their own, and applaud anyone that can grow their own food minimizing risks towar...",Link
8,AnonyJustAName,AntiVegan,5,Watch Sacred Cow re: regenerative agriculture....,Link
9,jures,specializedtools,5,"The action being depicted in this video is leading to massive soil erosion and is a contributing factor to the fact that our soil not only doesn’t produce nearly as nutrient rich food but also cannot function as a carbon sink as effectively as it could. For example, in North America, we used to have some of the best soil on the planet due in part to massive amounts of grazing and grasslands which ...",Link


## Sentiment over time

In [61]:
# load packages 
import textblob

In [62]:
# get the data of interest with function
data = get_pushshift_data(data_type="comment",
                          after="2d",
                          size=1000,
                          sort_type="score",
                          sort="desc",
                          subreddit="worldnews").get("data")

# define columns of interest
columns_of_interest = ["author", "body", "created_utc", "score", "permalink"]

# transform the response into a dataframe with relevant columns
df = pd.DataFrame.from_records(data)[columns_of_interest]

In [63]:
# create a column with sentiment polarity
df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)

# create a column with sentiment subjectivity
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)

# create a column with 'positive' or 'negative' depending on sentiment_polarity
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

# create a column with a text preview that shows the first 50 characters
df["preview"] = df["body"].str[0:50]

# take the created_utc parameter and tranform it into a datetime column
df["date"] = pd.to_datetime(df['created_utc'],unit='s')

In [66]:
#make visual with date and sentiment polarity
# scale size of point by score
# color point by sentiment
# labels={"sentiment_polarity": "Comment positivity", "date": "Date comment was posted"}, # axis names
# title=f"Comment sentiment in r/worldnews for the past 48h", # title of figure
          

## Summary

Herein sentiment anlaysis has been performed on comments from reddit on the key phrase 'regenerative agriculture'. The same work completed above could be utilized for searches of various relevant keywords.  It would be of use to have a dashboard for auto-updating.  

### Future Tasks:  

1. Find alternate endoint apis
   - Reddit tends to get off in the weeds, so some comments related to regenerative ag are on posts that have nothing to do with the topic.
   - Test and adapt to ensure similar results given alternate endpoints.  
2. Create sharable link
    - Packages to consider for this: docker, jupyter dashboard (extension), ipywidgets

## Watermark

In [51]:
# use watermark in a notebook with the following call
%load_ext watermark

# %watermark? #<-- watermark documentation

%watermark -a "H.GRYK" -d -t -v -p sys
%watermark -p pandas
%watermark -p textblob
%watermark -p plotly
%watermark -p requests

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
H.GRYK 2021-01-06 16:10:05 

CPython 3.7.7
IPython 7.18.1

sys 3.7.7 (default, May  6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]
pandas 1.0.5
textblob 0.15.3
plotly 4.9.0
requests 2.24.0
