# Sentiment Analysis on Glassman Issue
Greg Glassman made an insensitive comment relating to George Floyd during the BLM movement in June 2020. This notebook is a sentiment analysis on documents mentioning glassman, castro, or hq

1. Prep Data
2. IBM Tone Analyzer
3. Vader Sentiment Analysis

In [2]:
import praw
import pandas as pd
import numpy as np
import pickle

import reddit as red

### 1. Import and Prep Data

In [24]:
#import df_praw data with glassman columns
picklefile_name = '../reddit_pkl_data/df_praw_full_blm.pkl'
with open(picklefile_name, 'rb') as picklefile: 
    df_praw = pickle.load(picklefile)

In [25]:
df_praw.shape

(65790, 15)

In [28]:
#df_praw.drop(columns='content_categories', inplace=True)

In [26]:
mask = (df_praw.glassman > 0) | (df_praw.castro > 0) | (df_praw.hq > 0)
df_blm = df_praw[mask]
df_blm.shape

(1464, 15)

In [27]:
df_blm.loc[268].full_text

'do you agree with crossfits endorsement of zone over paleo i am a trainer and have been to about   dozen boxes all over the world not just the us and have seen a much bigger backing from the affiliates of paleo  i am a trainer at a box and i push paleo over zone for a few reasons i did zone for a year and while i saw results it was almost crippling socially and mentally counting measuring eating at particular times ect  i never really felt better or saw improvements in the gym like i have with paleo paleo is more than just losing weight its been performance  as a trainer i dont like suggesting zone especially to young women who may or may not have had issues with eating disorders i feel like paleo is much more relaxed than zone whereas zone brings back some neurosis about food  do you think hq is out of touch by backing zone so much most of the top athletes who do zone do a paleo zone with   fats and low carb anyways  '

In [28]:
df_orig.loc[1015].full_text

'Do you agree with CrossFits endorsement of Zone over Paleo? I am a trainer and have been to about 2 dozen boxes all over the world (not just the US) and have seen a much bigger backing from the affiliates of paleo.  I am a trainer at a box and I push Paleo over Zone for a few reasons. I did Zone for a year and while i saw results it was almost crippling socially and mentally. Counting, measuring, eating at particular times ect...  I never really felt better or saw improvements in the gym like i have with paleo. Paleo is more than just losing weight its been performance.  As a trainer i dont like suggesting zone, especially to young women who may or may not have had issues with eating disorders. I feel like paleo is much more relaxed than zone, whereas zone brings back some neurosis about food.  Do you think HQ is out of touch by backing zone so much? Most of the top athletes who do zone do a paleo zone with 5x fats and low carb anyways.  '

Import original df to feed into sentiment

In [31]:
# import original df with original wording
picklefile_name = '../reddit_pkl_data/2008_2021feb19_df.pkl'
with open(picklefile_name, 'rb') as picklefile: 
    df_orig = pickle.load(picklefile)

In [34]:
# clean data
df_orig = red.clean_praw_input(df_orig)
df_orig = red.clean_submissions(df_orig)
df_orig.drop(columns=['created', 'content_categories', 'is_meta'], inplace=True)
df_orig.shape

(66948, 9)

Modified text pre-processing

In [35]:
import re

#remove http/https links
links = lambda x: re.sub(r'^https?:\/\/.*[\r\n]*', '', x, flags=re.MULTILINE)
#embedded links
links2 = lambda x: re.sub(r'`&.*link rel=.*”.*;', '', x, flags=re.MULTILINE)
#&amp*
amps = lambda x: re.sub(r'&amp;.*;', '', x, flags=re.MULTILINE)
#remove links in () or []
links3 = lambda x: re.sub(r'[\(\[]https?:\/\/.*[\)\]]', '', x, flags=re.MULTILINE)

#remove slashes
slashes = lambda x: re.sub(r'\/', ' ', x, flags=re.MULTILINE)

df_orig['full_text'] = df_orig.full_text.map(links).map(links2).map(amps).map(links3).map(slashes)

In [36]:
# alphanumeric = lambda x: re.sub('\w*\d\w*', ' ', x)
# punc_lower = lambda x: re.sub('[%s]' % re.escape(string.punctuation), '', x.lower())
new_line = lambda x: x.replace('\n', ' ')
# emojis = lambda x: x.encode('ascii', 'ignore').decode('ascii')

df_orig['full_text'] = df_orig.full_text.map(new_line)

In [15]:
# #import df_orig and df_blem
# picklefile_name = '../reddit_pkl_data/sentiment_df_blm.pkl'
# with open(picklefile_name, 'rb') as picklefile: 
#     df_blm = pickle.load(picklefile)

Filter glassman issue related documents

In [17]:
# Get list of index in blm df
blm_idx_list = list(df_blm['index'])

In [39]:
# Identify original text for blm index
df_filt = df_orig.loc[blm_idx_list]
df_filt.head()

Unnamed: 0,author,id,num_comments,score,url,upvote_ratio,time,full_text,media
1015,Skiingjoo,hbabg,10,3,http://www.reddit.com/r/crossfit/comments/hbab...,,2011-05-14 16:55:55,Do you agree with CrossFits endorsement of Zon...,none
777,kurian,j921n,41,6,http://www.reddit.com/r/crossfit/comments/j921...,,2011-08-04 19:34:29,Crossfit heading in a bad direction. So I've b...,none
667,chiuondis,k40pq,10,4,http://www.reddit.com/r/crossfit/comments/k40p...,,2011-09-04 03:18:16,Does anyone know what's up with Glassman? I kn...,none
523,badmonkey283,l64vn,11,10,http://www.reddit.com/r/crossfit/comments/l64v...,,2011-10-09 18:07:00,"question: Would you use... Hi, I have a quest...",none
504,Breeegz,lcbfb,6,14,http://www.forgingelitesarcasm.com/2011/10/int...,,2011-10-14 19:14:45,Drywall Crossfit Trolls HQ,


In [40]:
df_filt.shape

(1464, 9)

### 2. IBM Tone Analyzer

In [48]:
import json
from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

In [49]:
#pip install ibm_watson

In [50]:
import service_creds as creds

In [51]:
authenticator = IAMAuthenticator(creds.watson_tone_service_api_key)
tone_analyzer = ToneAnalyzerV3(authenticator=authenticator, 
                               version='2017-09-21')
                               
tone_analyzer.set_service_url(creds.watson_tone_service_url)

tone_analyzer.set_disable_ssl_verification(True)

#### Try out single text sample

In [279]:
text = df_filt.full_text.iloc[0]
text

'Do you agree with CrossFits endorsement of Zone over Paleo? I am a trainer and have been to about 2 dozen boxes all over the world (not just the US) and have seen a much bigger backing from the affiliates of paleo.  I am a trainer at a box and I push Paleo over Zone for a few reasons. I did Zone for a year and while i saw results it was almost crippling socially and mentally. Counting, measuring, eating at particular times ect...  I never really felt better or saw improvements in the gym like i have with paleo. Paleo is more than just losing weight its been performance.  As a trainer i dont like suggesting zone, especially to young women who may or may not have had issues with eating disorders. I feel like paleo is much more relaxed than zone, whereas zone brings back some neurosis about food.  Do you think HQ is out of touch by backing zone so much? Most of the top athletes who do zone do a paleo zone with 5x fats and low carb anyways.  '

In [280]:
#pull back as json
tone_analysis = tone_analyzer.tone({'text': text}, content_type='application/json').get_result()
print(json.dumps(tone_analysis, indent=2))

{
  "document_tone": {
    "tones": [
      {
        "score": 0.552201,
        "tone_id": "joy",
        "tone_name": "Joy"
      },
      {
        "score": 0.916397,
        "tone_id": "tentative",
        "tone_name": "Tentative"
      }
    ]
  },
  "sentences_tone": [
    {
      "sentence_id": 0,
      "text": "Do you agree with CrossFits endorsement of Zone over Paleo?",
      "tones": []
    },
    {
      "sentence_id": 1,
      "text": "I am a trainer and have been to about 2 dozen boxes all over the world (not just the US) and have seen a much bigger backing from the affiliates of paleo.",
      "tones": []
    },
    {
      "sentence_id": 2,
      "text": "I am a trainer at a box and I push Paleo over Zone for a few reasons.",
      "tones": [
        {
          "score": 0.538448,
          "tone_id": "analytical",
          "tone_name": "Analytical"
        }
      ]
    },
    {
      "sentence_id": 3,
      "text": "I did Zone for a year and while i saw results it wa

In [284]:
tone_analyzer.tone(text, content_type='text/plain;charset=utf-8').get_result()

{'document_tone': {'tones': [{'score': 0.552201,
    'tone_id': 'joy',
    'tone_name': 'Joy'},
   {'score': 0.916397, 'tone_id': 'tentative', 'tone_name': 'Tentative'}]},
 'sentences_tone': [{'sentence_id': 0,
   'text': 'Do you agree with CrossFits endorsement of Zone over Paleo?',
   'tones': []},
  {'sentence_id': 1,
   'text': 'I am a trainer and have been to about 2 dozen boxes all over the world (not just the US) and have seen a much bigger backing from the affiliates of paleo.',
   'tones': []},
  {'sentence_id': 2,
   'text': 'I am a trainer at a box and I push Paleo over Zone for a few reasons.',
   'tones': [{'score': 0.538448,
     'tone_id': 'analytical',
     'tone_name': 'Analytical'}]},
  {'sentence_id': 3,
   'text': 'I did Zone for a year and while i saw results it was almost crippling socially and mentally.',
   'tones': [{'score': 0.5538,
     'tone_id': 'tentative',
     'tone_name': 'Tentative'}]},
  {'sentence_id': 4,
   'text': 'Counting, measuring, eating at pa

In [281]:
# pull back as dictionary
tones = tone_analyzer.tone(text, content_type='text/plain;charset=utf-8').get_result()

In [240]:
test = pd.DataFrame(tones['document_tone']['tones'])
test[['add', 'this']] = pd.DataFrame([['row', 'info']], index=test.index)
test

Unnamed: 0,score,tone_id,tone_name,add,this
0,0.664063,sadness,Sadness,row,info
1,0.883404,analytical,Analytical,row,info


In [207]:
df_filt['vote_score'] = df_filt['score']
df_filt.drop(columns='score', inplace=True)

In [208]:
post_tones = pd.DataFrame()

for idx, row in df_filt.iterrows():
    tones = tone_analyzer.tone(row['full_text'], content_type='text/plain;charset=utf-8').get_result()
    tones_df = pd.io.json.json_normalize(tones['document_tone'], 'tones')
    
    tones_df[df_filt.columns.tolist()] = pd.DataFrame([row], index=tones_df.index)
    post_tones = post_tones.append(tones_df)

In [285]:
post_tones

Unnamed: 0,score,tone_id,tone_name,author,id,num_comments,url,upvote_ratio,time,full_text,media,vote_score
0,0.552201,joy,Joy,Skiingjoo,hbabg,10,http://www.reddit.com/r/crossfit/comments/hbab...,,2011-05-14 16:55:55,Do you agree with CrossFits endorsement of Zon...,none,3
1,0.916397,tentative,Tentative,Skiingjoo,hbabg,10,http://www.reddit.com/r/crossfit/comments/hbab...,,2011-05-14 16:55:55,Do you agree with CrossFits endorsement of Zon...,none,3
0,0.570424,sadness,Sadness,kurian,j921n,41,http://www.reddit.com/r/crossfit/comments/j921...,,2011-08-04 19:34:29,Crossfit heading in a bad direction. So I've b...,none,6
1,0.509031,analytical,Analytical,kurian,j921n,41,http://www.reddit.com/r/crossfit/comments/j921...,,2011-08-04 19:34:29,Crossfit heading in a bad direction. So I've b...,none,6
0,0.844556,analytical,Analytical,chiuondis,k40pq,10,http://www.reddit.com/r/crossfit/comments/k40p...,,2011-09-04 03:18:16,Does anyone know what's up with Glassman? I kn...,none,4
...,...,...,...,...,...,...,...,...,...,...,...,...
0,0.617335,joy,Joy,angrytongan,laj256,10,https://www.reddit.com/r/crossfit/comments/laj...,1.0,2021-02-02 00:41:33,[wodscrape] Shutting it down [wodscrape.com.au...,none,30
1,0.671669,fear,Fear,angrytongan,laj256,10,https://www.reddit.com/r/crossfit/comments/laj...,1.0,2021-02-02 00:41:33,[wodscrape] Shutting it down [wodscrape.com.au...,none,30
2,0.904259,tentative,Tentative,angrytongan,laj256,10,https://www.reddit.com/r/crossfit/comments/laj...,1.0,2021-02-02 00:41:33,[wodscrape] Shutting it down [wodscrape.com.au...,none,30
0,0.664063,sadness,Sadness,Thegooseontheisland,liez7q,26,https://www.rocketcommunityfitness.com/post/wa...,1.0,2021-02-12 16:47:52,"Wait, Can They Save CrossFit? Updated thoughts...",,1


In [286]:
picklefile_name = 'post_tones_df.pkl'
with open(picklefile_name, 'wb') as picklefile:
    pickle.dump(post_tones, picklefile)

In [244]:
#export data to CSV for viz in tableau
post_tones.to_csv("cf_submission_tones.csv")

Emotions rose in June 2020 during the glassman comment. Additional data needed to investigate deeper.

Future work would include a broader range of words and comments/replies documents

![image](./images/sentiment_analysis.png)

### 3. Vader Sentiment

In [41]:
# pip install vaderSentiment

In [42]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [120]:
text_list = df_filt.full_text
text_list

1015     Do you agree with CrossFits endorsement of Zon...
777      Crossfit heading in a bad direction. So I've b...
667      Does anyone know what's up with Glassman? I kn...
523      question: Would you use...  Hi, I have a quest...
504                            Drywall Crossfit Trolls HQ 
                               ...                        
63398                        Dave Castro stirring the pot 
62962    Castro just confirmed only a single 50 35# dum...
62930    Meeting Between Masters Fitness Collective, Ca...
62886    [wodscrape] Shutting it down [wodscrape.com.au...
62740    Wait, Can They Save CrossFit? Updated thoughts...
Name: full_text, Length: 1464, dtype: object

In [43]:
def sentiment_scores(sentence):
    score = analyser.polarity_scores(sentence)
    return score

In [None]:
scores_list = []
for text in text_list:
    scores_list.append(sentiment_scores(text))

In [None]:
df_scores = pd.DataFrame(scores_list)

In [125]:
mask = (df_scores.pos > 0.5)
df_scores[mask]

Unnamed: 0,neg,neu,pos,compound
18,0.0,0.435,0.565,0.3818
136,0.0,0.448,0.552,0.5719
215,0.0,0.462,0.538,0.8885
265,0.0,0.37,0.63,0.5267
284,0.0,0.408,0.592,0.4404
872,0.0,0.341,0.659,0.8286
1005,0.0,0.494,0.506,0.6249
1275,0.0,0.455,0.545,0.0516
