### Sentiment Analysis

Code adapted from lesson 5.03 - Natural Language Processing (Author: Matt Brems)

**Library Imports**

In [86]:
import pandas as pd
import numpy as np
import regex as re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

**Read in Data**

In [2]:
df = pd.read_csv('../data/merged/tweets_geom_unclean.csv')

In [3]:
df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Event,Stage,Query Date,Query Term,Id,Username,Text,Date,Hashtags,Location,_wkt_geom,id,xcoord,ycoord
0,0,0,ne_bomb_cyclone,before,2019-10-15,power outage,1183894397492043777,meekers999,Let's lose Govenor Gruesome in Cali please. He...,2019-10-14 23:56:34+00:00,,,Point (-77.58788778888725801 38.32415544020682...,17892,-77.587888,38.324155
1,1,1,ne_bomb_cyclone,before,2019-10-15,power outage,1183894362725289984,sharethiscrime,"Last time I checked, he's still a floofy baby ...",2019-10-14 23:56:26+00:00,,,Point (-76.48550640128354416 38.47508461690389...,17891,-76.485506,38.475085
2,2,2,ne_bomb_cyclone,before,2019-10-15,power outage,1183894014573105152,News_1jl4,California’s power outage means problems for e...,2019-10-14 23:55:03+00:00,,,Point (-77.41021456388132549 38.30670835924499...,17894,-77.410215,38.306708
3,3,3,ne_bomb_cyclone,before,2019-10-15,power outage,1183893791415123968,IndeCardio,Newsome is vanguard Globalism in action! What ...,2019-10-14 23:54:10+00:00,,,Point (-77.79985279265937947 38.45514536776436...,17893,-77.799853,38.455145
4,4,4,ne_bomb_cyclone,before,2019-10-15,power outage,1183893732652810240,BaddictsPH,Super Typhoon Faxai hit Chiba prefecture in Se...,2019-10-14 23:53:56+00:00,,,Point (-77.83618103291209422 38.62109214538766...,17888,-77.836181,38.621092


**Define Stopwords, Positive Words, and Negative Words**

In [139]:
positive_words = ['delight', 'good', 'great', 'awesome', 'tremendous', 'fabulous', 'amazing', 'stellar']
negative_words = ['garbage', 'sad', 'trash', 'ugly', 'bad', 'disgusting', 'terrible', 'gross']

In [146]:
# Build function to clean each tweet
def sentiment_tweet(tweet):
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(tweet.lower())
    porter_stemmer = PorterStemmer()
    stems_stops = [porter_stemmer.stem(i) for i in tokens]
    stems_only = [w for w in stems_stops if w not in stopwords.words('english')]
    
    
    positive_stems = [porter_stemmer.stem(i) for i in positive_words]
    negative_stems = [porter_stemmer.stem(i) for i in negative_words]
    
    pos_count = sum([1 for i in stems_only if i in positive_stems])
    neg_count = sum([1 for i in stems_only if i in negative_stems])
    
    return round((pos_count - neg_count) / len(tokens), 2)


**Test Sample Subset of Tweets**

In [147]:
test2 = df[500:505]

In [148]:
test2['SA'] = test2['Text'].apply(sentiment_tweet)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [149]:
test2.T

Unnamed: 0,500,501,502,503,504
Unnamed: 0,500,501,502,503,504
Unnamed: 0.1,500,501,502,503,504
Event,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone
Stage,before,before,before,before,before
Query Date,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15
Query Term,power outage,power outage,power outage,power outage,power outage
Id,1183792259847872512,1183791916061790214,1183791869408489472,1183791864966610944,1183791723400646665
Username,KatyVaux36,colleenve,hubblyguy,MariYUH00,juango_snijack
Text,California’s massive power outages remind us a...,@CityPowerJhb Power outage Halfway Gardens Mid...,@CityPowerJhb power outage in Midrand Halfway ...,Work is about to be hella backed up because of...,"2/5 we don't have food,we don't have medicine ..."
Date,2019-10-14 17:10:43+00:00,2019-10-14 17:09:21+00:00,2019-10-14 17:09:10+00:00,2019-10-14 17:09:09+00:00,2019-10-14 17:08:35+00:00


**Run Tweets Through Functions**

In [150]:
df['Sentiment'] = df['Text'].apply(simple_sentiment)

In [None]:
df['Sentim']