### Sentiment Analysis

Code adapted from lesson 5.03 - Natural Language Processing (Author: Matt Brems)

**Library Imports**

In [1]:
import pandas as pd
import numpy as np
import regex as re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

**Read in Data**

In [2]:
df = pd.read_csv('../data/merged/tweets_geom_unclean.csv')

In [3]:
df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Event,Stage,Query Date,Query Term,Id,Username,Text,Date,Hashtags,Location,_wkt_geom,id,xcoord,ycoord
0,0,0,ne_bomb_cyclone,before,2019-10-15,power outage,1183894397492043777,meekers999,Let's lose Govenor Gruesome in Cali please. He...,2019-10-14 23:56:34+00:00,,,Point (-77.58788778888725801 38.32415544020682...,17892,-77.587888,38.324155
1,1,1,ne_bomb_cyclone,before,2019-10-15,power outage,1183894362725289984,sharethiscrime,"Last time I checked, he's still a floofy baby ...",2019-10-14 23:56:26+00:00,,,Point (-76.48550640128354416 38.47508461690389...,17891,-76.485506,38.475085
2,2,2,ne_bomb_cyclone,before,2019-10-15,power outage,1183894014573105152,News_1jl4,California’s power outage means problems for e...,2019-10-14 23:55:03+00:00,,,Point (-77.41021456388132549 38.30670835924499...,17894,-77.410215,38.306708
3,3,3,ne_bomb_cyclone,before,2019-10-15,power outage,1183893791415123968,IndeCardio,Newsome is vanguard Globalism in action! What ...,2019-10-14 23:54:10+00:00,,,Point (-77.79985279265937947 38.45514536776436...,17893,-77.799853,38.455145
4,4,4,ne_bomb_cyclone,before,2019-10-15,power outage,1183893732652810240,BaddictsPH,Super Typhoon Faxai hit Chiba prefecture in Se...,2019-10-14 23:53:56+00:00,,,Point (-77.83618103291209422 38.62109214538766...,17888,-77.836181,38.621092


**Define Stopwords, Positive Words, and Negative Words**

In [4]:
positive_words = ['delight', 'good', 'great', 'awesome', 'tremendous', 'fabulous', 'amazing', 'stellar']
negative_words = ['garbage', 'sad', 'trash', 'ugly', 'bad', 'disgusting', 'terrible', 'gross']

In [5]:
# Build function to clean each tweet
def sentiment_tweet(tweet):
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(tweet.lower())
    porter_stemmer = PorterStemmer()
    stems_stops = [porter_stemmer.stem(i) for i in tokens]
    stems_only = [w for w in stems_stops if w not in stopwords.words('english')]
    
    
    positive_stems = [porter_stemmer.stem(i) for i in positive_words]
    negative_stems = [porter_stemmer.stem(i) for i in negative_words]
    
    pos_count = sum([1 for i in stems_only if i in positive_stems])
    neg_count = sum([1 for i in stems_only if i in negative_stems])
    
    return round((pos_count - neg_count) / len(tokens), 2)


**Test Sample Subset of Tweets**

In [6]:
test2 = df[500:505]

In [7]:
test2['SA'] = test2['Text'].apply(sentiment_tweet)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [8]:
test2.T

Unnamed: 0,500,501,502,503,504
Unnamed: 0,500,501,502,503,504
Unnamed: 0.1,500,501,502,503,504
Event,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone
Stage,before,before,before,before,before
Query Date,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15
Query Term,power outage,power outage,power outage,power outage,power outage
Id,1183792259847872512,1183791916061790214,1183791869408489472,1183791864966610944,1183791723400646665
Username,KatyVaux36,colleenve,hubblyguy,MariYUH00,juango_snijack
Text,California’s massive power outages remind us a...,@CityPowerJhb Power outage Halfway Gardens Mid...,@CityPowerJhb power outage in Midrand Halfway ...,Work is about to be hella backed up because of...,"2/5 we don't have food,we don't have medicine ..."
Date,2019-10-14 17:10:43+00:00,2019-10-14 17:09:21+00:00,2019-10-14 17:09:10+00:00,2019-10-14 17:09:09+00:00,2019-10-14 17:08:35+00:00


**Run Tweets Through Functions**

In [12]:
df['Sentiment'] = df['Text'].apply(sentiment_tweet)

In [13]:
df['Sentiment']

0        0.0
1        0.0
2        0.0
3        0.0
4        0.0
        ... 
17995    0.0
17996    0.0
17997    0.0
17998    0.0
17999    0.0
Name: Sentiment, Length: 18000, dtype: float64

In [14]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,17990,17991,17992,17993,17994,17995,17996,17997,17998,17999
Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,17990,17991,17992,17993,17994,17995,17996,17997,17998,17999
Unnamed: 0.1,0,1,2,3,4,5,6,7,8,9,...,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999
Event,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,ne_bomb_cyclone,...,july-end,july-end,july-end,july-end,july-end,july-end,july-end,july-end,july-end,july-end
Stage,before,before,before,before,before,before,before,before,before,before,...,before,before,before,before,before,before,before,before,before,before
Query Date,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,2019-10-15,...,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30,2019-06-30
Query Term,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage,...,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage,power outage
Id,1183894397492043777,1183894362725289984,1183894014573105152,1183893791415123968,1183893732652810240,1183893505606897665,1183893437872951296,1183893323070803968,1183893313142833154,1183893125305196546,...,1144416345200701440,1144416258269569024,1144415828886065152,1144415203393712128,1144415058815938560,1144415045637402624,1144414002644230144,1144413949745455104,1144413604881375232,1144413206481461248
Username,meekers999,sharethiscrime,News_1jl4,IndeCardio,BaddictsPH,SeattleNewsHeds,StrataNetNZ,MathLisa,imtrash195,TankDestroyer,...,SmileMMP,EricFlaris,Ryanwiz,WaynesboroY,waewhimz,RisingDarkstar,qt_alexandria,SayNoToArsenal,CarGenerator1,XcelEnergyCO
Text,Let's lose Govenor Gruesome in Cali please. He...,"Last time I checked, he's still a floofy baby ...",California’s power outage means problems for e...,Newsome is vanguard Globalism in action! What ...,Super Typhoon Faxai hit Chiba prefecture in Se...,Seattle (WA) Times-Business: California regula...,We are aware of two major power outages affect...,@ConEdison there is a power outage in my block...,@NexpoYT bit creepy but I don’t know for conte...,"N. California PG&E Power Outage Day 2 ""Calpoca...",...,I really need We Energies to get this power ou...,Power outage in the area going on 2 hrs now. I...,The Power Outage #familyportrait #sunroom #the...,"We will OPEN Friday, June 28th as usual with o...",every traffic light out of town wasn't working...,My boys are so trained.... There has been a wi...,The way these power outages set up it makes me...,There's a power outage? Welp.,Buy a CarGenerator for just a fraction of what...,Please direct message us your full service add...
Date,2019-10-14 23:56:34+00:00,2019-10-14 23:56:26+00:00,2019-10-14 23:55:03+00:00,2019-10-14 23:54:10+00:00,2019-10-14 23:53:56+00:00,2019-10-14 23:53:02+00:00,2019-10-14 23:52:45+00:00,2019-10-14 23:52:18+00:00,2019-10-14 23:52:16+00:00,2019-10-14 23:51:31+00:00,...,2019-06-28 01:24:53+00:00,2019-06-28 01:24:32+00:00,2019-06-28 01:22:50+00:00,2019-06-28 01:20:21+00:00,2019-06-28 01:19:46+00:00,2019-06-28 01:19:43+00:00,2019-06-28 01:15:35+00:00,2019-06-28 01:15:22+00:00,2019-06-28 01:14:00+00:00,2019-06-28 01:12:25+00:00


In [16]:
df['Sentiment'].value_counts()

 0.00    17126
 0.02      192
 0.03      173
 0.05       80
-0.02       77
 0.04       75
-0.03       47
 0.06       37
-0.04       31
 0.08       27
 0.07       23
-0.06       19
-0.05       17
-0.07       12
 0.10        9
 0.09        8
 0.17        7
 0.12        7
 0.11        6
-0.08        5
 0.14        4
-0.09        4
 0.20        4
 0.25        4
-0.14        2
 0.33        2
-0.12        1
-0.10        1
Name: Sentiment, dtype: int64

In [17]:
# Export to data folder:

df.to_csv('../data/merged/tweets_sa_unclean.csv')