# Creating a Dictionary-based Sentiment Analyzer

In [1]:
import pandas as pd
import nltk
from IPython.display import display
pd.set_option('display.max_columns', None)

### Step 1: Loading in the small_corpus .csv file created in the "creating_dataset" milestone.

In [2]:
reviews = pd.read_csv("../data/small_corpus.csv")

In [3]:
reviews.head()

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,1.0,True,"11 30, 2015",A3AC92K59QLYR8,B00503E8S2,ben,Game freezes over and over its unplayable,it just doesn't work,1448841600,,{'Format:': ' Video Game'},
1,1.0,False,"05 19, 2012",A334LHR8DWARY8,B00178630A,Xenocide,I have no problem with needing to be online to...,The only real way to show Blizzard our feeling...,1337385600,23.0,{'Format:': ' Computer Game'},
2,1.0,True,"10 19, 2014",A28982ODE7ZGVP,B001AWIP7M,Eric Frykberg,NOT GOOD,One Star,1413676800,,{'Format:': ' Video Game'},
3,1.0,True,"09 6, 2015",A19E85RLQCAMI1,B00NASF4MS,Joe,Really not worth the money to buy this game on...,Really not worth the money to buy this game on...,1441497600,2.0,{'Format:': ' Video Game'},
4,1.0,False,"05 28, 2008",AEMQKS13WC4D2,B00140P9BA,Craig,They need to eliminate the Securom. I purchase...,Securom can ruin a great game,1211932800,55.0,{'Format:': ' DVD-ROM'},


### Step 2: Tokenizing the sentences and words of the reviews
Here, We're going to test different versions of word tokenizer on reviews. We'll then decide which tokenizer might be better to use.

### Treebank Word Tokenizer

In [4]:
from nltk.tokenize import TreebankWordTokenizer

In [5]:
tb_tokenizer = TreebankWordTokenizer()

In [6]:
reviews["rev_text_lower"] = reviews['reviewText'].apply(lambda rev: str(rev).lower())

In [7]:
reviews[['reviewText','rev_text_lower']].sample(2)

Unnamed: 0,reviewText,rev_text_lower
1413,We bought this game with several others at Chr...,we bought this game with several others at chr...
1734,Old style graphics old style interface. I gues...,old style graphics old style interface. i gues...


In [8]:
reviews["tb_tokens"] = reviews['rev_text_lower'].apply(lambda rev: tb_tokenizer.tokenize(str(rev)))

In [9]:
pd.set_option('display.max_colwidth', None)

In [10]:
reviews[['reviewText','tb_tokens']].sample(3)

Unnamed: 0,reviewText,tb_tokens
2879,"Seems like everyone else does...\nI played the hell out of the first, and I jumped at the chance to grab this one. It includes the first city, and adds two more. And the most notable change from the first is it adds a 'jump' button! You can jump on top of buildings, over cars, everything!\nThis is a very neat addition. Ignore the whiners who complain about this, that, and the other. This is a fun no-brainer game.\nDidn't one reviewer complaian about not being able to pick up and drop off going full speed? What was that all about? Games do have rules, and that is one of the few this game has...","[seems, like, everyone, else, does, ..., i, played, the, hell, out, of, the, first, ,, and, i, jumped, at, the, chance, to, grab, this, one., it, includes, the, first, city, ,, and, adds, two, more., and, the, most, notable, change, from, the, first, is, it, adds, a, 'jump, ', button, !, you, can, jump, on, top, of, buildings, ,, over, cars, ,, everything, !, this, is, a, very, neat, addition., ignore, the, whiners, who, complain, about, this, ,, that, ,, and, the, other., this, is, a, fun, no-brainer, game., did, n't, one, reviewer, complaian, about, not, being, able, to, pick, ...]"
3060,"Essential for getting started with our new Nintendo DS Lite! Carrying case seems durable, zips easily . You cant go wrong with this product you couldn't buy the items individually at the price","[essential, for, getting, started, with, our, new, nintendo, ds, lite, !, carrying, case, seems, durable, ,, zips, easily, ., you, cant, go, wrong, with, this, product, you, could, n't, buy, the, items, individually, at, the, price]"
2479,i mean so far after using it for a bit didn't have any problems. There is a lot of damage to the cover . so far no probs.,"[i, mean, so, far, after, using, it, for, a, bit, did, n't, have, any, problems., there, is, a, lot, of, damage, to, the, cover, ., so, far, no, probs, .]"


### Casual Tokenizer

In [11]:
from nltk.tokenize.casual import casual_tokenize

In [12]:
reviews['casual_tokens'] = reviews['rev_text_lower'].apply(lambda rev: casual_tokenize(str(rev)))

In [13]:
reviews[['reviewText','casual_tokens','tb_tokens']].sample(3)

Unnamed: 0,reviewText,casual_tokens,tb_tokens
766,Characters here have an awful mean demeanor that's not appropriate of good sportsmanship for kids to watch.\n\nVirtual Tennis 3 is easier to play and has video game mode that you can hit a ball against a colorful wall.,"[characters, here, have, an, awful, mean, demeanor, that's, not, appropriate, of, good, sportsmanship, for, kids, to, watch, ., virtual, tennis, 3, is, easier, to, play, and, has, video, game, mode, that, you, can, hit, a, ball, against, a, colorful, wall, .]","[characters, here, have, an, awful, mean, demeanor, that, 's, not, appropriate, of, good, sportsmanship, for, kids, to, watch., virtual, tennis, 3, is, easier, to, play, and, has, video, game, mode, that, you, can, hit, a, ball, against, a, colorful, wall, .]"
1220,"I am writting this review because I love to keep great games and play them years later. The DRM on this product prevents me from doing this. What happens when the game is no longer supported but I haven't used up my 3 activations? Since the DRM wouldn't be able to contact the ""Mothership"", again I am out of luck. No more strolling down memory lane. What happens if the game studio goes out of business, again, the game is no longer usable. What happens if EA goes out of business next week, everyone who is currenly playing this game would be out of luck. Isn't that a nice picture! Then what happens to the DRM still running on your computer. If you are lucky, it doesn't do anything. If you are unlucky, you will be reinstalling your OS and your games again. Oh, btw, you just used up another installation credit for any other games using DRM.\nYou would like to think the gaming industry would have learned something from the music industry fiasco. Treating your paying customers like crooks will only cause your sales to drop. The harder your squeeze, the more customers you will lose.\n\nI have never stolen any games and never will, but this type of treatment won't earn the industy my hard earned money.\n\nTo all gamers out there, please take a stand and not purchase games that support this type of treatment. To those people out there who think I am just a whiner, more power to you. Eventually DRM will progress until it finally impacts your fun and then you will understand how your rights have been taken away, one step at a time.","[i, am, writting, this, review, because, i, love, to, keep, great, games, and, play, them, years, later, ., the, drm, on, this, product, prevents, me, from, doing, this, ., what, happens, when, the, game, is, no, longer, supported, but, i, haven't, used, up, my, 3, activations, ?, since, the, drm, wouldn't, be, able, to, contact, the, "", mothership, "", ,, again, i, am, out, of, luck, ., no, more, strolling, down, memory, lane, ., what, happens, if, the, game, studio, goes, out, of, business, ,, again, ,, the, game, is, no, longer, usable, ., what, happens, if, ea, goes, out, ...]","[i, am, writting, this, review, because, i, love, to, keep, great, games, and, play, them, years, later., the, drm, on, this, product, prevents, me, from, doing, this., what, happens, when, the, game, is, no, longer, supported, but, i, have, n't, used, up, my, 3, activations, ?, since, the, drm, would, n't, be, able, to, contact, the, ``, mothership, '', ,, again, i, am, out, of, luck., no, more, strolling, down, memory, lane., what, happens, if, the, game, studio, goes, out, of, business, ,, again, ,, the, game, is, no, longer, usable., what, happens, if, ea, goes, out, of, business, next, ...]"
2962,"The sequel to Baten Kaitos, this game is a great addition for fans of that first game. You have real time card strategy action mixed in with an epic storyline.\n\nIt's fair enough to say that some people will love this game and other people will not feel the pull of the gameplay style. This is a *strategy* game. If you are a shooter fan, wanting to jump immediately into shooting and killing enemies, you will probably be disappointed. You can spend literally 15 minutes walking around talking and gathering before you're even allowed to enter into your first combat. This is a game that requires patience and dedication.\n\nFor people with that time, attention and desire for strategy, they'll get everything they could ask for. Your characters move in real time as you play and attack, their swords flying, their wings moving. Your organize your cards and play them for the maximum effect.\n\nIn the world around you, world-shaping events are taking place, and you meet a number of characters, roam through landscapes, and explore this world. It compares with Final Fantasy and other similar games, but with a card base.\n\nThe sound is reasonably good, with epic, energizing music pulling you into the gameplay. I found some of the voice actors to be uninspiring, but you find that in most games of this type. You're not expecting drama quality acting.\n\nI would recommend people play the first Baten Kaitos before moving on to this one. That way you get a feel for the series and style, and really get a full appreciation for this game.\n\nThis is definitely a game that will keep you occupied for WEEKS. Other games out there can be finished in a day. This isn't one of them. Between the side-quests and other things to do, you'll be busy for a long while. Again, depending on your personality, this can be great, or this can be too much.\n\nHighly recommended for epic strategy fans who have the time and attention to dedicate to gaming!","[the, sequel, to, baten, kaitos, ,, this, game, is, a, great, addition, for, fans, of, that, first, game, ., you, have, real, time, card, strategy, action, mixed, in, with, an, epic, storyline, ., it's, fair, enough, to, say, that, some, people, will, love, this, game, and, other, people, will, not, feel, the, pull, of, the, gameplay, style, ., this, is, a, *, strategy, *, game, ., if, you, are, a, shooter, fan, ,, wanting, to, jump, immediately, into, shooting, and, killing, enemies, ,, you, will, probably, be, disappointed, ., you, can, spend, literally, 15, minutes, walking, around, talking, and, gathering, ...]","[the, sequel, to, baten, kaitos, ,, this, game, is, a, great, addition, for, fans, of, that, first, game., you, have, real, time, card, strategy, action, mixed, in, with, an, epic, storyline., it, 's, fair, enough, to, say, that, some, people, will, love, this, game, and, other, people, will, not, feel, the, pull, of, the, gameplay, style., this, is, a, *strategy*, game., if, you, are, a, shooter, fan, ,, wanting, to, jump, immediately, into, shooting, and, killing, enemies, ,, you, will, probably, be, disappointed., you, can, spend, literally, 15, minutes, walking, around, talking, and, gathering, before, you, 're, even, allowed, to, ...]"


### Removing Punctuations and StopWords

In [14]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/koosha.tahmasebipour/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [15]:
stop_words = nltk.corpus.stopwords.words('english')

In [16]:
stop_words[:10]

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

In [17]:
len(stop_words)

179

In [18]:
from string import punctuation
print(punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [19]:
reviews['tokens_nosw'] = reviews['tb_tokens'].\
    apply(lambda words: [w for w in words if w not in stop_words and w not in punctuation and w != ""])

In [20]:
reviews[['tb_tokens','tokens_nosw']].sample(3)

Unnamed: 0,tb_tokens,tokens_nosw
2053,"[so, so]",[]
1063,"[i, have, all, the, arenanet, games, ,, due, to, problems, at, arenanet, ,, i, am, unable, to, play, this, game, (, guild, war, 2, ), ,, i, have, the, full, retail, box, version, ..., .it, keeps, tell, me, my, password, is, incorrect, and, it, 's, a, new, install, ..., i, guess, somebody, hacked, the, game, ,, now, nothing, works, right, ..., $, 60, bucks, in, the, trash, ,, this, game, has, no, support]","[arenanet, games, due, problems, arenanet, unable, play, game, guild, war, 2, full, retail, box, version, ..., .it, keeps, tell, password, incorrect, 's, new, install, ..., guess, somebody, hacked, game, nothing, works, right, ..., 60, bucks, trash, game, support]"
4311,"[my, daughter, loves, it, .]","[daughter, loves]"


### Stemming

In [21]:
from nltk.stem.porter import PorterStemmer

In [22]:
stemmer = PorterStemmer()

In [23]:
reviews['tokens_stemmed'] = reviews['tokens_nosw'].apply(lambda words: [stemmer.stem(w) for w in words])

In [24]:
reviews[['tokens_nosw','tokens_stemmed']].sample(3)

Unnamed: 0,tokens_nosw,tokens_stemmed
934,"[game, good, nothing, positive, say, bout, ..., n't, like, format, players, choosen, developers, character, appearence]","[game, good, noth, posit, say, bout, ..., n't, like, format, player, choosen, develop, charact, appear]"
3487,"[excellent, turtle, beach, audio]","[excel, turtl, beach, audio]"
2643,"[n't, rpg, like, turn, based, squad, combat., 's, fun, many, different, classes, toy, lots, secret, stuff, find, graphics, sharp., set-up, dialouge, kinda, childish, hey, 's, game, boy, advance, game, expect, worth, buy, enjoy]","[n't, rpg, like, turn, base, squad, combat., 's, fun, mani, differ, class, toy, lot, secret, stuff, find, graphic, sharp., set-up, dialoug, kinda, childish, hey, 's, game, boy, advanc, game, expect, worth, buy, enjoy]"


### Sentiment Predictor Baseline Model

In [28]:
nltk.download('sentiwordnet')
from nltk.corpus import sentiwordnet as swn

[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /Users/koosha.tahmasebipour/nltk_data...
[nltk_data]   Unzipping corpora/sentiwordnet.zip.


In [30]:
list(swn.senti_synsets("happy"))

[SentiSynset('happy.a.01'),
 SentiSynset('felicitous.s.02'),
 SentiSynset('glad.s.02'),
 SentiSynset('happy.s.04')]

In [32]:
joy1 = swn.senti_synset('joy.n.01')
joy2 = swn.senti_synset('joy.n.02')
 
trouble1 = swn.senti_synset('trouble.n.03')
trouble2 = swn.senti_synset('trouble.n.04')
 
 
categories = ["Joy1", "Joy2", "Trouble1", "Trouble2"]
rows = []
rows.append(["List", "Positive score", "Negative Score"])
accs = {}
accs["Joy1"] = [joy1.pos_score(), joy1.neg_score()]
accs["Joy2"] = [joy2.pos_score(), joy2.neg_score()]
accs["Trouble1"] = [trouble1.pos_score(), trouble1.neg_score()]
accs["Trouble2"] = [trouble2.pos_score(), trouble2.neg_score()]
for cat in categories:
    rows.append([cat, f"{accs.get(cat)[0]:.3f}",
                f"{accs.get(cat)[1]:.3f}"])
 
columns = zip(*rows)
column_widths = [max(len(item) for item in col) for col in columns]
for row in rows:
    print(''.join(' {:{width}} '.format(row[i], width=column_widths[i])
                  for i in range(0, len(row))))

 List      Positive score  Negative Score 
 Joy1      0.500           0.250          
 Joy2      0.375           0.000          
 Trouble1  0.000           0.625          
 Trouble2  0.000           0.500          
