# Creating a Dictionary-based Sentiment Analyzer

In [1]:
import pandas as pd
import nltk
from IPython.display import display
pd.set_option('display.max_columns', None)

### Step 1: Loading in the small_corpus .csv file created in the "creating_dataset" milestone.

In [2]:
reviews = pd.read_csv("../data/small_corpus.csv")

In [3]:
reviews.head()

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,1.0,True,"11 30, 2015",A3AC92K59QLYR8,B00503E8S2,ben,Game freezes over and over its unplayable,it just doesn't work,1448841600,,{'Format:': ' Video Game'},
1,1.0,False,"05 19, 2012",A334LHR8DWARY8,B00178630A,Xenocide,I have no problem with needing to be online to...,The only real way to show Blizzard our feeling...,1337385600,23.0,{'Format:': ' Computer Game'},
2,1.0,True,"10 19, 2014",A28982ODE7ZGVP,B001AWIP7M,Eric Frykberg,NOT GOOD,One Star,1413676800,,{'Format:': ' Video Game'},
3,1.0,True,"09 6, 2015",A19E85RLQCAMI1,B00NASF4MS,Joe,Really not worth the money to buy this game on...,Really not worth the money to buy this game on...,1441497600,2.0,{'Format:': ' Video Game'},
4,1.0,False,"05 28, 2008",AEMQKS13WC4D2,B00140P9BA,Craig,They need to eliminate the Securom. I purchase...,Securom can ruin a great game,1211932800,55.0,{'Format:': ' DVD-ROM'},


### Step 2: Tokenizing the sentences and words of the reviews
Here, We're going to test different versions of word tokenizer on reviews. We'll then decide which tokenizer might be better to use.

### Treebank Word Tokenizer

In [4]:
from nltk.tokenize import TreebankWordTokenizer

In [5]:
tb_tokenizer = TreebankWordTokenizer()

In [6]:
reviews["rev_text_lower"] = reviews['reviewText'].apply(lambda rev: str(rev).lower())

In [7]:
reviews[['reviewText','rev_text_lower']].sample(2)

Unnamed: 0,reviewText,rev_text_lower
2971,Game still works great with my older instrumen...,game still works great with my older instrumen...
2924,"Classic, I wish I would have been told the ope...","classic, i wish i would have been told the ope..."


In [8]:
reviews["tb_tokens"] = reviews['rev_text_lower'].apply(lambda rev: tb_tokenizer.tokenize(str(rev)))

In [9]:
pd.set_option('display.max_colwidth', None)

In [10]:
reviews[['reviewText','tb_tokens']].sample(3)

Unnamed: 0,reviewText,tb_tokens
485,"Couldnt even put in a full hour,before switching to something else.\nI cant even find the words to describe how bad this is.\nEverythings bad. And was done better 10 to 15 years ago.\nThis is nowhere near the old Baldur's Gates or Champions of Norrath games.","[couldnt, even, put, in, a, full, hour, ,, before, switching, to, something, else., i, cant, even, find, the, words, to, describe, how, bad, this, is., everythings, bad., and, was, done, better, 10, to, 15, years, ago., this, is, nowhere, near, the, old, baldur, 's, gates, or, champions, of, norrath, games, .]"
419,I was really excited to get this game. I popped it in and was disappointed from the start. Do you see how small the people are? It should be called Grand Theft Ants. Good graphics but the people are small and very hard to see. Not worth anything.,"[i, was, really, excited, to, get, this, game., i, popped, it, in, and, was, disappointed, from, the, start., do, you, see, how, small, the, people, are, ?, it, should, be, called, grand, theft, ants., good, graphics, but, the, people, are, small, and, very, hard, to, see., not, worth, anything, .]"
3655,this mouse is great,"[this, mouse, is, great]"


### Casual Tokenizer

In [11]:
from nltk.tokenize.casual import casual_tokenize

In [12]:
reviews['casual_tokens'] = reviews['rev_text_lower'].apply(lambda rev: casual_tokenize(str(rev)))

In [13]:
reviews[['reviewText','casual_tokens','tb_tokens']].sample(3)

Unnamed: 0,reviewText,casual_tokens,tb_tokens
3348,"No one can resist the urge to keep playing Resistance: Fall of Man once the game is started. Chapter after chapter of exhilarating battles against a plethora of creatures that you use a plethora of weapons to defeat (and they'll use them against you too).\n\nTo really put the plot of the game in context (if you care about the plot), it's well worth the visit to the official Resistance: Fall of Man web site; there's a ton of story there that I'd bet 90% of the people who start the game are unaware of. For me, the story helped me ""get into character"" for the game. If you just want to start wasting some nasty beasts, ""Get to it, soldier!"" Rest assured, the story won't help you beat the enemies or evade the barrage of defenses that will be fired at you by them.\n\nAs a relatively new gamer, I'm glad that I played Call of Duty 3&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Call-of-Duty-3/dp/B000GA73O0/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Call of Duty 3</a>&nbsp;before I cracked open Resistance: Fall of Man. It was good FPS training (set in approximately the same time period) for this much more difficult PS3 title.\n\nResistance has an M (mature gamers only) rating which miffs me a bit, but then I'm a bit more liberal when it comes to what I think might warp the little minds of kids these days. I've seen this classified as a Horror title, but I think that Sci-fi Action is much more fitting. Resistance deserved a T (teen) rating. My kids and I get hung up all too often on sunny days playing Resistance's Co-op Mode when we should be outside playing ball. And hopefully that alone speaks volumes as to how addictively fun this game is.\n\nCo-op Mode is split screen of course because the game is FPS; it would be nice if some games could give the option of switching to a single screen, third person view for Co-op in order to negate the need for split screen, but that would distinctly change to look, range, feel and some of the weaponry...so I assume that's why it's never offered on Co-op mode.\n\nMulti-player is very cool. Multi-player is not like Co-op where you play the Campaign Mode with a partner. In Multi-player you battle against your friends offline (4 player max/split screen) or online (40 player max/full screen). You get some areas of battle to play in that you encountered in Campaign Mode and some new map scematics.\n\nBut Resistance's real challenge is in single player Campaign Mode.\n\nYour weapons are plentiful. You start with a pretty standard machine gun (M5A2) that includes a grenade launcher that will become your best ""little friend"" in the game. It's very useful on the enemies that you encounter most throughout the game. You also get several mean weapons that you pick up from your fallen comrades and defeated beasts including the enemy's main machine gun (The Bullseye...very handy), grenades, a shotgun, a sniper rifle, a radiation blaster (The Auger), a mine thrower, a rapid fire subsonic bolt dispenser (The Hailstorm) and a rocket launcher.\n\nGrenades are key to survival...use them, but you'll want to keep in mind that you can run out real real fast...so use them pretty much exclusively to take down groups of enemies.\n\n(And five other weapons aside from the ones that I mentioned above are unlocked only after beating/completing the game once on medium difficulty or higher.)\n\nIf you missed the countless gamer magazines' synopsises of the creature types and the power of the individual weapons, there are some great gamer help web sites that are worth taking a look at to help you strategize (just click on my profile above and email me if you'd like me to point you in the right direction).\n\nAnd speaking of getting pointed in the right direction, part of the fun of this game is figuring out on your own how to get from chapter to chapter. There are no maps, which at times was frustrating. Let me tell you that if the phrase ""the best route is not always the most obvious"" ever applied to trying to get from here to there it applies in this game on more than one occasion.\n\nYou'll also need to at times navigate a jeep and a tank.\n\nSo what exactly are you fighting? Where exactly did they come from? If you're up to the challenge to find out....You're Sgt. Nathan Hale, and you're the only one with a Resistance to the Fall of Man. Prepare for some serious battle.","[no, one, can, resist, the, urge, to, keep, playing, resistance, :, fall, of, man, once, the, game, is, started, ., chapter, after, chapter, of, exhilarating, battles, against, a, plethora, of, creatures, that, you, use, a, plethora, of, weapons, to, defeat, (, and, they'll, use, them, against, you, too, ), ., to, really, put, the, plot, of, the, game, in, context, (, if, you, care, about, the, plot, ), ,, it's, well, worth, the, visit, to, the, official, resistance, :, fall, of, man, web, site, ;, there's, a, ton, of, story, there, that, i'd, bet, 90, %, of, the, people, who, ...]","[no, one, can, resist, the, urge, to, keep, playing, resistance, :, fall, of, man, once, the, game, is, started., chapter, after, chapter, of, exhilarating, battles, against, a, plethora, of, creatures, that, you, use, a, plethora, of, weapons, to, defeat, (, and, they, 'll, use, them, against, you, too, ), ., to, really, put, the, plot, of, the, game, in, context, (, if, you, care, about, the, plot, ), ,, it, 's, well, worth, the, visit, to, the, official, resistance, :, fall, of, man, web, site, ;, there, 's, a, ton, of, story, there, that, i, 'd, bet, 90, %, of, ...]"
1352,"This Certified Refurbished product is tested and certified by the manufacturer or by a third-party refurbisher to look and work like new, with limited to no signs of wear.","[this, certified, refurbished, product, is, tested, and, certified, by, the, manufacturer, or, by, a, third-party, refurbisher, to, look, and, work, like, new, ,, with, limited, to, no, signs, of, wear, .]","[this, certified, refurbished, product, is, tested, and, certified, by, the, manufacturer, or, by, a, third-party, refurbisher, to, look, and, work, like, new, ,, with, limited, to, no, signs, of, wear, .]"
3538,Another fun Lego game. A large open world with a lot of familiar characters and places. Extra content is great and will definitely keep you occupied for a while. If you like Lego games and Marvel characters this is for you.,"[another, fun, lego, game, ., a, large, open, world, with, a, lot, of, familiar, characters, and, places, ., extra, content, is, great, and, will, definitely, keep, you, occupied, for, a, while, ., if, you, like, lego, games, and, marvel, characters, this, is, for, you, .]","[another, fun, lego, game., a, large, open, world, with, a, lot, of, familiar, characters, and, places., extra, content, is, great, and, will, definitely, keep, you, occupied, for, a, while., if, you, like, lego, games, and, marvel, characters, this, is, for, you, .]"


### Removing StopWords

In [14]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/koosha.tahmasebipour/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [15]:
stop_words = nltk.corpus.stopwords.words('english')

In [16]:
stop_words[:10]

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

In [17]:
len(stop_words)

179

In [25]:
reviews['tokens_nosw'] = reviews['tb_tokens'].apply(lambda words: [w for w in words if w not in stop_words and w != ''])

In [26]:
reviews[['tb_tokens','tokens_nosw']].sample(3)

Unnamed: 0,tb_tokens,tokens_nosw
1931,"[and, it, will, !, first, one, broke, at, the, flex, point, swivel, (, where, the, earcup, attaches, to, the, band, ), ., second, one, just, broke., this, was, after, about, a, year, of, moderate, use., for, this, price, ,, this, thing, should, hold, together, better, .]","[!, first, one, broke, flex, point, swivel, (, earcup, attaches, band, ), ., second, one, broke., year, moderate, use., price, ,, thing, hold, together, better, .]"
3589,"[excellent, game]","[excellent, game]"
1098,"[it, kills, me, to, do, this, considering, i, 've, purchased, every, single, c, &, c, game, that, 's, been, released, to, date., this, is, a, good, game, ,, nope, a, fantastic, game, locked, in, a, bad, package., there, is, no, way, to, justify, the, drm, that, comes, with, ea, 's, recent, pc, releases., please, note, this, is, not, whining., securom, can, seriously, mess, up, your, system, and, uninstalling, the, game, wo, n't, remove, it, either., what, it, really, comes, down, to, is, taking, a, stand., ask, yourself, this, ,, do, you, really, want, to, pay, $, 50, for, a, game, ...]","[kills, considering, 've, purchased, every, single, c, &, c, game, 's, released, date., good, game, ,, nope, fantastic, game, locked, bad, package., way, justify, drm, comes, ea, 's, recent, pc, releases., please, note, whining., securom, seriously, mess, system, uninstalling, game, wo, n't, remove, either., really, comes, taking, stand., ask, ,, really, want, pay, $, 50, game, makes, jump, hoops, anyone, really, wants, get, free, copy, without, much, hassle, ?]"


### Stemming

In [20]:
from nltk.stem.porter import PorterStemmer

In [21]:
stemmer = PorterStemmer()

In [27]:
reviews['tokens_stemmed'] = reviews['tokens_nosw'].apply(lambda words: [stemmer.stem(w) for w in words])

In [28]:
reviews[['tokens_nosw','tokens_stemmed']].sample(3)

Unnamed: 0,tokens_nosw,tokens_stemmed
2618,"[great, less, expensive, xbox, one, windows, pc, controller., buttons, ,, sticks, bumpers, almost, exactly, size, place, controller, official, microsoft, controller., 's, solidly, built, controller, easy, hold, weight, microsoft, controller., 8, foot, usb, cable, standard, usb, mini, connector, disconnected, ,, lose, cable, need, longer/shorter, one, reason, ,, substitute, another, cheap, usb, cable., use, controller, without, cable, though, (, 's, wireless, ), ., connect, headset, 3.5mm, jack, controller., controller, also, includes, extra, ``, audio, '', button, right, stick, use, control, game, volume, ,, game/chat, balance, ,, mute, microphone., drawback, find, controller, (, wireless, ,, 's, deal, breaker, ), ,, buttons, bit, harder, ...]","[great, less, expens, xbox, one, window, pc, controller., button, ,, stick, bumper, almost, exactli, size, place, control, offici, microsoft, controller., 's, solidli, built, control, easi, hold, weight, microsoft, controller., 8, foot, usb, cabl, standard, usb, mini, connector, disconnect, ,, lose, cabl, need, longer/short, one, reason, ,, substitut, anoth, cheap, usb, cable., use, control, without, cabl, though, (, 's, wireless, ), ., connect, headset, 3.5mm, jack, controller., control, also, includ, extra, ``, audio, '', button, right, stick, use, control, game, volum, ,, game/chat, balanc, ,, mute, microphone., drawback, find, control, (, wireless, ,, 's, deal, breaker, ), ,, button, bit, harder, ...]"
805,"[played, tropico, 3, mac, ,, tried, ps4, disappointing., tutorial, hard, ,, plus, tv, 40, inches, ,, big, enough, game, ton, text, like, one., honestly, ,, wish, bought, computer, version, .]","[play, tropico, 3, mac, ,, tri, ps4, disappointing., tutori, hard, ,, plu, tv, 40, inch, ,, big, enough, game, ton, text, like, one., honestli, ,, wish, bought, comput, version, .]"
669,"[game, ,, nasty, !, everything, sick, gory, !, buy, !, hate, !, hate, !]","[game, ,, nasti, !, everyth, sick, gori, !, buy, !, hate, !, hate, !]"
