# Creating a Dictionary-based Sentiment Analyzer

In [17]:
import pandas as pd
import nltk
from IPython.display import display
pd.set_option('display.max_columns', None)

### Step 1: Loading in the small_corpus .csv file created in the "creating_dataset" milestone.

In [4]:
reviews = pd.read_csv("../data/small_corpus.csv")

In [5]:
reviews.head()

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,1.0,True,"11 30, 2015",A3AC92K59QLYR8,B00503E8S2,ben,Game freezes over and over its unplayable,it just doesn't work,1448841600,,{'Format:': ' Video Game'},
1,1.0,False,"05 19, 2012",A334LHR8DWARY8,B00178630A,Xenocide,I have no problem with needing to be online to...,The only real way to show Blizzard our feeling...,1337385600,23.0,{'Format:': ' Computer Game'},
2,1.0,True,"10 19, 2014",A28982ODE7ZGVP,B001AWIP7M,Eric Frykberg,NOT GOOD,One Star,1413676800,,{'Format:': ' Video Game'},
3,1.0,True,"09 6, 2015",A19E85RLQCAMI1,B00NASF4MS,Joe,Really not worth the money to buy this game on...,Really not worth the money to buy this game on...,1441497600,2.0,{'Format:': ' Video Game'},
4,1.0,False,"05 28, 2008",AEMQKS13WC4D2,B00140P9BA,Craig,They need to eliminate the Securom. I purchase...,Securom can ruin a great game,1211932800,55.0,{'Format:': ' DVD-ROM'},


### Step 2: Tokenizing the sentences and words of the reviews
Here, We're going to test different versions of word tokenizer on reviews. We'll then decide which tokenizer might be better to use.

### Treebank Word Tokenizer

In [10]:
from nltk.tokenize import TreebankWordTokenizer

In [11]:
tb_tokenizer = TreebankWordTokenizer()

In [47]:
reviews["rev_text_lower"] = reviews['reviewText'].apply(lambda rev: str(rev).lower())

In [48]:
reviews[['reviewText','rev_text_lower']].sample(2)

Unnamed: 0,reviewText,rev_text_lower
2647,I like this although the 3DS wobbles slightly. Doesn't fit as snugly while docked as I'd like.,i like this although the 3ds wobbles slightly. doesn't fit as snugly while docked as i'd like.
4362,"It's like Ms. Pac Man, but for guys","it's like ms. pac man, but for guys"


In [49]:
reviews["tb_tokens"] = reviews['rev_text_lower'].apply(lambda rev: tb_tokenizer.tokenize(str(rev)))

In [50]:
pd.set_option('display.max_colwidth', None)

In [51]:
reviews[['reviewText','tb_tokens']].sample(3)

Unnamed: 0,reviewText,tb_tokens
4103,Oh man what a classic,"[oh, man, what, a, classic]"
3293,My friend was thrilled when he got this! it came in fast and complete!,"[my, friend, was, thrilled, when, he, got, this, !, it, came, in, fast, and, complete, !]"
4459,very good,"[very, good]"


### Casual Tokenizer

In [24]:
from nltk.tokenize.casual import casual_tokenize

In [56]:
reviews['casual_tokens'] = reviews['rev_text_lower'].apply(lambda rev: casual_tokenize(str(rev)))

In [57]:
reviews[['reviewText','casual_tokens','tb_tokens']].sample(3)

Unnamed: 0,reviewText,casual_tokens,tb_tokens
1328,"The games are NEVER balanced!! You get a bunch of losers on one team and a bunch of noobs on another team and the game expects you to have fun? Really? They use the worst algorithm to set up games I have ever seen. Also, since when was it ok to respawn people behind you constantly? Did the people who create this game even try it out before they released it? This game has ruined call of duty for me. I will never play again.","[the, games, are, never, balanced, !, !, you, get, a, bunch, of, losers, on, one, team, and, a, bunch, of, noobs, on, another, team, and, the, game, expects, you, to, have, fun, ?, really, ?, they, use, the, worst, algorithm, to, set, up, games, i, have, ever, seen, ., also, ,, since, when, was, it, ok, to, respawn, people, behind, you, constantly, ?, did, the, people, who, create, this, game, even, try, it, out, before, they, released, it, ?, this, game, has, ruined, call, of, duty, for, me, ., i, will, never, play, again, .]","[the, games, are, never, balanced, !, !, you, get, a, bunch, of, losers, on, one, team, and, a, bunch, of, noobs, on, another, team, and, the, game, expects, you, to, have, fun, ?, really, ?, they, use, the, worst, algorithm, to, set, up, games, i, have, ever, seen., also, ,, since, when, was, it, ok, to, respawn, people, behind, you, constantly, ?, did, the, people, who, create, this, game, even, try, it, out, before, they, released, it, ?, this, game, has, ruined, call, of, duty, for, me., i, will, never, play, again, .]"
4088,I really enjoyed the plot of this game. Homage to Agatha Christie and very intricate twist that fooled me up until the end. Starts off kind of slow but gets very interesting and involved during the second act. Loved the atmosphere and story.,"[i, really, enjoyed, the, plot, of, this, game, ., homage, to, agatha, christie, and, very, intricate, twist, that, fooled, me, up, until, the, end, ., starts, off, kind, of, slow, but, gets, very, interesting, and, involved, during, the, second, act, ., loved, the, atmosphere, and, story, .]","[i, really, enjoyed, the, plot, of, this, game., homage, to, agatha, christie, and, very, intricate, twist, that, fooled, me, up, until, the, end., starts, off, kind, of, slow, but, gets, very, interesting, and, involved, during, the, second, act., loved, the, atmosphere, and, story, .]"
1674,"By far the greatest disappointment in gaming since Duke Nukem Forever came out. This game was over 11 years in the making and when it finally was released it was filled with bugs, exploits, server issues, and no single player offline mode. You are required to connect to the servers in order to even play a game by yourself, the reasoning behind this is that it prevents people from making illegal copies and/or duping or hacking the game. Fine, but what it also does is limit your ability to play a product you paid for, you cannot play if you have no internet connection, if the servers are down or out, or your connection is not ""stable"" (NZ and Australian players know this fact!). Why not add a single player mode only that cannot then be transferred online for play for those who wish to enjoy the game when the internet is not an option or the servers are out?\nAside from the glaring requirement of being connected in order to play there are the issues with the game play itself. Things like ""rubberbanding"" and latency cause your character to just suddenly die, issues further compounded by insane graphical requirements even at the lowest of settings. Systems that run over 1 GB of video memory still experience framerate stutter and tearing on a game with graphics less intense than something made 5 years ago. This is owed to the MPQ files that are used as a resource to render the world, these objects are stored on the Hard Drive and must be accessed constantly to generate the terrain. This means that they are subject to read time delays, even fast 10,000 RPM disks are going to struggle to keep up. The best solution has been to store these files on a SSD separately, which is a ridiculous fix that should not need to be implemented simply to make the game run smoothly.\nThe game itself is a progression, like Diablo 2 or Dungeon Siege 2, beating certain difficulties unlocks more challenging ones. However the story is exactly the same, the enemies simply get harder to kill. This means there is no real end game. For those that have ""seen it all"" and ""done it all"", such as beating the hardest difficulty or mastering the ""Hardcore Mode"" (another failed attempt to challenge players, what good is a 1 death system when things like lag or framerate issues can kill you as surely as any enemy) this means the game has no replay value whatsoever. There is nothing to do but wait for more content to be released. A ladder system or PvP such as was introduced in D2 would alleviate this, but Blizzard prefers to invest time in their cash cow, the Real Money Auction House (RMAH).\nWhich brings me to my final point, the game is custom designed to make good items hard to find so that players will be forced to resort to buying the items they need to proceed. It is in Blizzard's best interests to make good items hard to get, because if everyone can get them on their own no one will sell them in the RMAH and Blizzard will not receive its commissions. The economy of the game is of course broken because the people with all the good items are the botters, hackers, exploiters, and cheaters who took advantage of loopholes and continue to take advantage of the game in order to profit. These people, for the most part, work out places like Korea and China, dedicating untold hours to grinding the game in order to gather their loot which they then sell at premium prices to those too ignorant or desperate to know better than to buy them.\nThis game was fundamentally flawed in the way it was developed, released, and is now currently being managed. Blizzard needs to stop the RMAH and increase the drop rates of quality items across the board. This is a game people play for fun in order to challenge themselves to get better at it, not a money making endeavor that is meant to line the pockets of the company that made it. For these reasons I have stopped playing the game, since I received my copy for free when I purchased a yearly subscription to World of Warcraft I am one of the few fortunate souls who was not robbed for 60 dollars for a failure of a product.\nRe-visit in 2015: The game has not become much more fun in these intervening years. Playing infrequently I have had really no expectations that the game would get much better, even with the release of the Expansion Reaper of Souls. I will admit that the loss of the RMAH and inclusion of new ""Seasonal"" play has helped dramatically. However the game itself has not gotten any ""funner"". Patches have added new tweaks and items, but overall the game has remained an incessant grind fest. I own the Console version and find it superior in every way, it is still not an exciting game to play for hours on end on a console...but you feel more rewarded by the drop rates and game mechanics like rolling and kill streak rewards. Add into that an up to 4 player on screen co-op that can be 3 other people in the same room as you (provided you have the controllers and people willing to play) and the console version just shines. I like nothing more than playing co-op on the Xbox One version with my neice and nephew who both like the game, but don't find it particularly enjoyable.","[by, far, the, greatest, disappointment, in, gaming, since, duke, nukem, forever, came, out, ., this, game, was, over, 11, years, in, the, making, and, when, it, finally, was, released, it, was, filled, with, bugs, ,, exploits, ,, server, issues, ,, and, no, single, player, offline, mode, ., you, are, required, to, connect, to, the, servers, in, order, to, even, play, a, game, by, yourself, ,, the, reasoning, behind, this, is, that, it, prevents, people, from, making, illegal, copies, and, /, or, duping, or, hacking, the, game, ., fine, ,, but, what, it, also, does, is, limit, your, ability, to, play, ...]","[by, far, the, greatest, disappointment, in, gaming, since, duke, nukem, forever, came, out., this, game, was, over, 11, years, in, the, making, and, when, it, finally, was, released, it, was, filled, with, bugs, ,, exploits, ,, server, issues, ,, and, no, single, player, offline, mode., you, are, required, to, connect, to, the, servers, in, order, to, even, play, a, game, by, yourself, ,, the, reasoning, behind, this, is, that, it, prevents, people, from, making, illegal, copies, and/or, duping, or, hacking, the, game., fine, ,, but, what, it, also, does, is, limit, your, ability, to, play, a, product, you, paid, for, ...]"


### Removing StopWords

In [32]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/koosha.tahmasebipour/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [35]:
stop_words = nltk.corpus.stopwords.words('english')

In [37]:
stop_words[:10]

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

In [38]:
len(stop_words)

179

In [54]:
reviews['tokens_nosw'] = reviews['tb_tokens'].apply(lambda words: [w for w in words if w not in stop_words])

In [55]:
reviews[['tb_tokens','tokens_nosw']].sample(3)

Unnamed: 0,tb_tokens,tokens_nosw
2533,"[i, got, this, game, with, a, gift, card, from, earning, employee, of, the, month., every, free, game, is, a, good, game, .]","[got, game, gift, card, earning, employee, month., every, free, game, good, game, .]"
622,"[this, game, is, more, based, on, the, book, then, any, other, lotr, game., but, the, battles, are, terribly, slow, ,, some, items, do, n't, work, ,, things, that, the, game, promised, are, taken, out, ,, the, graphics, are, awful, ,, and, there, are, some, terrible, glitches, in, the, game, ,, especially, the, moria, glitch, which, made, me, go, out, buy, another, one, (, i, thought, the, first, one, was, a, defect, ), and, go, through, the, same, glitch, again., even, if, you, are, a, huge, lotr, fan, you, should, probably, stay, away, from, this, game, .]","[game, based, book, lotr, game., battles, terribly, slow, ,, items, n't, work, ,, things, game, promised, taken, ,, graphics, awful, ,, terrible, glitches, game, ,, especially, moria, glitch, made, go, buy, another, one, (, thought, first, one, defect, ), go, glitch, again., even, huge, lotr, fan, probably, stay, away, game, .]"
3474,"[now, i, can, plug, my, nintendo, to, the, flat, screen]","[plug, nintendo, flat, screen]"
