# Creating a Dictionary-based Sentiment Analyzer

In [1]:
import pandas as pd
import nltk
from IPython.display import display
pd.set_option('display.max_columns', None)

### Step 1: Loading in the small_corpus .csv file created in the "creating_dataset" milestone.

In [2]:
reviews = pd.read_csv("small_corpus.csv")

In [3]:
reviews.head()

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,1.0,False,"02 27, 2002",AAQMWWN5UDJM5,B00005N9CQ,Zorin,This game [stinks]. Don't buy it. The graphi...,Complete wate of time,1014768000,,,
1,1.0,True,"08 28, 2015",A350SO127HDQSQ,B00O65I2VY,Derrick Black,"Doesn't work when connected to PC, as simple a...",Buyer Beware,1440720000,,,
2,1.0,False,"06 26, 2009",A3ERSDUGJX7BGC,B0015RWPWS,PM,I have tried repeatedly to use this in my bala...,Didn't work for me.,1245974400,,,
3,1.0,False,"12 22, 2010",A1PBW0R798CBKR,B002DC8GT0,24joshua15,I know there are some out there who enjoy this...,Bad Advertisement is the Culprit,1292976000,17.0,,
4,1.0,True,"11 20, 2013",A2VKK5PI79ML1J,B0016NRS8M,Jimmy Hitchcock,This game is completely terrible. There are no...,South peak has done it again...,1384905600,2.0,{'Platform:': ' Xbox 360'},


### Step 2: Tokenizing the sentences and words of the reviews
Here, We're going to test different versions of word tokenizer on reviews. We'll then decide which tokenizer might be better to use.

### Treebank Word Tokenizer

In [4]:
from nltk.tokenize import TreebankWordTokenizer
from string import punctuation
import string

In [5]:
tb_tokenizer = TreebankWordTokenizer()

In [6]:
reviews["rev_text_lower"] = reviews['reviewText'].apply(lambda rev: str(rev)\
                                                        .translate(str.maketrans('', '', punctuation))\
                                                        .replace("<br />", " ")\
                                                        .lower())

In [7]:
reviews[['reviewText','rev_text_lower']].sample(2)

Unnamed: 0,reviewText,rev_text_lower
1956,This was a dumb game. Did not like it at all.....,this was a dumb game did not like it at allsav...
235,did not meet expectations,did not meet expectations


In [8]:
reviews["tb_tokens"] = reviews['rev_text_lower'].apply(lambda rev: tb_tokenizer.tokenize(str(rev)))

In [9]:
pd.set_option('display.max_colwidth', None)

In [10]:
reviews[['reviewText','tb_tokens']].sample(3)

Unnamed: 0,reviewText,tb_tokens
3310,"There are a lot of reviews here, so I'll just hit a few highs, plus a low or two. This refers to the PS3 version.\n\nI've played a lot of shooter and survival games, but Dead Space adds some great twists that make it fresh and exciting.\n\nFirst, fighting tactics. As others have said, you don't go for the usual head or torso shots, you go for the limbs. This is great fun.\n\nSecond, the graphics. Absolutely gorgeous and faultlessly detailed. No render errors, no load snags, no frame rate problems. None. And no constant reuse of pipes, walls, etc. except where it's appropriate for them to look the same as another spot.\n\nThird, environment and sound. You're kept on your toes by the creepy atmosphere and the sound is fantastic. Distant clangs, soft nearby noises and other sounds constantly keep you wondering if there's a baddie behind the next corner.\n\nFourth, realism. There are many zero G areas where you have to leap from surface to surface (your boots are magnetic). These are challenging but not annoying and add a nice twist. There are also areas where there is no atmosphere and you have to watch your air supply closely.\n\nFifth, perspective. For the first time, your health and your suit's special power meters are on the back of your suit itself, not in a corner of the screen. The over-the-shoulder perspective is nicely judged - you're not so close to yourself that you can't around, but far enough back to get a good, wide view. Your body does block a part of the left half of the screen, but I've never found this a problem.\n\nThe not so good? Mainly two things that are far from a killjoy but could have been better.\n\nOne, the on-screen displays when you view your ""rig"" inventory, the map, and text logs you collect have absolutely puny text. They're almost impossible to read on a 34-inch conventional TV using the 3-prong high-quality video (not hi-def).\n\nTwo, you upgrade your weapons and suit powers at ""benches"" using ""nodes"" you collect along the way. Both the nodes and benches are few and far between, so you don't get to upgrade as much as you'd like.\n\nThat's it. Dead Space is simply the best game I've played on the PS3, and possibly the best ever. It's richly detailed in every possible respect, brings fresh ideas to the 3rd-person shooter experience, and has no functional flaws and no real controller-throwing frustrations.\n\nAbsolutely highly recommended!","[there, are, a, lot, of, reviews, here, so, ill, just, hit, a, few, highs, plus, a, low, or, two, this, refers, to, the, ps3, version, ive, played, a, lot, of, shooter, and, survival, games, but, dead, space, adds, some, great, twists, that, make, it, fresh, and, exciting, first, fighting, tactics, as, others, have, said, you, dont, go, for, the, usual, head, or, torso, shots, you, go, for, the, limbs, this, is, great, fun, second, the, graphics, absolutely, gorgeous, and, faultlessly, detailed, no, render, errors, no, load, snags, no, frame, rate, problems, none, and, no, constant, reuse, of, pipes, walls, etc, ...]"
2676,"does exactly what its supposed too, and cheaper then getting a wired controller to use for your pc games that want a 360 controller. (like all of the lego games)","[does, exactly, what, its, supposed, too, and, cheaper, then, getting, a, wired, controller, to, use, for, your, pc, games, that, want, a, 360, controller, like, all, of, the, lego, games]"
193,"This might be a great game, no way to tell. Cant get it to run. Error 12. Did as the website said, NO JOY. Submitted a ticket, 1 to 2 days response time. Forget it. Sending it back. Dont waste your time\nAmazon gave a full refund they are the best:-)","[this, might, be, a, great, game, no, way, to, tell, cant, get, it, to, run, error, 12, did, as, the, website, said, no, joy, submitted, a, ticket, 1, to, 2, days, response, time, forget, it, sending, it, back, dont, waste, your, time, amazon, gave, a, full, refund, they, are, the, best]"


### Casual Tokenizer

In [11]:
from nltk.tokenize.casual import casual_tokenize

In [12]:
reviews['casual_tokens'] = reviews['rev_text_lower'].apply(lambda rev: casual_tokenize(str(rev)))

In [13]:
reviews[['reviewText','casual_tokens','tb_tokens']].sample(3)

Unnamed: 0,reviewText,casual_tokens,tb_tokens
4215,"Fast shipment, once we received our product, we inserted it into our N64 and were playing Mario Cart in no time!","[fast, shipment, once, we, received, our, product, we, inserted, it, into, our, n64, and, were, playing, mario, cart, in, no, time]","[fast, shipment, once, we, received, our, product, we, inserted, it, into, our, n64, and, were, playing, mario, cart, in, no, time]"
2914,I really enjoyed playing this game...The graphics and the music is really cool...there are a lot of different things that you can do on this game...play this game for sure.,"[i, really, enjoyed, playing, this, gamethe, graphics, and, the, music, is, really, coolthere, are, a, lot, of, different, things, that, you, can, do, on, this, gameplay, this, game, for, sure]","[i, really, enjoyed, playing, this, gamethe, graphics, and, the, music, is, really, coolthere, are, a, lot, of, different, things, that, you, can, do, on, this, gameplay, this, game, for, sure]"
4425,OF COURSE I HAVE THIS WHOLE SERIES... I PLAYED MY FIRST COPY OF GEARS 2 SO MUCH THAT THE DISK STOPPED BEING ABLE TO BE READ WHICH IS THE REASON WHY I BOUGHT THIS COPY... AS SOON AS I CAN DRAG MYSELF AWAY FROM PART 3 I WILL DEFINITELY GET BACK INTO PLAYING PART 2,"[of, course, i, have, this, whole, series, i, played, my, first, copy, of, gears, 2, so, much, that, the, disk, stopped, being, able, to, be, read, which, is, the, reason, why, i, bought, this, copy, as, soon, as, i, can, drag, myself, away, from, part, 3, i, will, definitely, get, back, into, playing, part, 2]","[of, course, i, have, this, whole, series, i, played, my, first, copy, of, gears, 2, so, much, that, the, disk, stopped, being, able, to, be, read, which, is, the, reason, why, i, bought, this, copy, as, soon, as, i, can, drag, myself, away, from, part, 3, i, will, definitely, get, back, into, playing, part, 2]"


# Casual tokenization handles informal text elements like contractions, emojis, hashtags, and mentions more effectively, making it better suited for social media content.


### Removing StopWords
This part has been remvoed as removing stop words is not good for sentiment analysis at all!!

In [14]:
#nltk.download('stopwords')

In [15]:
#stop_words = nltk.corpus.stopwords.words('english')

In [16]:
#stop_words.remove("no")

In [17]:
#stop_words.remove("not")

In [18]:
#print(stop_words)

In [19]:
#"not" in stop_words

In [20]:
#len(stop_words)

In [21]:
#from string import punctuation
#print(punctuation)

In [22]:
#reviews['tokens_nosw'] = reviews['tb_tokens'].\
#    apply(lambda words: [w for w in words if w not in stop_words and w not in punctuation and w != ""])

In [23]:
#reviews[['tb_tokens','tokens_nosw']].sample(3)

### Stemming

 Stemming reduces words to their root form by removing suffixes and prefixes.
 This helps to group different forms of a word (e.g., "playing", "played", "plays") under a common base ("play").
 Here, the PorterStemmer is used to stem each token in the reviews,
 which can improve generalization in text analysis by treating related words as the same.


from nltk.stem.porter import PorterStemmer

In [25]:
stemmer = PorterStemmer()

In [26]:
reviews['tokens_stemmed'] = reviews['tb_tokens'].apply(lambda words: [stemmer.stem(w) for w in words])

In [27]:
reviews[['tb_tokens','tokens_stemmed']].sample(3)

Unnamed: 0,tb_tokens,tokens_stemmed
3325,"[does, the, job]","[doe, the, job]"
3182,"[great, fast, services]","[great, fast, servic]"
1747,"[i, initially, had, high, hopes, for, this, game, but, after, playing, through, the, single, player, mode, which, is, tiny, and, playing, the, online, game, for, a, while, this, game, just, isnt, up, to, the, standards, halo, 2, created, although, i, loved, halo, 2, i, grew, weary, of, the, cheaters, and, lack, of, maps, that, it, continues, to, offer, at, least, until, it, is, patched, project, snowblinds, online, component, is, a, mix, between, halo2, and, rainbow, six, 3, black, arrow, the, graphics, are, similar, to, rainbow, six, and, so, is, the, set, up, so, if, you, liked, it, then, you, would, ...]","[i, initi, had, high, hope, for, thi, game, but, after, play, through, the, singl, player, mode, which, is, tini, and, play, the, onlin, game, for, a, while, thi, game, just, isnt, up, to, the, standard, halo, 2, creat, although, i, love, halo, 2, i, grew, weari, of, the, cheater, and, lack, of, map, that, it, continu, to, offer, at, least, until, it, is, patch, project, snowblind, onlin, compon, is, a, mix, between, halo2, and, rainbow, six, 3, black, arrow, the, graphic, are, similar, to, rainbow, six, and, so, is, the, set, up, so, if, you, like, it, then, you, would, ...]"


### Lemmatisation

Lemmatization changes words to their correct base form based on their part of speech (like noun, verb, adjective). This makes text analysis more accurate than stemming. It first finds the word’s type, then gets the proper root form. The result is a list of words in their proper base form in the ‘lemmas’ column.
 

In [28]:
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk import sent_tokenize, word_tokenize, pos_tag

In [29]:
def penn_to_wn(tag):
    """
        Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

In [30]:
lemmatizer = WordNetLemmatizer()
def get_lemas(tokens):
    lemmas = []
    for token in tokens:
        pos = penn_to_wn(pos_tag([token])[0][1])
        if pos:
            lemma = lemmatizer.lemmatize(token, pos)
            if lemma:
                lemmas.append(lemma)
    return lemmas

In [31]:
reviews['lemmas'] = reviews['tb_tokens'].apply(lambda tokens: get_lemas(tokens))

In [32]:
reviews[['reviewText','tokens_stemmed','lemmas']].sample(2)

Unnamed: 0,reviewText,tokens_stemmed,lemmas
411,weast,[weast],[weast]
562,I don't own the game but with all these negative reviews just might as well rate it 1 star,"[i, dont, own, the, game, but, with, all, these, neg, review, just, might, as, well, rate, it, 1, star]","[i, dont, own, game, negative, review, just, well, rate, star]"


### Sentiment Predictor Baseline Model

In [33]:
def get_sentiment_score(tokens):
    score = 0
    tags = pos_tag(tokens)
    for word, tag in tags:
        wn_tag = penn_to_wn(tag)
        if not wn_tag:
            continue
        synsets = wn.synsets(word, pos=wn_tag)
        if not synsets:
            continue
        
        #most common set:
        synset = synsets[0]
        swn_synset = swn.senti_synset(synset.name())
        
        score += (swn_synset.pos_score() - swn_synset.neg_score())
        
    return score
                    

In [35]:
import nltk
nltk.download('sentiwordnet')


[nltk_data] Downloading package sentiwordnet to
[nltk_data]     C:\Users\wama\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\sentiwordnet.zip.


True

In [36]:
## test
swn.senti_synset(wn.synsets("perfect", wn.ADJ)[0].name()).pos_score()

0.625

In [37]:
reviews['sentiment_score'] = reviews['lemmas'].apply(lambda tokens: get_sentiment_score(tokens))

In [38]:
reviews[['reviewText','lemmas','sentiment_score']].sample(5)

Unnamed: 0,reviewText,lemmas,sentiment_score
1926,"OK, why does Amazon let all the trolls who don't even have the game spam there uninformed opinion here. There are plenty of other formus on the web. This site should be for only for peeps that actually bought the game. With that stated, DO NOT waste your time reading the 1 stars. I didn't try to log on until last night because any moron with half a brain knows that the first day of any major release is going to be hard to connect. I only played for a couple of hours and so far it's a little easy. Playing as a demon hunter/nml. They did dumb it down a little from D2 so it's basically using 2 mouse buttons and the shift button--ranged attacks. The graphics are really good and no lag so far. I looked at the AH but nothing so far (day 1). Good hunting everyone and enjoy the game :}\n\nAlso forgot, they had to use DRM and online only because anyone that played D2 should remember how all of the hacks ruined that game. If you don't like it play something else.\n\nUPDATE 17 May, Played now for about 9 hours total=big let down. I was hoping that the game would get better but it didn't. Also it was getting a little laggy last night. Not worth $60 so wait until the rush is over and it might be worth buying at $20.\n\n13 July UPDATE: OK, ran 2 toons thru inferno and I'm finished. Same grind over and over. I miss the gold old days of the 90's UO. Now that was a game!!","[ok, do, amazon, let, troll, dont, even, have, game, spam, there, uninformed, opinion, here, there, be, plenty, other, formus, web, site, be, only, peep, actually, bought, game, state, do, not, waste, time, reading, star, i, didnt, try, log, last, night, moron, half, brain, know, first, day, major, release, be, go, be, hard, connect, i, only, played, couple, hour, so, far, little, easy, play, demon, hunternml, do, dumb, down, little, d2, so, basically, use, mouse, button, shift, buttonranged, attack, graphic, be, really, good, lag, so, far, i, look, ah, nothing, so, far, day, good, hunt, everyone, enjoy, game, also, forgot, have, ...]",0.888
3915,Great price disk mint,"[great, price, disk, mint]",0.0
2351,was okay,"[be, okay]",0.5
2129,"This is a great keyboard. The feel of the keys took a bit of getting used to at first but I really like it now. The software to control colors, macros, etc is very nice and easy to use.\n\nI have played around a bit with the ARX app on my phone. I really like it and hope to see more support for it in the future. Leveraging your phone and touchscreen for the secondary display is a great idea!\n\n6mo update: disappointed at how many keys have lost LEDs. I'd say roughly 1/3 of the key caps have lost at least 1 of the color so you can't get them to match the others. It started happening after a few months and more and more of them continue to fail over time. Not going to replace them 1:1. Sad for such a high end keyboard.","[be, great, keyboard, feel, key, take, bit, get, use, first, i, really, now, software, control, color, macro, etc, be, very, nice, easy, use, i, have, played, bit, arx, app, phone, i, really, hope, see, more, support, future, leverage, phone, touchscreen, secondary, display, be, great, idea, update, disappointed, many, key, have, lose, led, id, say, roughly, key, cap, have, lose, least, color, so, cant, get, match, others, start, happen, few, month, more, more, continue, fail, time, not, go, replace, sad, such, high, end, keyboard]",-0.125
1942,"While the ""game"" is fun, it's no replacement for a real workout. I found the time between exercises painfully slow... and when you exclude all the talking and prep work and A-button mashing, you only get about 10 minutes of work out of a 30 minute ""workout.""\n\nI got the ""My Fitness Coach"" and it's much more intense than the wii fit. The balance board is basically a gimmick.\n\nAfter 6 months of light use on the wii fit, I sold mine on ebay for just about what I paid (wasn't in the mood to price gouge). I was disappointed with it and I STILL don't understand all the hype. It's just not that much fun.","[game, be, fun, replacement, real, workout, i, found, time, exercise, painfully, slow, exclude, talk, prep, work, abutton, mash, only, get, minute, work, minute, workout, i, get, fitness, coach, much, more, intense, wii, fit, balance, board, be, basically, gimmick, month, light, use, wii, fit, i, sell, mine, ebay, just, i, paid, wasnt, mood, price, gouge, i, be, disappointed, i, still, dont, understand, hype, just, not, much, fun]",0.625


In [39]:
reviews[['reviewText','lemmas','sentiment_score']].sample(5)

Unnamed: 0,reviewText,lemmas,sentiment_score
497,"1. No english voice overs! Japanese audio only\n2. Short story mode. 1hr is really being generous.\n3. 25 characters playable only.\n4.75 characters are used as battle items with cheap animations.\n5.feels very lazy, no effort put into the US release. This is a copy paste.\n6. Gets boring fast. You can only play the limited modes for so long.\n7. No super saiyan transformations. SS use a separate character slot.\n8.way over priced. At $30 you get the same game Japanese fans own, the only difference is google translation on somevof the text.\n9.irritating challenges to unlock all assist characters. Will take a patient person forever and make the others toss it across the room.\n10. Nothing thorough about the roster,audio,game modes,experience or product. This is a poor copy and past cash grab for the most gullable fan willing to throw money at anything that reads dragonball z. Simply put this isn't 1999 and there is no excuse for such trash.","[english, voice, over, japanese, audio, only, short, story, mode, be, really, be, generous, character, playable, only, character, be, use, battle, item, cheap, animation, 5feels, very, lazy, effort, put, release, be, copy, paste, get, boring, fast, only, play, limited, mode, so, long, super, saiyan, transformation, s, use, separate, character, slot, price, get, same, game, japanese, fan, own, only, difference, be, google, translation, somevof, text, 9irritating, challenge, unlock, assist, character, take, patient, person, forever, make, others, toss, room, nothing, thorough, rosteraudiogame, modesexperience, product, be, poor, copy, past, cash, grab, most, gullable, fan, willing, throw, money, anything, read, dragonball, z, simply, put, isnt, ...]",2.25
1301,Servers are messed up. Been this way for over a week. Unacceptable.\nPower of the cloud my ass.,"[server, be, mess, up, be, way, week, unacceptable, power, cloud, as]",-0.5
4002,"Came in great condition, just like it said. I did not have any problems. It came in exactly just like it said in the description.","[come, great, condition, just, say, i, do, not, have, problem, come, exactly, just, say, description]",-0.875
1338,i coudnt standing this game. i played like 15 minutes of it.. its just plain boring.\ni have no clue why they changed it to 3rd person too.,"[i, coudnt, stand, game, i, played, minute, just, plain, boring, i, have, clue, change, person, too]",0.375
2153,"Good-Improved graphics/LOTS of weapons/frequent save points/better music/semi-regenerating health bar/gore/finishing techniques\n\nBad-bland environments/glitches and bugs/camera seems to ""fight"" you/low replay value/VERY little unlockable content/enemy AI resorts to cheap tactics\n\nThis games combat is its saving grace, its addictive, fun, and satisfying with so many ways to off the opposition. The finishers can really bring out the masochist in you. I'm currently on master ninja having played the 1st 3 difficulties. The enemy AI is really intelligent that they'll adjust to always find the most effective way to take out Ryu, most of the time thats resorting to cheap tactics such as 6 enemies aggressively attacking you up front(2 of them looking for an unblockable throw while the remaining 4 relentlessly executes combos) and the remaining enemies stay in the back throwing shuriken after shuriken that explode. You will constantly be attacked as soon as you walk through a door without even sighting an enemy, constant cheap shots like this along with the camera looking at everything but the action will be more than enough to frustrate the avg. gamer. There is no theater mode/concept art, or anything of that nature upon completing the game. The unlockables include costumes(1 per difficulty) and these are color variants not even new costumes. This game seemed like a rushed product, but even after dying so many times you'll still keep playing because the combat was so well made.","[goodimproved, graphicslots, weaponsfrequent, save, pointsbetter, musicsemiregenerating, health, bargorefinishing, technique, badbland, environmentsglitches, bugscamera, seem, fight, youlow, replay, valuevery, little, unlockable, contentenemy, ai, resort, cheap, tactic, game, combat, be, save, grace, addictive, fun, satisfy, so, many, way, opposition, finisher, really, bring, masochist, im, currently, master, ninja, have, played, difficulty, enemy, ai, be, really, intelligent, theyll, adjust, always, find, most, effective, way, take, ryu, most, time, thats, resort, cheap, tactic, such, enemy, aggressively, attack, up, front2, look, unblockable, throw, remain, relentlessly, executes, combo, remain, enemy, stay, back, throw, shuriken, shuriken, explode, constantly, be, attack, soon, walk, door, even, sight, enemy, constant, cheap, shot, ...]",2.875
