# Movie Review Sentiment Analysis using Lexicon

In [1]:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer



Data Overview
For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The data was compiled by Andrew Maas and can be found here: IMDb Reviews. http://ai.stanford.edu/~amaas/data/sentiment/

The data is split evenly with 25k reviews intended for training and 25k for testing your classifier. Moreover, each set has 12.5k positive and 12.5k negative reviews.

IMDb lets users rate movies on a scale from 1 to 10. To label these reviews the curator of the data labeled anything with ≤ 4 stars as negative and anything with ≥ 7 stars as positive. Reviews with 5 or 6 stars were left out.

In [4]:
review=[]
for line in open("imdb_review/full_train.txt",'r',encoding="utf8"):
    review.append(line.strip())
    

In [5]:
review[:5]

['Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!',
 'Homelessness (or Houselessness as George Carlin stated) has been an issue for years but never a plan to help those on the street that were once considered human who did everything fro

## Clean and Preprocess


In [6]:
import re

replace_no_space = re.compile("[.;:!\'?,\"()\[\]]")
replace_with_space = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")


def replace_n_clean(review):
    review = [replace_no_space.sub("",line.lower()) for line in review]
    review = [replace_with_space.sub("",line) for line in review]
    
    return review

review_clean = replace_n_clean(review)


In [7]:
review_clean[:5]

['bromwell high is a cartoon comedy it ran at the same time as some other programs about school life such as teachers my 35 years in the teaching profession lead me to believe that bromwell highs satire is much closer to reality than is teachers the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled  at  high a classic line inspector im here to sack one of your teachers student welcome to bromwell high i expect that many adults of my age think that bromwell high is far fetched what a pity that it isnt',
 'homelessness or houselessness as george carlin stated has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school work or vote for the matter most

In [8]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\m0hd7ah1r\AppData\Roaming\nltk_data...


True

In [9]:
data = pd.DataFrame(data=[Review for Review in review_clean], columns=['Reviews'])

display(data.head(10))


Unnamed: 0,Reviews
0,bromwell high is a cartoon comedy it ran at th...
1,homelessness or houselessness as george carlin...
2,brilliant overacting by lesley ann warren best...
3,this is easily the most underrated film inn th...
4,this is not the typical mel brooks film it was...
5,this isnt the comedic robin williams nor is it...
6,yes its an art to successfully make a slow pac...
7,in this critically acclaimed psychological thr...
8,the night listener 2006 **12 robin williams to...
9,you know robin williams god bless him is const...


In [11]:
sid = SentimentIntensityAnalyzer()


listy = []

for index, row in data.iterrows():
  ss = sid.polarity_scores(row["Reviews"])
  listy.append(ss)
  
se = pd.Series(listy)
data['polarity'] = se.values

display(data.head(10))
display(data.tail(10))

Unnamed: 0,Reviews,polarity
0,bromwell high is a cartoon comedy it ran at th...,"{'neg': 0.043, 'neu': 0.917, 'pos': 0.04, 'com..."
1,homelessness or houselessness as george carlin...,"{'neg': 0.113, 'neu': 0.734, 'pos': 0.153, 'co..."
2,brilliant overacting by lesley ann warren best...,"{'neg': 0.078, 'neu': 0.736, 'pos': 0.186, 'co..."
3,this is easily the most underrated film inn th...,"{'neg': 0.02, 'neu': 0.757, 'pos': 0.223, 'com..."
4,this is not the typical mel brooks film it was...,"{'neg': 0.032, 'neu': 0.791, 'pos': 0.177, 'co..."
5,this isnt the comedic robin williams nor is it...,"{'neg': 0.055, 'neu': 0.745, 'pos': 0.199, 'co..."
6,yes its an art to successfully make a slow pac...,"{'neg': 0.0, 'neu': 0.779, 'pos': 0.221, 'comp..."
7,in this critically acclaimed psychological thr...,"{'neg': 0.028, 'neu': 0.814, 'pos': 0.158, 'co..."
8,the night listener 2006 **12 robin williams to...,"{'neg': 0.097, 'neu': 0.768, 'pos': 0.134, 'co..."
9,you know robin williams god bless him is const...,"{'neg': 0.126, 'neu': 0.677, 'pos': 0.196, 'co..."


Unnamed: 0,Reviews,polarity
24990,yeti curse of the snow demon starts aboard a p...,"{'neg': 0.129, 'neu': 0.701, 'pos': 0.17, 'com..."
24991,hmmm a sports team is in a plane crash gets st...,"{'neg': 0.159, 'neu': 0.772, 'pos': 0.07, 'com..."
24992,i saw this piece of garbage on amc last night ...,"{'neg': 0.109, 'neu': 0.891, 'pos': 0.0, 'comp..."
24993,although the production and jerry jamesons dir...,"{'neg': 0.152, 'neu': 0.754, 'pos': 0.094, 'co..."
24994,capt gallagher lemmon and flight attendant eve...,"{'neg': 0.073, 'neu': 0.775, 'pos': 0.152, 'co..."
24995,towards the end of the movie i felt it was too...,"{'neg': 0.091, 'neu': 0.778, 'pos': 0.132, 'co..."
24996,this is the kind of movie that my enemies cont...,"{'neg': 0.149, 'neu': 0.719, 'pos': 0.132, 'co..."
24997,i saw descent last night at the stockholm film...,"{'neg': 0.159, 'neu': 0.684, 'pos': 0.157, 'co..."
24998,some films that you pick up for a pound turn o...,"{'neg': 0.141, 'neu': 0.72, 'pos': 0.14, 'comp..."
24999,this is one of the dumbest films ive ever seen...,"{'neg': 0.177, 'neu': 0.736, 'pos': 0.087, 'co..."


## Conclusion

As we are aware Top 12500 Reviews are Positive and Bottom are Negative this method mostly convey as neutral. However if ignnore the neutral and compare only positive and negative  polarity. this model 