# Sentiment Analyzer

### Sentiment Analyzer is the process of determining the sentiment of a given piece of text.
#### Movie review is positive or negative
#### How people feel about particular product, band, topic...

####  Used to analyze : 
	- marketing campaigns
	- Opinion Polls
	- Social media presence
	- Product reviews on e-commerce sites and so on

# Building Sentiment Analyzer

#### - Determine the sentiment of a movie review:
#### - Classifer ( Naive Bayes, ... )
#### - Unique words.
#### - Data as a dictionary.
#### - Divide data into training and testing datasets.
#### - Train the classifier.
#### - Top infotmative words === what words are beign used to denote various reactions.

# What we're building

#### - Classify the reviews into positive or negative.
#### - Informative words to indicate positive and negative reviews.

In [4]:
# import libs
from nltk.corpus import movie_reviews
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy as nltk_accuracy

In [5]:
# Download the corpus
import nltk
nltk.download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\B21Yassine\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


True

In [6]:
# extract features from the input list of words
def extract_features(words):
    return dict([word, True] for word in words )

In [7]:
# load reviews from the corpus
fileids_pos =movie_reviews.fileids('pos')
fileids_neg =movie_reviews.fileids('neg')

In [8]:
# Extract features from the reviews
features_pos =[(extract_features(movie_reviews.words(fileids=[f])), 'Positive') for f in fileids_pos ]
features_neg =[ (extract_features(movie_reviews.words(fileids=[f])), 'Negative' ) for f in fileids_neg ]

In [9]:
# Define the train & test split 80% , 20%
threshold =0.8
num_pos =int(threshold * len(features_pos))
num_neg =int(threshold * len(features_neg)) 

In [10]:
# create training & testing datasets
features_train =features_pos[:num_pos] + features_neg[:num_neg]
features_test =features_pos[num_pos:] + features_neg[num_neg:]

In [11]:
# print the number of datapoints used
print('\nNumber of training datapoints : ', len(features_train))
print('\nNumber of testing datapoints : ', len(features_test))


Number of training datapoints :  1600

Number of testing datapoints :  400


In [12]:
# Train the naive bayes classifier
classifier =NaiveBayesClassifier.train(features_train)

In [13]:
# test input movie reviews
N = 15
print('\nTop ' + str(N) + ' most informative words:' )
for i, item in enumerate(classifier.most_informative_features()):
    print(str(i+1)+ '. '+ item[0])
    if i == N - 1:
        break;


Top 15 most informative words:
1. outstanding
2. insulting
3. vulnerable
4. ludicrous
5. uninvolving
6. avoids
7. astounding
8. fascination
9. symbol
10. animators
11. seagal
12. anna
13. darker
14. affecting
15. idiotic


In [14]:
input_reviews =[
    'the costumes in this movie were greate',
    'I think the story was terrible and characters were very weak',
    'People say that the director of the movie is amazing',
    'This is such an idiotic movie. I will not recomment it to anyone'
]

In [15]:
print('\nMovie Reviews Prediction:')
for review in input_reviews:
    print('Review : ', review)
    
    # Compute the probabilities
    probabilities =classifier.prob_classify(extract_features(review.split()))
    
    # Pick the maximum value
    predicted_sentement =probabilities.max()
    
    # print the output
    print('Predicted sentement : ', predicted_sentement)
    print('Probability : ', round(probabilities.prob(predicted_sentement), 2))


Movie Reviews Prediction:
Review :  the costumes in this movie were greate
Predicted sentement :  Negative
Probability :  0.51
Review :  I think the story was terrible and characters were very weak
Predicted sentement :  Negative
Probability :  0.8
Review :  People say that the director of the movie is amazing
Predicted sentement :  Positive
Probability :  0.6
Review :  This is such an idiotic movie. I will not recomment it to anyone
Predicted sentement :  Negative
Probability :  0.9
