# Creating a Module for Sentiment Analysis with NLTK

Now, you just need to run this one time. You can always run it again if you wanted, but now, you are ready to create the sentiment analysis module. Here's the file that we're going to call sentiment_mod.py

In [4]:
#File: sentiment_mod.py

import nltk
import random
#from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify import ClassifierI
from statistics import mode
from nltk.tokenize import word_tokenize



class VoteClassifier(ClassifierI):
    def __init__(self, *classifiers):
        self._classifiers = classifiers

    def classify(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)
        return mode(votes)

    def confidence(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)

        choice_votes = votes.count(mode(votes))
        conf = choice_votes / len(votes)
        return conf


documents_f = open("documents.pickle", "rb")
documents = pickle.load(documents_f)
documents_f.close()




word_features5k_f = open("wordfeatures.pickle", "rb")
word_features = pickle.load(word_features5k_f)
word_features5k_f.close()


def find_features(document):
    words = word_tokenize(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features



featuresets_f = open("featuresets.pickle", "rb")
featuresets = pickle.load(featuresets_f)
featuresets_f.close()

random.shuffle(featuresets)
print(len(featuresets))

testing_set = featuresets[10000:]
training_set = featuresets[:10000]



open_file = open("naive.pickle", "rb")
classifier = pickle.load(open_file)
open_file.close()


open_file = open("mnb.pickle", "rb")
MNB_classifier = pickle.load(open_file)
open_file.close()



open_file = open("bnb.pickle", "rb")
BernoulliNB_classifier = pickle.load(open_file)
open_file.close()


open_file = open("logreg.pickle", "rb")
LogisticRegression_classifier = pickle.load(open_file)
open_file.close()


open_file = open("linsvc.pickle", "rb")
LinearSVC_classifier = pickle.load(open_file)
open_file.close()


open_file = open("SGDV.pickle", "rb")
SGDC_classifier = pickle.load(open_file)
open_file.close()




voted_classifier = VoteClassifier(
                                  classifier,
                                  LinearSVC_classifier,
                                  MNB_classifier,
                                  BernoulliNB_classifier,
                                  LogisticRegression_classifier)




def sentiment(text):
    feats = find_features(text)
    return voted_classifier.classify(feats),voted_classifier.confidence(feats)

10664


So here, there's really nothing new, besides the final function, which is quite simple. This function is the crux of what we will be interacting with from here on out. This function, which we're calling "sentiment," takes one parameter, which is text. From there, we break down the features with the find_features function we created long ago. From there, now all we need to do is use our voted_classifier to return not only the classification, but also the confidence in that classification.

With that, we can now use this file, and the sentiment function as a module. Here's an example script that might utilize the module:

In [6]:
import sentiment_mod as s

print(s.sentiment("This movie was awesome! The acting was great, plot was wonderful, and there were pythons...so yea!"))
print(s.sentiment("This movie was utter junk. There were absolutely 0 pythons. I don't see what the point was at all. Horrible movie, 0/10"))

10664
('neg', 0.6)
('neg', 1.0)


As expected, the movie with pythons obviously did very well with reviewers, and the movie without any pythons was junk. Both of these were with 100% confidence as well.

It took me about 5 seconds to import the module, since we pickled the classifiers, as compared to the 30ish minutes it took without pickling. Yay for pickling. Your time will vary greatly depending on your processor. If you continue down this path, I will just throw out there that **you may also want to look into joblib**.

Now that we have this awesome module, and it works easily, what can we do? I propose we take to Twitter to perform live sentiment analysis!

In [7]:
print(s.sentiment("I am absolutely delighted. The lovely acting combined with the excellent camera work make for an outstanding piece of art!"))

('pos', 1.0)


In [8]:
print(s.sentiment("Well I don't know, I guess they could not really convey their intended messages clearly. I was lost for most of the plot."))

('neg', 1.0)
