# Challenge: evaluate your sentiment classifier
It's time to revisit your classifier from the previous assignment. Using the evaluation techniques we've covered here, look at your classifier's performance in more detail. Then go back and iterate by engineering new features, removing poor features, or tuning parameters. Repeat this process until you have five different versions of your classifier. Once you've iterated, answer these questions to compare the performance of each:

- Do any of your classifiers seem to overfit?
- Which seem to perform the best? Why?
- Which features seemed to be most impactful to performance?
- Write up your iterations and answers to the above questions in a few pages. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
df = pd.read_csv('sentiment_analysis_challenge/amazon_cells_labelled.txt', sep='\t', header=None, names=['review','positive'])

In [34]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import confusion_matrix

In [83]:
vectorizer = CountVectorizer()
data = vectorizer.fit_transform(df.review)
target = df.positive
mnb = MultinomialNB()
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=.3)
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
(y_pred == y_test).sum()/y_test.shape[0]

0.83

In [84]:
c_mat = confusion_matrix(y_test, y_pred)
TN = c_mat[0][0]
TP = c_mat[1][1]
FP = c_mat[0][1]
FN = c_mat[1][0]
TOT = y_test.shape[0]
print('So there\'s {} True Negatives, {} True Positives, {} False Positives and {} False Negatives'.format(TN, TP, FP, FN))

So there's 117 True Negatives, 132 True Positives, 33 False Positives and 18 False Negatives


In [85]:
print('Sensitivity: {}%\nSpecificity: {}%'.format(round(TP/(TP+FP)*100, 2), round(TN/(TN+FN)*100, 2)))

Sensitivity: 80.0%
Specificity: 86.67%


In [86]:
cross_val_score(mnb, data, target, cv=10)

array([0.82, 0.85, 0.85, 0.8 , 0.83, 0.8 , 0.78, 0.8 , 0.85, 0.79])

The above shows that the multinomial naive Bayes method works decently well for this data set when it comes to cross validation. A standard guess the majority would only garner 50% correct. While splitting things into 

In [87]:
len(vectorizer.vocabulary_)

1847

Looks like there are 1847 words in this feature list. That is quite a lot! Wonder if it can predict the Yelp and IMDB better than the IMDB version?

In [88]:
yelp = pd.read_csv('sentiment_analysis_challenge/yelp_labelled.txt', sep='\t', header=None, names=['review','positive'])
imdb = pd.read_csv('sentiment_analysis_challenge/imdb_labelled.txt', sep='\t', header=None, names=['review','positive'])

In [89]:
y_data = vectorizer.transform(yelp.review)
i_data = vectorizer.transform(imdb.review)
y_target = yelp.positive
i_target = imdb.positive

In [90]:
yelp_pred = mnb.predict(y_data)
imdb_pred = mnb.predict(i_data)
print((yelp_pred == y_target).sum()/y_target.shape[0])
print((imdb_pred == i_target).sum()/i_target.shape[0])

0.724
0.6737967914438503


Does look like it performs a bit better than the IMDB version.

In [96]:
def get_auc(actual, prediction):
    fpr, tpr, thresholds = metrics.roc_curve(actual, prediction)
    print(metrics.auc(fpr, tpr))

In [98]:
get_auc(y_test, y_pred)
get_auc(y_target, yelp_pred)
get_auc(i_target, imdb_pred)

0.83
0.7240000000000001
0.6764663784959779
