### **Data Class**

In [64]:
import random

class Sentiment:
    NEGATIVE = "NEGATIVE"
    NEUTRAL = "NEUTRAL"
    POSITIVE = "POSITIVE"

class Review:
    def __init__(self, text, score):
        self.text = text
        self.score = score
        self.sentiment = self.get_sentiment()
    
    def get_sentiment(self):
        if self.score <= 2:
            return Sentiment.NEGATIVE
        elif self.score == 3:
            return Sentiment.NEUTRAL
        else:
            return Sentiment.POSITIVE

### **Load Data**

In [65]:
import json

file_name = './data/sentiment/Books_small.json'

reviews = []
with open(file_name) as f :
    for line in f:
        review = json.loads(line)
        reviews.append(Review(review['reviewText'], review['overall']))

reviews[0].text

'Da Silva takes the divine by storm with this unique new novel.  She develops a world unlike any others while keeping it firmly in the real world.  This is a very well written and entertaining novel.  I was quite impressed and intrigued by the way that this solid storyline was developed, bringing the readers right into the world of the story.  I was engaged throughout and definitely enjoyed my time spent reading it.I loved the character development in this novel.  Da Silva creates a cast of high school students who actually act like high school students.  I really appreciated the fact that none of them were thrown into situations far beyond their years, nor did they deal with events as if they had decades of life experience under their belts.  It was very refreshing and added to the realism and impact of the novel.  The friendships between the characters in this novel were also truly touching.Overall, this novel was fantastic.  I can&#8217;t wait to read more and to find out what happe

### **Prep Data**

In [66]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(reviews, test_size=0.33, random_state=42)

X_train = [x.text for x in train]
y_train = [y.sentiment for y in train]

X_test = [x.text for x in test]
y_test = [y.sentiment for y in test]

##### Bag of Words Vectorization

In [67]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
X_train_vectors = vectorizer.fit_transform(X_train)
X_test_vectors = vectorizer.transform(X_test)

### **Classification**

#### Linear SVM

In [68]:
from sklearn.svm import SVC

svm = SVC(kernel='linear')
svm.fit(X_train_vectors, y_train)

print(X_test[0])
print(svm.predict(X_test_vectors[0]))

Every new Myke Cole book is better than the last, and this is no exception. If you haven't read the Shadow Ops series before start with Control Point, but go ahead and order Fortress Frontier and Breach Zone as well - you're going to want them.
['POSITIVE']


#### Decision Tree

In [69]:
from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train_vectors, y_train)

decision_tree.predict(X_test_vectors[0])

array(['POSITIVE'], dtype='<U8')

#### Naive Bayes

In [70]:
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import CategoricalNB
from sklearn.naive_bayes import MultinomialNB

gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train_vectors.toarray(), y_train) # Gaussian NB takes dense matrix as input

gaussian_nb.predict(X_test_vectors[0].toarray())

bernoulli_nb = BernoulliNB()
bernoulli_nb.fit(X_train_vectors.toarray(), y_train)

bernoulli_nb.predict(X_test_vectors[0].toarray())

categorical_nb = CategoricalNB()
categorical_nb.fit(X_train_vectors.toarray(), y_train)

categorical_nb.predict(X_test_vectors[0].toarray())

multinomial_nb = MultinomialNB()
multinomial_nb.fit(X_train_vectors.toarray(), y_train)

multinomial_nb.predict(X_test_vectors[0].toarray())

array(['POSITIVE'], dtype='<U8')

#### Logistic Regression

In [71]:
from sklearn.linear_model import LogisticRegression

logistic = LogisticRegression(max_iter=1000)
logistic.fit(X_train_vectors, y_train)

logistic.predict(X_test_vectors[0])

array(['POSITIVE'], dtype='<U8')

### **Evaluation**

#### Mean Accuracy

In [72]:
print("SVM Score : ", end="")
print(svm.score(X_test_vectors, y_test))
print("Decision Tree Score : ", end="")
print(decision_tree.score(X_test_vectors, y_test))
print("Gaussian Naive Bayes Score : ", end="")
print(gaussian_nb.score(X_test_vectors.toarray(), y_test))
print("Bernoulli Naive Bayes Score : ", end="")
print(bernoulli_nb.score(X_test_vectors.toarray(), y_test))
#print("Categorical Naive Bayes Score : ", end="")
#print(categorical_nb.score(X_test_vectors.toarray(), y_test))
print("Multinomial Naive Bayes Score : ", end="")
print(multinomial_nb.score(X_test_vectors.toarray(), y_test))
print("Logistic Regression Score : ", end="")
print(logistic.score(X_test_vectors, y_test))

SVM Score : 0.8242424242424242
Decision Tree Score : 0.7545454545454545
Gaussian Naive Bayes Score : 0.8121212121212121
Bernoulli Naive Bayes Score : 0.8484848484848485
Multinomial Naive Bayes Score : 0.8575757575757575
Logistic Regression Score : 0.8303030303030303


#### F1 Score

In [73]:
from sklearn.metrics import f1_score

f1_score(y_test, svm.predict(X_test_vectors), average=None, 
         labels=(Sentiment.POSITIVE, Sentiment.NEUTRAL, Sentiment.NEGATIVE))

array([0.91319444, 0.21052632, 0.22222222])