In [1]:
import random 
import os

class Sentiment:
    NEGATIVE="NEGATIVE"
    NEUTRAL= "NEUTRAL"
    POSITIVE= "POSITIVE"
    
class Review:
    def __init__(self,text,score):
        self.text = text
        self.score = score
        self.sentiment=self.get_sentiment()
        
    def get_sentiment(self):
        if self.score<= 2:
            return Sentiment.NEGATIVE
        elif self.score==3:
            return Sentiment.NEUTRAL
        else:#Score of 4 or 5
            return Sentiment.POSITIVE
        
class ReviewContainer:
    def __init__(self,reviews):
        self.reviews = reviews
    
    def get_text(self):
        return[x.text for x in self.reviews]
    
    def get_sentiment(self):
        return[x.sentiment for x in self.reviews]
        
    
    def evenly_distribute(self):
        negative = list(filter(lambda x: x.sentiment == Sentiment.NEGATIVE, self.reviews))
        positive = list(filter(lambda x: x.sentiment == Sentiment.POSITIVE, self.reviews))        
        positive_shrunk = positive[:len(negative)]
        self.reviews = negative + positive_shrunk
        random.shuffle(self.reviews)


In [2]:
import json
 
file_name=r"D:\program\Documents\books_10000.json"

reviews=[]
with open (file_name) as f:
    for line in f:
        review = json.loads(line)
        reviews.append(Review(review['reviewText'],review['overall']))
        
reviews[5].text  
        
        

'I hoped for Mia to have some peace in this book, but her story is so real and raw.  Broken World was so touching and emotional because you go from Mia\'s trauma to her trying to cope.  I love the way the story displays how there is no "just bouncing back" from being sexually assaulted.  Mia showed us how those demons come for you every day and how sometimes they best you. I was so in the moment with Broken World and hurt with Mia because she was surrounded by people but so alone and I understood her feelings.  I found myself wishing I could give her some of my courage and strength or even just to be there for her.  Thank you Lizzy for putting a great character\'s voice on a strong subject and making it so that other peoples story may be heard through Mia\'s.'

In [3]:
from sklearn.model_selection import train_test_split

train,test =train_test_split(reviews, test_size= 0.33, random_state=42)

train_container = ReviewContainer(train)
test_container = ReviewContainer(test)

In [4]:
train_container.evenly_distribute()
train_x = train_container.get_text()
train_y = train_container.get_sentiment()

test_container.evenly_distribute()
test_x = test_container.get_text()
test_y = test_container.get_sentiment()

print(train_y.count(Sentiment.POSITIVE))
print(train_y.count(Sentiment.NEGATIVE))




436
436


In [5]:
# bag of word

In [6]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

#This book is great
# This book was so bad

vectorizer = TfidfVectorizer()
train_x_vectors =vectorizer.fit_transform(train_x)
#vectorizer.fit(train_x)
#train_x_vectors= vectorizer.transform(train_x)

test_x_vectors = vectorizer.transform(test_x)


print(train_x[0])
print(train_x_vectors[0].toarray())

 

To put things in perspective, after reading two thirds of this book, I noticed a reference to &#34;present day Yugoslavia&#34;. At that point I realized that it was first published in 1982. After over 30 years this book maintains its freshness and relevance on an amazing subject.The level of detail that McCullough brings about the early part of Teddy Roosevelt's life is astonishing. A child that faced illness and developmental problems, went on to achieve so much in his life. And the secret to his success was a mixture of born grit and ambition coupled with unconditional parental love.What a wonderful story. Roosevelt's life is better than fiction.I would highly recommend this book to anybody interested in American history and politics, as well as a parenting book. We can all learn from how the Roosevelts raised their children in the late 1800s.
[[0. 0. 0. ... 0. 0. 0.]]


In [7]:
#classification

In [8]:
# linear svm

from sklearn import svm

clf_svm = svm.SVC(kernel='linear')
clf_svm.fit(train_x_vectors,train_y)

test_x[0]

clf_svm.predict(test_x_vectors[0])

array(['NEGATIVE'], dtype='<U8')

In [9]:
# Decision tree

from sklearn.tree import DecisionTreeClassifier

clf_dec= DecisionTreeClassifier()
clf_dec.fit(train_x_vectors, train_y)

clf_dec.predict(test_x_vectors[0])

array(['POSITIVE'], dtype='<U8')

In [10]:
#Naive Bayes

from sklearn.naive_bayes import GaussianNB


#clf_gnb= DecisionTreeClassifier()
clf_gnb= GaussianNB()
clf_gnb.fit(train_x_vectors.toarray(), train_y)
clf_gnb.predict(test_x_vectors[0].toarray())


array(['NEGATIVE'], dtype='<U8')

In [21]:
print(clf_gnb.score(test_x_vectors.toarray(), test_y))

0.6610576923076923


In [12]:
# logistic regression

from sklearn.linear_model import LogisticRegression

clf_log = LogisticRegression()
clf_log.fit(train_x_vectors, train_y)

clf_log.predict(test_x_vectors[0])


array(['NEGATIVE'], dtype='<U8')

Evalution

In [13]:
#Mean Accuracy
print(clf_svm.score(test_x_vectors, test_y))
print(clf_dec.score(test_x_vectors, test_y))
print(clf_log.score(test_x_vectors, test_y))



0.8076923076923077
0.6225961538461539
0.8052884615384616


In [14]:
#f1 scores
from sklearn.metrics import f1_score

f1_score(test_y, clf_svm.predict(test_x_vectors), average= None , labels =[Sentiment.POSITIVE,Sentiment.NEUTRAL,Sentiment.NEGATIVE])
#f1_score(test_y, clf_log.predict(test_x_vectors), average= None , labels =[Sentiment.POSITIVE,Sentiment.NEUTRAL,Sentiment.NEGATIVE])

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


array([0.80582524, 0.        , 0.80952381])

In [15]:
test_set =['very fun, 5 stars','bad book do not buy','horrible waste of time']
new_test = vectorizer.transform(test_set)

clf_svm.predict(new_test)

array(['POSITIVE', 'NEGATIVE', 'NEGATIVE'], dtype='<U8')

Tuning our model(gridSearch)

In [16]:
from sklearn.model_selection import GridSearchCV

parameters = {'kernel': ('linear','rbf'),'C':(1,4,8,16,32)}

svc = svm.SVC()
clf = GridSearchCV(svc ,parameters, cv= 5)
clf.fit(train_x_vectors, train_y)

In [17]:
print(clf.score(test_x_vectors,test_y))

0.8197115384615384


## Saving Model


In [18]:
import pickle
with open('sentiment_classifier.pk1','wb')as f:
    pickle.dump(clf,f)

In [19]:
# load model

with open('sentiment_classifier.pk1','rb') as f:
    loaded_clf = pickle.load(f)

In [20]:
print(test_x[0])
loaded_clf.predict(test_x_vectors[0])

I received a copy of this book in exchange for an honest review.The first half of this book I thought was quite interesting. There was a good cast of characters and a fast-moving plot, but the second half I didn&#8217;t love quite as much. Fast-paced is always good for me, but the ending especially seemed quite rushed. There was also an odd plot twist that didn&#8217;t seem to fit with the overall theme of the story, and things got quite violent and didn&#8217;t make much sense to me. I enjoyed the dynamic between the two sisters (being a sister myself) but I didn&#8217;t think other relationships needed to be highlighted as much as they were. If the focus stayed on Gemma more I think it could have been a lot more interesting and easier to follow.


array(['NEGATIVE'], dtype='<U8')