# Analyze Product Sentiment

Here, I'm going to analyze 'Amazon Baby' product' on the basis of the reviews given by consumers.   
I'm going to use Logistic Regression Model of Machine Learning to classify sentiments with two approaches as below:
1. Sentiment Classifier Model using all the words in reviews.
2. Sentiment Classifier Model using the selected words from the reviews.
And at the end, I'll compare both the approaches of analysing sentiments.

In [None]:
! pip install turicreate
import turicreate

from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import HoverTool
output_notebook()

In [None]:
# Reading the data and creating an SFrame of the data
products = turicreate.SFrame.read_csv('../input/amazon-baby-sentiment-analysis/amazon_baby.csv')

# Exploring dataset
products

### Top 10 Amazon Baby Product's Frequency

In [None]:
products.groupby('name',operations={'count':turicreate.aggregate.COUNT()}).sort('count', ascending= False).head(5)

### Distribution of Ratings of the most popular Amazon Baby Product

In [None]:
giraffe_reviews = products[products['name']=='Vulli Sophie the Giraffe Teether']
giraffe_reviews['rating'].show()

# Preprocessing data for Sentiment Analysis

* We will build 'word_count' vector. 

In [None]:
products['word_count'] = turicreate.text_analytics.count_words(products['review'])
products.head(5)

Now, I'm creating a subset of words to create a classifier. Often, ML practitioners will throw out words they consider “unimportant” before training their model. This procedure can often be helpful in terms of accuracy. Here, I'm going to throw out all words except for the very few which indicate sentiments as below. Using so few words in our model will hurt our accuracy, but help us interpret what our classifier is doing.
* I'll build columns for selected words using 'word_count' column

In [None]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
# Loop through word counts to create a classifier for only a few words 
# Created an individual column for each item 
for word in selected_words:
    products[word] = products['word_count'].apply(lambda counts: counts.get(word, 0))

products.head(5)

## Define what is positive and negative sentiment

Let's see ratings distribution of Amazon Baby Products

In [None]:
products['rating'].show()

In [None]:
#ignore all 3*  reviews
products = products[products['rating']!= 3]

#positive sentiment = 4-star or 5-star reviews
products['sentiment'] = products['rating'] >= 4

products.head(5)

## Distribution of Sentiments

In [None]:
products['sentiment'].show()

### Train and Test Split

In [None]:
train_data,test_data = products.random_split(.8,seed=0)                  # using 80% data for trainning and the rest for Testing

# Building a sentiment classifier using all words as Features

In [None]:
# Classification Model using all words
sentiment_model = turicreate.logistic_classifier.create(train_data,target='sentiment', features=['word_count'], validation_set=test_data)

In [None]:
predictions = sentiment_model.classify(test_data)
print (predictions)

## Evaluation of Sentiment Model using all words as Features

In [None]:
roc = sentiment_model.evaluate(test_data, metric= 'roc_curve')
roc

In [None]:
p = figure(title= 'ROC Curve for all words Sentiment Model', plot_width=600, plot_height=400)

p.line(x= roc['roc_curve']['fpr'], y= roc['roc_curve']['tpr'], line_width=2 , legend_label="ROC Curve Class")
p.line([0, 1], [0, 1], line_dash="dotted", line_color="indigo", line_width=2)
p.add_tools(HoverTool(tooltips=[("False Positive Rate", "@x"), ("True Positive Rate", "@y")])) 
p.xaxis.axis_label = 'False Positive Rate'
p.yaxis.axis_label = 'True Positive Rate'
p.legend.location = 'bottom_right'
show(p)

In [None]:
result = sentiment_model.evaluate(test_data)
print ("Accuracy             : {}".format(result['accuracy']))
print ("Area under ROC Curve : {}".format(result['auc']))
print ("Confusion Matrix     : \n{}".format(result['confusion_matrix']))
print ("F1_score             : {}".format(result['f1_score']))
print ("Precision            : {}".format(result['precision']))
print ("Recall               : {}".format(result['recall']))
print ("Log_loss             : {}".format(result['log_loss']))

### Apply the sentiment classifier to better understand the most popular Amazon Baby Product

In [None]:
productsdata = products.copy()
productsdata['predicted_sentiment'] = sentiment_model.predict(productsdata, output_type = 'probability')
# As above identified the most popular Amazon Baby Product is 'Vulli Sophie the Giraffe Teether'
giraffe_reviews = productsdata[productsdata['name']== 'Vulli Sophie the Giraffe Teether']
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)

# Most positive review for most popular Amazon Baby Product
print('Most Positive review for Vulli Sophie the Giraffe Teether:\n\n ', giraffe_reviews[0]['review'])
print('\n\n')
# Most negative review for most popular Amazon Baby Product
print('Most Negative review for Vulli Sophie the Giraffe Teether:\n\n ', giraffe_reviews[-1]['review'])

---

# Building a sentiment classifier using Selected Words as Features

In [None]:
# Features to be trained on selected words Model
selected_words_feat = selected_words

In [None]:
# Classification Model using selected words
selected_words_model = turicreate.logistic_classifier.create(train_data,target='sentiment', features= selected_words_feat, validation_set=test_data)

In [None]:
predictions = selected_words_model.classify(test_data)
print (predictions)

## Evaluation of Sentiment Model using selected words as Features

In [None]:
roc_swm = selected_words_model.evaluate(test_data, metric= 'roc_curve')
roc_swm

In [None]:
p = figure(title= 'ROC Curve for selected words Sentiment Model', plot_width=600, plot_height=400)

p.line(x= roc_swm['roc_curve']['fpr'], y= roc_swm['roc_curve']['tpr'], line_width=2 , legend_label="ROC Curve Class")
p.line([0, 1], [0, 1], line_dash="dotted", line_color="indigo", line_width=2)
p.add_tools(HoverTool(tooltips=[("False Positive Rate", "@x"), ("True Positive Rate", "@y")])) 
p.xaxis.axis_label = 'False Positive Rate'
p.yaxis.axis_label = 'True Positive Rate'
p.legend.location = 'bottom_right'
show(p)

In [None]:
result_swm = selected_words_model.evaluate(test_data)
print ("Accuracy             : {}".format(result_swm['accuracy']))
print ("Area under ROC Curve : {}".format(result_swm['auc']))
print ("Confusion Matrix     : \n{}".format(result_swm['confusion_matrix']))
print ("F1_score             : {}".format(result_swm['f1_score']))
print ("Precision            : {}".format(result_swm['precision']))
print ("Recall               : {}".format(result_swm['recall']))
print ("Log_loss             : {}".format(result_swm['log_loss']))

---
---
# Comparing the two Models
Also finding answers to some queries.

Using the .sum() method on each of the new columns you created, answer the following questions: Out of the selected_words, which one is most used in the dataset? Which one is least used?

In [None]:
for word in selected_words:
    print("\nThe number of times {} appears: {}".format(word, products[word].sum()))

As we can see above, out of the selected_words, the **most used word** in the dataset is <span style="color:blue">'great'</span> and the **least used word** in the dataset is <span style="color:blue">'wow'</span>.

---

### Analysing selected words on the basis of weights learned in selected words classifier Model
Out of the 11 words in selected_words, which one got the most positive weight? Which one got the most negative weight? Do these values make sense for you?

In [None]:
swm_weights= selected_words_model.coefficients.sort(key_column_names='value', ascending=False)
swm_weights.head(5)

In [None]:
print('Out of the 11 words in selected_words, Most Positive: ', 
      swm_weights[swm_weights['value'] == swm_weights['value'].max()]['name'][0])
print('\n')
print('Out of the 11 words in selected_words, Most Negative: ', 
      swm_weights[swm_weights['value'] == swm_weights['value'].min()]['name'][0])

Out of the 11 words in selected_words,
**Most Positive**: <span style="color:blue">'love'</span> and
**Most Negative**: <span style="color:blue">'horrible'</span> 

These values make total sense because love is a great word and horrible is a bad descriptor.    

---

### Interpreting the difference in performance between the models: 
To understand which of the two models performs better, I'll now examine the reviews for a particular product.

* I'll investigate a product named ‘Baby Trend Diaper Champ’. (This is a trash can for soiled baby diapers, which keeps the smell contained.)

* Again, just like 'Vulli Sophie the Giraffe Teether', I'll use the sentiment_model to predict the sentiment of each review in diaper_champ_reviews and then sort the results according to their ‘predicted_sentiment’.     
    
   
* Now I'll find out the ‘predicted_sentiment’ for the most positive and most negative reviews with their reviews  for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the Jupyter Notebook from lecture? Save this result to answer the quiz at the end.   
    
    
* Then I'll use the selected_words_model learned using just the selected_words to predict the sentiment most positive and negative review. Then Compare the value:

In [None]:
# For sentiment_model
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']            # extracts data only product named 'diaper_champ_reviews'
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type = 'probability')
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)
diaper_champ_reviews.head(5)

In [None]:
# Predicted Sentiment for the most positive review 
print('Predicted Sentiment for Most Positive review:  ', diaper_champ_reviews[0]['predicted_sentiment'])
# Most positive review for ‘Baby Trend Diaper Champ’
print('Most positive review for ‘Baby Trend Diaper Champ’:\n\n ', diaper_champ_reviews[0]['review'])
print('\n\n')

# Predicted Sentiment for the most negative review
print('Predicted Sentiment for Most Negative review:  ', diaper_champ_reviews[-1]['predicted_sentiment'])
# Most negative review for ‘Baby Trend Diaper Champ’
print('Most negative review for ‘Baby Trend Diaper Champ’:\n\n ', diaper_champ_reviews[-1]['review'])

In [None]:
# For selected_words_model
dcr_swm = products[products['name'] == 'Baby Trend Diaper Champ']            # extracts data only product named 'diaper_champ_reviews'
dcr_swm['predicted_sentiment'] = selected_words_model.predict(dcr_swm, output_type = 'probability')
dcr_swm = dcr_swm.sort('predicted_sentiment', ascending=False)
dcr_swm.head(5)

In [None]:
# Predicted Sentiment for the most positive review 
print('Predicted Sentiment for Most Positive review:  ', dcr_swm[0]['predicted_sentiment'])
# Most positive review for ‘Baby Trend Diaper Champ’
print('Most positive review for ‘Baby Trend Diaper Champ’:\n\n ', dcr_swm[0]['review'])
print('\n\n')

# Predicted Sentiment for the most negative review
print('Predicted Sentiment for Most Negative review:  ', dcr_swm[-1]['predicted_sentiment'])
# Most negative review for ‘Baby Trend Diaper Champ’
print('Most negative review for ‘Baby Trend Diaper Champ’:\n\n ', dcr_swm[-1]['review'])

In [None]:
dcr_swm[dcr_swm['word_count'] == diaper_champ_reviews['word_count'][0]]

The ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’, according to the <span style="color:green">sentiment_model</span> is <span style="color:blue">0.9999</span> where as the predicted sentiment for the same ‘Baby Trend Diaper Champ’ review, according to  <span style="color:green">selected_words_model</span> is <span style="color:blue">0.7919</span>.      
According to me, the value of the predicted_sentiment for the most positive review found using the the sentiment_model is much more positive than the value predicted using_selected_words_model because none of the selected words appeared in the text of this review.

---

## Accuracy of Majority Class Classifier

In [None]:
def Calculate_y_hat(scores):
    y_hat = []
    for score in scores:
        if score>0:
            y_hat.append(1)
        else:y_hat.append(-1)
    return y_hat

def get_classification_accuracy(model, data, true_labels):
    # First get the predictions
    scores = model.predict(data, output_type='margin')
    
    # Compute the number of correctly classified examples
    count_correct_classified_samples = 0
    y_hat =  Calculate_y_hat(scores)
    
    for i in range(len(scores)):
        if y_hat[i] == true_labels[i]:
            count_correct_classified_samples+=1

    # Then compute accuracy by dividing num_correct by total number of examples
    accuracy = count_correct_classified_samples/(len(scores))
    
    return accuracy

In [None]:
get_classification_accuracy(sentiment_model, test_data, test_data['sentiment'])

In [None]:
get_classification_accuracy(selected_words_model, test_data, test_data['sentiment'])

### Baseline: Majority class prediction
It is quite common to use the **majority class classifier** as the a baseline (or reference) model for comparison with your classifier model. The majority classifier model predicts the majority class for all data points. At the very least, we should healthily beat the majority class classifier, otherwise, the model is (usually) pointless.

While comparing just the Majority Class Classifier, I compare the different learned models with baseline approach as model with selected_word_model performed better than the all word model.

---

### Comparison of Accuracy
As the accuracy score for <span style="color:green">sentiment_model</span> is <span style="color:blue">0.9177</span> where as the accuracy score for <span style="color:green">selected_words_model</span> is <span style="color:blue">0.8464</span>. Definately, 'sentiment_model' means model with all words have better accuracy than 'selected_words_model' means model with selected words.       
Also, the total error (i.e. sum of False Positive and False Negative) for sentiment model is 2741 and for selected_words_model is 5116. Hence, error count in sentiment_model is less than selected_words_model.

---

# Conclusion:
After all those comparision we come to conclusion that, the sentiment_model with all word is better sentiment classifer than the selected_words_model with some selected words. 