#Predicting sentiment from product reviews

#Fire up GraphLab Create

In [2]:
import graphlab

#Read some product review data

Loading reviews for a set of baby products. 

In [64]:
products = graphlab.SFrame('amazon_baby.gl/')

#Let's explore this data together

Data includes the product name, the review text and the rating of the review. 

In [None]:
products.head()

#Build the word count vector for each review

In [65]:
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [None]:
products.head()

In [None]:
graphlab.canvas.set_target('ipynb')

In [None]:
products['name'].show()

#Examining the reviews for most-sold product:  'Vulli Sophie the Giraffe Teether'

In [5]:
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']

In [None]:
len(giraffe_reviews)

In [None]:
giraffe_reviews['rating'].show(view='Categorical')

#Build a sentiment classifier

In [None]:
products['rating'].show(view='Categorical')

##Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.   

In [6]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [7]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4

In [None]:
products.head()

##Let's train the sentiment classifier

In [None]:
products[products['sentiment'] == 0].head()

In [8]:
train_data,test_data = products.random_split(.8, seed=0)

In [9]:
sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)

PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 1
PROGRESS: Number of unpacked features : 219217
PROGRESS: Number of coefficients    : 219218
PROGRESS: Starting L-BFGS
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 5        | 0.000002  | 2.833824     | 0.841481          | 0.839989            |
PROGRESS: | 2         | 9        | 3.000000  | 4.526444     | 0.947425          | 0.894877            |
PROGRESS: | 3         | 10       | 3.000000  | 5.288993     | 0.92

#Evaluate the sentiment model

In [78]:
sentiment_model.evaluate(test_data)#, metric='roc_curve')

{'accuracy': 0.916256305548883,
 'auc': 0.9446492867438502,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      1       |        0        |  1461 |
 |      0       |        1        |  1328 |
 |      0       |        0        |  4000 |
 |      1       |        1        | 26515 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9500349343413533,
 'log_loss': 0.2610669843242208,
 'precision': 0.9523039902309378,
 'recall': 0.9477766657134686,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+----------------+----------------+-------+------+
 | threshold |      fpr       |      tpr       |   p   |  n   |
 +-----------+----------------+----------------+-------+------+
 |    0.0    |      1.0       |  

In [51]:
sentiment_model.show(view='Evaluation')
#accuracy 0.916

Canvas is accessible via web browser at the URL: http://localhost:14313/index.html
Opening Canvas in default web browser.


#Applying the learned model to understand sentiment for Giraffe

In [11]:
giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type='probability')

In [None]:
giraffe_reviews.head()

##Sort the reviews based on the predicted sentiment and explore

In [12]:
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)

In [63]:
giraffe_reviews.head()

name,review,rating,word_count,predicted_sentiment
Vulli Sophie the Giraffe Teether ...,"Sophie, oh Sophie, your time has come. My ...",5.0,"{'giggles': 1L, 'all': 1L, ""violet's"": 2L, ...",1.0
Vulli Sophie the Giraffe Teether ...,I'm not sure why Sophie is such a hit with the ...,4.0,"{'peace': 1L, 'month': 1L, 'bright': 1L, ...",0.999999999703
Vulli Sophie the Giraffe Teether ...,I'll be honest...I bought this toy because all the ...,4.0,"{'all': 2L, 'pops': 1L, 'existence.': 1L, ...",0.999999999392
Vulli Sophie the Giraffe Teether ...,We got this little giraffe as a gift from a ...,5.0,"{'all': 2L, ""don't"": 1L, '(literally).so': 1L, ...",0.99999999919
Vulli Sophie the Giraffe Teether ...,As a mother of 16month old twins; I bought ...,5.0,"{'cute': 1L, 'all': 1L, 'reviews.': 2L, 'just': ...",0.999999998657
Vulli Sophie the Giraffe Teether ...,Sophie the Giraffe is the perfect teething toy. ...,5.0,"{'just': 2L, 'both': 1L, 'month': 1L, 'ears,': ...",0.999999997108
Vulli Sophie the Giraffe Teether ...,Sophie la giraffe is absolutely the best toy ...,5.0,"{'and': 5L, 'the': 1L, 'all': 1L, 'old': 1L, ...",0.999999995589
Vulli Sophie the Giraffe Teether ...,My 5-mos old son took to this immediately. The ...,5.0,"{'just': 1L, 'shape': 2L, 'mutt': 1L, '""dog': 1L, ...",0.999999995573
Vulli Sophie the Giraffe Teether ...,My nephews and my four kids all had Sophie in ...,5.0,"{'and': 4L, 'chew': 1L, 'all': 1L, 'perfect;': ...",0.999999989527
Vulli Sophie the Giraffe Teether ...,Never thought I'd see my son French kissing a ...,5.0,"{'giggles': 1L, 'all': 1L, 'out,': 1L, 'over': ...",0.999999985069


##Most positive reviews for the giraffe

In [None]:
giraffe_reviews[0]['review']

In [None]:
giraffe_reviews[1]['review']

##Show most negative reviews for giraffe

In [None]:
giraffe_reviews[-1]['review']

In [None]:
giraffe_reviews[-2]['review']

In [None]:
# ASSINGMENT

In [None]:
products.head()

In [None]:
products[10]['word_count']

In [None]:
type(products[0]['word_count'])

In [None]:
products[0]['word_count']['now']

In [16]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [35]:
def awesome_count(word, words_dict):
    if word in words_dict:
        return words_dict[word]
    return 0

In [66]:
#importing partial function -- to pass aditional parametres in apply function
from functools import partial
for word in selected_words:
    products[word] = products['word_count'].apply(partial(awesome_count,word))

In [38]:
products.head()

name,review,rating,word_count,sentiment,awesome
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3L, 'love': 1L, 'it': 2L, 'highly': 1L, ...",1,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2L, 'quilt': 1L, 'it': 1L, 'comfortable': ...",1,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1L, 'and': 3L, 'love': 2L, ...",1,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2L, 'parents!!': 1L, 'all': 2L, 'puppe ...",1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2L, 'cute': 1L, 'help': 2L, 'doll': 1L, ...",1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1L, 'be': 1L, 'is': 1L, 'it': 1L, ' ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'feeding,': 1L, 'and': 2L, 'all': 1L, 'right': ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1L, 'help': 1L, 'give': 1L, 'is': 1L, ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1L, 'all': 1L, 'standarad': 1L, ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 1L, 'forget': 1L, 'just': 1L, ""daughter ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0


In [39]:
for word in selected_words:
    print (word, products[word].sum())

('awesome', 2002L)
('great', 42420L)
('fantastic', 873L)
('amazing', 1305L)
('love', 40277L)
('horrible', 659L)
('bad', 3197L)
('terrible', 673L)
('awful', 345L)
('wow', 131L)
('hate', 1057L)


In [41]:
train_data_selected,test_data_selected = products.random_split(.8, seed = 0)

In [44]:
selected_words_model = graphlab.logistic_classifier.create(train_data_selected,
                                                               target = 'sentiment',
                                                              features = selected_words,
                                                              validation_set = test_data_selected)

PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 11
PROGRESS: Number of unpacked features : 11
PROGRESS: Number of coefficients    : 12
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 2        | 0.227128     | 0.844299          | 0.842842            |
PROGRESS: | 2         | 3        | 0.395281     | 0.844186          | 0.842842            |
PROGRESS: | 3         | 4        | 0.553437     | 0.844276          | 0.843142            |
PROGRESS: | 4         | 5        |

In [77]:
selected_words_model['coefficients'].print_rows(12)

+-------------+-------+-------+------------------+
|     name    | index | class |      value       |
+-------------+-------+-------+------------------+
| (intercept) |  None |   1   |  1.36728315229   |
|   awesome   |  None |   1   |  1.05800888878   |
|    great    |  None |   1   |  0.883937894898  |
|  fantastic  |  None |   1   |  0.891303090304  |
|   amazing   |  None |   1   |  0.892802422508  |
|     love    |  None |   1   |  1.39989834302   |
|   horrible  |  None |   1   |  -1.99651800559  |
|     bad     |  None |   1   | -0.985827369929  |
|   terrible  |  None |   1   |  -2.09049998487  |
|    awful    |  None |   1   |  -1.76469955631  |
|     wow     |  None |   1   | -0.0541450123333 |
|     hate    |  None |   1   |  -1.40916406276  |
+-------------+-------+-------+------------------+
[12 rows x 4 columns]



In [47]:
# Sorting on value column
selected_words_model['coefficients'].sort('value')

name,index,class,value
terrible,,1,-2.09049998487
horrible,,1,-1.99651800559
awful,,1,-1.76469955631
hate,,1,-1.40916406276
bad,,1,-0.985827369929
wow,,1,-0.0541450123333
great,,1,0.883937894898
fantastic,,1,0.891303090304
amazing,,1,0.892802422508
awesome,,1,1.05800888878


In [49]:
selected_words_model.evaluate(test_data_selected)

{'accuracy': 0.8431419649291376,
 'auc': 0.6648096413721418,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  234  |
 |      1       |        0        |  130  |
 |      0       |        1        |  5094 |
 |      1       |        1        | 27846 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.914242563530107,
 'log_loss': 0.405474711036565,
 'precision': 0.8453551912568306,
 'recall': 0.9953531598513011,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+-------+------+
 | threshold | fpr | tpr |   p   |  n   |
 +-----------+-----+-----+-------+------+
 |    0.0    | 1.0 | 1.0 | 27976 | 5328 |
 |   1e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   2e-05   | 1

In [67]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

In [68]:
diaper_champ_reviews.head()

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'just': 2L, 'less': 1L, '-': 3L, 'smell- ...",0,0,0
Baby Trend Diaper Champ,This is a good product to start and very easy to ...,3.0,"{'and': 3L, 'because': 1L, 'old': 1L, 'use.': ...",0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'just': 1L, 'less': 1L, 'when': 3L, 'over': 1L, ...",0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'control': 1L, 'am': 1L, 'it': 1L, 'used': 1L, ...",0,0,0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'and': 3L, 'over.': 1L, 'all': 1L, 'love': 1L, ...",0,0,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1L, 'when': 1L, 'both': 1L, 'results': ...",0,0,0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,"{'lysol': 1L, 'all': 1L, 'mom.': 1L, 'busy': 1L, ...",0,0,0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,"{'all': 1L, 'bags.': 1L, 'just': 1L, ""don't"": 2L, ...",0,0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1L, 'all': 2L, 'bags.': 1L, 'feedback': ...",0,0,0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,"{'and': 2L, 'all': 1L, 'just': 1L, 'is': 2L, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0
0,0,1,0,0,0,0,0
0,0,0,1,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0


In [69]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type='probability')

In [70]:
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)

In [71]:
diaper_champ_reviews.head()

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1L, 'less': 1L, ""friend's"": 1L, '(whi ...",0,0,0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'just': 1L, 'over': 1L, 'rweek': 1L, 'sooo': 1L, ...",0,0,0
Baby Trend Diaper Champ,We researched all of the different types of di ...,4.0,"{'all': 2L, 'just': 4L, ""don't"": 2L, 'one,': 1L, ...",0,0,0
Baby Trend Diaper Champ,My baby is now 8 months and the can has been ...,5.0,"{""don't"": 1L, 'when': 1L, 'over': 1L, 'soon': 1L, ...",0,2,0
Baby Trend Diaper Champ,"This is absolutely, by far, the best diaper ...",5.0,"{'just': 3L, 'money': 1L, 'not': 2L, 'mechanism': ...",0,0,0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'all': 1L, 'bags.': 1L, 'son,': 1L, '(i': 1L, ...",0,0,0
Baby Trend Diaper Champ,Wow! This is fabulous. It was a toss-up between ...,5.0,"{'and': 4L, '""genie"".': 1L, 'since': 1L, ...",0,0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1L, 'all': 2L, 'bags.': 1L, 'feedback': ...",0,0,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1L, 'when': 1L, 'both': 1L, 'results': ...",0,0,0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'taller': 1L, 'bags.': 1L, 'just': 1L, ""don't"": ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,predicted_sentiment
0,0,0,0,0,0,0,0,0.999999937267
0,1,0,0,0,0,0,0,0.999999917406
0,0,0,1,0,0,0,0,0.999999899509
0,0,0,1,0,0,0,0,0.999999836182
0,2,0,0,0,0,0,0,0.999999824745
0,0,0,0,0,0,0,0,0.999999759315
0,0,0,0,0,0,0,0,0.999999692111
0,0,0,0,0,0,0,0,0.999999642488
0,0,1,0,0,0,0,0,0.999999604504
0,1,0,0,0,0,0,0,0.999999486804


In [72]:
diaper_champ_reviews[0:1]

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1L, 'less': 1L, ""friend's"": 1L, '(whi ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,predicted_sentiment
0,0,0,0,0,0,0,0,0.999999937267


In [73]:
selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')

dtype: float
Rows: 1
[0.796940851290671]

In [74]:
diaper_champ_reviews[0:1]['review']

dtype: str
Rows: 1
['Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend's house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn't any added bulk around the sealing edge of the champ.']