# Analyze Product sentiment review using linear regression

Using classifier model to analyzing product sentiment, and understanding the types of errors a classifier makes. 

this notebook also explore this application further, training a sentiment analysis model using a set of key polarizing words, verify the weights learned to each of these words, and compare the results of this simpler classifier with those of the one using all of the words.

In [1]:
import turicreate

In [2]:
products = turicreate.SFrame('amazon_baby.sframe')

In [3]:
products

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0


## Exploration Data

In [4]:
products.groupby('name',operations={'count':turicreate.aggregate.COUNT()}).sort('count',ascending=False)

name,count
Vulli Sophie the Giraffe Teether ...,785
"Simple Wishes Hands-Free Breastpump Bra, Pink, ...",562
Infant Optics DXR-5 2.4 GHz Digital Video Baby ...,561
Baby Einstein Take Along Tunes ...,547
Cloud b Twilight Constellation Night ...,520
"Fisher-Price Booster Seat, Blue/Green/Gray ...",489
Fisher-Price Rainforest Jumperoo ...,450
"Graco Nautilus 3-in-1 Car Seat, Matrix ...",419
Leachco Snoogle Total Body Pillow ...,388
"Regalo Easy Step Walk Thru Gate, White ...",374


In [10]:
product_count = products.groupby('name',operations ={'rating_mean': turicreate.aggregate.MEAN('rating')}).sort('rating_mean',ascending=True)
product_count.head()

name,rating_mean
"Tree By Kerri Lee Wooden Windup Music Box Love, ...",1.0
Bugaboo Maxi Cosi Mico Car Seat Adapter ...,1.0
Waterpals Swim Diapers Xx Large 24 to 30 Months ...,1.0
Graco Cuddle Cove Playard - Minnie Mouse ...,1.0
Pawhut Wood Small Freestanding Pet / Dog ...,1.0
Disney Planes Sky Track Challenge Track Set ...,1.0
Safety 1st Prism Video Camera Add-On ...,1.0
Soaker Stopper Diaper Extension - Prevent n ...,1.0
Baby Jogger Jump Seat Canopy ...,1.0
"Momma Rocking Feeding Bottle, Green ...",1.0


In [11]:
product_count.tail()

name,rating_mean
DwellStudio Baby Transportation Hooded ...,5.0
2pcs Mommy's Pal:Extra long High Quality ...,5.0
Kidsline Snug As A Bug 6 Piece Crib Bedding Set ...,5.0
Mommy's Helper Perfect Feeder ...,5.0
Carters Bib Set 3 Car OFF TO GRANDMAS SLOPPY eater ...,5.0
Mod Ladybug 4-Piece Baby Crib Bedding Set ...,5.0
Cloud B Slumber Scented Puppy - Vanilla ...,5.0
Oeuf Organic Mattress,5.0
Cotton Tale Designs 4 Piece Penny Lane Crib ...,5.0
Bedtime Originals Champ Snoopy Bumper ...,5.0


In [15]:
print("count of product = " + str(len(product_count)))

count of product = 32419


## Building a sentiment classifier

### Build word count vector

### 1. All Features

In [6]:
feature_selected = products.copy()

In [7]:
products['word_count'] = turicreate.text_analytics.count_words(products['review'])

#### Making label for each review

In [8]:
products = products[products['rating']!= 3]

In [9]:
products['sentiment'] = products['rating'] >= 4

In [10]:
products

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'highly': 1.0, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'of': 1.0, 'the': 1.0, 'than': 1.0, ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'approach': 2.0, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'thumb': 1.0, 'or': 1.0, 'break': 1.0, 'trying': ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'for': 1.0, 'barnes': 1.0, 'at': 1.0, 'is': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'right': 1.0, 'because': 1.0, 'questions': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'like': 1.0, 'and': 1.0, 'changes': 1.0, 'the': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'in': 1.0, 'pages': 1.0, 'out': 1.0, 'run': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'tracker': 1.0, 'now': 1.0, 'its': 1.0, 'sti ...",1


#### Train our sentiment clasiffier

In [12]:
train_data1,test_data1 = products.random_split(.8,seed=0)

In [13]:
sentiment_model = turicreate.logistic_classifier.create(train_data1,target='sentiment', features=['word_count'], validation_set=test_data1)

In [14]:
sentiment_model.coefficients.sort('value',ascending=False)

name,index,class,value,stderr
word_count,arghhhhhh,1,49.227190347578,
word_count,joovys,1,34.29424821375622,
word_count,screencons,1,29.121712460416557,
word_count,punchers,1,27.52923890802292,
word_count,unpaired,1,27.33385312957874,
word_count,angrily,1,27.33385312957874,
word_count,roboticness,1,27.33385312957874,
word_count,pinkjeep,1,26.72298073818621,
word_count,primobaby,1,25.13082257694997,
word_count,marinate,1,24.59247970497861,


In [15]:
products['predicted_sentiment'] = sentiment_model.predict(products, output_type = 'probability')

In [16]:
products

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'highly': 1.0, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'of': 1.0, 'the': 1.0, 'than': 1.0, ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'approach': 2.0, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'thumb': 1.0, 'or': 1.0, 'break': 1.0, 'trying': ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'for': 1.0, 'barnes': 1.0, 'at': 1.0, 'is': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'right': 1.0, 'because': 1.0, 'questions': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'like': 1.0, 'and': 1.0, 'changes': 1.0, 'the': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'in': 1.0, 'pages': 1.0, 'out': 1.0, 'run': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'tracker': 1.0, 'now': 1.0, 'its': 1.0, 'sti ...",1

predicted_sentiment
0.9997307390047092
0.998508336831661
0.999748904249988
0.9999916625399972
0.9999999514462168
0.9999146735569904
0.9999916615904652
0.9999938843594008
0.9961247617006416
0.9999999920460632


### 2. Feature Selected

In [17]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [18]:
def word_count(selected_word,tc):
    word = dict()
    for words in selected_words:
        count = 0
        if words in tc:
            count = count +1
        word[words]= count
    return word

In [19]:
feature_selected['word_count'] = feature_selected['review'].apply(lambda x:word_count(selected_words,x))

In [20]:
feature_selected

name,review,rating,word_count
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'awesome': 0, 'great': 1, 'fantastic': 0, ..."
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'awesome': 0, 'great': 1, 'fantastic': 0, ..."
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 1, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ..."


#### Making label for each review

1. ignore all 3 * review because this rating is netral

In [22]:
feature_selected = feature_selected[feature_selected['rating']!=3]

2. positif sentiment = 4 * review and 5 * review

In [23]:
feature_selected['sentiment'] = feature_selected['rating'] >= 4

In [24]:
feature_selected

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'awesome': 0, 'great': 1, 'fantastic': 0, ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'awesome': 0, 'great': 1, 'fantastic': 0, ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'awesome': 0, 'great': 0, 'fantastic': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'awesome': 0, 'great': 0, 'fantastic': 0, ...",1


#### Train our sentiment clasiffier

In [26]:
train_data,test_data = feature_selected.random_split(.8,seed=0)

In [27]:
selected_words_model = turicreate.logistic_classifier.create(train_data,target='sentiment',features=['word_count'],validation_set=test_data)

In [28]:
selected_words_model.coefficients.sort('value',ascending=False)

name,index,class,value,stderr
word_count,love,1,1.494022180306391,0.0243886979112559
(intercept),,1,1.2862558461524298,0.0091556266151738
word_count,amazing,1,1.1170805713858951,0.1031511075539952
word_count,awesome,1,1.0849401779580845,0.0898983862510547
word_count,great,1,0.905098468121736,0.0226270430054438
word_count,fantastic,1,0.8699319676015065,0.1185067610957679
word_count,wow,1,-0.7038752364974794,0.2623161759918005
word_count,hate,1,-0.7421436968561138,0.0504496131950641
word_count,bad,1,-1.117435606631241,0.0406778390172422
word_count,awful,1,-1.9674772784818129,0.1027347870563895


In [43]:
selected_words_model.coefficients.sort('value',ascending=True)

name,index,class,value,stderr
word_count,horrible,1,-2.23398102657102,0.084946679078848
word_count,terrible,1,-2.1911443316524357,0.0820731794183429
word_count,awful,1,-1.9674772784818129,0.1027347870563895
word_count,bad,1,-1.117435606631241,0.0406778390172422
word_count,hate,1,-0.7421436968561138,0.0504496131950641
word_count,wow,1,-0.7038752364974794,0.2623161759918005
word_count,fantastic,1,0.8699319676015065,0.1185067610957679
word_count,great,1,0.905098468121736,0.0226270430054438
word_count,awesome,1,1.0849401779580845,0.0898983862510547
word_count,amazing,1,1.1170805713858951,0.1031511075539952


### Evaluation models

In [29]:
selected_words_model.evaluate(test_data)

{'accuracy': 0.844072784049964,
 'auc': 0.6879909023264043,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      1       |        0        |   94  |
 |      0       |        0        |  229  |
 |      0       |        1        |  5099 |
 |      1       |        1        | 27882 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9148087996456519,
 'log_loss': 0.40091362871150377,
 'precision': 0.8453958339650102,
 'recall': 0.9966399771232485,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +-----------+-----+-----+-------+------+
 | threshold | fpr | tpr |   p   |  n   |
 +-----------+-----+-----+-------+------+
 |    0.0    | 1.0 | 1.0 | 27976 | 5328 |
 |   0.001   | 1.0 | 1.0 | 27976 | 5328 |
 |   0.002   | 1

In [30]:
sentiment_model.evaluate(test_data1)

{'accuracy': 0.9176975738650012,
 'auc': 0.9258242975424673,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |  1397 |
 |      1       |        0        |  1344 |
 |      0       |        0        |  3931 |
 |      1       |        1        | 26632 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.951057941255245,
 'log_loss': 0.3304787187232084,
 'precision': 0.9501587641371436,
 'recall': 0.9519588218472976,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +-----------+--------------------+--------------------+-------+------+
 | threshold |        fpr         |        tpr         |   p   |  n   |
 +-----------+--------------------+--------------------+-------+------+
 |    0.0  

<h1 align='center'> End <h1>