# Predicting sentiment from product reviews

## Fire up GraphLab Create

In [3]:
import graphlab

# Read some product review data

Loading reviews for a set of baby products. 

In [4]:
products = graphlab.SFrame('amazon_baby.gl/')

[INFO] This non-commercial license of GraphLab Create is assigned to fer.gonzalez.rodriguez@gmail.comand will expire on September 21, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-1301 - Server binary: /Users/fer_gonzalez_rodriguez/.graphlab/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1444589096.log
[INFO] GraphLab Server Version: 1.6.1


# Let's explore this data together

Data includes the product name, the review text and the rating of the review. 
## Data rating range is [0, 5]

### mio_test: product columns = 3 = name, review, rating


In [5]:
products.head()

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0


# Build the word count vector for each review

In [6]:
#Create a new column with the result of counting the words in 'review'
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [7]:
products.head()

name,review,rating,word_count
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ..."
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ..."
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ..."
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ..."
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ..."
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ..."
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ..."


In [8]:
# mio test
print products.column_types

<bound method SFrame.column_types of Columns:
	name	str
	review	str
	rating	float
	word_count	dict

Rows: 183531

Data:
+-------------------------------+-------------------------------+--------+
|              name             |             review            | rating |
+-------------------------------+-------------------------------+--------+
|    Planetwise Flannel Wipes   | These flannel wipes are OK... |  3.0   |
|     Planetwise Wipe Pouch     | it came early and was not ... |  5.0   |
| Annas Dream Full Quilt wit... | Very soft and comfortable ... |  5.0   |
| Stop Pacifier Sucking with... | This is a product well wor... |  5.0   |
| Stop Pacifier Sucking with... | All of my kids have cried ... |  5.0   |
| Stop Pacifier Sucking with... | When the Binky Fairy came ... |  5.0   |
| A Tale of Baby's Days with... | Lovely book, it's bound ti... |  4.0   |
| Baby Tracker&reg; - Daily ... | Perfect for new parents. W... |  5.0   |
| Baby Tracker&reg; - Daily ... | A friend of mine pinn

In [9]:
graphlab.canvas.set_target('ipynb')

In [10]:
products['name'].show()

# Examining the reviews for most-sold product:
# Explore 'Vulli Sophie the Giraffe Teether'

In [11]:
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']

In [12]:
#len(giraffe_reviews) #785

In [13]:
giraffe_reviews['rating'].show(view='Categorical')

# Build a sentiment classifier

In [14]:
products['rating'].show(view='Categorical')

## STEP 1.- Get the proper data set
### a) Engineering decision: Define what's a positive and a negative sentiment
####                Which values are negative solutions and which are positive solutions:          
####                                               f: y(1, 2, n) --> y {0, 1}

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment: 
+ positive if rating = 4, 5
- negative if rating = 1, 2
* discard if rating = 3 # decision boundary

In [15]:
# sentiment_products = products[products['rating']!=3]    # Why this?? It is not used anymore    

In [16]:
#len(sentiment_products) #166752

In [17]:
# sentiment_products.head()

### b) Set the data set for the sentiment analysis: X = review; y = sentiment = {0,1}

In [18]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [19]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4

In [20]:
#check dataset
#ds_positive_sentiments = len(products[products['sentiment']==1])
#ds_negative_sentiments = len(products[products['sentiment']==0])
#print "#positive outputs: ", ds_positive_sentiments #positive outputs:  140259
#print "#negative outputs: ", ds_negative_sentiments #negative outputs:  26493
products.head()

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 1, 'forget': 1, 'just': 1, 'food': 1, ...",1


## STEP 2.- Training the data set 
### Training the sentiment classifier

### a) split data sets

In [21]:
train_data, test_data = products.random_split(.8,seed=0)

In [22]:
#print len(train_data) # 133448
#print len(test_data)  # 33304

### b) train the data set
#### b.1) select features (x__i)
#### b.2) select target (y)

In [25]:
sentiment_features = ['word_count'] 
sentiment_target = 'sentiment'
sentiment_model = graphlab.logistic_classifier.create(train_data, 
                                                      target=sentiment_target,
                                                      features=sentiment_features,
                                                      #validation_set=None,
                                                      validation_set=test_data)

PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 1
PROGRESS: Number of unpacked features : 219217
PROGRESS: Number of coefficients    : 219218
PROGRESS: Starting L-BFGS
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 5        | 0.000002  | 2.685423     | 0.841481          | 0.839989            |
PROGRESS: | 2         | 9        | 3.000000  | 5.144905     | 0.947425          | 0.894877            |
PROGRESS: | 3         | 10       | 3.000000  | 6.098983     | 0.92

In [26]:
#my_test
sentiment_model.get('coefficients').tail()

name,index,class,value
word_count,bedside.update,1,2.50312371252
word_count,breathing-,1,1.06676294546
word_count,ugggghhhhh.,1,-1.77504583915
word_count,beep!!!!!,1,-1.77504583915
word_count,"(channels,",1,-1.77504583915
word_count,reiterate:,1,-1.77504583915
word_count,inappropriately,1,-1.77504583915
word_count,work!!!!!!!,1,-1.47869790633
word_count,misusing,1,-1.47869790633
word_count,clicking-beeping,1,-2.70440539012


## STEP 3.- Evaluate the regression algorithm (test set)

In [27]:
sentiment_model_eval_result = sentiment_model.evaluate(test_data)  # e.g. evaluating the X_test
print sentiment_model_eval_result

{'confusion_matrix': Columns:
	target_label	int
	predicted_label	int
	count	int

Rows: 4

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        1        |  1328 |
|      0       |        0        |  4000 |
|      1       |        1        | 26515 |
|      1       |        0        |  1461 |
+--------------+-----------------+-------+
[4 rows x 3 columns]
, 'accuracy': 0.916256305548883}


In [28]:
sentiment_model.show()

In [29]:
sentiment_model.show(view='Evaluation')

###  evalute optimization:  using 'roc_curve' metric

In [30]:
sentiment_model.evaluate(test_data, metric='roc_curve') # optimizartion:  using 'roc_curve' metric

{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +------------------+----------------+------------------+-------+------+
 |    threshold     |      fpr       |       tpr        |   p   |  n   |
 +------------------+----------------+------------------+-------+------+
 |       0.0        | 0.215601503759 | 0.00649003316336 | 28043 | 5320 |
 | 0.0010000000475  | 0.784398496241 |  0.993509966837  | 28043 | 5320 |
 | 0.00200000009499 | 0.745864661654 |  0.992083585922  | 28043 | 5320 |
 | 0.00300000002608 | 0.724812030075 |  0.991085119281  | 28043 | 5320 |
 | 0.00400000018999 | 0.709962406015 |  0.990407588346  | 28043 | 5320 |
 | 0.00499999988824 | 0.699436090226 |  0.989801376458  | 28043 | 5320 |
 | 0.00600000005215 | 0.69022556391  |  0.989195164569  | 28043 | 5320 |
 | 0.00700000021607 | 0.68007518797  |  0.988731590771  | 28043 | 5320 |
 | 0.00800000037998 | 0.670112781955 |  0.98833933602   | 28043 | 5320 |
 | 0.00899999961257 

In [31]:
sentiment_model.show()

In [32]:
sentiment_model.show(view='Evaluation') # TP vs FP plot plus accuracy, precision, ... results

## STEP 4.- Predict new solutions: apply the regression model to predict new values.

### Example.- Applying the learned model to understand sentiment for Giraffe

In [33]:
#my_test
giraffe_reviews.head()

name,review,rating,word_count
Vulli Sophie the Giraffe Teether ...,He likes chewing on all the parts especially the ...,5.0,"{'and': 1, 'all': 1, 'because': 1, 'it': 1, ..."
Vulli Sophie the Giraffe Teether ...,My son loves this toy and fits great in the diaper ...,5.0,"{'and': 1, 'right': 1, 'help': 1, 'just': 1, ..."
Vulli Sophie the Giraffe Teether ...,There really should be a large warning on the ...,1.0,"{'and': 2, 'all': 1, 'would': 1, 'latex.': 1, ..."
Vulli Sophie the Giraffe Teether ...,All the moms in my moms' group got Sophie for ...,5.0,"{'and': 2, 'one!': 1, 'all': 1, 'love': 1, ..."
Vulli Sophie the Giraffe Teether ...,I was a little skeptical on whether Sophie was ...,5.0,"{'and': 3, 'all': 1, 'months': 1, 'old': 1, ..."
Vulli Sophie the Giraffe Teether ...,I have been reading about Sophie and was going ...,5.0,"{'and': 6, 'seven': 1, 'already': 1, 'love': 1, ..."
Vulli Sophie the Giraffe Teether ...,My neice loves her sophie and has spent hours ...,5.0,"{'and': 4, 'drooling,': 1, 'love': 1, ..."
Vulli Sophie the Giraffe Teether ...,What a friendly face! And those mesmerizing ...,5.0,"{'and': 3, 'chew': 1, 'be': 1, 'is': 1, ..."
Vulli Sophie the Giraffe Teether ...,We got this just for my son to chew on instea ...,5.0,"{'chew': 2, 'seemed': 1, 'because': 1, 'about.': ..."
Vulli Sophie the Giraffe Teether ...,"My baby seems to like this toy, but I could ...",3.0,"{'and': 2, 'already': 1, 'some': 1, 'it': 3, ..."


In [34]:
#my_test
sentiment_model.predict(giraffe_reviews)

dtype: int
Rows: 785
[1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, ... ]

In [35]:
giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type='probability')

In [36]:
giraffe_reviews.head()

name,review,rating,word_count,predicted_sentiment
Vulli Sophie the Giraffe Teether ...,He likes chewing on all the parts especially the ...,5.0,"{'and': 1, 'all': 1, 'because': 1, 'it': 1, ...",0.999513023521
Vulli Sophie the Giraffe Teether ...,My son loves this toy and fits great in the diaper ...,5.0,"{'and': 1, 'right': 1, 'help': 1, 'just': 1, ...",0.999320678306
Vulli Sophie the Giraffe Teether ...,There really should be a large warning on the ...,1.0,"{'and': 2, 'all': 1, 'would': 1, 'latex.': 1, ...",0.013558811687
Vulli Sophie the Giraffe Teether ...,All the moms in my moms' group got Sophie for ...,5.0,"{'and': 2, 'one!': 1, 'all': 1, 'love': 1, ...",0.995769474148
Vulli Sophie the Giraffe Teether ...,I was a little skeptical on whether Sophie was ...,5.0,"{'and': 3, 'all': 1, 'months': 1, 'old': 1, ...",0.662374415673
Vulli Sophie the Giraffe Teether ...,I have been reading about Sophie and was going ...,5.0,"{'and': 6, 'seven': 1, 'already': 1, 'love': 1, ...",0.999997148186
Vulli Sophie the Giraffe Teether ...,My neice loves her sophie and has spent hours ...,5.0,"{'and': 4, 'drooling,': 1, 'love': 1, ...",0.989190989536
Vulli Sophie the Giraffe Teether ...,What a friendly face! And those mesmerizing ...,5.0,"{'and': 3, 'chew': 1, 'be': 1, 'is': 1, ...",0.999563518413
Vulli Sophie the Giraffe Teether ...,We got this just for my son to chew on instea ...,5.0,"{'chew': 2, 'seemed': 1, 'because': 1, 'about.': ...",0.970160542725
Vulli Sophie the Giraffe Teether ...,"My baby seems to like this toy, but I could ...",3.0,"{'and': 2, 'already': 1, 'some': 1, 'it': 3, ...",0.195367644588


## Sort the reviews based on the predicted sentiment and explore

In [37]:
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)

In [38]:
giraffe_reviews.head()

name,review,rating,word_count,predicted_sentiment
Vulli Sophie the Giraffe Teether ...,"Sophie, oh Sophie, your time has come. My ...",5.0,"{'giggles': 1, 'all': 1, ""violet's"": 2, 'bring': ...",1.0
Vulli Sophie the Giraffe Teether ...,I'm not sure why Sophie is such a hit with the ...,4.0,"{'adoring': 1, 'find': 1, 'month': 1, 'bright': 1, ...",0.999999999703
Vulli Sophie the Giraffe Teether ...,I'll be honest...I bought this toy because all the ...,4.0,"{'all': 2, 'discovered': 1, 'existence.': 1, ...",0.999999999392
Vulli Sophie the Giraffe Teether ...,We got this little giraffe as a gift from a ...,5.0,"{'all': 2, ""don't"": 1, '(literally).so': 1, ...",0.99999999919
Vulli Sophie the Giraffe Teether ...,As a mother of 16month old twins; I bought ...,5.0,"{'cute': 1, 'all': 1, 'reviews.': 2, 'just' ...",0.999999998657
Vulli Sophie the Giraffe Teether ...,Sophie the Giraffe is the perfect teething toy. ...,5.0,"{'just': 2, 'both': 1, 'month': 1, 'ears,': 1, ...",0.999999997108
Vulli Sophie the Giraffe Teether ...,Sophie la giraffe is absolutely the best toy ...,5.0,"{'and': 5, 'the': 1, 'all': 1, 'that': 2, ...",0.999999995589
Vulli Sophie the Giraffe Teether ...,My 5-mos old son took to this immediately. The ...,5.0,"{'just': 1, 'shape': 2, 'mutt': 1, '""dog': 1, ...",0.999999995573
Vulli Sophie the Giraffe Teether ...,My nephews and my four kids all had Sophie in ...,5.0,"{'and': 4, 'chew': 1, 'all': 1, 'perfect;': 1, ...",0.999999989527
Vulli Sophie the Giraffe Teether ...,Never thought I'd see my son French kissing a ...,5.0,"{'giggles': 1, 'all': 1, 'out,': 1, 'over': 1, ...",0.999999985069


## Most positive reviews for the giraffe

In [39]:
giraffe_reviews[0]['review']

"Sophie, oh Sophie, your time has come. My granddaughter, Violet is 5 months old and starting to teeth. What joy little Sophie brings to Violet. Sophie is made of a very pliable rubber that is sturdy but not tough. It is quite easy for Violet to twist Sophie into unheard of positions to get Sophie into her mouth. The little nose and hooves fit perfectly into small mouths, and the drooling has purpose. The paint on Sophie is food quality.Sophie was born in 1961 in France. The maker had wondered why there was nothing available for babies and made Sophie from the finest rubber, phthalate-free on St Sophie's Day, thus the name was born. Since that time millions of Sophie's populate the world. She is soft and for babies little hands easy to grasp. Violet especially loves the bumpy head and horns of Sophie. Sophie has a long neck that easy to grasp and twist. She has lovely, sizable spots that attract Violet's attention. Sophie has happy little squeaks that bring squeals of delight from Viol

In [40]:
giraffe_reviews[1]['review']

"I'm not sure why Sophie is such a hit with the little ones, but my 7 month old baby girl is one of her adoring fans.  The rubber is softer and more pleasant to handle, and my daughter has enjoyed chewing on her legs and the nubs on her head even before she started teething.  She also loves the squeak that Sophie makes when you squeeze her.  Not sure what it is but if Sophie is amongst a pile of her other toys, my daughter will more often than not reach for Sophie.  And I have the peace of mind of knowing that only edible and safe paints and materials have been used to make Sophie, as opposed to Bright Starts and other baby toys made in China.  Now that the research is out on phthalates and other toxic substances in baby toys, I think it's more important than ever to find good quality toys that are also safe for our babies to handle and put in their mouths.  Sophie is a must-have for every new mom in my opinion.  Even if your kid is one of the few that can take or leave her, it's worth

## Show most negative reviews for giraffe

In [41]:
giraffe_reviews[-1]['review']

"My son (now 2.5) LOVED his Sophie, and I bought one for every baby shower I've gone to. Now, my daughter (6 months) just today nearly choked on it and I will never give it to her again. Had I not been within hearing range it could have been fatal. The strange sound she was making caught my attention and when I went to her and found the front curved leg shoved well down her throat and her face a purply/blue I panicked. I pulled it out and she vomited all over the carpet before screaming her head off. I can't believe how my opinion of this toy has changed from a must-have to a must-not-use. Please don't disregard any of the choking hazard comments, they are not over exaggerated!"

In [42]:
giraffe_reviews[-2]['review']

"This children's toy is nostalgic and very cute. However, there is a distinct rubber smell and a very odd taste, yes I tried it, that my baby did not enjoy. Also, if it is soiled it is extremely difficult to clean as the rubber is a kind of porus material and does not clean well. The final thing is the squeaking device inside which stopped working after the first couple of days. I returned this item feeling I had overpaid for a toy that was defective and did not meet my expectations. Please do not be swayed by the cute packaging and hype surounding it as I was. One more thing, I was given a full refund from Amazon without any problem."

# Programming assignment: Analyzing product sentiment
### Perform the sentiment analysis using as features a selected group of words, instead using all the words

## Step 1.- Set the features (selected words) 

In [43]:
products.head()

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 1, 'forget': 1, 'just': 1, 'food': 1, ...",1


In [44]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [86]:
def word_counter(a_word, a_dict):
    """ \param a_dit: dictionary of pairs (word, number_of_ocurrences)
        \return: number of ocurrences of 'a_word'in the dictionary 'a_dict'"""
    if not (a_word in a_dict):
        return 0
    else:
        return a_dict[a_word]

In [87]:
#test word_counter
#print products['word_count'][0]
assert(word_counter('awesome', products['word_count'][0]) == 0)
assert(word_counter('love', products['word_count'][0]) == 1)
assert(awesome_counter(products['word_count'][0]) == 0)

In [89]:
for a_word in selected_words:
    products[a_word] = products['word_count'].apply(lambda x: word_counter(a_word, x))

In [90]:
products.head()

name,review,rating,word_count,sentiment,awesome
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",1,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",1,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",1,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...",1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ...",1,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 1, 'forget': 1, 'just': 1, 'food': 1, ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0


## Quizz result 1

In [93]:
result_1_dict={}
for a_word in selected_words:
    result_1_dict[a_word]= sum(products[a_word])

In [94]:
result_1_max_value = max(result_1_dict.values())
result_1_min_value = min(result_1_dict.values())
print result_1_dict

result_1_inv_dict={}
for k,v in result_1_dict.items():
     result_1_inv_dict[v]=k
result_1_inv_dict

quizz_result_1={}
quizz_result_1['most_used'] = (result_1_inv_dict[result_1_max_value], result_1_max_value)
quizz_result_1['least_used'] = (result_1_inv_dict[result_1_min_value], result_1_min_value)

{'fantastic': 873, 'love': 40277, 'bad': 3197, 'awesome': 2002, 'great': 42420, 'terrible': 673, 'amazing': 1305, 'horrible': 659, 'awful': 345, 'hate': 1057, 'wow': 131}


In [95]:
print "QUIZZ_RESULT_1: ", quizz_result_1

QUIZZ_RESULT_1:  {'least_used': ('wow', 131), 'most_used': ('great', 42420)}


## Step 2.- Create a new sentiment analysis model using only the selected_words as features

In [96]:
 train_data, test_data = products.random_split(.8,seed=0)

In [97]:
#assigment_sentiment_features = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
#['word_count'] 
selected_words_model = graphlab.logistic_classifier.create(train_data, 
                                                      target='sentiment',
                                                      features=selected_words,
                                                      validation_set=test_data)

PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 11
PROGRESS: Number of unpacked features : 11
PROGRESS: Number of coefficients    : 12
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 2        | 0.269452     | 0.844299          | 0.842842            |
PROGRESS: | 2         | 3        | 0.420440     | 0.844186          | 0.842842            |
PROGRESS: | 3         | 4        | 0.575440     | 0.844276          | 0.843142            |
PROGRESS: | 4         | 5        |

## Quizz result 2

In [99]:
selected_words_model.get('coefficients')

name,index,class,value
(intercept),,1,1.36728315229
awesome,,1,1.05800888878
great,,1,0.883937894898
fantastic,,1,0.891303090304
amazing,,1,0.892802422508
love,,1,1.39989834302
horrible,,1,-1.99651800559
bad,,1,-0.985827369929
terrible,,1,-2.09049998487
awful,,1,-1.76469955631


In [100]:
result_2_table = selected_words_model.get('coefficients').sort('value', ascending=False)
result_2_table.print_rows(num_rows=12, num_columns=4)

+-------------+-------+-------+------------------+
|     name    | index | class |      value       |
+-------------+-------+-------+------------------+
|     love    |  None |   1   |  1.39989834302   |
| (intercept) |  None |   1   |  1.36728315229   |
|   awesome   |  None |   1   |  1.05800888878   |
|   amazing   |  None |   1   |  0.892802422508  |
|  fantastic  |  None |   1   |  0.891303090304  |
|    great    |  None |   1   |  0.883937894898  |
|     wow     |  None |   1   | -0.0541450123333 |
|     bad     |  None |   1   | -0.985827369929  |
|     hate    |  None |   1   |  -1.40916406276  |
|    awful    |  None |   1   |  -1.76469955631  |
|   horrible  |  None |   1   |  -1.99651800559  |
|   terrible  |  None |   1   |  -2.09049998487  |
+-------------+-------+-------+------------------+
[12 rows x 4 columns]



In [101]:
quizz_result_2={}
quizz_result_2['most_positive_weight'] = (result_2_table[0]['name'], result_2_table[0]['value'])
quizz_result_2['most_negative_weight'] = (result_2_table[-1]['name'], result_2_table[-1]['value'])

In [102]:
print "QUIZZ_RESULT_2: ", quizz_result_2

QUIZZ_RESULT_2:  {'most_positive_weight': ('love', 1.399898343017463), 'most_negative_weight': ('terrible', -2.090499984872607)}


## Step 3. Comparing the accuracy of different sentiment analysis model

In [103]:
selected_words_model_eval_result = selected_words_model.evaluate(test_data)
print selected_words_model_eval_result

{'confusion_matrix': Columns:
	target_label	int
	predicted_label	int
	count	int

Rows: 4

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  234  |
|      0       |        1        |  5094 |
|      1       |        1        | 27846 |
|      1       |        0        |  130  |
+--------------+-----------------+-------+
[4 rows x 3 columns]
, 'accuracy': 0.8431419649291376}


In [104]:
selected_words_model_eval_result_roc_curve = selected_words_model.evaluate(test_data, metric='roc_curve')
print selected_words_model_eval_result_roc_curve

{'roc_curve': Columns:
	threshold	float
	fpr	float
	tpr	float
	p	int
	n	int

Rows: 1001

Data:
+------------------+----------------+-------------------+-------+------+
|    threshold     |      fpr       |        tpr        |   p   |  n   |
+------------------+----------------+-------------------+-------+------+
|       0.0        |      0.0       | 3.57334286225e-05 | 27985 | 5367 |
| 0.0010000000475  |      1.0       |   0.999964266571  | 27985 | 5367 |
| 0.00200000009499 | 0.999813676169 |   0.999964266571  | 27985 | 5367 |
| 0.00300000002608 | 0.999813676169 |   0.999964266571  | 27985 | 5367 |
| 0.00400000018999 | 0.999627352338 |   0.999928533143  | 27985 | 5367 |
| 0.00499999988824 | 0.999627352338 |   0.999928533143  | 27985 | 5367 |
| 0.00600000005215 | 0.999441028508 |   0.999892799714  | 27985 | 5367 |
| 0.00700000021607 | 0.999441028508 |   0.999892799714  | 27985 | 5367 |
| 0.00800000037998 | 0.999441028508 |   0.999892799714  | 27985 | 5367 |
| 0.00899999961257 | 0.999441

In [105]:
selected_words_model.show(view='Evaluation') # TP vs FP plot plus accuracy, precision, ... results

## Quizz result 3
#### q3_1) What is the accuracy of the selected_words_model on the test_data? 
#### q3_2) What was the accuracy of the sentiment_model that we learned using all the word counts in the IPython Notebook above from the lectures? 
#### q3_3) What is the accuracy majority class classifier on this task?
#### q3_4) How do you compare the different learned models with the baseline approach where we are just predicting the majority class? 

#### q3_1) What is the accuracy of the selected_words_model on the test_data? 

In [106]:
quizz_result_3={}
q3_1= selected_words_model_eval_result['accuracy']
print q3_1

0.843141964929


#### q3_2) What was the accuracy of the sentiment_model that we learned using all the word counts in the IPython Notebook above from the lectures? 

In [107]:
selected_words_model_eval_result = selected_words_model.evaluate(test_data)
q3_2= sentiment_model_eval_result['accuracy']
print q3_2

0.916256305549


#### q3_3) What is the accuracy majority class classifier on this task?

In [108]:
print selected_words_model_eval_result['confusion_matrix']
print selected_words_model_eval_result['confusion_matrix']['count']

total_examples = selected_words_model_eval_result['confusion_matrix']['count'].sum()
y_0_class = selected_words_model_eval_result['confusion_matrix']['count'][0]+ selected_words_model_eval_result['confusion_matrix']['count'][1]
y_1_class = selected_words_model_eval_result['confusion_matrix']['count'][2]+ selected_words_model_eval_result['confusion_matrix']['count'][3]


print "total_examples = ", total_examples
print "y_0_class = ", y_0_class
print "y_1_class = ", y_1_class

+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  234  |
|      0       |        1        |  5094 |
|      1       |        1        | 27846 |
|      1       |        0        |  130  |
+--------------+-----------------+-------+
[4 rows x 3 columns]

[234, 5094, 27846, 130]
total_examples =  33304
y_0_class =  5328
y_1_class =  27976


In [109]:
q3_3 = 0
count_majority_class = 0
if y_1_class>= y_0_class:
    q3_3 = 1 # y_1_class >> y_0_class
    count_majority_class = float (y_1_class)
else:
    q3_3 = 0
    count_majority_class = float (y_0_class)

#### q3_4) How do you compare the different learned models with the baseline approach where we are just predicting the majority class? 

In [110]:
print" q3_4 answer: Comparation: accuracy_majority_class vs accuracy_predicted_model"
print" Interpretation: if accuracy_predicted_model is less than accuracy_majority_class, then the model is useless."

print "\nTEST:"
#print "total_examples = ", total_examples
#print "count_majority_class", count_majority_class

assert(type(count_majority_class) is float)
majority_class_accuracy = count_majority_class / float(total_examples)
print "majority_class.accuracy", majority_class_accuracy
print "select_words_model.accuracy", q3_1
print "all_words_model.accuracy.accuracy", q3_2


 q3_4 answer: Comparation: accuracy_majority_class vs accuracy_predicted_model
 Interpretation: if accuracy_predicted_model is less than accuracy_majority_class, then the model is useless.

TEST:
majority_class.accuracy 0.840019216911
select_words_model.accuracy 0.843141964929
all_words_model.accuracy.accuracy 0.916256305549


In [111]:
quizz_result_3['select_words_model.accuracy'] = q3_1
quizz_result_3['all_words_model.accuracy'] = q3_2
quizz_result_3['majority_class.accuracy'] = majority_class_accuracy
quizz_result_3['accuracy majority class classifier'] = q3_3
quizz_result_3['accuracy comparation'] = "The model learned using all words performed much better than the other two. The other two approaches performed about the same."

print "QUIZZ_RESULT_3:", quizz_result_3

QUIZZ_RESULT_3: {'accuracy comparation': 'The model learned using all words performed much better than the other two. The other two approaches performed about the same.', 'select_words_model.accuracy': 0.8431419649291376, 'all_words_model.accuracy': 0.916256305548883, 'majority_class.accuracy': 0.8400192169108815, 'accuracy majority class classifier': 1}


## Step 4. Interpreting the difference in performance between the models

In [112]:
selected_words_model_eval_result_roc_curve = selected_words_model.evaluate(test_data, metric='roc_curve')
selected_words_model.show(view='Evaluation') # TP vs FP plot plus accuracy, precision, ... results

In [113]:
sentiment_model.evaluate(test_data, metric='roc_curve')
sentiment_model.show(view='Evaluation')

## Data Subset: product 'Baby Trend Diaper Champ'

In [114]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

In [115]:
#print len(diaper_champ_reviews) #298
diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'son': 1, 'just': 2, 'less': 1, '-': 3, ...",1,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'material)': 1, 'bags,': 1, 'less': 1, 'when': 3, ...",0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'control': 1, 'am': 1, 'it': 1, 'used': 1, ' ...",1,0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'and': 3, 'over.': 1, 'all': 1, 'bags.': 1, ...",1,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1, '-': 3, 'both': 1, 'results': 1, ...",1,0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,"{'lysol': 1, 'all': 1, 'mom.': 1, 'busy': 1, ...",1,0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,"{'all': 1, 'bags.': 1, 'just': 1, ""don't"": 2, ...",0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...",1,0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,"{'and': 2, 'all': 1, 'just': 1, 'is': 2, ' ...",1,0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,"{'hand.': 1, 'both': 1, '(required': 1, 'befo ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,1,0,0,0,0,0,0
0,0,0,0,1,0,0,0,0,0
0,0,0,0,0,1,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0


In [116]:
#mio_test 
diaper_champ_reviews['rating'].show(view='Categorical')
diaper_champ_reviews['sentiment'].show(view='Categorical')
print "My analysis: ", "\nmajority class is '1'; ", "\nmajority class accuracy = 83.221", "\ngoal: predict with accuracy greater than 83.221"

My analysis:  
majority class is '1';  
majority class accuracy = 83.221 
goal: predict with accuracy greater than 83.221


## sentiment_model prediction

In [117]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type='probability')
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)
diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1, 'less': 1, ""friend's"": 1, '(which': ...",1,0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'just': 1, 'over': 1, 'rweek': 1, 'sooo': 1, ...",1,0
Baby Trend Diaper Champ,We researched all of the different types of di ...,4.0,"{'all': 2, 'just': 4, ""don't"": 2, 'one,': 1, ...",1,0
Baby Trend Diaper Champ,My baby is now 8 months and the can has been ...,5.0,"{""don't"": 1, 'able': 2, 'over': 1, 'soon': 1, ...",1,0
Baby Trend Diaper Champ,"This is absolutely, by far, the best diaper ...",5.0,"{'just': 3, 'money': 1, 'still': 3, 'fine': 1, ...",1,0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'son': 2, 'all': 1, 'bags.': 1, 'son,': 1, ...",1,0
Baby Trend Diaper Champ,Wow! This is fabulous. It was a toss-up between ...,5.0,"{'and': 4, 'this': 3, 'stink': 1, 'garbage' ...",1,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...",1,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1, '-': 3, 'both': 1, 'results': 1, ...",1,0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'all': 1, 'humid': 1, 'just': 1, 'less': 1, ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,predicted_sentiment
0,0,0,0,0,0,0,0,0,0,0.999999937267
0,0,0,1,0,0,0,0,0,0,0.999999917406
0,0,0,0,0,1,0,0,0,0,0.999999899509
2,0,0,0,0,1,0,0,0,0,0.999999836182
0,0,0,2,0,0,0,0,0,0,0.999999824745
0,0,0,0,0,0,0,0,0,0,0.999999759315
0,0,0,0,0,0,0,0,0,0,0.999999692111
0,0,0,0,0,0,0,0,0,0,0.999999642488
0,0,0,0,1,0,0,0,0,0,0.999999604504
0,0,0,1,0,0,0,0,0,0,0.999999486804


## Quizz result 4
#### q4_1) What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture?

In [118]:
q4_1 = diaper_champ_reviews[0]['predicted_sentiment']
print q4_1 # 0.999999937267
#diaper_champ_reviews[0]['review']
assert(sentiment_model.predict(diaper_champ_reviews[0:1], output_type='probability') == q4_1) #just first row

0.999999937267


#### q4_2) What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the selected_words_model you learned using just the selected_words? 

In [119]:
# just show the result, do not create a new column in the dataset
q4_2 = selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')
print q4_2
#selected_words_model.predict(diaper_champ_reviews, output_type='probability')

[0.7969408512906712]


#### q4_3) Why is (sentiment_model) prediction much more positive than (selected_words_model) prediction?

In [120]:
 diaper_champ_reviews[0:1]

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1, 'less': 1, ""friend's"": 1, '(which': ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,predicted_sentiment
0,0,0,0,0,0,0,0,0,0,0.999999937267


In [121]:
 diaper_champ_reviews[0:1]['word_count']

dtype: dict
Rows: 1
[{'all': 1, 'less': 1, "friend's": 1, '(which': 1, 'absolutly': 2, 'to': 3, 'easy': 2, 'around': 1, 'deffinite': 1, 'luke': 1, 'champ': 1, 'turns': 1, 'bag': 1, 'quick': 1, 'found': 1, 'where': 1, "isn't": 1, 'because': 1, 'house': 1, 'are': 1, 'best': 2, 'really': 3, '"what': 1, 'what': 1, 'for': 2, 'product,': 1, 'seconds': 1, '3': 1, 'integrated': 1, 'dirty': 1, 'we': 2, 'pad)': 1, 'odor': 1, 'use': 1, 'flat.': 1, 'on': 1, 'of': 2, 'chanp': 1, 'turn': 1, 'free,': 1, 'purchase.great': 1, 'reinforced': 1, 'garbage': 1, 'vie': 1, 'into': 2, 'one': 3, 'economical,': 1, 'smelly': 1, 'ties': 1, 'nursery.': 1, 'little': 1, 'from': 1, 'there': 3, 'bjorn,': 1, 'needed': 1, 'was': 2, 'that': 2, 'smell"': 1, 'bulk': 1, 'fabulous.updatei': 1, 'hesitated': 1, 'graco': 1, 'baby': 3, 'champ,': 2, 'champ.': 1, 'than': 1, 'loved': 1, 'this': 1, 'work': 1, 'useing': 1, 'can': 1, 'pack': 1, 'and': 6, 'purchases': 1, 'bassinet': 1, 'is': 4, 'use,': 1, 'at': 1, 'have': 1, 'in': 2, 'a

In [122]:
 diaper_champ_reviews[0:1]['review']

dtype: str
Rows: 1
['Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend's house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn't any added bulk around the sealing edge of the champ.']

In [123]:
q4_3 = "None of the selected_words appeared in the text of this review. "
print "WARNING: there are selected words but in different case (i.e. 'Great' instead of 'great')\n data manipulation (to lower case all the review) would be necessary before any analysis"

 data manipulation (to lower case all the review) would be necessary before any analysis


In [124]:
quizz_result_4={}
quizz_result_4['1.- "predicted_sentiment" for the most positive review, using the model from lesson notebook'] = q4_1
quizz_result_4['2.- "predicted_sentiment" for the most positive review, using the selected_words_model'] = q4_2
quizz_result_4['3.- Why is (sentiment_model) prediction much more positive than (selected_words_model) prediction?'] = q4_3

print "QUIZZ_RESULT_4:\n"
for k,v in quizz_result_4.items():
    print k, " = ", v

QUIZZ_RESULT_4:

2.- "predicted_sentiment" for the most positive review, using the selected_words_model  =  [0.7969408512906712]
1.- "predicted_sentiment" for the most positive review, using the model from lesson notebook  =  0.999999937267
3.- Why is (sentiment_model) prediction much more positive than (selected_words_model) prediction?  =  None of the selected_words appeared in the text of this review. 
