In [1]:
import turicreate

In [2]:
products = turicreate.SFrame('./data/amazon_baby.sframe')
# word_count['awesome'] if 'awesome' in word_count else 0

In the Jupyter notebook above, we used the word counts for all words in the reviews to train the sentiment classifier model. Now, we are going to follow a similar path, but only use this subset of the words:

```python
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
```

Often, ML practitioners will throw out words they consider “unimportant” before training their model.  This procedure can often be helpful in terms of accuracy.  Here, we are going to throw out all words except for the very few above.  Using so few words in our model will hurt our accuracy, but help us interpret what our classifier is doing. 

In [3]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

## 1.
Use `.apply()` to build a new feature with the counts for each of the selected_words:  
 In the notebook (cf. lesson), we created a column ‘word_count’ with the word counts for each review.  Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method `.apply()` can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!


In [4]:
products['word_count'] = turicreate.text_analytics.count_words(products['review'])

 Our first goal is to create a column `products[‘awesome’]` where each row contains the number of times the word ‘awesome’ showed up in the review for the corresponding product, and 0 if the review didn’t show up.  One way to do this is to look at the each row ‘word_count’ column and follow this logic:  

  - If ‘awesome’ shows up in the word counts for a particular product (row of the products SFrame), then we know how often ‘awesome’ appeared in the review, 
  - If ‘awesome’ doesn’t appear in the word counts, then it didn’t appear in the review, and we should set the count for ‘awesome’ to 0 in this review.  
  
  We could use a for loop to iterate this logic for each row of the products SFrame, but this approach would be really slow, because the SFrame is not optimized for this being accessed with a for loop.  Instead, we will use the `.apply()` method to iterate the logic above for each row of the `products[‘word_count’]` column (which, since it’s a single column, has type SArray).  Read about using the .apply() method on an SArray [here](https://apple.github.io/turicreate/docs/api/generated/turicreate.SArray.apply.html?highlight=apply#turicreate.SArray.apply).  

We are now ready to create our new columns:

  First, we will use a Python function to define the logic above. We will write a function called `awesome_count` which takes in the word counts and returns the number of times ‘awesome’ appears in the reviews.

A few tips:  
  i. Each entry of the ‘word_count’ column is of Python type dictionary.      
  ii. If you have a dictionary called dict, you can access a field in the dictionary using: `dict['awesome']` but only  if ‘awesome’ is one of the fields in the dictionary, otherwise you will get an error.
  iii. In Python, to test if a dictionary has a particular field, you can simply write: `if 'awesome' in dict` and if this condition doesn’t hold, the count of ‘awesome’ should be 0.
  
Using these tips, you can now write the `awesome_count` function. 

In [5]:
def awesome_count(word_count: int) -> int:
  return word_count['awesome'] if 'awesome' in word_count else 0

Next, we use `.apply()` to iterate awesome_count for each row of `products[‘word_count’]` and create a new column called ‘awesome’ with the resulting counts.  Here is what that looks like: 
```python
products['awesome'] = products['word_count'].apply(awesome_count)
```

Repeat this process for the other 11 words in selected_words.  (Here, we described a simple procedure to obtain the counts for each selected_word.  There are other more efficient ways of doing this, and we encourage you to explore this further.)

In [6]:
# let's a lambda (closure) to DRY this column construction
for word in selected_words:
  products[word] = products['word_count'].apply(lambda wc: wc.get(word, 0))

In [7]:
products.head(5)

name,review,rating,word_count,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'handles': 1.0, 'stripping': 1.0, ...",0.0,0.0,0.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'disappointed': 1.0, ...",0.0,0.0,0.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'the': 1.0, 'than': 1.0, 'fu ...",0.0,0.0,0.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'binky': 2.0, ...",0.0,0.0,0.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",0.0,1.0,0.0

amazing,love,horrible,bad,terrible,awful,wow,hate
0.0,0.0,0,0.0,0,0,0,0
0.0,1.0,0,0.0,0,0,0,0
0.0,0.0,0,0.0,0,0,0,0
0.0,2.0,0,0.0,0,0,0,0
0.0,1.0,0,0.0,0,0,0,0


Using the `.sum()` method on each of the new columns you created, answer the following questions:  

Out of the `selected_words`, which one is most used in the dataset?  
Which one is least used? Save these results to answer the quiz at the end.

In [8]:
resp_q1q2 = sorted([(word, int(products[word].sum())) for word in selected_words], 
                   key=lambda t: t[1], reverse=True)
resp_q1q2  

[('great', 59536),
 ('love', 43867),
 ('bad', 4950),
 ('awesome', 4075),
 ('amazing', 2726),
 ('fantastic', 1765),
 ('hate', 1285),
 ('terrible', 1282),
 ('horrible', 1245),
 ('awful', 753),
 ('wow', 461)]

### Q1. Out of the 11 words in selected_words, which one is most used in the reviews in the dataset?
  - ( ) awesome
  - ( ) love
  - ( ) hate
  - ( ) bad
  - **(X) great**
  
### Q2. Out of the 11 words in selected_words, which one is least used in the reviews in the dataset?    
  - **(X) wow**
  - ( ) amazing
  - ( ) terrible
  - ( ) awful
  - ( ) love

# 2.  Create a new sentiment analysis model 
Using only the selected_words as features

In the Jupyter Notebook above, we used word counts for all words as features for our sentiment classifier.  Now, you are just going to use the `selected_words`:

 - Use the same train/test split as in the Jupyter Notebook from lecture.


In [9]:
## ignore all 3*  reviews
products = products[products['rating'] != 3]

## positive sentiment = 4-star or 5-star reviews
products['sentiment'] = products['rating'] >= 4

In [10]:
products.head(4)

name,review,rating,word_count,awesome,great,fantastic
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'disappointed': 1.0, ...",0.0,0.0,0.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'the': 1.0, 'than': 1.0, 'fu ...",0.0,0.0,0.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'binky': 2.0, ...",0.0,0.0,0.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",0.0,1.0,0.0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0.0,1.0,0,0.0,0,0,0,0,1
0.0,0.0,0,0.0,0,0,0,0,1
0.0,2.0,0,0.0,0,0,0,0,1
0.0,1.0,0,0.0,0,0,0,0,1


In [11]:
train_data, test_data = products.random_split(.8, seed=0)
train_data.shape, test_data.shape

((133448, 16), (33304, 16))

Train a logistic regression classifier (`use turicreate.logistic_classifier.create`) using just the `selected_words`.  

*Hint:  you can use this parameter in the `.create()` call to specify the features used to be exactly the new columns you just created: `features=selected_words`*

Call your new model: `selected_words_model`.

In [12]:
selected_words_model = turicreate.logistic_classifier.create(train_data, target='sentiment', features=selected_words, 
                                                             validation_set=test_data)

You will now examine the weights the learned classifier assigned to each of the 11 words in `selected_words` and gain intuition as to what the ML algorithm did for your data using these features. In Turi Create, a learned model, such as the `selected_words_model`, has a field 'coefficients', which lets you look at the learned coefficients. 

In [13]:
selected_words_model.coefficients

name,index,class,value,stderr
(intercept),,1,1.3365913848877602,0.0089299697876565
awesome,,1,1.1335346660341417,0.0839964398318753
great,,1,0.8630655001196592,0.0189550524443769
fantastic,,1,0.885804756881427,0.1116759129339967
amazing,,1,1.1000933113660258,0.0995477626046598
love,,1,1.359268866922512,0.0280683001520992
horrible,,1,-2.2513352367590955,0.0802024938878843
bad,,1,-0.991477880065059,0.0384842866469906
terrible,,1,-2.223661436085129,0.0773173620378575
awful,,1,-2.0529082040313544,0.1009973543525925


The result has a column called `value`, which contains the weight learned for each feature.  

Using this approach, sort the learned coefficients according to the `value` column using .sort().  
Out of the 11 words in selected_words, which one got the most positive weight?  
Which one got the most negative weight?  
Do these values make sense for you?   
Save these results to answer the quiz at the end.

In [14]:
selected_words_model.coefficients['value'].sort(ascending=False)
# positive to negative sentiment

dtype: float
Rows: 12
[1.359268866922512, 1.3365913848877602, 1.1335346660341417, 1.1000933113660258, 0.885804756881427, 0.8630655001196592, -0.009538236067681735, -0.991477880065059, -1.3484407222463144, -2.0529082040313544, -2.2236614360851292, -2.2513352367590955]

In [15]:
## sentiment model (from lecture)
sentiment_model = turicreate.logistic_classifier.create(train_data,target='sentiment', features=['word_count'], 
                                                        validation_set=test_data)

In [16]:
## sentiment_model evaluation
sentiment_model.evaluate(test_data)

{'accuracy': 0.9176975738650012,
 'auc': 0.9258242975424673,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |  1397 |
 |      1       |        0        |  1344 |
 |      0       |        0        |  3931 |
 |      1       |        1        | 26632 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.951057941255245,
 'log_loss': 0.33047871872409346,
 'precision': 0.9501587641371436,
 'recall': 0.9519588218472976,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +-----------+--------------------+--------------------+-------+------+
 | threshold |        fpr         |        tpr         |   p   |  n   |
 +-----------+--------------------+--------------------+-------+------+
 |    0.0 

### Q3. Out of the 11 words in selected_words, which one got the most positive weight in the selected_words_model? 
(Tip: when printing the list of coefficients, make sure to use print_rows(rows=12) to print ALL coefficients.)

  - ( ) amazing
  - ( ) awesome
  - **(x) love**
  - ( ) fantastic
  - ( ) terrible


### Q4. Out of the 11 words in selected_words, which one got the most negative weight in the selected_words_model?
(Tip: when printing the list of coefficients, make sure to use print_rows(rows=12) to print ALL coefficients.)

  - **(x) horrible**
  - ( ) terrible - *Might be this one, if using another library*
  - ( ) awful
  - ( ) hate
  - ( ) love

# 3.  Comparing the accuracy of different sentiment analysis model

What is the accuracy of the `selected_words_model` on the test_data?  
What was the accuracy of the sentiment_model that we learned using all the word counts in the Jupyter Notebook above from the lectures?  
What is the accuracy majority class classifier on this task?  

How do you compare the different learned models with the baseline approach where we are just predicting the majority class?
Save these results to answer the quiz at the end.

*Hint: we discussed the majority class classifier in lecture, which simply predicts that every data point is from the most common class.  This is baseline and something we definitely want to beat with models we learn from data.*  

In [17]:
selected_words_model.evaluate(test_data)

{'accuracy': 0.8463848186404036,
 'auc': 0.6935096220934976,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      1       |        0        |  159  |
 |      0       |        0        |  371  |
 |      0       |        1        |  4957 |
 |      1       |        1        | 27817 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9157860082304526,
 'log_loss': 0.3962265467087378,
 'precision': 0.8487520595594068,
 'recall': 0.9943165570488991,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +-----------+--------------------+-----+-------+------+
 | threshold |        fpr         | tpr |   p   |  n   |
 +-----------+--------------------+-----+-------+------+
 |    0.0    |        1.0         | 1.0 | 27976 | 5328 

In [18]:
## majority class 1
pvc = products['sentiment'].value_counts()
pvc

value,count
1,140259
0,26493


In [19]:
## majority class 2
##    for 1                           for 0:
(pvc['count'][0] / products.shape[0], pvc['count'][1] / products.shape[0])

(0.8411233448474381, 0.15887665515256189)

### Q5. Which of the following ranges contains the accuracy of the selected_words_model on the test_data?

  - ( ) 0.811 to 0.841
  - **(x) 0.841 to 0.871**
  - ( ) 0.871 to 0.901
  - ( ) 0.901 to 0.931

### Q6. Which of the following ranges contains the accuracy of the sentiment_model in the IPython Notebook from lecture on the test_data?

  - ( ) 0.811 to 0.841
  - ( ) 0.841 to 0.871
  - ( ) 0.871 to 0.901
  - **(x) 0.901 to 0.931**

### Q7.  Which of the following ranges contains the accuracy of the majority class classifier, which simply predicts the majority class on the test_data?

  - **(x) 0.811 to 0.843**
  - ( ) 0.843 to 0.871
  - ( ) 0.871 to 0.901
  - ( ) 0.901 to 0.931
  
### Q8. How do you compare the different learned models with the baseline approach where we are just predicting the majority class?   

  - ( ) They all performed about the same.
  - ( ) The model learned using all words performed much better than the one using the only the selected_words.  
     And, the model learned using the selected_words performed much better than just predicting the majority class.  
  - **(x) The model learned using all words performed much better than the other two.  The other two approaches performed about the same.**
  - ( ) Predicting the simply majority class performed much better than the other two models.   

# 4. Interpreting the difference in performance between the models
To understand why the model with all word counts performs better than the one with only the `selected_words`, we will now examine the reviews for a particular product.

  - We will investigate a product named ‘Baby Trend Diaper Champ’.  (This is a trash can for soiled baby diapers, which keeps the smell contained.)
  - Just like we did for the reviews for the giraffe toy in the Jupyter Notebook in the lecture video, before we start our analysis you should select all reviews where the product name is ‘Baby Trend Diaper Champ’.  Let’s call this table `diaper_champ_reviews`.


In [20]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']
diaper_champ_reviews.head(5)   ## let's look at the first 5 rows

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'convenient': 1.0, 'more': 1.0, 'trash': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'system': 1.0, 'try': 1.0, 're': 1.0, 'still': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'nose': 1.0, 'for': 2.0, 'investment': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'out': 1.0, 'pull': 1.0, 'open': 1.0, 'pail': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'winter': 1.0, 'outside': 1.0, 'day': ...",0.0,0.0,0.0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0.0,0.0,0,0.0,0,0,0,0,1
0.0,0.0,0,0.0,0,0,0,0,0
0.0,0.0,0,0.0,0,0,0,0,1
0.0,1.0,0,0.0,0,0,0,0,1
1.0,0.0,1,0.0,0,0,1,0,1


  - Again, just as in the video, use the sentiment_model to predict the sentiment of each review in diaper_champ_reviews and sort the results according to their ‘predicted_sentiment’.

In [21]:
## Apply the sentiment classifier to better understand the 'Baby Trend Diaper Champ''
products['predicted_sentiment'] = sentiment_model.predict(products, output_type='probability')

diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']
diaper_champ_reviews.head(5)

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'convenient': 1.0, 'more': 1.0, 'trash': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'system': 1.0, 'try': 1.0, 're': 1.0, 'still': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'nose': 1.0, 'for': 2.0, 'investment': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'out': 1.0, 'pull': 1.0, 'open': 1.0, 'pail': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'winter': 1.0, 'outside': 1.0, 'day': ...",0.0,0.0,0.0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0.0,0.0,0,0.0,0,0,0,0,1,0.9950122935570288
0.0,0.0,0,0.0,0,0,0,0,0,5.901414532927823e-13
0.0,0.0,0,0.0,0,0,0,0,1,0.9999996193960344
0.0,1.0,0,0.0,0,0,0,0,1,0.9999567191544853
1.0,0.0,1,0.0,0,0,1,0,1,0.9999997013199228


In [22]:
# Sort the diaper trend... reviews according to predicted sentiment
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)
diaper_champ_reviews

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,I read a review below that can explain exactly ...,4.0,"{'key': 1.0, 'have': 1.0, 'pieces': 1.0, 'betwe ...",0.0,0.0,0.0
Baby Trend Diaper Champ,I have never written a review for Amazon but I ...,5.0,"{'priceless': 1.0, 'knows': 1.0, 'parent': ...",0.0,0.0,0.0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'price': 1.0, 'suggestions': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'around': 1.0, 'any': 1.0, 't': 1.0, 'isn': ...",0.0,1.0,0.0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'either': 1.0, 'be': 1.0, 't': 1.0, 'not': ...",0.0,1.0,0.0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'hope': 1.0, 'make': 1.0, 'slower': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'buy': 1.0, 'product': 1.0, 'recommend': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,"As a first time mother, I wanted to get the best ...",5.0,"{'ll': 1.0, 'baby': 1.0, 'recommended': 1.0, ' ...",0.0,0.0,0.0
Baby Trend Diaper Champ,I see that there are complaints of stinkiness ...,5.0,"{'very': 1.0, 'told': 1.0, 'all': 1.0, ...",0.0,0.0,0.0
Baby Trend Diaper Champ,I have a 10 year old daughter and an 8 month ...,5.0,"{'sorry': 1.0, 'be': 1.0, 'you': 2.0, 'sell': 1.0, ...",0.0,0.0,0.0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0.0,0.0,0,0.0,0,0,0,0,1,0.999999999989594
0.0,1.0,0,0.0,0,0,0,0,1,0.9999999999868132
0.0,0.0,0,0.0,0,0,0,0,1,0.9999999999465672
0.0,0.0,0,0.0,0,0,0,0,1,0.9999999999302822
0.0,0.0,0,0.0,0,0,0,0,1,0.9999999999174132
0.0,1.0,0,0.0,0,0,0,0,1,0.9999999998430964
0.0,1.0,0,0.0,0,0,0,0,1,0.9999999997360196
0.0,1.0,0,0.0,0,0,0,0,1,0.9999999995664316
0.0,0.0,0,0.0,0,0,0,0,1,0.9999999985015902
0.0,2.0,0,0.0,0,0,0,0,1,0.999999998056851


  - What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the Jupyter Notebook from lecture?  
  
    Save this result to answer the quiz at the end.

### Q9. Which of the following ranges contains the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’, according to the sentiment_model from the IPython Notebook from lecture?

  - ( ) Below 0.7
  - ( ) 0.7 to 0.8
  - ( ) 0.8 to 0.9
  - **(x) 0.9 to 1.0**

  - Now use the selected_words_model you learned using just the selected_words to predict the sentiment most positive review you found above. 
  
*Hint: if you sorted the diaper_champ_reviews in descending order (from most positive to most negative), this command will be helpful to make the prediction you need:*

```python
selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')
```

In [23]:
selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')

dtype: float
Rows: 1
[0.7919288370624461]

### Q10. Consider the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture. Which of the following ranges contains the predicted_sentiment for this review, if we use the selected_words_model to analyze it?

  - ( ) Below 0.7
  - **(x)  0.7 to 0.8**
  - ( ) 0.8 to 0.9
  - ( ) 0.9 to 1.0

  - Why is the predicted_sentiment for the most positive review found using the model with all word counts (sentiment_model) much more positive than the one using only the selected_words (selected_words_model)?  

*Hint: examine the text of this review, the extracted word counts for all words, and the word counts for each of the selected_words, and you will see what each model used to make its prediction*

In [24]:
diaper_champ_reviews['review'][0]

"I read a review below that can explain exactly what we experienced. We've had it for 16 months and it has worked wonderful for us. No smells, change it out once a week, easy to clean. Then a diaper snagged this foam material in the head part, so I pulled the rest of the foam out. Big mistake!!! Now it can no loner retain the stinkiness and we're looking for a replacement. Be careful of overloading and never take out that foam piece that is cushioned between pieces. I have figured out that it is key to keeping the stink out."

In [25]:
diaper_champ_reviews['word_count'][0]

{'key': 1.0,
 'have': 1.0,
 'pieces': 1.0,
 'between': 1.0,
 'cushioned': 1.0,
 'piece': 1.0,
 'take': 1.0,
 'overloading': 1.0,
 'be': 1.0,
 'looking': 1.0,
 're': 1.0,
 'stinkiness': 1.0,
 'retain': 1.0,
 'now': 1.0,
 'wonderful': 1.0,
 'worked': 1.0,
 '16': 1.0,
 'and': 3.0,
 'months': 1.0,
 've': 1.0,
 'in': 1.0,
 'us': 1.0,
 'i': 3.0,
 'experienced': 1.0,
 'read': 1.0,
 'easy': 1.0,
 'for': 3.0,
 'to': 2.0,
 'has': 1.0,
 'review': 1.0,
 'keeping': 1.0,
 'replacement': 1.0,
 'out': 5.0,
 'loner': 1.0,
 'clean': 1.0,
 'mistake': 1.0,
 'big': 1.0,
 'pulled': 1.0,
 'it': 5.0,
 'this': 1.0,
 'is': 2.0,
 'explain': 1.0,
 'material': 1.0,
 'exactly': 1.0,
 'a': 4.0,
 'we': 3.0,
 'that': 4.0,
 'had': 1.0,
 'what': 1.0,
 'part': 1.0,
 'no': 2.0,
 'smells': 1.0,
 'can': 2.0,
 'change': 1.0,
 'figured': 1.0,
 'week': 1.0,
 'then': 1.0,
 'snagged': 1.0,
 'diaper': 1.0,
 'careful': 1.0,
 'the': 5.0,
 'never': 1.0,
 'foam': 3.0,
 'head': 1.0,
 'so': 1.0,
 'below': 1.0,
 'rest': 1.0,
 'stink': 1

In [29]:
assert list(filter(lambda t: t[0] in selected_words, diaper_champ_reviews['word_count'][0].items())) == []
# t tuple = (k, v) 

In [None]:
for k, v in diaper_champ_reviews['word_count'][0].items():
  if k in selected_words:
    print(f"({k}, {v})")
    
# None!

###  Q11. Why is the value of the predicted_sentiment for the most positive review found using the sentiment_model much more positive than the value predicted using the selected_words_model?

  - ( ) The sentiment_model is just too positive about everything.
  - ( ) The selected_words_model is just too negative about everything.
  - ( ) This review was positive, but used too many of the negative words in selected_words.
  - **(x) None of the selected_words appeared in the text of this review.**