# Homework

In [5]:
import graphlab
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
products = graphlab.SFrame('amazon_baby.gl/')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
products = products[products['rating'] != 3]
products['sentiment'] = products['rating'] >=4

Now you are ready! We are going do four tasks in this assignment. There are several results you need to gather along the way to enter into the quiz after this reading.

We used the word counts for all words in the reviews to train the sentiment classifier model. Now, we are going to follow a similar path, but only use this subset of the words:

```python
selected_words = [
    'awesome', 
    'great', 
    'fantastic', 
    'amazing', 
    'love', 
    'horrible', 
    'bad', 
    'terrible', 
    'awful', 
    'wow', 
    'hate'
]
```
Often, ML practitioners will throw out words they consider “unimportant” before training their model. This procedure can often be helpful in terms of accuracy. Here, we are going to throw out all words except for the very few above. Using so few words in our model will hurt our accuracy, but help us interpret what our classifier is doing.

**1. Use .apply() to build a new feature with the counts for each of the selected_words:** In the notebook above, we created a column `word_count` with the word counts for each review. Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method `.apply()` can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!

Our first goal is to create a column `products[‘awesome’]` where each row contains the number of times the word `awesome` showed up in the review for the corresponding product, and 0 if the review didn’t show up. One way to do this is to look at the each row `word_count` column and follow this logic:

* If `awesome` shows up in the word counts for a particular product (row of the products SFrame), then we know how often `awesome` appeared in the review,
* If `awesome` doesn’t appear in the word counts, then it didn’t appear in the review, and we should set the count for `awesome` to 0 in this review.

Repeat this process for the other 11 words in selected_words. (Here, we described a simple procedure to obtain the counts for each selected_word. There are other more efficient ways of doing this, and we encourage you to explore this further.)

Using the `.sum()` method on each of the new columns you created, answer the following questions: Out of the `selected_words`, which one is most used in the dataset? Which one is least used? Save these results to answer the quiz at the end.

In [7]:
selected_words = [
    'awesome', 
    'great', 
    'fantastic', 
    'amazing', 
    'love', 
    'horrible', 
    'bad', 
    'terrible', 
    'awful', 
    'wow', 
    'hate'
]

def count_word(review_word_count, word):
    return 0 if word not in review_word_count else review_word_count[word]

total_frequencies = []

for word in selected_words:
    products[word] = products['word_count'].apply(lambda x : count_word(x, word))
    total_frequencies.append((word, products[word].sum()))
    
print total_frequencies
print max(total_frequencies, key=lambda x : x[1])
print min(total_frequencies, key=lambda x : x[1])

[('awesome', 2002), ('great', 42420), ('fantastic', 873), ('amazing', 1305), ('love', 40277), ('horrible', 659), ('bad', 3197), ('terrible', 673), ('awful', 345), ('wow', 131), ('hate', 1057)]
('great', 42420)
('wow', 131)


**2. Create a new sentiment analysis model using only the selected_words as features:** In the IPython Notebook above, we used word counts for all words as features for our sentiment classifier. Now, you are just going to use the `selected_words`.

Train a logistic regression classifier (use `graphlab.logistic_classifier.create`) using just the selected_words. Hint: you can use this parameter in the `.create()` call to specify the features used to be exactly the new columns you just created:

```python
features=selected_words
```

Call your new model: `selected_words_model`.

You will now examine the weights the learned classifier assigned to each of the 11 words in selected_words and gain intuition as to what the ML algorithm did for your data using these features. In GraphLab Create, a learned model, such as the selected_words_model, has a field 'coefficients', which lets you look at the learned coefficients. You can access it by using:

```python
selected_words_model['coefficients']
```

The result has a column called `value`, which contains the weight learned for each feature.

Using this approach, sort the learned coefficients according to the ‘value’ column using `.sort()`. Out of the 11 words in `selected_words`, which one got the most positive weight? Which one got the most negative weight? Do these values make sense for you?

In [8]:
graphlab.canvas.set_target('ipynb')
train_data, test_data = products.random_split(.8, seed=0)

sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)

selected_words_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=selected_words,
                                                     validation_set=test_data)

In [9]:
coefficients = selected_words_model['coefficients']
sorted_by_value_coefficients = coefficients.sort('value', ascending=False)
sorted_by_value_coefficients.print_rows(num_rows=12)

+-------------+-------+-------+------------------+------------------+
|     name    | index | class |      value       |      stderr      |
+-------------+-------+-------+------------------+------------------+
|     love    |  None |   1   |  1.39989834302   | 0.0287147460124  |
| (intercept) |  None |   1   |  1.36728315229   | 0.00861805467824 |
|   awesome   |  None |   1   |  1.05800888878   |  0.110865296265  |
|   amazing   |  None |   1   |  0.892802422508  |  0.127989503231  |
|  fantastic  |  None |   1   |  0.891303090304  |  0.154532343591  |
|    great    |  None |   1   |  0.883937894898  | 0.0217379527921  |
|     wow     |  None |   1   | -0.0541450123333 |  0.275616449416  |
|     bad     |  None |   1   | -0.985827369929  | 0.0433603009142  |
|     hate    |  None |   1   |  -1.40916406276  | 0.0771983993506  |
|    awful    |  None |   1   |  -1.76469955631  |  0.134679803365  |
|   horrible  |  None |   1   |  -1.99651800559  | 0.0973584169028  |
|   terrible  |  Non

**3. Comparing the accuracy of different sentiment analysis model:** Using the method 

```python
.evaluate(test_data)
```
What is the accuracy of the `selected_words_model` on the test_data? What was the accuracy of the `sentiment_model` that we learned using all the word counts in the IPython Notebook above from the lectures? What is the accuracy majority class classifier on this task? How do you compare the different learned models with the baseline approach where we are just predicting the majority class?

**Hint:** we discussed the majority class classifier in lecture, which simply predicts that every data point is from the most common class. This is baseline is something we definitely want to beat with models we learn from data.

In [10]:
sentiment_model.evaluate(test_data)

{'accuracy': 0.916256305548883,
 'auc': 0.9446492867438502,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |  1328 |
 |      0       |        0        |  4000 |
 |      1       |        1        | 26515 |
 |      1       |        0        |  1461 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9500349343413533,
 'log_loss': 0.26106698432422487,
 'precision': 0.9523039902309378,
 'recall': 0.9477766657134686,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+----------------+----------------+-------+------+
 | threshold |      fpr       |      tpr       |   p   |  n   |
 +-----------+----------------+----------------+-------+------+
 |    0.0    |      1.0       | 

In [11]:
selected_words_model.evaluate(test_data)
selected_words_model.show(view="Evaluation")

**4. Interpreting the difference in performance between the models:** To understand why the model with all word counts performs better than the one with only the selected_words, we will now examine the reviews for a particular product.

* We will investigate a product named `Baby Trend Diaper Champ`. (This is a trash can for soiled baby diapers, which keeps the smell contained.)

* Just like we did for the reviews for the giraffe toy in the IPython Notebook in the lecture video, before we start our analysis you should select all reviews where the product name is `Baby Trend Diaper Champ`. Let’s call this table `diaper_champ_reviews`.

In [12]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

* Again, just as in the video, use the sentiment_model to predict the sentiment of each review in `diaper_champ_reviews` and sort the results according to their `predicted_sentiment`.
* What is the `predicted_sentiment` for the most positive review for `Baby Trend Diaper Champ` according to the sentiment_model from the IPython Notebook from lecture?

In [13]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, 
                                                                      output_type='probability')
sorted_diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)
sorted_diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1, 'less': 1, ""friend's"": 1, '(which': ...",1,0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'just': 1, 'over': 1, 'rweek': 1, 'sooo': 1, ...",1,0
Baby Trend Diaper Champ,We researched all of the different types of di ...,4.0,"{'all': 2, 'just': 4, ""don't"": 2, 'one,': 1, ...",1,0
Baby Trend Diaper Champ,My baby is now 8 months and the can has been ...,5.0,"{""don't"": 1, 'able': 2, 'over': 1, 'soon': 1, ...",1,0
Baby Trend Diaper Champ,"This is absolutely, by far, the best diaper ...",5.0,"{'just': 3, 'money': 1, 'still': 3, 'fine': 1, ...",1,0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'son': 2, 'all': 1, 'bags.': 1, 'son,': 1, ...",1,0
Baby Trend Diaper Champ,Wow! This is fabulous. It was a toss-up between ...,5.0,"{'and': 4, 'this': 3, 'stink': 1, 'garbage' ...",1,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...",1,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1, '-': 3, 'both': 1, 'results': 1, ...",1,0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'all': 1, 'humid': 1, 'just': 1, 'less': 1, ...",1,0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,predicted_sentiment
0,0,0,0,0,0,0,0,0,0,0.999999937267
0,0,0,1,0,0,0,0,0,0,0.999999917406
0,0,0,0,0,1,0,0,0,0,0.999999899509
2,0,0,0,0,1,0,0,0,0,0.999999836182
0,0,0,2,0,0,0,0,0,0,0.999999824745
0,0,0,0,0,0,0,0,0,0,0.999999759315
0,0,0,0,0,0,0,0,0,0,0.999999692111
0,0,0,0,0,0,0,0,0,0,0.999999642488
0,0,0,0,1,0,0,0,0,0,0.999999604504
0,0,0,1,0,0,0,0,0,0,0.999999486804


* Now use the `selected_words_model` you learned using just the `selected_words` to predict the sentiment most positive review you found above.

In [14]:
sentiment_model.predict(sorted_diaper_champ_reviews[0:1], output_type='probability')

dtype: float
Rows: 1
[0.9999999372669541]

In [17]:
selected_words_model.predict(sorted_diaper_champ_reviews[0:1], output_type='probability')

dtype: float
Rows: 1
[0.796940851290673]

In [20]:
review = sorted_diaper_champ_reviews[0]['review']
total_frequencies = []

for word in selected_words:
    total_frequencies.append((word, review.count(word)))
    
print review
print
print total_frequencies

Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend's house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn't any added bulk around the sealing edge of the champ.

[('awesome', 0), ('great', 0),