### Storing Your Model

Now you have all the knowledge needed to go full circle on creating and putting your model in production. Let's put that knowledge to practice with some code!

In the cell below is a replication of the code you put together in the previous notebook - this code:

1. Creates **train** and **test** datasets
2. Creates three models: 
 * **model_factorization**
 * **model_popular**
 * **model_itemsim**

Run the cell below to get started with these three models.

In [40]:
# run this cell to read in the libraries and data needed
import numpy as np
import pandas as pd
import turicreate as tc
import solution_part3 as sp

ratings_dat = pd.read_csv('../../../data/ratings.dat', sep='::', engine='python', \
                          header=None, names=['user_id', 'movie_id','rating','time'])

ratings_dat2 = ratings_dat.copy(deep=True)
ratings_dat2.columns = ['user_id', 'item_id', 'rating', 'time']
ratings_sframe = tc.SFrame(ratings_dat2[['user_id', 'item_id', 'rating']])

train, test = tc.recommender.util.random_split_by_user(ratings_sframe, 
                                                       user_id = 'user_id',
                                                       item_id = 'item_id',
                                                       max_num_users=None)

# creating your three models of interest
model_factorization = tc.factorization_recommender.create(train, target='rating')
model_popular = tc.popularity_recommender.create(train, target='rating')
model_itemsim = tc.item_similarity_recommender.create(train, target='rating',  similarity_type='cosine')

Since the `rating` is being used, you will notice the metric being used is `RMSE`.  Use the [`evaluate_rmse`](https://apple.github.io/turicreate/docs/api/generated/turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate_rmse.html?highlight=evaluate_rmse#turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate_rmse) method of each of the 3 above models to compare how well each model performs on the `train` data.  

Then answer the following question regarding your results.

In [41]:
model_factorization.evaluate_rmse(train, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 2967
 
 Data:
 +---------+-----------------------+-------+
 | user_id |          rmse         | count |
 +---------+-----------------------+-------+
 |   2871  |  0.03921762630329574  |   2   |
 |   2464  |  0.014711318963978925 |   1   |
 |   232   |   0.4054382384795427  |   6   |
 |   363   |  0.002536371776675317 |   1   |
 |   2444  |   0.6378639603996955  |   2   |
 |   2238  |  0.004103006368662676 |   1   |
 |   431   |  0.002415730113004777 |   1   |
 |   738   |   0.5084983422560355  |   3   |
 |   1860  | 0.0052909381610978905 |   2   |
 |   2661  |   0.808806751305701   |   5   |
 +---------+-----------------------+-------+
 [2967 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 1704
 
 Data:
 +---------+----------------------+---

In [42]:
model_itemsim.evaluate_rmse(train, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 2967
 
 Data:
 +---------+--------------------+-------+
 | user_id |        rmse        | count |
 +---------+--------------------+-------+
 |   2871  | 8.032106385604056  |   2   |
 |   2464  |        10.0        |   1   |
 |   232   | 5.719191506539028  |   6   |
 |   363   |        8.0         |   1   |
 |   2444  | 7.649102590799475  |   2   |
 |   2238  |        9.0         |   1   |
 |   431   |        8.0         |   1   |
 |   738   | 4.0471527341173426 |   3   |
 |   1860  | 7.341026782989502  |   2   |
 |   2661  | 4.290789104419295  |   5   |
 +---------+--------------------+-------+
 [2967 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 1704
 
 Data:
 +---------+--------------------+-------+
 | item_id |        rmse        | coun

In [43]:
model_popular.evaluate_rmse(train, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 2967
 
 Data:
 +---------+--------------------+-------+
 | user_id |        rmse        | count |
 +---------+--------------------+-------+
 |   2871  | 0.5892556509887901 |   2   |
 |   2464  | 1.313463514902363  |   1   |
 |   232   | 1.1858326130416144 |   6   |
 |   363   | 0.1403508771929829 |   1   |
 |   2444  | 1.334166406412633  |   2   |
 |   2238  | 1.833333333333333  |   1   |
 |   431   | 0.4226804123711343 |   1   |
 |   738   | 1.1547005383792517 |   3   |
 |   1860  | 0.5700837992665859 |   2   |
 |   2661  | 1.1627553482998907 |   5   |
 +---------+--------------------+-------+
 [2967 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 1704
 
 Data:
 +---------+-------------------+-------+
 | item_id |        rmse       | count 

**Question 1:** Based on the results, which of the following are True?  

**Add all of the True items statements to the `your_answer` list.**

In [None]:
a = "the HIGHER the rmse, the BETTER the recommender"
b = "using the train results, the best model is the popularity model"
c = "using the train results, the best model is the item similarity model"
d = "using the train results, the best model is the matrix factorization model"
e = "the recommender that works best for the training data is the one we should use in the real world"


your_answer = #[a, b, c, d]

sp.answer_one(your_answer)

Now that you have looked at how well each model fits the `train`ing data, `evaluate` how well each model works on the `test` data.  Use your results to answer the following question.

In [44]:
model_factorization.evaluate_rmse(test, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 1241
 
 Data:
 +---------+---------------------+-------+
 | user_id |         rmse        | count |
 +---------+---------------------+-------+
 |   2043  |  1.2862519011440021 |   1   |
 |   2238  |  0.4372604921998722 |   1   |
 |   738   |  3.311336084695374  |   1   |
 |   2661  |  1.3706072154062143 |   1   |
 |   764   |  2.2641215619547066 |   4   |
 |   926   |  1.3015737519206745 |   1   |
 |   1323  |   2.00418734694102  |   1   |
 |   2501  | 0.22618168736078914 |   1   |
 |   3172  | 0.46571017456443187 |   1   |
 |   1685  |  0.5766523799445831 |   3   |
 +---------+---------------------+-------+
 [1241 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 616
 
 Data:
 +---------+---------------------+-------+
 | item_id |         rms

In [45]:
model_popular.evaluate_rmse(test, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 1241
 
 Data:
 +---------+--------------------+-------+
 | user_id |        rmse        | count |
 +---------+--------------------+-------+
 |   2043  | 1.313463514902363  |   1   |
 |   2238  | 1.313463514902363  |   1   |
 |   738   | 0.4226804123711343 |   1   |
 |   2661  | 1.3614457831325302 |   1   |
 |   764   | 3.2083460190064526 |   4   |
 |   926   | 1.313463514902363  |   1   |
 |   1323  | 0.686536485097637  |   1   |
 |   2501  |        1.0         |   1   |
 |   3172  | 0.2857142857142865 |   1   |
 |   1685  | 0.9604076436730911 |   3   |
 +---------+--------------------+-------+
 [1241 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 616
 
 Data:
 +---------+--------------------+-------+
 | item_id |        rmse        | count

In [46]:
model_itemsim.evaluate_rmse(test, target='rating')

{'rmse_by_user': Columns:
 	user_id	int
 	rmse	float
 	count	int
 
 Rows: 1241
 
 Data:
 +---------+--------------------+-------+
 | user_id |        rmse        | count |
 +---------+--------------------+-------+
 |   2043  | 9.953682695627213  |   1   |
 |   2238  | 9.314554512500763  |   1   |
 |   738   |        8.0         |   1   |
 |   2661  |        9.0         |   1   |
 |   764   | 5.0990195135927845 |   4   |
 |   926   | 8.680874049663544  |   1   |
 |   1323  | 7.9695993065834045 |   1   |
 |   2501  |        8.0         |   1   |
 |   3172  |        8.0         |   1   |
 |   1685  | 7.636629733420268  |   3   |
 +---------+--------------------+-------+
 [1241 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'rmse_by_item': Columns:
 	item_id	int
 	rmse	float
 	count	int
 
 Rows: 616
 
 Data:
 +---------+-------------------+-------+
 | item_id |        rmse       | count |

**Question 2:** Based on the results, which of the following are True?  

**Add all of the True items statements to the `your_answer` list.**

In [None]:
a = "using the test results, the best model is the popularity model"
b = "using the test results, the best model is the item similarity model"
c = "using the test results, the best model is the matrix factorization model"
d = "the recommender that works best for the test data is the one we should use in the real world"

your_answer = #[a, b, c, d]

sp.answer_two(your_answer)

Consider a situation in which you only know if an individual watched a movie or not, but you don't know the rating.  Below a new `ratings_dat` is created with a removed rating.  The training and testing data is again created for you.

In [57]:
ratings_sframe = tc.SFrame(ratings_dat2[['user_id', 'item_id']])

train, test = tc.recommender.util.random_split_by_user(ratings_sframe, 
                                                       user_id = 'user_id',
                                                       item_id = 'item_id',
                                                       max_num_users=None)

Use the space below to **create** each of the same models as was done above using the **train** data, but instead of using the ratings, you will only use the user-item interactions.  The three types of models you should create include:

1. `ranking_factorization_recommender` 
2. `popularity_recommender`
3. `item_similarity_recommender`

**Notice:** the `ranking_factorization_recommender` is needed in the cases of having classification data, rather than `factorization_recommender` which is used with ratings (regression) data.

In [59]:
# creating your three models of interest
model_factorization = tc.ranking_factorization_recommender.create(train)
model_popular = tc.popularity_recommender.create(train)
model_itemsim = tc.item_similarity_recommender.create(train)

Since only the user-item relationships are being used, not ratings, you will notice `RMSE` is not used.  Instead, you will want to look at metrics associated with classification problems.

You may remember from earlier sections some of these metrics include **precision**, **recall**, and **f1-scores**.  You will then use the [`evaluate`](https://apple.github.io/turicreate/docs/api/generated/turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate.html) method of each of the 3 above models to compare how well each model performs on the `test` data.  

The results for each model are based on a `cutoff` value. Depending on which metric you would like to optimize on, you can choose a different cutoff.  Notice that by increasing the **precision**, you decrease the **recall** (and vice-versa).

Use the below slots to take a look at the precision-recall values for each model.

In [79]:
results = model_popular.evaluate(test)
results['precision_recall_overall']




Precision and recall summary statistics by cutoff
+--------+---------------------+---------------------+
| cutoff |    mean_precision   |     mean_recall     |
+--------+---------------------+---------------------+
|   1    | 0.23932312651087834 |  0.2001007252215956 |
|   2    |  0.1498791297340854 | 0.24677007789417116 |
|   3    | 0.11952726295997863 |  0.2890075208165457 |
|   4    | 0.10072522159548751 | 0.31896320171904374 |
|   5    | 0.08976631748589839 |  0.351625033575074  |
|   6    |   0.08058017727639  |  0.3775449905989792 |
|   7    | 0.07252215954875096 | 0.39604485629868397 |
|   8    | 0.06678082191780821 | 0.41706285253827563 |
|   9    |  0.0616886023815919 |  0.4332460381412838 |
|   10   | 0.05737308622078964 | 0.44324469513832915 |
+--------+---------------------+---------------------+
[10 rows x 3 columns]



cutoff,precision,recall
1,0.2393231265108782,0.2001007252215955
2,0.1498791297340854,0.2467700778941714
3,0.1195272629599784,0.2890075208165456
4,0.1007252215954875,0.3189632017190437
5,0.0897663174858984,0.3516250335750738
6,0.08058017727639,0.3775449905989791
7,0.0725221595487509,0.3960448562986837
8,0.0667808219178082,0.4170628525382756
9,0.0616886023815919,0.4332460381412838
10,0.0573730862207896,0.4432446951383293


In [77]:
results = model_itemsim.evaluate(test)
results['precision_recall_overall']




Precision and recall summary statistics by cutoff
+--------+----------------------+---------------------+
| cutoff |    mean_precision    |     mean_recall     |
+--------+----------------------+---------------------+
|   1    | 0.06929895245769546  | 0.05059092130002687 |
|   2    | 0.06929895245769539  | 0.10585549288208435 |
|   3    | 0.059897931775449914 | 0.13798012355627184 |
|   4    | 0.05539887187751813  | 0.16681439699167333 |
|   5    | 0.05318291700241743  |  0.2002618855761482 |
|   6    | 0.050362610797743734 | 0.22786059629331187 |
|   7    | 0.056866582249338125 |  0.3181372549019607 |
|   8    | 0.05237711522965351  | 0.33418614020950826 |
|   9    | 0.050138776971975996 | 0.36118049959709925 |
|   10   |  0.0472199838839645  | 0.37628928283642216 |
+--------+----------------------+---------------------+
[10 rows x 3 columns]



cutoff,precision,recall
1,0.0692989524576954,0.0505909213000268
2,0.0692989524576953,0.1058554928820843
3,0.0598979317754498,0.1379801235562718
4,0.0553988718775181,0.1668143969916733
5,0.0531829170024174,0.2002618855761483
6,0.0503626107977437,0.2278605962933118
7,0.0568665822493381,0.3181372549019606
8,0.0523771152296535,0.3341861402095084
9,0.0501387769719759,0.3611804995970991
10,0.0472199838839645,0.3762892828364221


In [78]:
results = model_factorization.evaluate(test)
results['precision_recall_overall']




Precision and recall summary statistics by cutoff
+--------+----------------------+---------------------+
| cutoff |    mean_precision    |     mean_recall     |
+--------+----------------------+---------------------+
|   1    | 0.23932312651087834  | 0.20010072522159547 |
|   2    | 0.15028203062046736  | 0.24703867848509264 |
|   3    | 0.11872146118721452  |  0.2867244157937146 |
|   4    | 0.10092667203867851  |  0.3175530486167067 |
|   5    | 0.08928283642224008  |  0.3479317754499061 |
|   6    | 0.07910287402632286  |  0.3687483212463068 |
|   7    | 0.07252215954875105  |  0.3953062046736504 |
|   8    | 0.06668009669621269  | 0.41618990061778116 |
|   9    |  0.0616886023815919  | 0.43176873489121687 |
|   10   | 0.057292506043513304 |  0.4455345151759333 |
+--------+----------------------+---------------------+
[10 rows x 3 columns]



cutoff,precision,recall
1,0.2393231265108783,0.2001007252215954
2,0.1502820306204673,0.2470386784850926
3,0.1187214611872145,0.2867244157937148
4,0.1009266720386784,0.3175530486167069
5,0.08928283642224,0.3479317754499058
6,0.0791028740263228,0.3687483212463068
7,0.072522159548751,0.39530620467365
8,0.0666800966962127,0.4161899006177812
9,0.0616886023815918,0.4317687348912168
10,0.0572925060435132,0.4455345151759333


**Question 3:** Write a function that takes in the dataframe from `results['precision_recall_overall']` and adds a column for `f1_score` for each `cutoff`.

In [83]:
def create_f1score(df):
    '''
    input:
        df: dataframe with cutoff, precision, and recall
    
    return:
        df: datafra,e with cutoff, precision, recall, and f1_score
    '''
    num = df['precision']*df['recall']
    den = df['precision']+df['recall']
    df['f1_score'] = 2*(num/den)
    
    return df

create_f1score(results['precision_recall_overall'])



cutoff,precision,recall,f1_score
1,0.2393231265108782,0.2001007252215955,0.2179614556120253
2,0.1498791297340854,0.2467700778941714,0.1864906512247678
3,0.1195272629599784,0.2890075208165456,0.1691130317899542
4,0.1007252215954875,0.3189632017190437,0.153102336825141
5,0.0897663174858984,0.3516250335750738,0.1430208558641632
6,0.08058017727639,0.3775449905989791,0.1328136693007666
7,0.0725221595487509,0.3960448562986837,0.1225951775756541
8,0.0667808219178082,0.4170628525382756,0.1151272675630263
9,0.0616886023815919,0.4332460381412838,0.1079994827279113
10,0.0573730862207896,0.4432446951383293,0.1015957365399153


In [82]:
pr_res

cutoff,precision,recall,f1_score
1,0.2393231265108782,0.2001007252215955,0.2179614556120253
2,0.1498791297340854,0.2467700778941714,0.1864906512247678
3,0.1195272629599784,0.2890075208165456,0.1691130317899542
4,0.1007252215954875,0.3189632017190437,0.153102336825141
5,0.0897663174858984,0.3516250335750738,0.1430208558641632
6,0.08058017727639,0.3775449905989791,0.1328136693007666
7,0.0725221595487509,0.3960448562986837,0.1225951775756541
8,0.0667808219178082,0.4170628525382756,0.1151272675630263
9,0.0616886023815919,0.4332460381412838,0.1079994827279113
10,0.0573730862207896,0.4432446951383293,0.1015957365399153
