### Storing Your Model

Now you have all the knowledge needed to go full circle on creating and putting your model in production. Let's put that knowledge to practice with some code!

In the cell below is a replication of the code you put together in the previous notebook - this code:

1. Creates **train** and **test** datasets
2. Creates three models: 
 * **model_factorization**
 * **model_popular**
 * **model_itemsim**

Run the cell below to get started with these three models.

In [None]:
# run this cell to read in the libraries and data needed
import numpy as np
import pandas as pd
import turicreate as tc
import solution_part3 as sp

ratings_dat = pd.read_csv('../../data/ratings.dat', sep='::', engine='python', \
                          header=None, names=['user_id', 'movie_id','rating','time'])

ratings_dat2 = ratings_dat.copy(deep=True)
ratings_dat2.columns = ['user_id', 'item_id', 'rating', 'time']
ratings_sframe = tc.SFrame(ratings_dat2[['user_id', 'item_id', 'rating']])

train, test = tc.recommender.util.random_split_by_user(ratings_sframe, 
                                                       user_id = 'user_id',
                                                       item_id = 'item_id',
                                                       max_num_users=None)

# creating your three models of interest
model_factorization = tc.factorization_recommender.create(train, target='rating')
model_popular = tc.popularity_recommender.create(train, target='rating')
model_itemsim = tc.item_similarity_recommender.create(train, target='rating',  similarity_type='cosine')

Since the `rating` is being used, you will notice the metric being used is `RMSE`.  Use the [`evaluate_rmse`](https://apple.github.io/turicreate/docs/api/generated/turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate_rmse.html?highlight=evaluate_rmse#turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate_rmse) method of each of the 3 above models to compare how well each model performs on the `train` data.  

Then answer the following question regarding your results.

In [None]:
# model 1


In [None]:
# model 2


In [None]:
# model 3


**Question 1:** Based on the results, which of the following are True?  

**Add all of the True items statements to the `your_answer` list.**

In [None]:
a = "the HIGHER the rmse, the BETTER the recommender"
b = "using the train results, the best model is the popularity model"
c = "using the train results, the best model is the item similarity model"
d = "using the train results, the best model is the matrix factorization model"
e = "the recommender that works best for the training data is the one we should use in the real world"


your_answer = #[a, b, c, d, e]

sp.answer_one(your_answer)

Now that you have looked at how well each model fits the `train`ing data, `evaluate` how well each model works on the `test` data.  Use your results to answer the following question.

In [None]:
# here is one example, look at the others
model_factorization.evaluate_rmse(test, target='rating')

In [None]:
# model 2


In [None]:
# model 3


**Question 2:** Based on the results, which of the following are True?  

**Add all of the True items statements to the `your_answer` list.**

In [None]:
a = "using the test results, the best model is the popularity model"
b = "using the test results, the best model is the item similarity model"
c = "using the test results, the best model is the matrix factorization model"
d = "the recommender that works best for the test data is the one we should use in the real world"

your_answer = #[a, b, c, d]

sp.answer_two(your_answer)

Consider a situation in which you only know if an individual watched a movie or not, but you don't know the rating.  Below a new `ratings_dat` is created with a removed rating.  The training and testing data is again created for you.

In [None]:
ratings_sframe = tc.SFrame(ratings_dat2[['user_id', 'item_id']])

train, test = tc.recommender.util.random_split_by_user(ratings_sframe, 
                                                       user_id = 'user_id',
                                                       item_id = 'item_id',
                                                       max_num_users=None)

Use the space below to **create** each of the same models as was done above using the **train** data, but instead of using the ratings, you will only use the user-item interactions.  The three types of models you should create include:

1. `ranking_factorization_recommender` 
2. `popularity_recommender`
3. `item_similarity_recommender`

**Notice:** the `ranking_factorization_recommender` is needed in the cases of having classification data, rather than `factorization_recommender` which is used with ratings (regression) data.

In [None]:
# creating your three models of interest


Since only the user-item relationships are being used, not ratings, you will notice `RMSE` is not used.  Instead, you will want to look at metrics associated with classification problems.

You may remember from earlier sections some of these metrics include **precision**, **recall**, and **f1-scores**.  You will then use the [`evaluate`](https://apple.github.io/turicreate/docs/api/generated/turicreate.recommender.factorization_recommender.FactorizationRecommender.evaluate.html) method of each of the 3 above models to compare how well each model performs on the `test` data.  

The results for each model are based on a `cutoff` value. Depending on which metric you would like to optimize on, you can choose a different cutoff.  Notice that by increasing the **precision**, you decrease the **recall** (and vice-versa).

Use the below slots to take a look at the precision-recall values for each model.

In [None]:
# example 1
results_popular = model_popular.evaluate(test)
results_popular['precision_recall_overall']

In [None]:
# example 2


In [None]:
# example 3


**Question 3:** Write a function that takes in the dataframe from `results['precision_recall_overall']` and adds a column for `f1_score` for each `cutoff`. You may find the [wiki page](https://en.wikipedia.org/wiki/F1_score) helpful.

In [None]:
def create_f1score(df):
    '''
    input:
        df: dataframe with cutoff, precision, and recall
    
    return:
        df: datafra,e with cutoff, precision, recall, and f1_score
    '''
    # your code here
    
    return df


In [None]:
# try your function out
create_f1score(results_popular['precision_recall_overall'])

In [None]:
# second model


In [None]:
# third model


**Question 4:** Using each of your models and looking at the **f1-score** for each of the test sets, is the precision-recall consistent with your findings using rmse in terms of which modeling technique you should use?

In [None]:
a = "the most popular recommender is the best, which matches what we got from rmse test"
b = "the item similarity recommender is best, which does not match what we got from rmse test"
c = "the factorization recommender is best, which does not match we got from rmse test"
d = "we can't be sure based on the results which is best"

your_answer = #a

sp.answer_four(your_answer)

**Question 5:** Precision in this case means ...

In [None]:
a = "of all the movies, the propotion you got right as watching or right as not watching"
b = "of the movies we recommended, the proportion they actually watched"
c = "of the movies that they actually watched, the proportion you recommended"
d = "none of the above"

your_answer = #a

sp.answer_five(your_answer)

**Question 6:** Recall in this case means...

In [None]:
a = "of all the movies, the propotion you got right as watching or right as not watching"
b = "of the movies we recommended, the proportion they actually watched"
c = "of the movies that they actually watched, the proportion you recommended"
d = "none of the above"

your_answer = #a

sp.answer_six(your_answer)

Now that you have found the best model based on the test set, it is important that you make sure your model performs well in the real world.  In order to re-use models for new situations, you will want to save them.  Look at the [`save`](https://apple.github.io/turicreate/docs/userguide/recommender/using-trained-models.html) method at the bottom of the page here, and use it to save one of your models.

In [None]:
# put some recommendations here, so you can compare with the loaded model
new_user = tc.SFrame({'user_id': [0]})
model_popular.recommend(new_user, k=3)

In [None]:
pth = './factor_model.model' # save your model here


Now use the load method **link** and use it to load your existing model

In [None]:
# load the model here and make sure the recommendations match from before
loaded_model = # load your model here
loaded_model.recommend(new_user, k=3) # test it has the same predictions

You may also want to do as you did earlier and store the results in a `json` format to be used by other engineering groups.  You could then imagine updating data files, re-creating your models, and then final creating new predictions.  This is a process you can find in the extra section to simulate how this might work in the real world.

In [None]:
# run this for good measure
sp.end_value()