<a href="https://colab.research.google.com/github/richardcsuwandi/tc-projects/blob/main/Movie%20Recommender%20using%20Turi%20Create.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Movie Recommender using Turi Create
In this project, we will use Turi Create to build a movie recommender using the MovieLens 100K dataset.  [Turi Create](https://apple.github.io/turicreate/docs/api/index.html) is an open source toolset developed by Apple for creating Core ML models. The goal of this project is to create a movie recommender using the [recommender systems toolkit](https://apple.github.io/turicreate/docs/userguide/recommender/) provided by Turi Create.

Note: You can find the complete documentation of Turi Create [here](https://apple.github.io/turicreate/docs/api/index.html)

## Installing Turi Create
First, we need to install Turi Create using the following command:

`pip install turicreate`

Note: You can find the list of supported platforms and system requirements [here](https://github.com/apple/turicreate#supported-platforms)

In [None]:
# Install turicreate
!pip install turicreate

After Turi Create is successfully installed, we can import Turi Create using:

`import turicreate as tc`

Note: `tc` here is just an abbreviation for `turicreate`

In [2]:
# Import turicreate
import turicreate as tc

## Loading the data

Next, we can load the MovieLens 100k dataset into SFrames. The dataset contains 100836 ratings across 9742 movies, which was rated by 610 different users. In particular, we will use the `ratings.csv` and `movies.csv` files

Note: The MovieLens dataset can be downloaded [here](https://grouplens.org/datasets/movielens/)

In [90]:
# Load the ratings and movies data to SFrames
ratings = tc.SFrame.read_csv("https://raw.githubusercontent.com/richardcsuwandi/datasets/master/ratings.csv")
movies = tc.SFrame.read_csv("https://raw.githubusercontent.com/richardcsuwandi/datasets/master/movies.csv")

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,float,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


After successfully loading the data, we can take a look at the data using `SFrame.head()`:

In [91]:
# Display the first 10 rows of the ratings data
ratings.head()

userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931
1,70,3.0,964982400
1,101,5.0,964980868
1,110,4.0,964982176
1,151,5.0,964984041
1,157,5.0,964984100


In [92]:
# Display the first 10 rows of the movies data
movies.head()

movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Child ren|Comedy|Fantasy ...
2,Jumanji (1995),Adventure|Children|Fantas y ...
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995) ...,Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
10,GoldenEye (1995),Action|Adventure|Thriller


## Preprocessing the data
As we can see above, the ratings SFrame contains the `rating` for each corresponding `userId` and `movieId`. On the other hand, the movies SFrame contains information like the `title` and `genres` for each `movieId`. Let's merge these two SFrames together:

In [106]:
# Merge the ratings and movies SFrames
full_data = ratings.join(movies, on="movieId", how="left")

full_data.head()

userId,movieId,rating,timestamp,title,genres
1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Child ren|Comedy|Fantasy ...
1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995) ...,Mystery|Thriller
1,50,5.0,964982931,"Usual Suspects, The (1995) ...",Crime|Mystery|Thriller
1,70,3.0,964982400,From Dusk Till Dawn (1996) ...,Action|Comedy|Horror|Thri ller ...
1,101,5.0,964980868,Bottle Rocket (1996),Adventure|Comedy|Crime|Ro mance ...
1,110,4.0,964982176,Braveheart (1995),Action|Drama|War
1,151,5.0,964984041,Rob Roy (1995),Action|Drama|Romance|War
1,157,5.0,964984100,Canadian Bacon (1995),Comedy|War


After merging the SFrames, we need to split the data into training and validation sets using `tc.recommender.util.random_split_by_user`

In [95]:
train_data, val_data = tc.recommender.util.random_split_by_user(full_data, 
                                                                user_id="userId",
                                                                item_id="movieId",
                                                                max_num_users=100,
                                                                item_test_proportion=0.2)

Here, the validation dataset is generated by first choosing `max_num_users` (set to 100) out of the
total number of users in dataset. Then, for each of the chosen users, a portion of the user's items (determined by `item_test_proportion`, which is set to 0.2) is randomly chosen to be included in the validation set. We can check the resulting dimensions of the training and validation sets as follows:

In [96]:
# Check the dimensions of the training and validation sets
train_data.shape, val_data.shape

((97976, 5), (2860, 5))

## Building the model 
Now, we are ready to build our movie recommender model. The easiest way to choose a model is to let Turi Create choose your model for you. This is done by simply using the default `tc.recommender.create` function, which chooses the model based on the data provided to it

In [103]:
# Build the model
model = tc.recommender.create(train_data,
                              user_id="userId",
                              item_id="title",
                              target="rating")

In [104]:
# Show the model parameters
model

Class                            : RankingFactorizationRecommender

Schema
------
User ID                          : userId
Item ID                          : title
Target                           : rating
Additional observation features  : 2
User side features               : []
Item side features               : []

Statistics
----------
Number of observations           : 97976
Number of users                  : 610
Number of items                  : 9660

Training summary
----------------
Training time                    : 8.9474

Model Parameters
----------------
Model class                      : RankingFactorizationRecommender
num_factors                      : 32
binary_target                    : 0
side_data_factorization          : 1
solver                           : auto
nmf                              : 0
max_iterations                   : 25

Regularization Settings
-----------------------
regularization                   : 0.0
regularization_type              : normal
l

Using the provided data, Turi Create has chosen a Ranking Factorization Recommender model which learns latent factors for each user and item and uses them to rank recommended items according to the likelihood of observing those (user, item) pairs. This model is commonly desired when performing collaborative filtering for implicit feedback datasets or datasets with explicit ratings for which ranking prediction is desired

## Evaluating the model
We can evaluate our train model by calling `model.evaluate` on the validation set:

In [120]:
# Evaluate the model on validation set
model.evaluate(val_data)


Precision and recall summary statistics by cutoff
+--------+---------------------+-----------------------+
| cutoff |    mean_precision   |      mean_recall      |
+--------+---------------------+-----------------------+
|   1    |         0.06        | 0.0028126716953648027 |
|   2    | 0.08000000000000002 |  0.00760916960925404  |
|   3    |         0.09        |  0.011484332611459045 |
|   4    |        0.085        |  0.013928684646944085 |
|   5    |         0.08        |  0.01853489582706831  |
|   6    | 0.07166666666666667 |  0.02049502043766021  |
|   7    | 0.06857142857142856 |  0.023132557020998767 |
|   8    |        0.0675       |  0.025295098401187214 |
|   9    | 0.06111111111111112 |  0.02536312561207156  |
|   10   | 0.05900000000000001 |  0.028217849319736445 |
+--------+---------------------+-----------------------+
[10 rows x 3 columns]


Overall RMSE: 1.1719939366833003

Per User RMSE (best)
+--------+---------------------+-------+
| userId |         rmse        

{'precision_recall_by_user': Columns:
 	userId	int
 	cutoff	int
 	precision	float
 	recall	float
 	count	int
 
 Rows: 1800
 
 Data:
 +--------+--------+-----------+----------------------+-------+
 | userId | cutoff | precision |        recall        | count |
 +--------+--------+-----------+----------------------+-------+
 |   1    |   1    |    0.0    |         0.0          |   44  |
 |   1    |   2    |    0.0    |         0.0          |   44  |
 |   1    |   3    |    0.0    |         0.0          |   44  |
 |   1    |   4    |    0.0    |         0.0          |   44  |
 |   1    |   5    |    0.0    |         0.0          |   44  |
 |   1    |   6    |    0.0    |         0.0          |   44  |
 |   1    |   7    |    0.0    |         0.0          |   44  |
 |   1    |   8    |    0.0    |         0.0          |   44  |
 |   1    |   9    |    0.0    |         0.0          |   44  |
 |   1    |   10   |    0.1    | 0.022727272727272728 |   44  |
 +--------+--------+-----------+----

There are two metrics that are used in evaluating the model: RMSE and precision-recall. RMSE measures how well the model predicts the score of the user, while precision-recall measures how well the `model.recommend()` function recommends items that the user also chooses

## Making recommendations

Finally, we can use our trained model to make recommendations. We can use the `model.recommend(k)` function to get the top k movie recommendations for each user. Let's get the top 5 movie recommendations for each user:

In [107]:
# Get the top 5 movie recommendations for each user
results = model.recommend(k=5)

The `model.recommend()` function gives the recommendations

```
# This is formatted as code
```

 for all users and saves the result into an SFrame. We can print the first 25 rows of the SFrame using `SFrame.print_rows()`:

In [110]:
# Print the first 25 rows
results.print_rows(25)

+--------+-------------------------------+--------------------+------+
| userId |             title             |       score        | rank |
+--------+-------------------------------+--------------------+------+
|   1    | Lord of the Rings: The Fel... | 5.212723076376367  |  1   |
|   1    | Shawshank Redemption, The ... | 5.177412867578912  |  2   |
|   1    | Lord of the Rings: The Ret... | 5.087076604399133  |  3   |
|   1    |        Magnolia (1999)        | 5.065529013085413  |  4   |
|   1    | Good, the Bad and the Ugly... | 5.018297970327783  |  5   |
|   2    | Harry Potter and the Chamb... | 4.531741276416231  |  1   |
|   2    |  Sleepless in Seattle (1993)  | 4.5262871161431555 |  2   |
|   2    |  Bourne Ultimatum, The (2007) | 4.464553972739029  |  3   |
|   2    | Spanish Apartment, The (L'... | 4.4594226777881865 |  4   |
|   2    | Monty Python's Life of Bri... | 4.395771332117963  |  5   |
|   3    |   Mad Max: Fury Road (2015)   | 4.836166933211732  |  1   |
|   3 

We can also select only a specific subset of the users and get their recommendations:

In [117]:
# Recommend on a subset of users
res = model.recommend(users=[10, 100, 500], k=5)
res.print_rows(15)

+--------+-------------------------------+--------------------+------+
| userId |             title             |       score        | rank |
+--------+-------------------------------+--------------------+------+
|   10   |    Ocean's Thirteen (2007)    | 5.298210129174638  |  1   |
|   10   | Harry Potter and the Chamb... | 5.098013356360841  |  2   |
|   10   |   Look Who's Talking (1989)   | 5.085982665452409  |  3   |
|   10   | Mystery Science Theater 30... | 4.890643760594774  |  4   |
|   10   |      Fugitive, The (1993)     | 4.887112423810411  |  5   |
|  100   | Shawshank Redemption, The ... | 4.6426611394018416 |  1   |
|  100   | How to Lose a Guy in 10 Da... |  4.52993492785399  |  2   |
|  100   | Clear and Present Danger (... | 4.4457693882614855 |  3   |
|  100   |    Bringing Up Baby (1938)    | 4.351187214288164  |  4   |
|  100   |      Fugitive, The (1993)     | 4.275397583875108  |  5   |
|  500   | House of Flying Daggers (S... | 4.726014643940378  |  1   |
|  500

The `model.recommend()` function also works seamlessly with new users (usually known as the "cold-start" problem).  If the model has never seen the user, then it defaults to recommending popular items:

In [119]:
# Recommend for new user
model.recommend(users=[1000], k=5)

userId,title,score,rank
1000,"Shawshank Redemption, The (1994) ...",4.160681769284654,1
1000,"Silence of the Lambs, The (1991) ...",4.130045041474748,2
1000,Fight Club (1999),4.127069398793626,3
1000,Star Wars: Episode IV - A New Hope (1977) ...,4.094726905259538,4
1000,Jurassic Park (1993),4.063138410481859,5


## Saving the model
Lastly, we can save the model for later use using `model.save()`:

In [65]:
# Save the model
model.save("movie_recommender.model")

Like other models in Turi Create, we can load the model back later using `tc.load_model()`