# Training an "Item Similary Recommender" model on the MovieLens 26M dataset with Turi Create

In this notebook, you're going to use Turi Create to train a movie recommendation in ~1 minute runtime (MacBook Pro w/ TB).

Start off by importing `turicreate` (as `tc` for convenience)

In [1]:
import turicreate as tc

Then, read the "ratings.csv" file from the "ml-latest" directory. This file contains 3 columns:

- userId: The unique identifier of the user that watched and rated a certain movie.
- movieId: The unique identifier of the movie that the user watched.
- rating: A rating on a scale of 0.5 to 5.
- timestamp: When the user rated and watched the movie.

In [2]:
ratings = tc.SFrame.read_csv("ml-latest-small/ratings.csv")

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,float,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


The "timestamp" column of this file is not needed for this code pattern. It could be used to track how a user's interests change over time, but we won't leverage this data point for now. We'll remove that column from the SFrame:

In [3]:
del ratings["timestamp"]

We'll then print out the ratings SFrame, to get an idea of what the table looks like:

In [4]:
print(ratings)

+--------+---------+--------+
| userId | movieId | rating |
+--------+---------+--------+
|   1    |    1    |  4.0   |
|   1    |    3    |  4.0   |
|   1    |    6    |  4.0   |
|   1    |    47   |  5.0   |
|   1    |    50   |  5.0   |
|   1    |    70   |  3.0   |
|   1    |   101   |  5.0   |
|   1    |   110   |  4.0   |
|   1    |   151   |  5.0   |
|   1    |   157   |  5.0   |
+--------+---------+--------+
[100836 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


Next, you'll train the model itself. There are a few different kinds of models, but we're going to use the "item similarity recommender" model. It's very useful for recommending items to new users when there are only a few previous occurence data points to feed into the model.

We'll also have to tell the model that the "user_id" column, which is the identifier for the user, is called "userId"; the "item_id", which is the item that the user interacted with, is "movieId"; and the target, which is the score the user provided to the item, is "rating".

In [5]:
model = tc.item_similarity_recommender.create(ratings, user_id="userId", item_id="movieId", target="rating")

Finally, we'll save the model to the "movie_rec" folder - Turi Create models are not saved as files.

In [6]:
model.save("movie_rec")