#Collaborative filtering tutorial

Training a Recommender System with fastai, using a [MovieLens Dataset](https://doi.org/10.1145/2827872)

In [1]:
!pip install -Uqq fastai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
from fastai.vision.all import *
from fastai.collab import *

download and decompressed

In [5]:
path = untar_data(URLs.ML_100k)
path.ls()

(#23) [Path('/root/.fastai/data/ml-100k/u3.test'),Path('/root/.fastai/data/ml-100k/u4.base'),Path('/root/.fastai/data/ml-100k/u2.test'),Path('/root/.fastai/data/ml-100k/allbut.pl'),Path('/root/.fastai/data/ml-100k/u1.test'),Path('/root/.fastai/data/ml-100k/u.info'),Path('/root/.fastai/data/ml-100k/u4.test'),Path('/root/.fastai/data/ml-100k/u2.base'),Path('/root/.fastai/data/ml-100k/ub.base'),Path('/root/.fastai/data/ml-100k/u.item'),Path('/root/.fastai/data/ml-100k/ub.test'),Path('/root/.fastai/data/ml-100k/mku.sh'),Path('/root/.fastai/data/ml-100k/u3.base'),Path('/root/.fastai/data/ml-100k/u1.base'),Path('/root/.fastai/data/ml-100k/u.data'),Path('/root/.fastai/data/ml-100k/u5.test'),Path('/root/.fastai/data/ml-100k/u.genre'),Path('/root/.fastai/data/ml-100k/README'),Path('/root/.fastai/data/ml-100k/ua.test'),Path('/root/.fastai/data/ml-100k/u.user')...]

* The main table is in u.data. The u.data file is a tab-separated file.
* Since it’s not a proper csv, we have to specify a few things while opening it: the tab delimiter, the columns we want to keep and their names.

In [6]:
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,  usecols=(0,1,2), names=['user','movie','rating'])       #timestamps col don't need
ratings.head()

Unnamed: 0,user,movie,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1


Movie ids are not ideal to look at things, so we load the corresponding movie id to the title that is in the table u.item.

In [8]:
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1', usecols=(0,1), names=('movie','title'), header=None)
movies.head()

Unnamed: 0,movie,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


Next we merge it to our ratings table:

In [9]:
ratings = ratings.merge(movies)
ratings.head()

Unnamed: 0,user,movie,rating,title
0,196,242,3,Kolya (1996)
1,186,302,3,L.A. Confidential (1997)
2,22,377,1,Heavyweights (1994)
3,244,51,2,Legends of the Fall (1994)
4,166,346,1,Jackie Brown (1997)


* We can then build a DataLoaders object from this table.
* By default, it takes the first column for user, the second column for the item (here our movies) and the third column for the ratings.
* We need to change the value of item_name in our case, to use the titles instead of the ids:

In [15]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)       #if we don't mention item_name = title it'll take movie_id by default

In all applications, when the data has been assembled in a DataLoaders, you can have a look at it with the show_batch method:

In [16]:
dls.show_batch()

Unnamed: 0,user,title,rating
0,416,SubUrbia (1997),3
1,153,"Magnificent Seven, The (1954)",3
2,782,Night Flier (1997),3
3,422,Marvin's Room (1996),3
4,595,Turbulence (1997),2
5,82,Singin' in the Rain (1952),3
6,83,Junior (1994),4
7,174,Forrest Gump (1994),5
8,477,Circle of Friends (1995),4
9,194,"Terminator, The (1984)",3


fastai can create and train a collaborative filtering model by using collab_learner:

In [17]:
learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))

In [18]:
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.885069,0.956298,00:12
1,0.683007,0.900681,00:11
2,0.52224,0.873756,00:10
3,0.449842,0.860899,00:10
4,0.431936,0.856428,00:10
