<a href="https://colab.research.google.com/github/steimel60/ML/blob/main/DeepLearning/MovieRecs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
from fastai.vision.all import *

[K     |████████████████████████████████| 719 kB 30.8 MB/s 
[K     |████████████████████████████████| 1.2 MB 42.9 MB/s 
[K     |████████████████████████████████| 346 kB 70.3 MB/s 
[K     |████████████████████████████████| 197 kB 72.0 MB/s 
[K     |████████████████████████████████| 4.2 MB 60.5 MB/s 
[K     |████████████████████████████████| 59 kB 6.9 MB/s 
[K     |████████████████████████████████| 86 kB 5.6 MB/s 
[K     |████████████████████████████████| 1.1 MB 54.5 MB/s 
[K     |████████████████████████████████| 212 kB 73.0 MB/s 
[K     |████████████████████████████████| 140 kB 76.4 MB/s 
[K     |████████████████████████████████| 86 kB 1.4 MB/s 
[K     |████████████████████████████████| 596 kB 68.0 MB/s 
[K     |████████████████████████████████| 127 kB 76.0 MB/s 
[K     |████████████████████████████████| 271 kB 75.7 MB/s 
[K     |████████████████████████████████| 144 kB 75.2 MB/s 
[K     |████████████████████████████████| 94 kB 4.0 MB/s 
[K     |███████████████████████

In [2]:
#Get Movie Rating Data
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t',header=None,names=['user','movie','rating','timestamp'])
ratings.head()

Unnamed: 0,user,movie,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [3]:
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', usecols=(0,1), names=('movie','title'), header=None)
movies.head()

Unnamed: 0,movie,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [4]:
#Merge movies and ratings
ratings = ratings.merge(movies)
ratings.head()

Unnamed: 0,user,movie,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,63,242,3,875747190,Kolya (1996)
2,226,242,5,883888671,Kolya (1996)
3,154,242,3,879138235,Kolya (1996)
4,306,242,5,876503793,Kolya (1996)


In [6]:
from fastai.collab import *
from fastai.tabular.all import *

#Build Data Loaders
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()

Unnamed: 0,user,title,rating
0,542,My Left Foot (1989),4
1,422,Event Horizon (1997),3
2,311,"African Queen, The (1951)",4
3,595,Face/Off (1997),4
4,617,Evil Dead II (1987),1
5,158,Jurassic Park (1993),5
6,836,Chasing Amy (1997),3
7,474,Emma (1996),3
8,466,Jackie Chan's First Strike (1996),3
9,554,Scream (1996),3


In [8]:
#Represent users and movies as matrices
n_users = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
n_factors = 5

user_factors = torch.randn(n_users, n_factors)
movie_factors = torch.randn(n_movies, n_factors)

Our model will work by learning a certain number of factors (could be genre, or a popular actor, etc.) Then depending on the users likes/dislikes we can take the dotproduct of the user factors and movie factors to get an idea of whether or not they will enjoy a movie.

In [9]:
class DotProduct(Module):
  def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
    self.user_factors = Embedding(n_users, n_factors)
    self.movie_factors = Embedding(n_movies, n_factors)
    self.y_range = y_range
  
  def forward(self, x):
    #Called when inheriting from a PyTorch class
    users = self.user_factors(x[:,0])
    movies = self.movie_factors(x[:,1])
    return sigmoid_range((users * movies).sum(dim=1), *self.y_range)

In [10]:
model = DotProduct(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,time
0,1.005721,0.999734,00:06
1,0.885945,0.905953,00:06
2,0.693833,0.876002,00:06
3,0.484503,0.874067,00:06
4,0.369077,0.877741,00:06


This model can be improved by accounting for how positive/negative a user is in their reviews and how good or bad a movie is. To improve our model by accounting for this we can add Biases to our model

In [12]:
class DotProductBias(Module):
  def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
    self.user_factors = Embedding(n_users, n_factors)
    self.user_bias = Embedding(n_users, 1)
    self.movie_factors = Embedding(n_movies, n_factors)
    self.movie_bias = Embedding(n_movies, 1)
    self.y_range = y_range

  def forward(self, x):
    #Called when inheriting from a PyTorch class
    users = self.user_factors(x[:,0])
    movies = self.movie_factors(x[:,1])
    res = (users * movies).sum(dim=1, keepdim=True)
    res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
    return sigmoid_range(res, *self.y_range)

In [13]:
model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5,5e-3)

epoch,train_loss,valid_loss,time
0,0.928226,0.941926,00:11
1,0.821354,0.864699,00:11
2,0.61655,0.869891,00:07
3,0.410764,0.890642,00:07
4,0.292861,0.897089,00:07


Our training loss improved but validation loss worsened - we are overfitting