<a href="https://colab.research.google.com/github/niklasgrimm98/Digital-Organization/blob/main/assignment_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The following assignment consists again of a theoretical part (learning portfolio) and a practical part (assignment). The goal is to train a neural model for a recommendation system.

The plan would be that in the first week we will discuss your learnings from the theory part, that means you are relatively free to fill your Learning Portfolio on this topic and in the following week we will discuss your solutions of the practical part.

#Theory part (filling your Learning Portfolio, June 7)

In preparation for the practical part, I ask you to familiarize yourself with the following video sources in the next week:

1) Please watch the following videos:

https://www.youtube.com/watch?v=Fmtorg_dmM0&ab_channel=ritvikmath (not absolutely necessary, only for the overview)

https://course.fast.ai/Lessons/lesson7.html (The second part of the presentation starting with the topic collaborative filtering is mandatory)

Note: The first part of the video mainly contains tips for neural networks to submit a Kaggle Competition. For that, you would have to watch the end of the 6th video to understand this better. But this is not mandatory.

2) Please download the following notebooks and edit it in Google-Colab. Try to answer a few questions that are asked at the end. Take notes and update your Learning Portfolio.

https://www.kaggle.com/code/jhoward/collaborative-filtering-deep-dive/notebook


#Practical part (Assignment, June 14)

Find any data set that can be used for a recommender system and try to train and validate a neural network for it.

For this purpose I ask you to download a data set from the given lists and to use it for your program application.

https://gist.github.com/entaroadun/1653794

https://github.com/caserec/Datasets-for-Recommender-Systems

https://grouplens.org/datasets/movielens/

https://eigentaste.berkeley.edu/dataset/

In [1]:
#YOUR TASK

import pandas as pd
from google.colab import drive
from google.colab import data_table
import matplotlib.pyplot as plt
data_table.enable_dataframe_formatter()

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [3]:
%cd './drive/My Drive/Digital Organisations'

/content/drive/My Drive/Digital Organisations


In [5]:
from fastai.collab import *
from fastai.tabular.all import *
set_seed(42)

In [8]:
ratings = pd.read_csv("ratings.csv")

In [10]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


Error: Runtime no longer has a reference to this dataframe, please re-run this cell and try again.


In [11]:
movies = pd.read_csv("movies.csv")

In [12]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [13]:
ratings = ratings.merge(movies)
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [14]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()

Unnamed: 0,userId,title,rating
0,547,Brazil (1985),5.0
1,232,"You, Me and Dupree (2006)",3.0
2,328,Slumdog Millionaire (2008),3.0
3,132,Star Wars: Episode I - The Phantom Menace (1999),2.0
4,5,Shadowlands (1993),3.0
5,288,Mimic (1997),2.0
6,477,Scott Pilgrim vs. the World (2010),5.0
7,385,Hoop Dreams (1994),5.0
8,469,Superman II (1980),4.0
9,68,Junior (1994),1.5


In [17]:
n_users  = len(dls.classes['userId'])
n_movies = len(dls.classes['title'])
n_factors = 5

user_factors = torch.randn(n_users, n_factors)
movie_factors = torch.randn(n_movies, n_factors)

In [18]:
one_hot_3 = one_hot(3, n_users).float()


In [19]:
user_factors.t() @ one_hot_3

tensor([ 0.7339, -0.1020,  1.1491,  1.0992, -0.9457])

In [20]:
class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)

    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return (users * movies).sum(dim=1)

In [21]:
x,y = dls.one_batch()
x.shape

torch.Size([64, 2])

In [22]:
model = DotProduct(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())

In [23]:
learn.fit_one_cycle(5, 5e-3)


epoch,train_loss,valid_loss,time
0,2.606445,2.43444,00:17
1,1.429983,1.528834,00:18
2,1.061712,1.359398,00:18
3,0.82568,1.238726,00:17
4,0.715486,1.216508,00:17


In [24]:
learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))

In [25]:
learn.fit_one_cycle(5, 5e-3, wd=0.1)


epoch,train_loss,valid_loss,time
0,0.794284,0.803502,00:19
1,0.715014,0.734097,00:20
2,0.568909,0.709903,00:21
3,0.406219,0.703254,00:18
4,0.303855,0.704915,00:17


In [26]:
movie_bias = learn.model.i_bias.weight.squeeze()
idxs = movie_bias.argsort(descending=True)[:5]
[dls.classes['title'][i] for i in idxs]

['Shawshank Redemption, The (1994)',
 'Forrest Gump (1994)',
 "One Flew Over the Cuckoo's Nest (1975)",
 'Dark Knight, The (2008)',
 'Star Wars: Episode IV - A New Hope (1977)']

In [27]:
# Deep Learning:

embs = get_emb_sz(dls)
embs

[(611, 58), (9720, 274)]

In [28]:
class CollabNN(Module):
    def __init__(self, user_sz, item_sz, y_range=(0,5.5), n_act=100):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 1))
        self.y_range = y_range

    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return sigmoid_range(x, *self.y_range)

In [29]:
model = CollabNN(*embs)


In [30]:
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.01)

epoch,train_loss,valid_loss,time
0,0.816771,0.806853,00:49
1,0.769506,0.750904,01:00
2,0.691325,0.731672,00:58
3,0.619117,0.733882,00:54
4,0.541423,0.75032,00:48


In [31]:
learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.850558,0.832931,00:49
1,0.765464,0.754987,01:00
2,0.725196,0.736526,01:00
3,0.623669,0.73322,00:54
4,0.550176,0.747617,00:49
