https://betterprogramming.pub/building-a-recommendation-engine-with-pytorch-d64be4856fe7

# Embeddings for Recommendations

Here I use a move rating data set from Kaggle.
In this notebook I will derive embeddings for users and movies
and these will be the ingredients of a matrix factorization
of the full rating matrix.

We will train a simple Neural Net using Pytorch.

In [1]:
import os
import datetime
import pandas as pd
import numpy as np
import torch.cuda

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

In [3]:
!export CUDA_LAUNCH_BLOCKING=1

### Note: I am going to use my GPU card ... or try to anyway.

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Device in use:", device)

Device in use: cuda


In [5]:
## Read in the data and show shape and head

In [6]:
rdf = pd.read_csv("ratings.csv")
print(rdf.shape)
rdf.head(2)

(26024289, 4)


Unnamed: 0,userId,movieId,rating,timestamp
0,1,110,1.0,1425941529
1,1,147,4.5,1425942435


In [12]:
print(f"max userId {rdf.userId.max()} num users {rdf.userId.nunique()}")
print(f"max movieId {rdf.movieId.max()} num movies {rdf.movieId.nunique()}")

max userId 270896 num users 270896
max movieId 176275 num movies 45115


In [14]:
subsample = True
if subsample:
    max_user = 100000
    max_movie = 20000
    df = rdf.copy()
    df = df.loc[df.movieId <= max_movie]
    df = df.loc[df.userId <= max_user]
    print(df.shape)    
else:
    df = rdf.copy()

(7840094, 4)


## Partition the dataset
into train, val and test.
We can use val to tune the hyper parameters.

Test will be held out until all training is done.

We could use the sklearn function but I find this just as easy.

In [15]:
def partition(df, pct=0.1):
    size = int(np.floor(df.shape[0])*0.1)
    idx = list(np.random.choice(df.index, size, replace=False))
    subset = df.filter(items=idx, axis=0)
    rest = df.drop(index = idx)
    return subset, rest

testdf, val_train = partition(df, 0.1)
valdf, traindf = partition(val_train, 0.1)


## Dataset and dataloader
I want to use mini-batch training so we need a dataset
and a dataloader.

I adapted some code for converting a pandas dataframe into a dataloader

In [9]:
class CustomDataset(Dataset):
    def __init__(self, dataframe):
        self.dataframe = dataframe

    def __getitem__(self, index):
        row = self.dataframe.iloc[index].to_numpy()
        userid = int(row[0])
        movieid = int(row[1])
        rating = np.float32(row[2])
        return userid, movieid, rating

    def __len__(self):
        return len(self.dataframe)

traindata = CustomDataset(dataframe=traindf)
train_dataloader = DataLoader(traindata, batch_size=1024)

valdata = CustomDataset(dataframe=valdf)
val_dataloader = DataLoader(valdata, batch_size=1024)

## the Model
The model is fairly simple: 2 embedding layers, 
    one each for users and movies.
    
At the end of "froward" we simply do the dot product.

In [19]:
class MF(nn.Module):
    def __init__(self, n_users, n_movies, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(n_users, emb_size)
        self.movie_emb = nn.Embedding(n_movies, emb_size)
        # initializing our matrices with a positive number generally will yield better results
        self.user_emb.weight.data.uniform_(0, 0.5)
        self.movie_emb.weight.data.uniform_(0, 0.5)
    def forward(self, users, movies):
        print("in forward")
        print(users, movies)
        u = self.user_emb(users)
        m = self.movie_emb(movies)
        print ( u, m)
        return (u * m).sum(1)  # taking the dot product


## instantiate the Model
and push it to the gpu

In [20]:
n_users = rdf.userId.nunique()
n_movies = rdf.movieId.nunique()
model = MF(n_users, n_movies, emb_size=100)
if torch.cuda.is_available():
    print("using cuda")
    model = model.to(device)
    print(next(model.parameters()).is_cuda)

using cuda


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

In [12]:
# training
epochs=4
lr=0.01
wd=0.0
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
#model.to(device)
#optimizer.to(device)
#train_dataloader.to(device)
model.train()
for ei in range(epochs):
    print(f" epoch {ei}")
    if False:
        userIds = torch.LongTensor(train.userId.values).to(device)
        movieIds = torch.LongTensor(train.movieId.values).to(device)
        ratings = torch.FloatTensor(train.rating.values).to(device)
        print(userIds)
        y_hat = model(userIds, movieIds)
        loss = F.mse_loss(y_hat, ratings)
        train_loss += np.round(loss.item(), 4)/ users.size[0]
        optimizer.zero_grad()  # reset gradient
        loss.backward()
        optimizer.step()        
    if True:
        train_loss = 0.0
        for tbi, data in enumerate(train_dataloader):
            if tbi % 512 == 0:
                print(f" {tbi} {train_loss}")
            users, movies, ratings = data
            users = users.cuda()
            movies = movies.cuda()
            ratings = ratings.cuda()
            y_hat = model(users, movies)
            loss = F.mse_loss(y_hat, ratings)
            train_loss += np.round(loss.item(), 4)/ users.size()[0]

            optimizer.zero_grad()  # reset gradient
            loss.backward()
            optimizer.step() 

    print("done with training")
    val_loss = 0.0
    if True:
        for i, data in enumerate(val_dataloader):
            users, movies, ratings = data
            users = users.cuda()
            movies = movies.cuda()
            ratings = ratings.cuda()
            y_hat = model(users, movies)
            loss = F.mse_loss(y_hat, ratings)
            val_loss += np.round(loss.item(), 4)/ users.size()[0]

    print(f" train loss {train_loss}  val loss {val_loss}")


 epoch 0
 0 0.0


../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [114,0,0], t

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

In [18]:
type(users)

torch.Tensor

In [None]:
type(users)
print(users)

In [None]:
def test_model(model, df):
    model.eval()
    users = torch.LongTensor(df.userId.values)
    movies = torch.LongTensor(df.movieId.values)
    ratings = torch.FloatTensor(df.rating.values)
    y_hat = model(users, movies)
    loss = F.mse_loss(y_hat, ratings)
    return  loss.item()
val_err = test_model(model, val)
test_err = test_model(model, test)    
print(val_err, test_err)

In [None]:
user = torch.tensor([10])
games = torch.tensor(game_ratings['movieId'].unique().tolist())
predictions = model(user, games).tolist()
print(predictions)

In [None]:
normalized_predictions = [i/max(predictions)*10 for i in predictions]
print(normalized_predictions)

In [None]:
sortedIndices = predictions.argsort()
recommendations = dataset['Title'].unique()[sortedIndices][:30]  # taking top 30
print(recommendations)