# Recommendation System (Collaborative Filtering)

## Libraries

In [6]:
#hide
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cuda


In [5]:
from fastai.collab import *
from fastai.tabular.all import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Loading the dataset

In [10]:
books = pd.read_csv('BookRating/Books.csv/Books.csv', delimiter=',', encoding='latin-1',
                    usecols=(0,1), names=('ISBN', 'bookTitle'), header=None)

users = pd.read_csv('BookRating/Users.csv/Users.csv', delimiter=',', encoding='latin-1', usecols=(0,))
users.columns = ['userID']

ratings = pd.read_csv('BookRating/Ratings.csv/Ratings.csv', delimiter=',', encoding='latin-1',
                      usecols=(0,1,2), names=('userID', 'ISBN', 'bookRating'), header=None)

  ratings = pd.read_csv('BookRating/Ratings.csv/Ratings.csv', delimiter=',', encoding='latin-1',


In [11]:
# show head of ratings
ratings.head()

Unnamed: 0,userID,ISBN,bookRating
0,User-ID,ISBN,Book-Rating
1,276725,034545104X,0
2,276726,0155061224,5
3,276727,0446520802,0
4,276729,052165615X,3


In [13]:
# Remove the first row of data in ratings as it is junk data
ratings = ratings.iloc[1:]
ratings['bookRating'] = pd.to_numeric(ratings['bookRating'])

In [15]:
# The first line (0) containing junk data is gone
ratings.head()

Unnamed: 0,userID,ISBN,bookRating
1,276725,034545104X,0
2,276726,0155061224,5
3,276727,0446520802,0
4,276729,052165615X,3
5,276729,0521795028,6


## Data Preprocessing

In [17]:
# For training our dataframe we need to merge, ratings and books to have "userID, ISBN, bookRating, bookTitle" in a dataframe
ratings = ratings.merge(books)
ratings.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle
0,276725,034545104X,0,Flesh Tones: A Novel
1,276726,0155061224,5,Rites of Passage
2,276727,0446520802,0,The Notebook
3,276729,052165615X,3,Help!: Level 1
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge English Readers)


##### Now our dataset is ready for first training atempt

## DataLoader

In [19]:
# Using fastai's CollabDataLoaders.from_df for our dataloader
dls = CollabDataLoaders.from_df(ratings, user_name='userID', item_name='bookTitle', rating_name='bookRating', bs=64)
dls.show_batch()

Unnamed: 0,userID,bookTitle,bookRating
0,31609,Middlesex: A Novel,9
1,218187,La cautiva de Tordesillas,0
2,270211,Fatal Flaw,0
3,16795,Paradise Junction,0
4,223201,TO LOVE AND BE WISE,0
5,131855,7 Keys of Charisma: Unlocking the Secrets of Those Who Have It,0
6,47925,Exit to Eden,5
7,261829,Reading Lolita in Tehran: A Memoir in Books,0
8,171076,Incredible Universe,0
9,238781,"Surprise Delivery (That'S My Baby) (Silhouette Special Edition, 1273)",0


The basic structure of my model is SGD (stochastic gradient descent). I need to initialize some random parameters as a set of latent factors for users and books.

Step 1) Define latent factors (Parameters)

step 2) Calculate the predictions using "Dot Product" of latent factors of users and books

step 3) Calculate the loss

In [21]:
# our latent factors
n_users = len(dls.classes['userID'])
n_books = len(dls.classes['bookTitle'])
n_factors = 5

# n_factors = N, shows that we have N layers of random numbers of that latent factors
# in this example we have 5 layers of user's latent factors and 5 layers of books' latent factors
users_factors = torch.randn(n_users, n_factors)
books_factors = torch.randn(n_books, n_factors)

## Model

Our model is a basic SGD with some enhancements (a PyTorch module). First, the argument y_range is defined by the range of the ratings (0-10). Second, I used Embedding for defining the users' factors and biases. Third, When our pytorch module is called, Pytorch will call a method **forward** and will pass along to that any parameters that are included in the call. Model will first calculate the dot product of factors (parameters), then calculate the bias, and at the end return the sigmoid with a range of ratings.

In [23]:
class DotProduct(Module):
  def __init__(self, n_users, n_books, n_factors, y_range=(0, 10.1)):
    self.users_factors = Embedding(n_users, n_factors)
    self.books_factors = Embedding(n_books, n_factors)
    self.users_bias = Embedding(n_users, 1)
    self.books_bias = Embedding(n_books, 1)
    self.y_range = y_range

  def forward(self, x):
    users = self.users_factors(x[:, 0])
    books = self.books_factors(x[:, 1])
    res = ((users * books).sum(dim=1, keepdim=True))
    res += self.users_bias(x[:, 0]) + self.books_bias(x[:, 1])
    return sigmoid_range(res, *self.y_range)

## Training

In [25]:
model = DotProduct(n_users, n_books, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(7, 5e-3, wd=0.2)

epoch,train_loss,valid_loss,time
0,14.146245,14.529581,02:04
1,14.260515,14.614294,02:03
2,14.037701,14.62132,02:03
3,14.0599,14.477343,02:03
4,13.985012,14.295668,02:04
5,13.33845,14.141809,02:04
6,13.649458,14.098104,02:12


Seeing this result forced me to go deeper into our ratings dataset

In [26]:
Counter(ratings['bookRating'])

Counter({0: 647294,
         8: 91804,
         10: 71225,
         7: 66402,
         9: 60778,
         5: 45355,
         6: 31687,
         4: 7617,
         3: 5118,
         2: 2375,
         1: 1481})

As we can see, the number of zeros (0) are so much more than other ratings almost 47% of our ratings belongs to our friend "0".

First, I tried to reduce the number of zeros in our dataset which was a basic solution. Giving us better valid_loss, but still sth around 7.4 is not the valid_loss we look for. Then I tried weighted ratings, meaning that giving **lower weights** to the ratings that are **more** in our dataset, like **"0"** and **higher weights** to the ratings that are **less** in our dataset, like "2". My first attempt was a waste , but as I normalized the weights, the results became fascinating. 

For this to be achievable in fastai, I should use my own loss_function

### The real part starts here

## Weights

In [34]:
rating_counts = Counter(ratings['bookRating'])

# Give them weights based on their appearence count in the dataset
total = sum(rating_counts.values())
weights = {k: total / (v * len(rating_counts)) for k, v in rating_counts.items()}

weight_tensor = torch.tensor([weights[i] for i in range(11)], dtype=torch.float32)

weight_tensor /= weight_tensor.sum()  # Normalize to sum to 1

## Loss Function

In [37]:
class WeightedMSELoss:
    def __init__(self, weights):
        # defining weights for each rating number
        # weights: tensor of shape (n_ratings,)
        self.weights = weights
    def __call__(self, input, target):
        # input: model predictions (batch_size, 1)
        # target: true ratings (batch_size)
        device = target.device
        weights = self.weights.to(device)
        # I had a problem with device as I am training on Cuda and there was a conflict in the process
        target_long = target.long().squeeze()
        diff = (input.squeeze() - target.squeeze()) ** 2

        # Apply weights based on the portion of ratings
        batch_weights = weights[target_long]
        weighted_diff = diff * batch_weights

        # return the weighted mean squared of differences
        return weighted_diff.mean()

# Train

In [38]:
model = DotProduct(n_users, n_books, 50)
loss_func = WeightedMSELoss(weight_tensor)
learn = Learner(dls, model, loss_func=loss_func)

learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.059279,0.063556,02:08
1,0.061095,0.063083,02:03
2,0.056069,0.06296,02:00
3,0.064849,0.062882,02:00
4,0.055771,0.062835,01:59


As you can see our valid_loss and train_loss are showing good results, so we can head to exporting our model and using it in Gradio.

In [39]:
learn.export('recommendation_model.pkl')

In [40]:
learn.path

Path('.')

In [41]:
torch.save(model.state_dict(), 'recommendation_model.pth')