 # **Anime Recommender System**
 
 In this notebook, I am trying to build an anime embeddings based on myanimelist ratings dataset. The dataset is accessible through Kaggle here: https://www.kaggle.com/CooperUnion/anime-recommendations-database.

This notebook comprises of the following sections:
1. Data preprocessing.
2. Model building.
3. Model training
4. Results visualization.

We are going to use collaborative filtering technique to build the recommendation system. We adopt idea from natural language processing, that is the GloVe model to make anime embeddings. From these embeddings, we can give anime recommendation based on similarity with the list of anime you give for further project.

**Data Preprocessing**

In [None]:
from collections import Counter, defaultdict
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from sklearn.manifold import TSNE

First, we want to see how our data looks like. We sort the list of anime based on its id.

In [None]:
anime = pd.read_csv("../input/anime-recommendations-database/anime.csv")
rating = pd.read_csv("../input/anime-recommendations-database/rating.csv")

In [None]:
anime = anime.sort_values('anime_id')
anime

Next, we want to clean the rating data to fit our purpose. Since we want to have collaborative embedding, we need to make cooccurence matrix of recommended anime. In this case all anime recommended by one person is defined as one "cooccurence". Due to limitation of computation power, I suggest that we clean the data under these assumptions:

1. Anime that are not rated are not impressionable enough. Remove any anime that are not watched (rating = -1)
2. We only want anime that people recommend. Find the mean rating of all anime from all users, keep only ratings higher than the mean.
3. People who recommended too few anime does not know anime enough, people who recommended too many anime probably watch any anime. We limit the data only from people who gives rating higher than the average from 10 to 20 anime (the number choice is arbitrary).

Once the data is cleaned, we want to see how the rating looks like.

In [None]:
rated = rating[rating.rating >= 0]
mean = rated["rating"].mean()
rated = rated[rated.rating > mean]
test = rated.groupby("user_id").filter(lambda x:len(x) >= 10)
test = test.groupby("user_id").filter(lambda x:len(x) <= 20)

test

We have significantly decrease the number of rating data (but hopefully representative enough). To make cooccurence matrix, we need cross tabulation with each row corresponding to the individual user and the column for each anime. The value of 1 means that user recommended that anime and 0 means that no recommendation on the anime. We will see mostly zeroes because of the 12294 anime, only 10 to 20  of the row is non-zero entries.

In [None]:
cross = pd.crosstab(index = test.user_id, columns = test.anime_id)
cross

Representing the cross table as a matrix, we can have the cooccurence matrix by multiplying the matrix with its transpose.

In [None]:
cross_int = cross.astype(int)
cooc = cross_int.T.dot(cross_int)
cooc

Some anime only appear once. We cannot infer its relationship with other anime, remove them.

In [None]:
once = [cooc.columns[i] for i in range(len(cooc.columns)) if cooc.iloc[i, i] == 1]
cooc = cooc.drop(columns = once)
cooc = cooc.drop(index = once)
cooc

Our data has gone significantly smaller and will be very easy to compute. This is going to be

**Model Building**

Let us start by making the dataset to be in the format we need. We make a class of Anime Dataset with input of the rating's cooccurence matrix and the anime list that acts as dictionary. The input would be the tokenized id of each anime and the output would be cooccurence number of the two anime.

In [None]:
class AnimeDataset:
    def __init__(self, coocc_matrix, anime_df):
        self.coocc = coocc_matrix
        self.anime = anime_df
        
        self.good_id = list(self.coocc.columns)        
        self.namelen = len(self.good_id)
        
        self.name = [self.anime.name[self.anime.anime_id == i].values[0] for i in self.good_id if len(self.anime.name[self.anime.anime_id == i].values) != 0]
        self.newid = list(range(self.namelen))
        self.id2name = dict(zip(self.newid, self.name))
        
        newcol = list(range(self.namelen))
        
        self.coocc.columns = newcol
        self.coocc.index = newcol
        
        self._i_idx = list()
        self._j_idx = list()
        self._xij = list()
        
        for i in range(self.namelen):
            for j in range(self.namelen):
                if i != j and self.coocc.loc[i, j] != 0:
                    self._i_idx.append(i)
                    self._j_idx.append(j)
                    self._xij.append(self.coocc.loc[i, j])
        
        self._i_idx = torch.LongTensor(self._i_idx).cuda()
        self._j_idx = torch.LongTensor(self._j_idx).cuda()
        self._xij = torch.FloatTensor(self._xij).cuda()
        
    def get_batches(self, batch_size):
        #Generate random idx
        rand_ids = torch.LongTensor(np.random.choice(len(self._xij), len(self._xij), replace=False))
        
        for p in range(0, len(rand_ids), batch_size):
            batch_ids = rand_ids[p:p+batch_size]
            yield self._xij[batch_ids], self._i_idx[batch_ids], self._j_idx[batch_ids]

We create our dataset object here.

In [None]:
dataset = AnimeDataset(cooc, anime)

Next, we make the model here. We adopt GloVe model from NLP where instead of using kernels that slide along words, a single user's reviews is the entire "kernel". However, the anime in the same kernel are given the same weight unlike words that become less related when it is located further from each other.

In [None]:
class AnimeGlove(nn.Module):
    def __init__(self, num_embeddings, embedding_dim):
        super(AnimeGlove, self).__init__()
        self.wi = nn.Embedding(num_embeddings, embedding_dim)
        self.wj = nn.Embedding(num_embeddings, embedding_dim)
        self.bi = nn.Embedding(num_embeddings, 1)
        self.bj = nn.Embedding(num_embeddings, 1)
        
        self.wi.weight.data.uniform_(-1, 1)
        self.wj.weight.data.uniform_(-1, 1)
        self.bi.weight.data.zero_()
        self.bj.weight.data.zero_()
        
    def forward(self, i_indices, j_indices):
        w_i = self.wi(i_indices)
        w_j = self.wj(j_indices)
        b_i = self.bi(i_indices).squeeze()
        b_j = self.bj(j_indices).squeeze()
        
        x = torch.sum(w_i * w_j, dim=1) + b_i + b_j
        
        return x

**Model Training**

Now we can start training the model. I choose 100 as the embedding dimension and 3014 number of anime we want to give embeddings.

In [None]:
EMBED_DIM = 100
NAME_LEN = len(cooc)
model = AnimeGlove(NAME_LEN, EMBED_DIM)
model.cuda()

Here, we define the weight function according to the paper, loss function, and the optimizer.

In [None]:
def weight_func(x, x_max, alpha):
    wx = (x/x_max)**alpha
    wx = torch.min(wx, torch.ones_like(wx))
    return wx.cuda()  

def wmse_loss(weights, inputs, targets):
    loss = weights * F.mse_loss(inputs, targets, reduction='none')
    return torch.mean(loss).cuda()

optimizer = optim.Adagrad(model.parameters(), lr=0.05)

We set up the parameters according to the paper and begin training. 

In [None]:
N_EPOCHS = 100
BATCH_SIZE = 2048
X_MAX = 100
ALPHA = 0.75
n_batches = int(len(dataset._xij) / BATCH_SIZE)
loss_values = list()
for e in range(1, N_EPOCHS+1):
    batch_i = 0

    for x_ij, i_idx, j_idx in dataset.get_batches(BATCH_SIZE):

        batch_i += 1

        optimizer.zero_grad()

        outputs = model(i_idx, j_idx)

        weights_x = weight_func(x_ij, X_MAX, ALPHA)

        loss = wmse_loss(weights_x, outputs, torch.log(x_ij))

        loss.backward()

        optimizer.step()

        loss_values.append(loss.item())

        if batch_i % 100 == 0:
            print("Epoch: {}/{} \t Batch: {}/{} \t Loss: {}".format(e, N_EPOCHS, batch_i, n_batches, np.mean(loss_values[-20:])))  
    
    print("Saving model...")
    torch.save(model.state_dict(), "anime.pt")

We can see how the loss values converge to 0.

In [None]:
plt.plot(loss_values)

**Results Visualization**

We want to verify if the model works as intended. Let us visualize the anime using t-SNE to reduce the 100 dimension into 2D scatter plot. 

In [None]:
emb_i = model.wi.weight.cpu().data.numpy()
emb_j = model.wj.weight.cpu().data.numpy()
emb = emb_i + emb_j
top_k = 300
tsne = TSNE(metric='cosine', random_state=123)
embed_tsne = tsne.fit_transform(emb[:top_k, :])
fig, ax = plt.subplots(figsize=(14, 14))
for idx in range(top_k):
    plt.scatter(*embed_tsne[idx, :], color='steelblue')
    plt.annotate(dataset.id2name[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)

While the exact visualization depends on the training results, there are two ways we can see how our training of this embeddings have succeeded.

1. Anime with sequels are located closely together. For example, try to look at the Hunter x Hunter and Initial D. It is reasonable to expect people who like an anime like its sequel too.
2. The very popular anime are closely located together. Look how Naruto and Bleach are closely located. If you look at the surrounding anime, they are all really popular anime which everyone watches.

For these two reasons, I think it is sufficient to say that the anime embeddings work pretty well. This embedding can be deployed to make anime recommendation for further project.

Reference:
1. https://nlpython.com/implementing-glove-model-with-pytorch/
2. https://towardsdatascience.com/collaborative-embeddings-for-lipstick-recommendations-98eccfa816bd