# Collaborative Filtering

This is notebook is my follow along for lesson 5 of the Fast.ai course part 1. 

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.learner import *
from fastai.column_data import *
from sklearn.decomposition import PCA

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


import os
print(os.listdir("../input"))



In [None]:
path='../input/'
tmp_path='/kaggle/working/tmp/'
models_path='/kaggle/working/models/'

Lets load our dataset:

In [None]:
ratings = pd.read_csv(path+'ratings.csv')
ratings.head()

In [None]:
movies = pd.read_csv(path+'movies.csv')
movies.head()

# Lets jump into it

Collaborative filtering

In [None]:
val_idxs = get_cv_idxs(len(ratings))
wd=2e-4
n_factors = 50

In [None]:
cf = CollabFilterDataset.from_csv(path, 'ratings.csv', 'userId', 'movieId', 'rating')
learn = cf.get_learner(n_factors, val_idxs, 64, opt_fn=optim.Adam, tmp_name=tmp_path, models_name=models_path)


In [None]:
learn.fit(1e-2,2,wds=wd,cycle_len=1,cycle_mult=2)

Some other benchmarks use RMSE as their metric:

In [None]:
math.sqrt(0.765)

We are able to grab our predictions which enable us to visualize them. 

In [None]:
preds = learn.predict()

    
   The  `preds`  are our predictions while our actuals is `y`. What we notice is that when we predict high values such as 3.5 it gets up to 4. Meaning it is predicting well. A histogram is located top of the plot while a bar plot is on the right of the plot. 

In [None]:
y=learn.data.val_y
sns.jointplot(preds, y, kind='hex', stat_func=None)

# Results Anaysis

#### Movie Bias

Below we grab the titles of all movies and store it in a dictonary. Then we proceed to grab the total ratings for each movie.  Eg: Movie => ({The Mummy Returns})  Ratings => ({25})

`['The Mummy Returns','25'] = Movie, Ratings`

Then we create  `topMovies` by sorting them and proceed to create a numpy array of indices. 

In [None]:
movie_names = movies.set_index('movieId')['title'].to_dict()
g=ratings.groupby('movieId')['rating'].count()
topMovies=g.sort_values(ascending=False).index.values[:3000]
topMovieIdx = np.array([cf.item2idx[o] for o in topMovies])

`movie_names` is a series. Will need to provide a detailed explantion of what you can do with it.

In [None]:
print(movie_names)

In [None]:
movie_names.items

`g` is a series. Will need to provide a detailed explantion of what you can do with it.

In [None]:
print(g)

In [None]:
learn.summary()

`ib` - movie bias embedding
What are the the other embeddings : `u`, `i`, `ub` ?

In [None]:
m=learn.model; m.cuda()

Attempt to look at the movie bias term. It has the input of the movie it and the output is the movie bisa.

In [None]:
movie_bias = to_np(m.ib(V(topMovieIdx)))

In [None]:
movie_bias

We use the movie names to loop over the zip of the `topMovies` and the `movie_bias`

In [None]:
movie_ratings = [(b[0],movie_names[i]) for i,b in zip(topMovies, movie_bias) ]

Via a lambda function we sort the ratings and display the first 15. `movie_ratings` is a tuple, because we used the movie names to loop over the top movies and their associated id. And also mapped them using the biases. 

In [None]:
sorted(movie_ratings, key=lambda o:  o[0])[:15]

In [None]:
sorted(movie_ratings, key=itemgetter(0))[:15]

In [None]:
sorted(movie_ratings, key=lambda o:  o[0],reverse=True)[:15]

In [None]:
len(sorted(movie_ratings, key=lambda o:  o[0]))

We are also able to interpert embeddings:

In [None]:
movie_emb = to_np(m.i(V(topMovieIdx)))
movie_emb.shape             

# Principle Component Analysis (PCA)

PCA is used identitfy patterns in data by detecting the correlation between variables. PCA projects the entire dataset into a subspace, which is done by reducing the dimensions of d-dimensional dataset to project it onto k-dimensional subspace to increase computational efficiency to retain msot of the information. 

We will decompose the 50 embeddings into 3 vectors using PCA:




In [None]:
pca = PCA(n_components=3)
movie_pca = pca.fit(movie_emb.T).components_

In [None]:
movie_pca.shape

In [None]:
fac0 = movie_pca[0]
movie_comp = [(f, movie_names[i]) for f,i in zip(fac0, topMovies)]

Our first component: 

In [None]:
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

Serious movies

In [None]:
sorted(movie_comp, key=itemgetter(0))[:10]

So easy going films.

Lets grab our next component:

In [None]:
fac1 = movie_pca[1]
movie_comp = [(f, movie_names[i]) for f,i in zip(fac1, topMovies)]

In [None]:
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

CGI Films

In [None]:
sorted(movie_comp, key=itemgetter(0))[:10]

We can put a plot together to see how various films appear on the map of the components:

In [None]:
idxs = np.random.choice(len(topMovies), 50, replace=False)
X = fac0[idxs]
Y = fac1[idxs]
plt.figure(figsize=(15,15))
plt.scatter(X,Y)
for i, x, y in zip(topMovies[idxs], X, Y):
    plt.text(x,y,movie_names[i], color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

# Collaborative filtering from scratch

### Dot product example

Here we simply declear our tensors of n-dimesonal matrices

In [None]:
a = T([[1,2],[3,4]])
b = T([[2,2],[10,10]])
a,b

Then we apply some element-wise multiplcation:

In [None]:
a*b

Here we do element-wise multplication but sum across all the columns, which helds a tensor dot product. 

In [None]:
(a*b).sum(1)

Here we are going to build our own NN to process inputs and compute activations. The PyTorch module is derived from `nn.Module` which will contain a function called `forward` to compute the forward pass. 

In [None]:
class DotProduct(nn.Module):
    def forward(self, u, m): return (u*m).sum(1)

In [None]:
model=DotProduct()

This will call the forward function

In [None]:
model(a,b)

We need to fix some of the data to make it sequential and contiguous IDs. We do thatr getting the unique user IDs, then grab a list of sequential IDs using enumerate, then map the userIds in ratings using the `user_to_index`

In [None]:
unique_users = ratings.userId.unique()
user_to_index= {o:i for i,o in enumerate(unique_users)}
ratings.userId = ratings.userId.apply(lambda x: user_to_index[x])


In [None]:
unique_movies = ratings.movieId.unique()
movie_to_index = {o:i for i,o in enumerate(unique_movies)}
ratings.movieId = ratings.movieId.apply(lambda x:movie_to_index[x] )

In [None]:
n_users=int(ratings.userId.nunique())
n_movies=int(ratings.movieId.nunique())

# Creating the module

We will create a module that looks up the factors for the users and movies from the embedding matrix and then take the dot product. 

in `EmbeddingDot` we create embedding matrices for users and movies, then they are initialized. With the forward pass we take categorical and contiuous variables.

In [None]:
class EmbeddingDot(nn.Module):
    def __init__(self, n_users, n_movies):
        super().__init__()
        self.u = nn.Embedding(n_users, n_factors)
        self.m = nn.Embedding(n_movies, n_factors)
        self.u.weight.data.uniform_(0,0.05)
        self.m.weight.data.uniform_(0,0.05)
        
    def forward(self, cats, const):
        users,movies = cats[:,0],cats[:,1]
        u,m = self.u(users),self.m(movies)
        return (u*m).sum(1).view(-1,1)

We set up our crosstabe where `x` is everything besides the rating and timestamp, while `y` is the rating. 

In [None]:
x = ratings.drop(['rating','timestamp'],axis=1)
y =ratings['rating'].astype(np.float32)

We set up a Fast.ai data loader. 

In [None]:
data = ColumnarModelData.from_data_frame(path, val_idxs, x, y,['userId', 'movieId'], 64)

Then we initialize a optimization function.

In [None]:
wd=1e-5
model = EmbeddingDot(n_users, n_movies).cuda()
opt = optim.SGD(model.parameters(), 1e-1, weight_decay=wd,momentum=0.9)

`fit()` calls the PyTorch training Loop. 

In [None]:
fit(model,data, 3, opt, F.mse_loss)

Since our loss is still high, we will do learning rate annealing.

In [None]:
set_lrs(opt, 0.01)

In [None]:
fit(model,data, 3, opt, F.mse_loss)

# Bias

We need bias for cases where a user gives low scores to movies. 
We will need to create a new model that takes the bias into account, however, it will differ in that that it uses a convience method to make embeddings and normalizes scores returns from the forward pass.  

In [None]:
min_rating, max_rating =ratings.rating.min(), ratings.rating.max()
min_rating, max_rating

What is going on here? 

1. We are getting the number of rows and factors from the rows and columns in the embedding matrix
2. The embedding matrices and bias vectors are initialized.
3.  We apply a dot product, add our bias vectors and normilize the results

In [None]:
# 1
def get_emb(ni,nf):
    e = nn.Embedding(ni, nf)
    e.weight.data.uniform_(-0.01,0.01)
    return e

class EmbeddingDotBias(nn.Module):
    def __init__(self, n_users, n_movies):
        super().__init__()
        # 2
        (self.u, self.m, self.ub, self.mb) = [get_emb(*o) for o in [
            (n_users, n_factors), (n_movies, n_factors), (n_users,1),(n_movies,1)
        ]]
        
        # 3
    def forward(self, cats, conts):
        users,movies = cats[:,0],cats[:,1]
        um = (self.u(users)*self.m(movies)).sum(1)
        res = um + self.ub(users).squeeze() + self.mb(movies).squeeze()
        res = F.sigmoid(res) * (max_rating-min_rating) + min_rating
        return res.view(-1,1)

In [None]:
wd=2e-4
model = EmbeddingDotBias(cf.n_users, cf.n_items).cuda()
opt = optim.SGD(model.parameters(), 1e-1, weight_decay=wd, momentum=0.9)

In [None]:
fit(model, data, 3, opt, F.mse_loss)

## Mini Neural Net

We are going to take the embedding values of the users and movies and feed them into a linear layer. 

In [None]:
class EmbeddingNet(nn.Module):
    def __init__(self, n_users, n_movies, nh=10, p1=0.05,p2=0.5):
        super().__init__()
        (self.u, self.m) = [get_emb(*o) for o in [
            (n_users, n_factors), (n_movies, n_factors)
        ]]
        self.lin1 = nn.Linear(n_factors*2, nh)
        self.lin2 = nn.Linear(nh,1)
        self.drop1 = nn.Dropout(p1)
        self.drop2 = nn.Dropout(p2)
        
    def forward(self, cats, conts):
        users,movies = cats[:,0],cats[:,1]
        x = self.drop1(torch.cat([self.u(users), self.m(movies)], dim=1))
        x = self.drop2(F.relu(self.lin1(x)))
        return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5
    

In [None]:
wd=1e-5
model = EmbeddingNet(n_users, n_movies).cuda()
opt = optim.Adam(model.parameters(), 1e-3, weight_decay=wd)

In [None]:
fit(model, data, 3, opt, F.mse_loss)