# Model-based CF

In this step we will implement the PureSVD, one approach of the Collaborative Filtering methods.
For this reason, you must do:

- Read the train file extracted from the dataset 
- Create a sparse matrix to them
- Extract the eigenvalues and eigenvectors of this matrix via SVD
- Combine these latent factors to predict an user-item rating
- Recommend the items with the highest score

In [1]:
# import libs
import operator
import scipy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.sparse import csr_matrix
from collections import OrderedDict
from scipy.sparse import linalg 

# useful command
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

plt.rcParams.update({'font.size': 14})

## Reading train and test files

You can read this file as you prefer. I propose to read the files by the pandas' library and create the sparse matrix after it.

In [2]:
df_train = pd.read_csv('../Dataset/ML-1M/trainSet.txt', sep='::', names=['userId', 'itemId', 'rating', 'timestamp'])
df_test = pd.read_csv('../Dataset/ML-1M/testSet.txt', sep='::', names=['userId', 'itemId', 'rating', 'timestamp'])

df_train.head()
df_test.head()

  """Entry point for launching an IPython kernel.
  


Unnamed: 0,userId,itemId,rating,timestamp
0,1,1193,5.0,978300760.0
1,1,661,3.0,978302109.0
2,1,914,3.0,978301968.0
3,1,3408,4.0,978300275.0
4,1,1197,3.0,978302268.0


Unnamed: 0,userId,itemId,rating,timestamp
0,1,2355,5.0,978824291.0
1,1,595,5.0,978824268.0
2,1,2687,3.0,978824268.0
3,1,48,5.0,978824351.0
4,1,745,3.0,978824268.0


## Creating Sparse Matrix

I propose to use the csr_matrix from scipy.

In [3]:
# Select users, items and ratings logs (i.e., all information from each column)
users = df_train['userId']
items = df_train['itemId']
ratings = df_train['rating']

In [4]:
# Define the matrix dimensions based on the max index related to users and items
nb_users = max(users)
nb_items = max(items)

In [5]:
# Creating matrix of ratings
ratings_matrix = csr_matrix((ratings, (users, items)), shape=(nb_users+1, nb_items+1))

ratings_matrix.shape

(6041, 3953)

## An useful function

This function is used to save the recommendations in a file.

In [6]:
def dumpRecommendation(recommendation, users_targets, file_name):
    
    file_out = open(file_name, 'w')
    
    # for each user target
    for userId in users_targets:
        issuedItems = ""
        # for each item in the previous order
        for itemId in recommendation[userId]:
            issuedItems += str(itemId) + ":" + str(0.0) + ","
        # saving in file in correct format
        string_s = str(userId) + "\t" + "[" + issuedItems
        string_out = string_s[:-1] + ']'
        file_out.write(string_out + "\n")
    
    file_out.close()

## PureSVD Recommendation

In PureSVD model, the prediction is based on the latent factors extracted via SVD.

- Let a ratings matrix, we apply the SVD to extract three matrices:
    - *U* represents the users factors *(m x f)*
    - *S* the eigenvalues associated to each eigenvector *(f x f)*
    - *Q* represents the items factors *(f x n)*


- The prediction is similar to:
$
\begin{align}
\widehat{r}_{ui} = r_u \cdot Q^T \cdot q_i
\end{align}
$

### Extracting users and items latent factors

In [7]:
numFactors = 10

In [8]:
[U, S, Q] = scipy.sparse.linalg.svds(ratings_matrix, numFactors, return_singular_vectors=True)

In [9]:
U.shape
S.shape
Q.shape

(6041, 10)

(10,)

(10, 3953)

### Predicting items ratings

In [10]:
prediction_matrix = csr_matrix((nb_users+1, nb_items+1))

prediction_matrix.shape

(6041, 3953)

In [None]:
# Realize a prediction for each user
Q_t = Q.transpose()

for u in range(ratings_matrix.shape[0]):
    r_u = ratings_matrix[u,:]
    # optimization: instead to do 'q_i', we can use all Q
    aux = r_u.dot(Q_t)
    prediction_matrix[u,:] = aux.dot(Q)
    
prediction_matrix.shape

In [12]:
# Optimized way
prediction_matrix = csr_matrix((nb_users+1, nb_items+1))

Q_t = Q.transpose()

aux_matrix = ratings_matrix.dot(Q_t)
prediction_matrix = aux_matrix.dot(Q)

In [13]:
prediction_matrix.shape

(6041, 3953)

### Recommending items

The recommendation is related to the cosine similarity of users and items vectors.

In [14]:
# Size of each recommendation
top_k = 10

In [15]:
# Setting the recommendations for each user
recommendation = {}

for u in range(ratings_matrix.shape[0]):
    recommendation[u] = []
    cont = 0
    # sorting items by relevance
    order = np.argsort(prediction_matrix[u,:])[::-1]
    # recommending the best items that have never seen by users
    for i in order:
        # recommending the top-k items 
        if (cont < top_k):
            if (ratings_matrix[u,i] == 0):
                recommendation[u].append(i)
                cont += 1
        else:
            break

In [16]:
# Save in a file
users_targets = df_test['userId'].unique()
dumpRecommendation(recommendation, users_targets, "../Recommendation/ML-1M/recList_PureSVD.txt")

In [17]:
recommendation[300]
recommendation[3000]
recommendation[6010]

[260, 527, 1210, 1097, 1304, 1961, 1220, 1208, 1222, 296]

[2628, 1127, 750, 1527, 2396, 2028, 780, 2985, 1, 1372]

[589, 2791, 1079, 480, 3256, 2797, 1200, 2571, 2947, 1968]