<a href="https://colab.research.google.com/github/ihagoSantos/recommendation-systems/blob/main/cf_model_based_ipynb_pure_svd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model-based CF
In this step we will implement the PureSVD, one approach of the Collaborative Filtering methods. For this reason, you must do:
- Read the train file extracted from the dataset
- Create a sparse matrix to them
- Extract the eighnvalues and eigenvectors of the matrix via SVD
- Combine these latent factos to predict an user-item rating
- Recommend the items with the highest score

In [1]:
# import libs
import operator
import scipy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.sparse import csr_matrix
from collections import OrderedDict
from scipy.sparse import linalg

# useful command
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

plt.rcParams.update({'font.size': 14})

# Reading train and test files

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
base_url='/content/drive/My Drive/Pós Graduação PUC Minas/11 - Sistemas de Recomendação/Unidade 1/praticas/dataset/ML-1M'
train_test_names = ['userId', 'itemId', 'rating', 'timestamp']
df_train = pd.read_csv(
    base_url + '/trainSet.txt',
    sep = '::',
    engine = 'python',
    names=train_test_names
)
df_test = pd.read_csv(
    base_url + '/testSet.txt',
    sep='::',
    engine='python',
    names=train_test_names
)

df_train.head()
df_test.head()

Unnamed: 0,userId,itemId,rating,timestamp
0,1,1193,5.0,978300760.0
1,1,661,3.0,978302109.0
2,1,914,3.0,978301968.0
3,1,3408,4.0,978300275.0
4,1,1197,3.0,978302268.0


Unnamed: 0,userId,itemId,rating,timestamp
0,1,2355,5.0,978824291.0
1,1,595,5.0,978824268.0
2,1,2687,3.0,978824268.0
3,1,48,5.0,978824351.0
4,1,745,3.0,978824268.0


# Creating Sparse Matrix


In [5]:
# Select users, items and ratings logs
users = df_train['userId']
items = df_train['itemId']
ratings = df_train['rating']

In [6]:
# Define the matrix dimensions based on the max index related to users and items
nb_users = max(users)
nb_items = max(items)

In [7]:
# Creating matrix of ratings
ratings_matrix = csr_matrix((ratings, (users, items)), shape=(nb_users+1, nb_items+1))
ratings_matrix.shape

(6041, 3953)

# Useful Function
This function is used to save the recommendations in a file.

In [32]:
def dumpRecommendation(recommendation, users_targets, file_name):
  file_out = open(file_name, 'w')

  for userId in users_targets:
    issuedItems = ''

    for itemId in recommendation[userId]:
      issuedItems += str(itemId) + ":" + str(0.0) + ","
    #saving in file in correct format
    string_s = str(userId) + '\t' + '[' + issuedItems
    string_out = string_s[:-1] + ']'
    file_out.write(string_out + '\n')
  file_out.close()

# PureSVD Recommendation
In PureSVD Model, the prediction is based on the latest factors extracted via SVD.

- Let a ratings matrix, we apply the SVD to extract three matrices:
  - U represents the users factors (m x f)
  - S the eigenvalues associated to each eigenvacor (f x f)
  - Q represents the items factors (f x n)
- The prediction is similar to: $$ \hat{r}_{ui} = r_{u} . Q^{T} . q_{i} $$

## Extracting users and items latent factors
Define the number of latent factors and use it to run the SVD method.

In [11]:
numFactors = 10

In [12]:
[U, S, Q_t] = scipy.sparse.linalg.svds(ratings_matrix, numFactors, return_singular_vectors=True)

U.shape
S.shape
Q_t.shape

(6041, 10)

(10,)

(10, 3953)

## Predicting items ratings
Predict ratings for each user-item based on the PureSVD rules.

In [16]:
prediction_matrix = csr_matrix((nb_users + 1, nb_items + 1))
prediction_matrix.shape

(6041, 3953)

In [None]:
# Realize a prediction for each user
Q = Q_t.transpose() # n x 10

for u in range(ratings_matrix.shape[0]):
  r_u = ratings_matrix[u, :] # 1 x n
  # optimization: instead to do 'q_i'
  aux = r_u.dot(Q) # 1 x 10
  prediction_matrix[u,:] = aux.dot(Q_t) # 10 x 3953 => 1 x 3953

In [18]:
# Optimized way
prediction_matrix = csr_matrix((nb_users + 1, nb_items + 1))

Q = Q_t.transpose() # n x 10

aux_matrix = ratings_matrix.dot(Q) # (6041 x 3953) x (3953 X 10) => (6041 X 10)
prediction_matrix = aux_matrix.dot(Q_t) # (6041 X 10) x (10 X 3953) => (6041 x 3953)

In [19]:
prediction_matrix.shape

(6041, 3953)

## Recommending Items
The recommendation is related to the cosine similarity of users and items vectors.

In [21]:
# Size of each recommendation
top_k = 10

In [25]:
# Setting the recommendations for each user
recommendation = {}

for u in range(ratings_matrix.shape[0]):
  recommendation[u] = []
  cont = 0
  # sorting items by relevance
  order = np.argsort(prediction_matrix[u,:])[::-1]
  # Recommending the best items that have never seen by users
  for i in order:
    # recommending the top-k items
    if(cont < top_k):
      if(ratings_matrix[u,1] == 0):
        recommendation[u].append(i)
        cont += 1
      else:
        break

In [33]:
# Save in file
users_targets = df_test['userId'].unique()
dumpRecommendation(recommendation, users_targets, 'recList_PureSVD.txt')

In [34]:
recommendation[300]
recommendation[3000]
recommendation[6010]

[]

[260, 1196, 1210, 2571, 589, 1270, 1240, 541, 1580, 1197]

[2000, 1198, 1036, 1291, 1197, 1610, 1220, 1196, 1210, 2406]