## Collaborative filtering implemented via a Naive Bayes model 

Written by Nicholas Fasano
Last edited: 10/17/2023

Description: Implement collaborative filtering using a Naive Bayes model (with Laplace smoothing) for imputing missing values from a test ratings matrix. We implement the algorithm using Scipy's sparse matrices in anticipation of scaling this model to the movieLens dataset which is extremely sparse.

In [1]:
import numpy as np
import scipy.sparse as sps

In [2]:
# create test ratings matrix of size num_users x num_items
R = np.array([[1,-1,1,-1,1,-1],[1,1,0,-1,-1,-1],[0,1,1,-1,-1,0],[-1,-1,-1,1,1,1],[-1,0,-1,1,1,1]])
Rs = sps.coo_array(R)
Rsc = Rs.tocsc()
Rsr = Rs.tocsr()
R

array([[ 1, -1,  1, -1,  1, -1],
       [ 1,  1,  0, -1, -1, -1],
       [ 0,  1,  1, -1, -1,  0],
       [-1, -1, -1,  1,  1,  1],
       [-1,  0, -1,  1,  1,  1]])

In [3]:
# get number of users and items
num_users = R.shape[0]
num_items = R.shape[1]

# get number of nonzero entries in each row and each column 
nnz_entries_per_col = np.diff(Rsc.indptr)
nnz_entries_per_row = np.diff(Rsr.indptr)

In [10]:
# implement Naive Bayes model with Laplace smoothing
alpha = 0 # Laplace smoothing (alpha = 0 means no smoothing)
scoring_options = [-1, 1]
num_scoring_options = len(scoring_options)
new_movie_id = np.array(range(num_items))

# loop over all missing ratings and print the prediction to the screen

# loop over all users 
for juser in range(num_users): 
    juser_rated = Rsr.indices[Rsr.indptr[juser]:Rsr.indptr[juser+1]] 
    to_rate = np.delete(new_movie_id,juser_rated)

    # loop over all items that user did not rate
    for jitem in to_rate: 
        rating_prediction = []

        # compute probability of rating for all possible ratings  
        for jprob in scoring_options:
            jaa = Rsc.data[Rsc.indptr[jitem]:Rsc.indptr[jitem+1]] == jprob           
            aa = Rsc.indices[Rsc.indptr[jitem]:Rsc.indptr[jitem+1]][jaa]  # all users that rated item jitem as jprob
            prior = (np.count_nonzero(jaa) + alpha)/(nnz_entries_per_col[jitem] + num_scoring_options*alpha)       
            Pr = 1

            # loop over items that juser rated   
            for j in juser_rated:                          
                ind_j0 = Rsc.indptr[j]
                ind_j1 = Rsc.indptr[j+1]
                bb = Rsc.indices[ind_j0:ind_j1] # all users that rated item j
                cc = Rsc.data[ind_j0:ind_j1]          
                bp = np.isin(bb,aa,assume_unique=True)         
                Pr = Pr * (np.count_nonzero(cc[bp] == cc[bb == juser]) + alpha)/ (np.count_nonzero(bp) + num_scoring_options*alpha)
            
            rating_prediction.append(prior*Pr)

        print(f'prediction: user {juser} will rate item {jitem} a value of {scoring_options[np.argmax(rating_prediction)]}')

prediction: user 1 will rate item 2 a value of 1
prediction: user 2 will rate item 0 a value of 1
prediction: user 2 will rate item 5 a value of -1
prediction: user 4 will rate item 1 a value of -1
