# Recommendation Systems

https://www.nvidia.com/en-us/glossary/recommendation-system/

## __Collaborative Filtering and Memory-Based Modeling__ #
Collaborative filtering is a technique that can filter items a user might like based on reactions by similar users. It is a recommendation engine.


## Step 1: Import Required Libraries and Load the Dataset

- Import the pandas and NumPy libraries
- Load the dataset using pandas


In [4]:
import pandas as pd
import numpy as np

In [5]:
header =['user_id', 'item_id', 'rating', 'timestamp']

df = pd.read_csv('../../Datasets/u.data', sep='\t', names=header)
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [13]:
df.describe()

Unnamed: 0,user_id,item_id,rating,timestamp
count,100000.0,100000.0,100000.0,100000.0
mean,462.48475,425.53013,3.52986,883528900.0
std,266.61442,330.798356,1.125674,5343856.0
min,1.0,1.0,1.0,874724700.0
25%,254.0,175.0,3.0,879448700.0
50%,447.0,322.0,4.0,882826900.0
75%,682.0,631.0,4.0,888260000.0
max,943.0,1682.0,5.0,893286600.0


In [14]:
df['rating'].value_counts()

rating
4    34174
3    27145
5    21201
2    11370
1     6110
Name: count, dtype: int64

__Observations:__
- Here, we have defined the headers, as the user data has these columns.
- The data contains user_id, item_id, rating, and timestamp.

## Step 2: Create a N User

- Create an N user by taking unique values for the user and applying the same to the items




In [16]:
#Get unique users and unique items
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print('number of user = ' + str(n_users) + ' | number of items = ' + str(n_items))

number of user = 943 | number of items = 1682


__Observation:__
- There are 943 users and 1682 items.

## Step 3: Split the Data into Train and Test Sets

- Import train_test_split from sklearn.model_selection
- Split the data into train and test sets


In [7]:
#Train a suppervised model

from sklearn.model_selection import train_test_split

#75% goes for training dataset
train_data, test_data = train_test_split(df, test_size=0.25)

## Step 4: Create a Matrix for Train and Test Sets

- Create user-item matrices
- This is not a square matrix

In [15]:
train_data_mat = np.zeros((n_users, n_items))

for line in train_data.itertuples():
    train_data_mat[line[1]-1, line[2]-1] = line[3]


In [None]:
                      
test_data_mat = np.zeros((n_users, n_items))

for line in test_data.itertuples():
    test_data_mat[line[1]-1, line[2]-1] = line[3]                       

In [17]:
train_data_mat

array([[5., 3., 0., ..., 0., 0., 0.],
       [4., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [18]:
train_data_mat.shape

(943, 1682)

__Observation:__
-  Here, we have created user-item matrices for train and test sets by comparing line items.


## Step 5: Calculate Similarity Matrices for Users and Items

- Import pairwise_distances from sklearn.metrics.pairwise
- Calculate similarity matrices for users and items


In [19]:
from sklearn.metrics.pairwise import pairwise_distances

# # we can calculate simlarity using:
# cosine distance
# euclidian distance


#Take the euclidean distance between users
#by comparing all the rows
user_sim = pairwise_distances(train_data_mat )

#trasposing the matrix to compute similarity for items
item_sim = pairwise_distances(train_data_mat.T)

user_sim

array([[ 0.        , 56.11595139, 56.33826408, ..., 54.73572873,
        59.56509045, 57.06137047],
       [56.11595139,  0.        , 29.51270913, ..., 28.96549672,
        39.33192088, 45.22167622],
       [56.33826408, 29.51270913,  0.        , ..., 24.37211521,
        38.28837944, 43.45112196],
       ...,
       [54.73572873, 28.96549672, 24.37211521, ...,  0.        ,
        37.17526059, 41.95235393],
       [59.56509045, 39.33192088, 38.28837944, ..., 37.17526059,
         0.        , 46.36809248],
       [57.06137047, 45.22167622, 43.45112196, ..., 41.95235393,
        46.36809248,  0.        ]])

## Step 6: Define the Prediction Function

- Define a `predict` function that takes the following parameters:
  - ratings: the user-item matrix
  - similarity: the similarity matrix
  - type (default = user): the type of collaborative filtering (user or item)

In [10]:
def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        # Substract the average rating
        # 0 means the rating of the user is same as the mean average
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
    return pred

In [11]:
item_prediction = predict(train_data_mat, item_sim, type='item')
user_prediction = predict(train_data_mat, user_sim, type='user')

In [20]:
user_prediction
#negative values mean that that user rated an item less than the average rate

array([[ 1.63383965,  0.59449466,  0.47331431, ...,  0.24952092,
         0.2527604 ,  0.24952092],
       [ 1.42642927,  0.34243236,  0.15165673, ..., -0.12268392,
        -0.11803485, -0.12268392],
       [ 1.44765107,  0.32450321,  0.13082719, ..., -0.15601742,
        -0.1512194 , -0.15601742],
       ...,
       [ 1.41369543,  0.30701483,  0.1082033 , ..., -0.17658338,
        -0.17168437, -0.17658338],
       [ 1.45080652,  0.37747365,  0.22328195, ..., -0.03333912,
        -0.02940304, -0.03333912],
       [ 1.50153238,  0.43435088,  0.29657919, ...,  0.05819066,
         0.06184803,  0.05819066]])

__Observations:__
- Item predictions and user predictions are saved.
- Though the memory algorithm is easy to implement, there are drawbacks, such as not scaling up to the real-world scenario and not addressing the well-known cold start problem.
- The problem with a cold start is that when a new user or a new item enters the system, they won’t be able to create a recommendation.

## Step 7: Create a Function for RMSE

- Import mean_squared_error from sklearn.metrics
- Define the RMSE function
- Calculate RMSE for user-based and item-based predictions


In [12]:
from sklearn.metrics import mean_squared_error
from math import sqrt

def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten()
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

print('User-based CF RMSE: ' + str(rmse(user_prediction, test_data_mat)))
print('Item-based CF RMSE: ' + str(rmse(item_prediction, test_data_mat)))

User-based CF RMSE: 3.0776249066667103
Item-based CF RMSE: 3.3572923640106085


__Observation:__
- As shown, we have calculated the RMSE for user-based and item-based predictions.


This is how we evaluate the recommendation called collaborative filtering with memory.