----

# **Collaborative Filtering With Memory**

## **Author**   :  **Muhammad Adil Naeem**

## **Contact**   :   **madilnaeem0@gmail.com**
<br>

----



In [97]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import  train_test_split
from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings('ignore')

### **Load Dataset**

In [89]:
df = pd.read_csv('/content/data.csv')

### **First 5 rows of Dataset**

In [90]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


### **Detailed Information About Dataset**

In [91]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39694 entries, 0 to 39693
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   user_id    39694 non-null  int64
 1   item_id    39694 non-null  int64
 2   rating     39694 non-null  int64
 3   timestamp  39694 non-null  int64
dtypes: int64(4)
memory usage: 1.2 MB


### **Counting Unique Users and Items**

- This code calculates the number of unique users and items in the DataFrame `df` by using the `unique()` method and then prints the counts. The output will display the total number of users and movies in the dataset.

In [92]:
u_users = df['user_id'].unique().shape[0]
u_items = df['item_id'].unique().shape[0]
print('Number of users = ' + str(u_users) + ' | Number of items = ' + str(u_items))

Number of users = 651 | Number of items = 1546


### **Split Data int Train and Test**

In [93]:
train_data, test_data = train_test_split(df, test_size = 0.25)

### **Creating User-Item Interaction Matrices with Bounds Checking**

- This code initializes two user-item interaction matrices, `train_data_matrix` and `test_data_matrix`, filled with zeros. It iterates through the `train_data` and `test_data` DataFrames, populating the matrices with ratings while checking that the user IDs and item IDs are within the valid bounds before accessing the matrices. This prevents index errors during assignment.

In [94]:
train_data_matrix = np.zeros((u_users, u_items))
for line in train_data.itertuples():
    # Check if item_id and user_id are within the bounds before accessing the matrix
    if line[2] <= u_items and line[1] <= u_users:
        train_data_matrix[line[1]-1, line[2]-1] = line[3]

test_data_matrix = np.zeros((u_users, u_items))
for line in test_data.itertuples():
    # Check if item_id and user_id are within the bounds before accessing the matrix
    if line[2] <= u_items and line[1] <= u_users:
        test_data_matrix[line[1]-1, line[2]-1] = line[3]

### **Predicting Ratings Using Collaborative Filtering**

This function `predict_rating` predicts user or item ratings based on a similarity matrix.

- If `type` is `'user'`, it computes the predicted ratings by adjusting the user’s mean rating and applying the similarity to the ratings' differences.
- If `type` is `'item'`, it calculates the predicted ratings by taking the dot product of the ratings and similarity, normalizing by the sum of similarities.

The function returns the predicted ratings matrix.

In [95]:
def predict_rating(ratings, similarity, type='user'):
  if type == 'user':
    mean_user_rating = ratings.mean(axis=1)
    ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
    pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
  elif type == 'item':
    pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
  return pred

### **Calculating User and Item Similarities**

This code computes the cosine similarity between users and items using the `pairwise_distances` function from a distance metric library.

- `user_sim` stores the pairwise cosine distances between users in the `train_data_matrix`.
- `item_sim` stores the pairwise cosine distances between items by transposing the `train_data_matrix`.

These matrices can be used for collaborative filtering in recommendations.

In [98]:
user_sim = pairwise_distances(train_data_matrix, metric='cosine')
item_sim = pairwise_distances(train_data_matrix.T, metric='cosine')

### **Predicting Item and User Ratings**

This code uses the `predict_rating` function to generate predicted ratings based on the `train_data_matrix` and the previously calculated similarity matrices.

- `item_pred` contains the predicted ratings for items using item-based collaborative filtering.
- `user_pred` contains the predicted ratings for users using user-based collaborative filtering.

These predictions can be used for recommending items to users.

In [99]:
item_pred = predict_rating(train_data_matrix, item_sim, type='item')
user_pred = predict_rating(train_data_matrix, user_sim, type='user')

### **Calculating RMSE for Predictions**

This code defines a function `rmse` that computes the Root Mean Squared Error (RMSE) between predicted ratings and actual ratings. It flattens and filters the predictions and actual values to exclude zero entries.

The RMSE is then printed for both user-based and item-based collaborative filtering predictions against the `test_data_matrix`, providing a measure of prediction accuracy.

In [100]:
def rmse(pred, actual):
  pred = pred[actual.nonzero()].flatten()
  actual = actual[actual.nonzero()].flatten()
  return np.sqrt(mean_squared_error(pred, actual))

print('User-based CF RMSE: ' + str(rmse(user_pred, test_data_matrix)))
print('Item-based CF RMSE: ' + str(rmse(item_pred, test_data_matrix)))

User-based CF RMSE: 3.3538588334022545
Item-based CF RMSE: 3.5369688010877294
