In [1]:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

## User-based Collaborative Filtering

- Input: Original data with missing rating

- Step1: Calculate the USER average for all other ITEMS

- Step2: Centralise each row for all other ITEMS

- Step3: Calculate the 2-norm of each USER with all other ITEMS

- Step4: Compute the Pearson correlation coefficient between USERS

- Output: The predicted rating as the sum of average rating of the target USER and the centralised rating on the target ITEM of the most similar USER

In [2]:
# input: original data with missing rating

# import user scores csv
# set index to column 0
user_scores = pd.read_csv('user_scores.csv', index_col=0)

print("\nThese are the original user scores: \n")
user_scores


These are the original user scores: 



Unnamed: 0,Item 1,Item 2,Item 3,Item 4,Item 5
User 1,4,5,4,3,
User 2,5,1,3,2,4.0
User 3,3,4,3,4,4.0
User 4,2,4,5,1,3.0
User 5,3,2,3,1,3.0


In [3]:
# calculate the USER average for all other ITEMS

user_scores['Avg 1-4']=user_scores[['Item 1', 'Item 2', 'Item 3', 'Item 4']].sum(axis=1)/4

print("\nThese are the original user scores w/ averages for items 1-4 included: \n")
user_scores


These are the original user scores w/ averages for items 1-4 included: 



Unnamed: 0,Item 1,Item 2,Item 3,Item 4,Item 5,Avg 1-4
User 1,4,5,4,3,,4.0
User 2,5,1,3,2,4.0,2.75
User 3,3,4,3,4,4.0,3.5
User 4,2,4,5,1,3.0,3.0
User 5,3,2,3,1,3.0,2.25


In [7]:
# centralise each row for all other ITEMS

cent_scores=user_scores.loc[:,'Item 1':'Item 4']

cent_scores['Item 1']=user_scores['Item 1']-user_scores['Avg 1-4']
cent_scores['Item 2']=user_scores['Item 2']-user_scores['Avg 1-4']
cent_scores['Item 3']=user_scores['Item 3']-user_scores['Avg 1-4']
cent_scores['Item 4']=user_scores['Item 4']-user_scores['Avg 1-4']

print("\nThese are the centralised scores: \n")
cent_scores


These are the centralised scores: 



Unnamed: 0,Item 1,Item 2,Item 3,Item 4
User 1,0.0,1.0,0.0,-1.0
User 2,2.25,-1.75,0.25,-0.75
User 3,-0.5,0.5,-0.5,0.5
User 4,-1.0,1.0,2.0,-2.0
User 5,0.75,-0.25,0.75,-1.25


In [36]:
# calculate the 2-norm of each USER with all other ITEMS

cent_scores['2-norm']=np.linalg.norm(cent_scores[['Item 1','Item 2','Item 3','Item 4']].values,axis=1)

cent_scores

Unnamed: 0,Item 1,Item 2,Item 3,Item 4,2-norm
User 1,0.0,1.0,0.0,-1.0,1.414214
User 2,2.25,-1.75,0.25,-0.75,2.95804
User 3,-0.5,0.5,-0.5,0.5,1.0
User 4,-1.0,1.0,2.0,-2.0,3.162278
User 5,0.75,-0.25,0.75,-1.25,1.658312


In [38]:
# compute the Pearson correlation coefficient between USERS

"""
the sum of item-wise products divided by 2-norm of user x and further divided by 2-norm of user 1
"""

headers=['Pearson', 'Max?', 'U1 Avg', 'Prediction']
rows=['User 2', 'User 3', 'User 4', 'User 5']
pearson = pd.DataFrame(index=rows,columns=headers)

pearson

Unnamed: 0,Pearson,Max?,U1 Avg,Prediction
User 2,,,,
User 3,,,,
User 4,,,,
User 5,,,,


In [None]:
# output: the predicted rating as the sum of average rating of the target USER and the centralised rating on the...
# target ITEM of the most similar USER

## Item-based Collaborative Filtering

- Input: Original data with missing rating
- Step1: Calculate the USER average for all other USERS
- Step2: Centralise each column for all other USERS
- Step3: Calculate the 2-norm of each ITEM
- Step4: Compute the adjusted cosine similarity between ITEMS
- Output: The predicted rating based on the rating of the ITEM with maximal similarity