- Learn the differences between Item and User-based Recommender Systems
- Learn the pros and cons of using a Collaborative Filtering Recommender System

# Collaborative Filtering

Collaborative Filtering is one type of Recommender Systems, which makes predictions about a user's missing data according to the **collective** behaviour of many other users. There are 2 approaches to Collaborative Filtering: **Item-Based** and **User-Based**. It is based on the assumption that people who like similar things will give out similar ratings; and that people who give out similar ratings will like similar things.

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv('recommendersystem.csv')
data = data.set_index('UserName')
data

Unnamed: 0_level_0,Aquaman,Avengers: Infinity War,Venom,Black Panther,Ant-Man and the Wasp,Deadpool
UserName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Akira,3.0,,3.0,3.5,2.5,3.0
Eve,4.0,3.5,2.5,4.0,3.0,3.0
Chris,3.0,3.0,3.0,,,4.0
Pauline,,3.5,2.0,4.0,2.5,
Josh,,3.0,,3.0,2.0,5.0
Daniel,2.5,4.0,,5.0,3.5,5.0
Grady,,4.5,3.0,4.0,2.0,


In [3]:
#Function that implements the item-based recommender systems.
def get_itembased_scores(user, item, df, n=3):
    """
    Return the predicted `user` rating for `item`, using 3 most similar items.
    """

    # Get the original ratings for the current user
    current_ratings = df.loc[user,:]
    
    # Column mean imputation
    imputed_df = df.fillna(df.mean())
    
    # Get the imputed ratings for the current item
    x = imputed_df.loc[:,item]
    
    # Initialise a predicted dictionary
    similarity = {}
    
    # Only include items that user has rated
    rated_items = [x for x in df.columns if not np.isnan(current_ratings[x])]
    
    # Calculate the similarity scores
    for compare_item in rated_items:
        y = imputed_df.loc[:, compare_item]
        eucl_dist = np.sqrt(np.sum([(a-b)*(a-b) for a, b in zip(x, y)]))
        similarity[compare_item] = 1/(1+eucl_dist)

    # Convert `similarity` to a series, and find weights
    similarity = pd.Series(similarity)
    
    # Create `top_n`: a LIST of the top n item labels to calculate the weighted predicted score
    top_n = similarity.sort_values(ascending=False).head(n).index
    
    # Calculate the predicted score
    predicted_score = (current_ratings[top_n]*similarity[top_n]).sum() / similarity[top_n].sum()
    
    return(predicted_score)

In [4]:
# Test the function output
get_itembased_scores('Daniel', 'Venom', data)

3.2861412419268374

In [5]:
# Program that asks for your own inputs and gives out predicted ratings using the above function
new_user = {}
new_username = input('Provide your username: ')
print()

missing_movies = []
for movie in ['Aquaman', 'Avengers: Infinity War', 'Venom', 'Black Panther', 'Ant-Man and the Wasp', 'Deadpool']:
    new_input = input(f'Provide a 0-5 rating for {movie}. Enter to skip if you have not watched it: ')
    if new_input == '':
        new_user[movie] = np.nan
        missing_movies.append(movie)
    else:
        new_user[movie] = float(new_input)
        
if len(missing_movies) > 3:
    print("\nYou haven't rated enough movies to provide useful recommendations.")
    
else:
    # Update the dataframe
    new_data = data.append(pd.DataFrame.from_dict({new_username: new_user}, orient='index'))

    # Loop through movies without a rating and perform item-based recommendation
    for movie in missing_movies:
        print(f"\nYou haven't watched {movie}, but we think that you would rate it:", get_itembased_scores(new_username, movie, new_data))




  new_data = data.append(pd.DataFrame.from_dict({new_username: new_user}, orient='index'))
