# Recommendation System

This notebook will explore two popular recommendation systems techniques: **Content-Based Filtering** and **Neighborhood-Based Collaborative Filtering**. These methods are widely used in recommendation systems, like those used by online platforms such as Netflix, Amazon, and Spotify, to suggest items (such as movies, products, or music) based on user preferences or item characteristics.

In [1]:
### Import libraries
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

## Content-based filtering Recommendation System

Recommendation system focusing on recommending items based on their attributes rather than user behavior data. 


In [3]:
# Sample dataset: Books and their genres
data = {
    'Book': ['Harry Potter', 'Sherlock Holmes', 'Lord of the Rings', 'Gone Girl', 'Pride and Prejudice', 'Moby Dick', '1984', 'War and Peace'],
    'Fiction': [1, 0, 1, 0, 1, 1, 1, 1],
    'Mystery': [0, 1, 0, 1, 0, 0, 0, 0],
    'Adventure': [1, 1, 1, 0, 1, 1, 0, 1]
}
# convert to Dataframe
df = pd.DataFrame(data)
df

Unnamed: 0,Book,Fiction,Mystery,Adventure
0,Harry Potter,1,0,1
1,Sherlock Holmes,0,1,1
2,Lord of the Rings,1,0,1
3,Gone Girl,0,1,0
4,Pride and Prejudice,1,0,1
5,Moby Dick,1,0,1
6,1984,1,0,0
7,War and Peace,1,0,1


In [8]:
df['Book'].values

array(['Harry Potter', 'Sherlock Holmes', 'Lord of the Rings',
       'Gone Girl', 'Pride and Prejudice', 'Moby Dick', '1984',
       'War and Peace'], dtype=object)

In [9]:
# Compute similarity based on genres
features = df[['Fiction', 'Mystery', 'Adventure']]
similarity_matrix = cosine_similarity(features)

# Convert to Dataframe and give index 'Book'
similarity_matrix_df = pd.DataFrame(similarity_matrix, columns=df['Book'].values)
similarity_matrix_df.index = df['Book']
similarity_matrix_df

Unnamed: 0_level_0,Harry Potter,Sherlock Holmes,Lord of the Rings,Gone Girl,Pride and Prejudice,Moby Dick,1984,War and Peace
Book,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry Potter,1.0,0.5,1.0,0.0,1.0,1.0,0.707107,1.0
Sherlock Holmes,0.5,1.0,0.5,0.707107,0.5,0.5,0.0,0.5
Lord of the Rings,1.0,0.5,1.0,0.0,1.0,1.0,0.707107,1.0
Gone Girl,0.0,0.707107,0.0,1.0,0.0,0.0,0.0,0.0
Pride and Prejudice,1.0,0.5,1.0,0.0,1.0,1.0,0.707107,1.0
Moby Dick,1.0,0.5,1.0,0.0,1.0,1.0,0.707107,1.0
1984,0.707107,0.0,0.707107,0.0,0.707107,0.707107,1.0,0.707107
War and Peace,1.0,0.5,1.0,0.0,1.0,1.0,0.707107,1.0


In [16]:
similarity_matrix_df.loc['Harry Potter']

Harry Potter           1.000000
Sherlock Holmes        0.500000
Lord of the Rings      1.000000
Gone Girl              0.000000
Pride and Prejudice    1.000000
Moby Dick              1.000000
1984                   0.707107
War and Peace          1.000000
Name: Harry Potter, dtype: float64

In [10]:
# Get recommendation based on "Harry Potter" book

### get the book column
similar_scores = similarity_matrix_df['Harry Potter']
similar_scores

Book
Harry Potter           1.000000
Sherlock Holmes        0.500000
Lord of the Rings      1.000000
Gone Girl              0.000000
Pride and Prejudice    1.000000
Moby Dick              1.000000
1984                   0.707107
War and Peace          1.000000
Name: Harry Potter, dtype: float64

In [11]:
### sort array from higher to lower values
similar_scores_sorted = similar_scores.sort_values(ascending=False)
similar_scores_sorted

Book
Harry Potter           1.000000
Lord of the Rings      1.000000
Pride and Prejudice    1.000000
Moby Dick              1.000000
War and Peace          1.000000
1984                   0.707107
Sherlock Holmes        0.500000
Gone Girl              0.000000
Name: Harry Potter, dtype: float64

In [13]:
similar_scores_sorted[similar_scores_sorted.index != 'Harry Potter']

Book
Lord of the Rings      1.000000
Pride and Prejudice    1.000000
Moby Dick              1.000000
War and Peace          1.000000
1984                   0.707107
Sherlock Holmes        0.500000
Gone Girl              0.000000
Name: Harry Potter, dtype: float64

In [20]:
# Provide as a function to recommend similar books
def recommend(book_name, similarity_matrix_df, i):
    
    if book_name not in similarity_matrix_df.columns:
        return print("There is no book with this title in our dataset")
    
    similar_scores = similarity_matrix_df[book_name]
    similar_scores_sorted = similar_scores.sort_values(ascending=False)
    
    result = similar_scores_sorted[similar_scores_sorted.index != book_name]
    
    return list(result[:i].index)

In [21]:
print(recommend('ABC', similarity_matrix_df, 3))

There is no book with this title in our dataset
None


In [23]:
# Example usage
print(recommend('Harry Potter', similarity_matrix_df, 5))

['Lord of the Rings', 'Pride and Prejudice', 'Moby Dick', 'War and Peace', '1984']


In [22]:
# Example usage
print(recommend('1984', similarity_matrix_df, 3))

['Harry Potter', 'Lord of the Rings', 'Pride and Prejudice']


## Neighborhood-Based Collaborative Filtering

Neighborhood-Based Collaborative Filtering is a type of Collaborative Filtering that makes recommendations based on the preferences of other similar users, often referred to as "neighbors." This method leverages the idea that users who have historically agreed on items (i.e., have similar tastes) will continue to agree in the future.

In [24]:
# Step 1: Sample data (user-product interaction matrix)
# Rows represent users and columns represent products. Values are ratings.
data = {
    'Product A': [5, 4, None, 1, None],
    'Product B': [3, None, None, 1, 4],
    'Product C': [4, 5, None, None, 2],
    'Product D': [None, 3, 4, 2, 5],
    'Product E': [None, None, 5, 4, 3],
}

# Create DataFrame
df = pd.DataFrame(data, index=['User1', 'User2', 'User3', 'User4', 'User5'])
df

Unnamed: 0,Product A,Product B,Product C,Product D,Product E
User1,5.0,3.0,4.0,,
User2,4.0,,5.0,3.0,
User3,,,,4.0,5.0
User4,1.0,1.0,,2.0,4.0
User5,,4.0,2.0,5.0,3.0


In [26]:
# Step 2: Calculate similarity between users (using cosine similarity)
user_similarity = cosine_similarity(df.fillna(0))

# Step 3: Convert similarity into a DataFrame
user_similarity_df = pd.DataFrame(user_similarity, columns=df.index, index=df.index)
user_similarity_df

Unnamed: 0,User1,User2,User3,User4,User5
User1,1.0,0.8,0.0,0.241209,0.3849
User2,0.8,1.0,0.265036,0.301511,0.481125
User3,0.0,0.265036,1.0,0.932298,0.743839
User4,0.241209,0.301511,0.932298,1.0,0.754337
User5,0.3849,0.481125,0.743839,0.754337,1.0


In [27]:
# use User1 as example

# Get ratings for the specified user
user_ratings = df.loc['User1']
user_ratings

Product A    5.0
Product B    3.0
Product C    4.0
Product D    NaN
Product E    NaN
Name: User1, dtype: float64

In [28]:
# Get the similarity scores of the user with all other users
similar_users = user_similarity_df.loc['User1']
similar_users

User1    1.000000
User2    0.800000
User3    0.000000
User4    0.241209
User5    0.384900
Name: User1, dtype: float64

In [29]:
# Filter out products already rated by the user
unrated_products = user_ratings[user_ratings.isna()]
unrated_products

Product D   NaN
Product E   NaN
Name: User1, dtype: float64

In [30]:
weighted_sum = user_similarity_df['User1']['User2'] * df.loc['User2', 'Product D']
total_weight = abs(user_similarity_df['User1']['User2'])

weighted_sum, total_weight

(2.4, 0.7999999999999999)

In [31]:
weighted_rating_Product_D = weighted_sum / total_weight
weighted_rating_Product_D

3.0

In [33]:
# Calculate weighted sum of ratings for EACH unrated product
weighted_ratings = {}
for product in unrated_products.index:
    weighted_sum = 0
    total_weight = 0
    
    for other_user in df.index:
        if not np.isnan(df.loc[other_user, product]):
            weighted_sum += user_similarity_df['User1'][other_user] * df.loc[other_user, product]
            total_weight += abs(user_similarity_df['User1'][other_user])
            
    weighted_ratings[product] = weighted_sum / total_weight if total_weight != 0 else 0

In [34]:
weighted_ratings

{'Product D': 3.370652726191084, 'Product E': 3.3852507748271905}

In [35]:
# Sort products by the weighted rating (recommend the top n products)
recommended_products = sorted(weighted_ratings.items(), key=lambda x: x[1], reverse=True)
recommended_products

[('Product E', 3.3852507748271905), ('Product D', 3.370652726191084)]

In [36]:
# Step 4: Make recommendations as a Function for a specific user (e.g., User1)
def recommend_products(user, df, user_similarity_df, n_recommendations=2):
    
    user_ratings = df.loc[user]
    
    similar_users = user_similarity_df.loc[user]
    
    unrated_products = user_ratings[user_ratings.isna()]
    
    # Calculate weighted sum of ratings for EACH unrated product
    weighted_ratings = {}
    for product in unrated_products.index:
        weighted_sum = 0
        total_weight = 0

        for other_user in df.index:
            if not np.isnan(df.loc[other_user, product]):
                weighted_sum += user_similarity_df[user][other_user] * df.loc[other_user, product]
                total_weight += abs(user_similarity_df[user][other_user])

        weighted_ratings[product] = weighted_sum / total_weight if total_weight != 0 else 0
        
    recommended_products = sorted(weighted_ratings.items(), key=lambda x: x[1], reverse=True)[:n_recommendations]

    return recommended_products

In [37]:
# Example: Recommend 2 products for User1
recommended_products = recommend_products('User1', df, user_similarity_df, n_recommendations=2)
print("Recommended products for User1:", recommended_products)

Recommended products for User1: [('Product E', 3.3852507748271905), ('Product D', 3.370652726191084)]


In [38]:
# Example: Recommend 2 products for User1
recommended_products = recommend_products('User2', df, user_similarity_df, n_recommendations=2)
print("Recommended products for User2:", recommended_products)

Recommended products for User2: [('Product E', 3.7937431619817144), ('Product B', 2.92297823314207)]
