A hybrid recommendation system is a recommendation technique that offers a complete and balanced approach by mixing two or more recommendation techniques. It aims to provide more accurate, diverse and personalized recommendations to users leveraging the strengths of different techniques and providing valuable user experience. If you want to know how to build a hybrid recommendation system, this article is for you. In this article, I will take you through building a Hybrid Recommendation System using Python.

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

In [3]:
data =pd.read_csv("/content/fashion_products.csv")

In [4]:
data.head()

Unnamed: 0,User ID,Product ID,Product Name,Brand,Category,Price,Rating,Color,Size
0,19,1,Dress,Adidas,Men's Fashion,40,1.043159,Black,XL
1,97,2,Shoes,H&M,Women's Fashion,82,4.026416,Black,L
2,25,3,Dress,Adidas,Women's Fashion,44,3.337938,Yellow,XL
3,57,4,Shoes,Zara,Men's Fashion,23,1.049523,White,S
4,79,5,T-shirt,Adidas,Men's Fashion,79,4.302773,Black,M


In [5]:
data.isnull().sum()

Unnamed: 0,0
User ID,0
Product ID,0
Product Name,0
Brand,0
Category,0
Price,0
Rating,0
Color,0
Size,0


In [8]:
from surprise import Dataset, Reader, SVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

The surprise library is imported to use the SVD algorithm. SVD stands for Singular Value Decomposition. Simply put, it is a matrix factorization technique commonly used in collaborative filtering algorithms. You can install it on your systems using the command mentioned below:

For terminal or command prompt: pip install scikit-surprise
For Colab Notebook: !pip install scikit-surprise

First Approach: Content-Based Filtering
Now let’s move forward by creating a recommendation system using content-based filtering:

In [9]:
content_df = data[['Product ID', 'Product Name', 'Brand',
                   'Category', 'Color', 'Size']]
content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)

# Use TF-IDF vectorizer to convert content into a matrix of TF-IDF features
tfidf_vectorizer = TfidfVectorizer()
content_matrix = tfidf_vectorizer.fit_transform(content_df['Content'])

content_similarity = linear_kernel(content_matrix, content_matrix)

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(data[['User ID',
                                  'Product ID',
                                  'Rating']], reader)

def get_content_based_recommendations(product_id, top_n):
    index = content_df[content_df['Product ID'] == product_id].index[0]
    similarity_scores = content_similarity[index]
    similar_indices = similarity_scores.argsort()[::-1][1:top_n + 1]
    recommendations = content_df.loc[similar_indices, 'Product ID'].values
    return recommendations

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)


In the above code, we are implementing the content-based filtering component of the hybrid recommender system. We started by selecting relevant features from the dataset, including the product ID, name, brand, category, colour, and size. Then we combined these features into a single “Content” column for each product.

Next, we used the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer to convert the content into a TF-IDF feature matrix. This matrix represents the importance of each word in the content compared to the whole corpus.



Second Approach: Collaborative Filtering

In [10]:
algo = SVD()
trainset = data.build_full_trainset()
algo.fit(trainset)

def get_collaborative_filtering_recommendations(user_id, top_n):
    testset = trainset.build_anti_testset()
    testset = filter(lambda x: x[0] == user_id, testset)
    predictions = algo.test(testset)
    predictions.sort(key=lambda x: x.est, reverse=True)
    recommendations = [prediction.iid for prediction in predictions[:top_n]]
    return recommendations

In the above code, we implemented the collaborative filtering component of the hybrid recommender system using the SVD (Singular Value Decomposition) algorithm.

First, we initialized the SVD algorithm and trained it on the dataset. This step involves decomposing the user element rating matrix to capture the underlying patterns and latent factors that drive user preferences.

To generate collaborative filtering recommendations, we then created a test set composed of user-item pairs that were not present in the training set. We have filtered this test set to only include items belonging to the target user specified by user_id.


Next, we used the trained SVD model to predict the test set item ratings. These predictions represent the estimated ratings that the user would assign to the items.

The predictions are then sorted by their estimated ratings in descending order. We selected the top N items with the highest estimated ratings as collaborative filtering recommendations for the user.

In [11]:
def get_hybrid_recommendations(user_id, product_id, top_n):
    content_based_recommendations = get_content_based_recommendations(product_id, top_n)
    collaborative_filtering_recommendations = get_collaborative_filtering_recommendations(user_id, top_n)
    hybrid_recommendations = list(set(content_based_recommendations + collaborative_filtering_recommendations))
    return hybrid_recommendations[:top_n]

In the above code, we combined content-based and collaborative filtering approaches to create a hybrid recommender system.

The get_hybrid_recommendations function takes the user_id, the product_id and the desired number of top_n recommendations as input.

First, it calls the get_content_based_recommendations function to retrieve a list of content-based recommendations for the specified product_id. These recommendations are based on the similarity between the characteristics of the given product and other products in the dataset.

Then it calls the get_collaborative_filtering_recommendations function to get a list of collaborative filtering recommendations for the specified user_id. These recommendations are generated by leveraging historical user-item interactions and estimating user preferences based on similar user behaviours.


Next, we combine the content-based and collaborative filtering recommendations by taking the union of the two lists. It ensures that hybrid recommendations include content-based and collaborative filtering recommendations based on user preferences.

Here’s how to use our hybrid recommendation system to recommend products based on the product that a user is viewing:

In [12]:
user_id = 6
product_id = 11
top_n = 10
recommendations = get_hybrid_recommendations(user_id, product_id, top_n)

print(f"Hybrid Recommendations for User {user_id} based on Product {product_id}:")
for i, recommendation in enumerate(recommendations):
    print(f"{i + 1}. Product ID: {recommendation}")
    print(f"{i + 1}. Product ID: {recommendation}")

Hybrid Recommendations for User 6 based on Product 11:
1. Product ID: 737
1. Product ID: 737
2. Product ID: 258
2. Product ID: 258
3. Product ID: 835
3. Product ID: 835
4. Product ID: 1254
4. Product ID: 1254
5. Product ID: 775
5. Product ID: 775
6. Product ID: 464
6. Product ID: 464
7. Product ID: 1169
7. Product ID: 1169
8. Product ID: 854
8. Product ID: 854
9. Product ID: 1083
9. Product ID: 1083
10. Product ID: 1436
10. Product ID: 1436
