In [1]:
import pandas as pd

# Use the below command for surpise moduel instllation instead of pip
# !conda install -c conda-forge scikit-surprise
from surprise import Dataset, Reader, SVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

### Data

In [2]:
data = pd.read_csv('./fashion_products/fashion_products.csv')
data.head()

Unnamed: 0,User ID,Product ID,Product Name,Brand,Category,Price,Rating,Color,Size
0,19,1,Dress,Adidas,Men's Fashion,40,1.043159,Black,XL
1,97,2,Shoes,H&M,Women's Fashion,82,4.026416,Black,L
2,25,3,Dress,Adidas,Women's Fashion,44,3.337938,Yellow,XL
3,57,4,Shoes,Zara,Men's Fashion,23,1.049523,White,S
4,79,5,T-shirt,Adidas,Men's Fashion,79,4.302773,Black,M


Our goal is to create two recommendation systems using collaborative and content-based filtering and then combine the recommendation techniques to build a recommendation system using a hybrid approach.

### Content-Based Filtering

In [3]:
content_df = data[['Product ID', 'Product Name', 'Brand', 'Category', 'Color', 'Size']]
content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)
content_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)


Unnamed: 0,Product ID,Product Name,Brand,Category,Color,Size,Content
0,1,Dress,Adidas,Men's Fashion,Black,XL,1 Dress Adidas Men's Fashion Black XL
1,2,Shoes,H&M,Women's Fashion,Black,L,2 Shoes H&M Women's Fashion Black L
2,3,Dress,Adidas,Women's Fashion,Yellow,XL,3 Dress Adidas Women's Fashion Yellow XL
3,4,Shoes,Zara,Men's Fashion,White,S,4 Shoes Zara Men's Fashion White S
4,5,T-shirt,Adidas,Men's Fashion,Black,M,5 T-shirt Adidas Men's Fashion Black M


Use TF-IDF vectorizer to convert content into a matrix of TF_IDF features

In [4]:
tfidf_vectorizer = TfidfVectorizer()
content_matrix = tfidf_vectorizer.fit_transform(content_df['Content'])

Now we calculate the similarity between the products based on their content using the cosine similarity measure. <br>
This similarity matrix captures the similarity between each pair of products on their content.

In [5]:
content_similarity = linear_kernel(content_matrix, content_matrix)

In [6]:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(data[['User ID', 'Product ID', 'Rating']], reader)

To get content-based recommendations, we first found the index of the target product in the similarity matrix. Then we sorted the similarity scores in descending order and selected the top N similar products. Finally, we returned the product IDs of the recommended products.

In [7]:
def get_content_based_recommendations(product_id, top_n):
    """_summary_

    Args:
        prodcut_id (_type_): _description_
        top_n (_type_): _description_
    """
    index = content_df[content_df['Product ID'] == product_id].index[0]
    similarity_scores = content_similarity[index]
    similarity_indices = similarity_scores.argsort()[::-1][1:top_n+1]
    recommendations = content_df.loc[similarity_indices, 'Product ID'].values
    
    return recommendations

### Collaborative Filtering

We are implementing this filtering using the SVD (Singular Value Decomposition) algorithm.

First, we initialized the SVD algorithm and trained it on the dataset. This step involves **decomposing the user element rating matrix to capture the underlying patterns and laten factors that drive user preferences.**

In [8]:
algo = SVD()
trainset = data.build_full_trainset()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x20c96f16350>

To generate collaborative filtering recommendations, we are creating a test set composed of user-item pairs that we not present in the training set and we filter this test set to only include items belonging to the target user specified by user_id. <br>
<br>
Next, we use trained SVD model to predict the test set item ratings. These predictions represent the estimated ratings that the user would assign to the items.

In [9]:
def get_collaborative_filtering_recommendations(user_id, top_n):
    """_summary_

    Args:
        user_id (_type_): _description_
        top_n (_type_): _description_
    """
    testset = trainset.build_anti_testset()
    testset = filter(lambda x: x[0]== user_id, testset)
    
    predictions = algo.test(testset)
    predictions.sort(key= lambda x: x.est, reverse= True)
    
    recommendations = [prediction.iid for prediction in predictions[:top_n]]
    
    return recommendations

### Finally, The Hybrid Approach

In [10]:
def get_hybrid_recommendations(user_id, product_id, top_n):
    """_summary_

    Args:
        user_id (_type_): _description_
        product_id (_type_): _description_
        top_n (_type_): _description_
    """
    
    content_based_recommendations = get_content_based_recommendations(product_id, top_n)
    collaborative_filtering_recommendations = get_collaborative_filtering_recommendations(user_id, top_n)
    
    hybrid_recommendations = list(set(content_based_recommendations + collaborative_filtering_recommendations))
    
    return hybrid_recommendations[:top_n]

Here's how to the output of the hybrid recommendations system based on the product that a user in viewing.

In [11]:
user_id = 6
product_id = 11
top_n = 10
recommendations = get_hybrid_recommendations(user_id, product_id, top_n)

print(f"Hybrid Recommendations for User {user_id} based on Product {product_id}:")
for i, recommendation in enumerate(recommendations):
    print(f"{i+1}. Product ID: {recommendation}")

Hybrid Recommendations for User 6 based on Product 11:
1. Product ID: 1121
2. Product ID: 975
3. Product ID: 912
4. Product ID: 792
5. Product ID: 1362
6. Product ID: 1301
7. Product ID: 502
8. Product ID: 726
9. Product ID: 1112
10. Product ID: 1402
