<a href="https://colab.research.google.com/github/mustufajp/Product-Recommendation/blob/main/recommendation_system_for_products.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Collaborative Recommendation System using Cosine

#Description of the project

The objective of this project is to develop a recommendation system designed to predict the next item a customer might purchase, using real-world business data.

Initially, the system employs collaborative filtering, using cosine similarity, to identify relevant product categories before suggesting specific items within those categories. This approach is preferred over content-based filtering due to the unique characteristics of the men's silver accessory industry. Customers in this domain often purchase diverse items across various categories; for instance, after buying a pendant, they might opt for a ring instead of another pendant.

However, relying solely on collaborative filtering presents challenges due to the dataset's complexity—comprising over 20,000 products—and the sparsity of sales data. Therefore, to suggest specific item within the category, a product with most purchases is recommended.

In future, to improve the system further, may consider implenting hybrid system of collaborative and content based system.

# Importing Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Importing Data & Cleaning



In [2]:
df=pd.read_csv('/content/drive/MyDrive/SAAD/サード/Business Analysis/リコメンドシステム/source_data.csv')

#Filtering out any categories containing box since they are complementary items.
df_box_filtered_out=df[~df[ 'product_category'].str.contains( 'box' )]

#Changed customer_id to str
df_box_filtered_out[['customer_id']].astype('str')

#Removed sku_code_na_size=0
df_filtered_out_0_item=df_box_filtered_out[df_box_filtered_out['sku_code_na_size']!='0']

df_clean=df_filtered_out_0_item

# Data Preparation for Machine Learning Model


In [3]:
#Aggregation by customer_id and category.
df_category=df_clean[['customer_id','product_category','product_quantity']].groupby(['customer_id','product_category']).sum().reset_index()
df_category.head()

Unnamed: 0,customer_id,product_category,product_quantity
0,116.0,HBP,1
1,120.0,e,2
2,121.0,HBPSA,1
3,121.0,chnsa,1
4,123.0,HBE,2


In [4]:
#Converting the table to utility matrix where the list of users are the rows and list of items are the columns
df_pivot_category=pd.pivot_table(df_category, values='product_quantity', index=['customer_id'],
                       columns=['product_category'], aggfunc="sum",fill_value=0)
df_pivot_category.head()

product_category,B,BET,BGC,BGO,BRSA,CC,CHNSA,E,FKB,HBB,...,xnc3,xnc4,xnc5,xnc9,xp,xr,xrx,xwc,xwt,z
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
116.0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
120.0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
121.0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
123.0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
124.0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,12


# Machine Learning Model

In [5]:
def recommend_categories(user_id, n=2):

    sparse_purchase_counts=sparse.csr_matrix(df_pivot_category)
    cosine_similarities = cosine_similarity(sparse_purchase_counts.T)

    # Get the user's purchase history
    user_history = sparse_purchase_counts[user_id].toarray().flatten()

    # Compute the average cosine similarity between the user's purchased items and all other items
    similarities = cosine_similarities.dot(user_history)

    # Creating a list of indices (purchased_indices) where the user has made purchases.
    purchased_indices = np.where(user_history > 0)[0]

    #This step sets the similarity scores of items that the user has already purchased to zero. This ensures that these items are not recommended again to the user.
    similarities[purchased_indices] = 0

    # Sort the items by similarity score and return the top n items
    recommended_indices = np.argsort(similarities)[::-1][:n]
    recommended_categories = list(df_pivot_category.columns[recommended_indices])

    return recommended_categories

In [6]:
#Example
recommend_categories(123, n=3)

['HBP', 'bet', 'dbi']

# Data Preparation for model identifying most popular


In [7]:
#Aggregation by customer_id and SKU code excluding size reference.
df_item=df_clean[['customer_id','sku_code_na_size','product_quantity']].groupby(['customer_id','sku_code_na_size']).sum().reset_index()

#Aggregation by SKU code excluding size reference.
df_item_quantity=df_clean[['sku_code_na_size','product_quantity']].groupby(['sku_code_na_size']).sum().reset_index()

In [8]:
df_item.head()

Unnamed: 0,customer_id,sku_code_na_size,product_quantity
0,116.0,HBP448S,1
1,120.0,e111s,1
2,120.0,e92s,1
3,121.0,HBPSA2PGPL-SAP,1
4,121.0,chnsa5pgpl,1


In [9]:
df_item_quantity.head().sort_values (by='product_quantity' , ascending=False)

Unnamed: 0,sku_code_na_size,product_quantity
2,BET73,151
3,BET73SGPL,131
4,BET74,79
0,B105S,14
1,B109S,6


In [10]:
df_pivot_item=pd.pivot_table(df_item, values='product_quantity', index=['customer_id'],
                       columns=['sku_code_na_size'], aggfunc="sum",fill_value=0)



# Identifying most popular item

In [11]:
def item_selection(customer_id,n=2):

#list of items that were purchased
  purchased_items=list(df_pivot_item.columns[df_pivot_item.loc[customer_id] > 0])

  recommended_items=[]

  for i in recommend_categories(customer_id,n):
    # Creating a list of items that are in the recommended category, excluding purchased item.
    items_in_recommended_category=[col for col in df_pivot_item.columns if col.startswith(i) and i not in purchased_items]

    # Limiting the dataframe to the items_in_recommended_category
    df_items_in_recommended_category=df_item_quantity[df_item_quantity['sku_code_na_size'].isin(items_in_recommended_category)]
    highest_row=df_items_in_recommended_category.loc[df_items_in_recommended_category['product_quantity'].idxmax()]
    highest_item = highest_row['sku_code_na_size']
    recommended_items.append(highest_item)
  return recommended_items

In [12]:
#Example
item_selection(124)

['p9sgpl', 'HBP349S']

#Result



In [26]:
# when list of customer IDs are input, it results in a dictionary
def list_recommendation(customer_ids,n=2):
  result = [item_selection(i, n) for i in customer_ids]
  result_dict = {}
  for customer_id, recommendations in zip(customer_ids, result):
    result_dict[customer_id] = recommendations
  return result_dict

In [28]:
#Example
customer_ids=[123,124]
list_recommendation(customer_ids,3)

{123: ['HBP349S', 'bet15oz', 'dbi50oz'],
 124: ['p9sgpl', 'HBP349S', 'otbg46bkbk']}