# Product Recommendation System 

A personalized recommendation system is a type of recommendation system that provides personalized recommendations to users based on their individual preferences and behaviors. The system takes into account a user's past interactions with the system, such as items they have viewed, purchased, or rated, as well as demographic and contextual information, to generate recommendations that are tailored to the user's unique preferences and interests. Personalized recommendation systems are widely used in e-commerce, social media, and entertainment platforms to improve user engagement and satisfaction by providing them with more relevant and useful content.

Methods ❎
1. Popular methods 
2. Content Based Filtering
3. Collabrative Based Filtering
4. Tensorflow Recommendation
5. Hybrid Model

In [1]:
import pandas as pd # pandas for data manipulation
data = pd.read_csv("amazon.csv") # load data
data.head(2) # display data

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...


In [2]:
data.info() # information about data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

In [3]:
data.isnull().sum() # calculating total number of null value

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

In [4]:
data.duplicated().values.any() # checking duplicates

False

In [5]:
data['rating'].unique() # unique value of rating

array(['4.2', '4.0', '3.9', '4.1', '4.3', '4.4', '4.5', '3.7', '3.3',
       '3.6', '3.4', '3.8', '3.5', '4.6', '3.2', '5.0', '4.7', '3.0',
       '2.8', '4', '3.1', '4.8', '2.3', '|', '2', '3', '2.6', '2.9'],
      dtype=object)

In [6]:
data.drop(index=1279 , inplace= True) # drop the special charater in rating column

In [7]:
# Replace special charaters with white space
data['discounted_price'] = data['discounted_price'].str.replace('₹' , '')
data['discounted_price'] = data['discounted_price'].str.replace(',' , '')
data['actual_price'] = data['actual_price'].str.replace('₹' , '')
data['actual_price'] = data['actual_price'].str.replace(',' , '')
data['discount_percentage'] = data['discount_percentage'].str.replace('%' , '')
data['rating_count'] = data['rating_count'].str.replace(',' , '')

In [8]:
# Convert data types 
data['discounted_price'] = data['discounted_price'].astype('float64') 
data['actual_price'] = data['actual_price'].astype('float64') 
data['discount_percentage'] = data['discount_percentage'].astype('float64')
data['rating_count'] = data['rating_count'].astype('float64')
data['rating'] = data['rating'].astype('float64')

In [9]:
# FillNa with mean 
rating_count_mean = data['rating_count'].mean()
data['rating_count'] = data['rating_count'].fillna(rating_count_mean)

### Popularity Based Recommendation

The rating counts data can be used to improve the accuracy of popularity-based recommendations. Popularity-based recommendations simply recommend the most popular products to all users. However, not all products are equally popular, and some products may be more popular with certain users than others. By incorporating rating counts data, you can adjust the popularity of each product based on the number of users who have rated it, which can help to provide more personalized recommendations.

In [None]:
popular_data = data[["product_id","user_id","rating","rating_count"]]

In [None]:
popular_data.head()

Unnamed: 0,product_id,user_id,rating,rating_count
0,B07JW9H4J1,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",4.2,24269.0
1,B098NS6PVG,"AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",4.0,43994.0
2,B096MSW6CT,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",3.9,7928.0
3,B08HDJ86NZ,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",4.2,94363.0
4,B08CF3B7N1,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",4.2,16905.0


In [None]:
# Calculate the popularity of each product
popularity = popular_data.groupby('product_id')['user_id'].count().reset_index()
popularity.rename(columns = {'user_id':'popularity'}, inplace = True)


In [None]:
# Calculate the average rating for each product
average_rating = popular_data.groupby('product_id')['rating'].mean().reset_index()
average_rating.rename(columns = {'rating':'average_rating'}, inplace = True)

In [None]:
# Calculate the number of ratings for each product
rating_counts = popular_data.groupby('product_id')['rating'].count().reset_index()
rating_counts.rename(columns = {'rating':'rating_counts'}, inplace = True)


In [None]:
# Merge the popularity, average rating, and rating counts data into a single dataframe
product_data = pd.merge(popularity, average_rating, on='product_id')
product_data = pd.merge(product_data, rating_counts, on='product_id')

In [None]:
# Sort the products by popularity, average rating, and rating counts
sorted_products = product_data.sort_values(['popularity', 'average_rating', 'rating_counts'], 
                                            ascending=False)

In [None]:
# Print the top 10 most popular products
print(sorted_products.head(10))

      product_id  popularity  average_rating  rating_counts
876   B09C6HXFC1           3             4.5              3
472   B07XLCFSSN           3             4.4              3
522   B083342NKJ           3             4.4              3
1073  B09W5XR9RT           3             4.4              3
172   B01GGKYKQM           3             4.3              3
261   B077Z65HSD           3             4.3              3
519   B082T6V3DT           3             4.3              3
589   B08CF3D7QR           3             4.3              3
614   B08DDRGWTJ           3             4.3              3
346   B07JW9H4J1           3             4.2              3


I've added three new dataframes: average_rating, rating_counts, and product_data. average_rating calculates the average rating for each product, rating_counts calculates the number of ratings for each product, and product_data combines all three dataframes into a single dataframe.

We've also modified the sorting order of the sorted_products dataframe to take into account popularity, average rating, and rating counts, in that order. This ensures that the most popular products are listed first, but also takes into account the average rating and the number of ratings for each product.

### Conten-Based Recommendation 

Content-based recommendation systems use information about the features or attributes of items to recommend similar items to users. These systems first create a profile or representation of each item based on its features or attributes, such as the genre, director, and actors for a movie, or the brand, color, and size for a clothing item. Then, they use this profile to find other items that are similar in feature space to the item being recommended. Content-based systems are useful when there is a lot of information about the items available, but may not be effective if users' preferences change over time or if there is not enough diversity in the features or attributes of the items.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
content_data=data[["product_id","product_name","about_product"]]

In [None]:
content_data.head()

Unnamed: 0,product_id,product_name,about_product
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,High Compatibility : Compatible With iPhone 12...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Compatible with all Type C enabled devices, be..."
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,【 Fast Charger& Data Sync】-With built-in safet...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,The boAt Deuce USB 300 2 in 1 cable is compati...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,[CHARGE & SYNC FUNCTION]- This cable comes wit...


In [None]:
# Create a product feature matrix using the product title and description
content_data = content_data.fillna('') # Replace missing values with empty strings
content_data['text'] = content_data['product_name'] + ' ' + content_data['about_product']


In [None]:
# Vectorize the product feature matrix using TF-IDF
vectorizer = TfidfVectorizer()
product_vectors = vectorizer.fit_transform(content_data['text'])


In [None]:
# Calculate the cosine similarity between products based on their TF-IDF vectors
product_similarity = cosine_similarity(product_vectors)

In [None]:
# Convert the similarity matrix into a dataframe for easier indexing
product_similarity_df = pd.DataFrame(product_similarity, index=content_data['product_id'], 
                                      columns=content_data['product_id'])


In [None]:
# Select a product to make recommendations for
product_id = 'B098NS6PVG'

In [None]:
# Find the most similar products to the selected product
#similar_products = product_similarity_df[product_id].sort_values(ascending=False).index.tolist()

# Find the most similar products to the selected product
similar_products = product_similarity_df[product_id].sort_values(by='product_id',ascending=False)[1:6].index.tolist()

# Print the recommended products

In [None]:
# Print the recommended products
print('Recommended products for {}:'.format(product_id))
for i, product in enumerate(similar_products):
    print('{}. {}'.format(i+1, product))

Recommended products for B098NS6PVG:
1. B0BQRJ3C47
2. B0BQ3K23Y1
3. B0BPJBTB3F
4. B0BPCJM7TB
5. B0BPBXNQQT


 we use the product title and description as the product features, and vectorize them using TF-IDF. We then calculate the cosine similarity between products based on their TF-IDF vectors, and use this similarity to identify the most similar products to the selected product. Finally, we print the top recommended products.

### Collaborative filtering recommendation system

In [None]:
collab_data=data[["product_id","user_id","rating"]]

In [None]:
collab_data.head()

Unnamed: 0,product_id,user_id,rating
0,B07JW9H4J1,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",4.2
1,B098NS6PVG,"AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",4.0
2,B096MSW6CT,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",3.9
3,B08HDJ86NZ,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",4.2
4,B08CF3B7N1,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",4.2


In [None]:
pip install scikit-surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/772.0 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m389.1/772.0 KB[0m [31m11.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 KB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp39-cp39-linux_x86_64.whl size=3193679 sha256=8dc8b1fae1f4e6bdfeaef36cda3fa587a49818f78edf501a5de0da11ccddb661
  Stored in directory: /root/.cache/pip/wheels/c6/3a/46/9b17b3512bdf283c6cb84f59929

In [None]:
import pandas as pd
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise.model_selection import cross_validate

In [None]:
# Define the reader to read the data with the Surprise library
reader = Reader(rating_scale=(1, 5))


In [None]:
# Load the data into the Surprise library's Dataset format
data = Dataset.load_from_df(collab_data[['user_id', 'product_id', 'rating']], reader)


In [None]:
# Use the SVD algorithm for collaborative filtering
algo = SVD()

In [None]:
# Evaluate the algorithm using cross-validation
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)


Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.2727  0.2979  0.2708  0.2952  0.2823  0.2838  0.0112  
MAE (testset)     0.2013  0.2033  0.2049  0.2039  0.2030  0.2033  0.0012  
Fit time          0.03    0.02    0.02    0.02    0.02    0.02    0.00    
Test time         0.00    0.00    0.00    0.00    0.00    0.00    0.00    


{'test_rmse': array([0.27266718, 0.29791629, 0.27079461, 0.29516334, 0.28228692]),
 'test_mae': array([0.20128686, 0.20326223, 0.20488575, 0.20385144, 0.20302115]),
 'fit_time': (0.02573990821838379,
  0.017937898635864258,
  0.021818161010742188,
  0.017704486846923828,
  0.017697572708129883),
 'test_time': (0.0017971992492675781,
  0.0014145374298095703,
  0.002096891403198242,
  0.0013568401336669922,
  0.0013804435729980469)}

In [None]:
# Train the algorithm on the entire dataset
trainset = data.build_full_trainset()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fc3765125e0>

In [None]:
# Select a user to make recommendations for
user_id =100

In [None]:
# Get the products that the user has already rated
rated_products = collab_data[collab_data['user_id'] ==user_id]['product_id'].unique()


In [None]:
# Create a list of all products that the user has not rated
unrated_products = collab_data[~collab_data['product_id'].isin(rated_products)]['product_id'].unique()


In [None]:
# Create a list of tuples with each unrated product and the predicted rating for the user
predictions = []
for product_id in unrated_products:
    predicted_rating = algo.predict(user_id, product_id).est
    predictions.append((product_id, predicted_rating))


In [None]:
# Sort the list of predictions by the predicted rating, in descending order
predictions.sort(key=lambda x: x[1], reverse=True)

In [None]:
# Print the top 5 recommended products
for i, (product_id, predicted_rating) in enumerate(predictions[:5]):
    print(f"Recommendation {i+1}: Product ID {product_id}, Predicted Rating {predicted_rating}")


Recommendation 1: Product ID B09ZHCJDP1, Predicted Rating 4.181107357449973
Recommendation 2: Product ID B0BP7XLX48, Predicted Rating 4.173917864844586
Recommendation 3: Product ID B0B9BXKBC7, Predicted Rating 4.173382355749152
Recommendation 4: Product ID B0BQRJ3C47, Predicted Rating 4.165283734411774
Recommendation 5: Product ID B09F6S8BT6, Predicted Rating 4.159482888310232


### TensorFlow Recommendor 



### Retrival Model 

In [None]:
popular_data.head()

Unnamed: 0,product_id,user_id,rating,rating_count
0,B07JW9H4J1,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",4.2,24269.0
1,B098NS6PVG,"AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",4.0,43994.0
2,B096MSW6CT,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",3.9,7928.0
3,B08HDJ86NZ,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",4.2,94363.0
4,B08CF3B7N1,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",4.2,16905.0


In [None]:
data=popular_data.copy()

In [None]:
pip install tensorflow-recommenders

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow-recommenders
  Downloading tensorflow_recommenders-0.7.3-py3-none-any.whl (96 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/96.2 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.2/96.2 KB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tensorflow-recommenders
Successfully installed tensorflow-recommenders-0.7.3


In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_recommenders as tfrs
from typing import Dict, Text

In [None]:
### standardize item data types, especially string, float, and integer

data[['user_id',      
          'product_id',  
         ]] = data[['user_id','product_id']].astype(str)

# we will play around with the data type of the quantity, 
# which you shall see later it affects the accuracy of the prediction.

data['rating'] = data['rating'].astype(float)

In [None]:
### define interactions data and user data

### interactions 
### here we create a reference table of the user , item, and quantity purchased
interactions_dict = data.groupby(['user_id', 'product_id'])[ 'rating'].sum().reset_index()

## we tansform the table inta a dictionary , which then we feed into tensor slices
# this step is crucial as this will be the type of data fed into the embedding layers
interactions_dict = {name: np.array(value) for name, value in interactions_dict.items()}
interactions = tf.data.Dataset.from_tensor_slices(interactions_dict)

## we do similar step for item, where this is the reference table for items to be recommended
items_dict = data[['product_id']].drop_duplicates()
items_dict = {name: np.array(value) for name, value in items_dict.items()}
items = tf.data.Dataset.from_tensor_slices(items_dict)

## map the features in interactions and items to an identifier that we will use throught the embedding layers
## do it for all the items in interaction and item table
## you may often get itemtype error, so that is why here i am casting the quantity type as float to ensure consistency
interactions = interactions.map(lambda x: {
    'user_id' : x['user_id'], 
    'product_id' : x['product_id'], 
    'rating' : float(x['rating']),

})

items = items.map(lambda x: x['product_id'])

In [None]:
### define interactions data and user data

### interactions 
### here we create a reference table of the user , item, and quantity purchased
interactions_dict = data.groupby(['user_id', 'product_id'])[ 'rating'].sum().reset_index()

## we tansform the table inta a dictionary , which then we feed into tensor slices
# this step is crucial as this will be the type of data fed into the embedding layers
interactions_dict = {name: np.array(value) for name, value in interactions_dict.items()}
interactions = tf.data.Dataset.from_tensor_slices(interactions_dict)

## we do similar step for item, where this is the reference table for items to be recommended
items_dict = data[['product_id']].drop_duplicates()
items_dict = {name: np.array(value) for name, value in items_dict.items()}
items = tf.data.Dataset.from_tensor_slices(items_dict)

## map the features in interactions and items to an identifier that we will use throught the embedding layers
## do it for all the items in interaction and item table
## you may often get itemtype error, so that is why here i am casting the quantity type as float to ensure consistency
interactions = interactions.map(lambda x: {
    'user_id' : x['user_id'], 
    'product_id' : x['product_id'], 
    'rating' : float(x['rating']),

})

items = items.map(lambda x: x['product_id'])

In [None]:
unique_item_titles = np.unique(np.concatenate(list(items.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(interactions.batch(1_000).map(lambda x: x["user_id"]))))


In [None]:
### get unique item and user id's as a lookup table
unique_item_titles = np.unique(np.concatenate(list(items.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(interactions.batch(1_000).map(lambda x: x["user_id"]))))

# Randomly shuffle data and split between train and test.
tf.random.set_seed(42)
shuffled = interactions.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(60_000)
test = shuffled.skip(60_000).take(20_000)

In [None]:
class RetailModel(tfrs.Model):
    def __init__(self, user_model, item_model):
        super().__init__()
        ### Candidate model (item)
        ### This is Keras preprocessing layers to first convert user ids to integers, 
        ### and then convert those to user embeddings via an Embedding layer. 
        ### We use the list of unique user ids we computed earlier as a vocabulary:
        item_model = tf.keras.Sequential([tf.keras.layers.experimental.preprocessing.StringLookup(
                                        vocabulary=unique_item_titles, mask_token=None),
                                        tf.keras.layers.Embedding(len(unique_item_titles) + 1, embedding_dimension)])
        ### we pass the embedding layer into item model
        self.item_model: tf.keras.Model = item_model
            
        ### Query model (users)    
        user_model = tf.keras.Sequential([tf.keras.layers.experimental.preprocessing.StringLookup(
                                        vocabulary=unique_user_ids, mask_token=None),
                                        # We add an additional embedding to account for unknown tokens.
                                        tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)])
        self.user_model: tf.keras.Model = user_model
        
        ### for retrieval model. we take top-k accuracy as metrics
        metrics = tfrs.metrics.FactorizedTopK(candidates=items.batch(128).map(item_model))
        
        # define the task, which is retrieval                                        
        task = tfrs.tasks.Retrieval(metrics=metrics)
        self.task: tf.keras.layers.Layer = task

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        # We pick out the user features and pass them into the user model.
        user_embeddings = self.user_model(features["user_id"])
        # And pick out the item features and pass them into the item model,
        # getting embeddings back.
        positive_item_embeddings = self.item_model(features["product_id"])

        # The task computes the loss and the metrics.
        return self.task(user_embeddings, positive_item_embeddings)

In [None]:
### Fitting and evaluating
### we choose the dimensionality of the query and candicate representation.
embedding_dimension = 32
## we pass the model, which is the same model we created in the query and candidate tower, into the model
item_model = tf.keras.Sequential([tf.keras.layers.experimental.preprocessing.StringLookup(
                                vocabulary=unique_item_titles, mask_token=None),
                                tf.keras.layers.Embedding(len(unique_item_titles) + 1, embedding_dimension)])

user_model = tf.keras.Sequential([ tf.keras.layers.experimental.preprocessing.StringLookup(
                                vocabulary=unique_user_ids, mask_token=None),
                                # We add an additional embedding to account for unknown tokens.
                                tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)])

model = RetailModel(user_model, item_model)


model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.01))
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

## fit the model with ten epochs
model_hist = model.fit(cached_train, epochs=10)

#evaluate the model
#model.evaluate(cached_test,return_dict=True)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### TensorFlow-based recommendation models can be used to build more sophisticated and personalized recommendation systems by leveraging deep learning techniques. These models can incorporate additional features and data sources beyond traditional recommendation techniques,here, i am using simple retrive methods, 

### Hybrid Model

In [30]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

# Load the data
ratings_data=data[["product_id","user_id","rating"]]
products_data=data[["product_id","product_name","about_product"]]

# Merge the data
merged_data = pd.merge(ratings_data, products_data, on='product_id')

# Create a pivot table
product_features = merged_data.pivot_table(index='product_id', columns='user_id', values='rating').fillna(0)

# Create a sparse matrix
product_features_matrix = csr_matrix(product_features.values)

# Compute the cosine similarity between products
product_similarity = cosine_similarity(product_features_matrix)

# Create a content-based filtering model
cv = CountVectorizer()
product_content = cv.fit_transform(products_data['about_product'])
content_similarity = cosine_similarity(product_content)

# Create a collaborative filtering model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(product_features_matrix)

# Define a hybrid recommendation function
def hybrid_recommendation(product_id):
    # Get the index of the product in the product_features table
    product_idx = np.where(product_features.index == product_id)[0][0]
    
    # Compute the similarity score for the product based on content
    content_scores = list(enumerate(content_similarity[product_idx]))
    
    # Compute the similarity score for the product based on collaborative filtering
    cf_scores = list(model_knn.kneighbors(product_features.iloc[product_idx].values.reshape(1, -1), n_neighbors=10)[1][0])
    
    # Combine the scores and sort them
    hybrid_scores = [(i, (0.5*content_similarity[product_idx][i]) + (0.5*product_similarity[product_idx][i])) for i in range(len(product_features.index))]
    hybrid_scores = sorted(hybrid_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top 10 recommendations
    recommendations = [i[0] for i in hybrid_scores[:10]]
    
    return recommendations


In [32]:
# Call the hybrid_recommendation function to get the top 10 recommendations for a product with ID 'prod001'
recommendations = hybrid_recommendation('B07JW9H4J1')

# Print out the list of recommended product IDs
print('Top 10 recommendations:', recommendations)


Top 10 recommendations: [346, 369, 339, 345, 338, 457, 496, 600, 337, 462]


#### Hybrid models combine multiple recommendation techniques to improve the quality of recommendations. By combining collaborative filtering and content-based filtering, the hybrid model can provide more personalized recommendations and overcome the limitations of each individual technique.



#### In conclusion, each type of recommendation model has its strengths and limitations, and the choice of the most appropriate model depends on the specific needs and goals of the recommendation system.