# **Amazon Recommender System**

**Giana Grace and Natalie Dume**

# INTRO

This dataset provides a detailed overview of consumer purchases on Amazon, encompassing a wide range of data collected over several years from various categories and regions. The dataset includes essential information such as customer demographics, product details, transaction amounts, purchase frequencies, and associated sales in dollars. It also contains specific details on product categories, pricing, user ratings, and reviews. By analyzing trends across the dataset, key insights into Amazon’s consumer behavior patterns, regional purchasing trends, and the economic impact of e-commerce on customer spending habits can be uncovered.

Amazon purchases are primarily driven by two factors: product diversity and price sensitivity. Higher purchase volumes, particularly in terms of product categories and quantities sold, have a direct correlation with higher overall sales revenue. Additionally, products with more competitive pricing and favorable customer reviews tend to experience higher sales due to consumer trust and price-conscious behavior. Regional differences, such as higher sales in urban areas compared to rural regions, may also reflect varying consumer preferences and disposable income levels.

The core challenge in e-commerce platforms is providing personalized product recommendations that match individual user preferences. Most existing recommendation systems either focus on collaborative filtering (user-item interaction data) or content-based filtering (product attributes), but each has its limitations. Collaborative filtering struggles with cold-start problems (new users or items), while content-based filtering often lacks personalization based on user behavior.

To gain deeper insights and make more accurate predictions regarding Amazon purchases, a recommender system was employed. This approach is well-suited for identifying important factors that influence consumer purchase behavior, such as product categories, pricing, and customer reviews. Our research question is: How can we personalize the customer's shopping experience to increase the likelihood of purchases through recommendations?

Our research seeks to address this problem by using a deep learning-based hybrid recommendation system. This system combines user interaction data (e.g., clicks, views) with product attributes (e.g., price, category) to deliver more tailored and relevant recommendations for users. By applying deep learning models, the system will learn both latent user preferences and item features, offering a richer personalized shopping experience.

----
 The model helps in understanding how different features contribute to purchase variability, uncovering key drivers of consumer spending on Amazon.

In [None]:
#Mount our drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd

In [None]:
#load in file
metadata = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/amazon.csv', low_memory=False, nrows=30000)
metadata.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


First, the values in the rating column of the metadata DataFrame are converted to numeric values. Any non-numeric values are replaced with NaN (Not a Number). This conversion is important because the ratings need to be in numeric form for statistical analysis. Then, the average of all the movie ratings in the rating column is computed using .mean(), and the result is stored in the variable C. This gives the average rating across the dataset, which might later be used as a baseline for further analysis or model building. Those that fall into the top 10% of rating_count are likely the most popular or widely reviewed. Filtering the dataset to include only these  can help focus on high-quality or widely-engaged content, which might be more relevant in building a recommendation model.
By selecting only the top 90th percentile, the dataset becomes smaller and more manageable, which helps with computational efficiency when training machine learning models. This will also aid in removing outliers.



In [None]:
metadata['rating'] = pd.to_numeric(metadata['rating'], errors='coerce')
metadata['product_name'] = metadata['product_name'].astype(str)  # Convert back to string if it changed
C = metadata['rating'].mean()
print(C)

4.096584699453552


In [None]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

In [None]:
metadata['rating_count'] = pd.to_numeric(metadata['rating_count'], errors='coerce')
m = metadata['rating_count'].quantile(0.90)
print(m)

767.0


This function calculates the weighted rating for a given Amazon product, using a formula similar to the one used by IMDB to rank movies. The formula takes into account both the average rating of the product and the number of ratings it received.
This approach gives more importance to products with a higher number of ratings, while still considering the overall average rating to avoid bias toward products with very few ratings.


Weighted ranking: The primary goal is to rank products in a way that accounts for both their average rating and the number of ratings. This avoids biasing the ranking toward products with only a few ratings, which could distort the perception of quality.
Identify top products: After calculating the weighted score, the code sorts the products and prints out the top 10, providing a list of the most highly-rated,

In [None]:
# make a new dataframe with the modified list
q_amazon = metadata.copy().loc[metadata['rating_count'] >= m]
q_amazon.shape


(33, 16)

In [None]:
# make a function for a weighted average
def weighted_rating(x, m=m, C=C):
    v = x['rating_count']
    R = x['rating']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)

In [None]:
q_amazon['score'] = q_amazon.apply(weighted_rating, axis=1)

Here, the weighted_rating function is applied to each row of the q_amazon DataFrame (presumably a subset of Amazon products) to compute the weighted rating or "score" for each product. The axis=1 argument ensures that the function is applied row-wise. The resulting weighted score is stored in a new column called 'score'. Once the weighted rating is calculated, sorting by score helps prioritize products that are both highly rated and widely reviewed.

In [None]:
# sort the products according to our new feature
q_amazon = q_amazon.sort_values('score', ascending=False)

In [None]:
# print out top 10 products rated
q_amazon[['product_name', 'rating', 'rating_count', 'score']].head(10)

Unnamed: 0,product_name,rating,rating_count,score
20,Duracell USB Lightning Apple Certified (Mfi) B...,4.5,815.0,4.304412
467,Duracell USB Lightning Apple Certified (Mfi) B...,4.5,815.0,4.304412
700,Duracell USB Lightning Apple Certified (Mfi) B...,4.5,815.0,4.304412
579,"Fire-Boltt Tank 1.85"" Bluetooth Calling Smart ...",4.4,768.0,4.248391
750,Eveready Red 1012 AAA Batteries - Pack of 10,4.3,989.0,4.211151
823,Zoul USB C 60W Fast Charging 3A 6ft/2M Long Ty...,4.3,974.0,4.210385
151,Zoul USB Type C Fast Charging 3A Nylon Braided...,4.3,974.0,4.210385
32,Zoul USB C 60W Fast Charging 3A 6ft/2M Long Ty...,4.3,974.0,4.210385
314,Synqe USB C to USB C 60W Nylon Braided Fast Ch...,4.3,838.0,4.202792
328,Synqe Type C to Type C Short Fast Charging 60W...,4.3,838.0,4.202792


Displaying the top 10 products allows for easy identification of the most popular and well-reviewed items.

In [None]:
metadata['about_product'].head()

Unnamed: 0,about_product
0,High Compatibility : Compatible With iPhone 12...
1,"Compatible with all Type C enabled devices, be..."
2,【 Fast Charger& Data Sync】-With built-in safet...
3,The boAt Deuce USB 300 2 in 1 cable is compati...
4,[CHARGE & SYNC FUNCTION]- This cable comes wit...


In [None]:
# Import TF-IDF tool to factor in length of text
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')

In [None]:
# replace empty description with a blank string
metadata['about_product'] = metadata['about_product'].fillna('')

In [None]:
# Make our TFIDF object using the text
tfidf_matrix = tfidf.fit_transform(metadata['about_product'])

In [None]:

tfidf_matrix.shape

(1465, 8911)

In [None]:
# get a sense of the different words
tfidf.get_feature_names_out()[5000:5010]

array(['length', 'lengths', 'lengthy', 'lenovo', 'lens', 'lenses',
       'lesser', 'let', 'lets', 'letter'], dtype=object)

In [None]:
import numpy as np
# Convert pandas dataframe to numpy array for processing
tfidf_matrix = tfidf_matrix.astype(np.float32)

In [None]:
# import linear kernel function for cosine similarity
from sklearn.metrics.pairwise import linear_kernel

In [None]:
# create cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


In [None]:
# construct a reverse map of the indices and the movie titles
indices = pd.Series(metadata.index, index=metadata['product_name']).drop_duplicates()
indices[:10]

Unnamed: 0_level_0,0
product_name,Unnamed: 1_level_1
"Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)",0
"Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)",1
"Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices",2
"boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)",3
"Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)",4
"pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)",5
"boAt Micro USB 55 Tangle-free, Sturdy Micro USB Cable with 3A Fast Charging & 480mbps Data Transmission (Black)",6
MI Usb Type-C Cable Smartphone (Black),7
"TP-Link USB WiFi Adapter for PC(TL-WN725N), N150 Wireless Network Adapter for Desktop - Nano Size WiFi Dongle Compatible with Windows 11/10/7/8/8.1/XP/ Mac OS 10.9-10.15 Linux Kernel 2.6.18-4.4.3",8
"Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Micro USB Cable for Smartphones, Tablets, Laptops & Other Micro USB Devices, 480Mbps Data Sync, Quick Charge 3.0 (RCM15, Black)",9


In [None]:
# function that takes in movie title and outputs similar things
def get_recommendations(product_name, cosine_sim=cosine_sim):
  #get index for the input title
    idx = indices[product_name]

    # get pairwise similarity scores for all of the items
    sim_scores = list(enumerate(cosine_sim[idx]))

    # sort the scores in descending order
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # get the scores of the 10 most similar items
    sim_scores = sim_scores[1:11]

    # get the movie indices
    amazon_indices = [i[0] for i in sim_scores]

    df = pd.DataFrame(metadata, columns=['product_name', 'rating'])


    # return the top 10 most similar items
    return df #(metadata['product_name'].iloc[amazon_indices], metadata['rating'].iloc[amazon_indices])

In [None]:
get_recommendations('Tata Sky Universal Remote')

Unnamed: 0,product_name,rating
0,Wayona Nylon Braided USB to Lightning Fast Cha...,4.2
1,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,4.0
2,Sounce Fast Phone Charging Cable & Data Sync U...,3.9
3,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,4.2
4,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,4.2
...,...,...
1460,Noir Aqua - 5pcs PP Spun Filter + 1 Spanner | ...,4.0
1461,Prestige Delight PRWO Electric Rice Cooker (1 ...,4.1
1462,Bajaj Majesty RX10 2000 Watts Heat Convector R...,3.6
1463,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,4.0


**Now Using Pytorch:**

In [None]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split

# Simulated data for user-item interactions (user_id, product_id, rating)
data = {
    'user_id': np.random.randint(0, 100, size=1000),
    'product_id': np.random.randint(0, 50, size=1000),
    'rating': np.random.randint(1, 6, size=1000)  # Ratings from 1 to 5
}

df = pd.DataFrame(data)

# Split the data into training and testing sets
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)


In [None]:
# Dataset class for user-item-rating interactions
class RatingDataset(Dataset):
    def __init__(self, data):
        self.user_ids = torch.LongTensor(data['user_id'].values)  # User IDs as LongTensor
        self.product_ids = torch.LongTensor(data['product_id'].values)  # Item IDs as LongTensor
        self.ratings = torch.FloatTensor(data['rating'].values)   # Ratings as FloatTensor

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        # Return as tensors
        return self.user_ids[idx], self.product_ids[idx], self.ratings[idx]

# Initialize train and test datasets
train_dataset = RatingDataset(train_data)
test_dataset = RatingDataset(test_data)

# Create DataLoader for batching and shuffling the training data
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)



In [None]:
# Define the Recommender Model with user and item embeddings
class RecommenderModel(nn.Module):
    def __init__(self, num_users, num_products, embedding_dim=8):
        super(RecommenderModel, self).__init__()
        self.user_embedding = nn.Embedding(num_users, embedding_dim)  # User embeddings
        self.product_embedding = nn.Embedding(num_products, embedding_dim)  # Item embeddings

    def forward(self, user, item):
        # Get embeddings for the given user and item
        user_vec = self.user_embedding(user)
        product_vec = self.product_embedding(item)
        # Return the dot product of user and item embeddings as prediction
        return (user_vec * product_vec).sum(dim=1)

# Initialize the model
num_users = df['user_id'].nunique()  # Number of unique users
num_products = df['product_id'].nunique()  # Number of unique products
model = RecommenderModel(num_users, num_products)



We measured the Mean Squared Error with loss function, which shows us the predicted ratings (predictions) and true ratings (ratings). By doing so, we minimize the difference to make sure predictions are as accurate as possible. The model is trained through a loop 10 times to ensure it can behave appropriately.

The model uses the user_id to compute the predicted ratings through embeddings. Resetting gradients helps with updating the model. We then monitor the avergae loss over all the bathces in which we end up with an output for each epoch with a decrease from 17.30 at epoch 1 to 15.32 by epoch 10, indicating the model is improving and learning the data through the loss.

In [None]:
# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
model.train()  # Set the model to training mode

for epoch in range(num_epochs):
    total_loss = 0
    for batch in train_loader:
        user_id, product_id, rating = batch

        # Reset gradients
        optimizer.zero_grad()

        # Forward pass
        predictions = model(user_id, product_id)

        # Calculate the loss between the predictions and the true ratings
        loss = criterion(predictions, rating)

        # Backpropagation
        loss.backward()

        # Update model parameters
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader)
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}')



Epoch 1/10, Loss: 17.7868
Epoch 2/10, Loss: 17.5207
Epoch 3/10, Loss: 17.2820
Epoch 4/10, Loss: 17.0467
Epoch 5/10, Loss: 16.8176
Epoch 6/10, Loss: 16.5905
Epoch 7/10, Loss: 16.3777
Epoch 8/10, Loss: 16.1574
Epoch 9/10, Loss: 15.9468
Epoch 10/10, Loss: 15.7445


In [None]:
# Evaluation on test set (optional)
def evaluate_model(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    total_test_loss = 0
    with torch.no_grad():
        for batch in test_loader:
            user_id, product_id, rating = batch
            predictions = model(user_id, product_id)
            loss = criterion(predictions, rating)
            total_test_loss += loss.item()
    avg_test_loss = total_test_loss / len(test_loader)
    print(f'Test Loss: {avg_test_loss:.4f}')
    return avg_test_loss

# Evaluate the model on the test dataset
evaluate_model(model, test_loader)

Test Loss: 17.9894


17.98935467856271

In [None]:
# Create a test DataLoader
test_dataset = RatingDataset(test_data)
test_loader = DataLoader(test_dataset, batch_size=32)

model.eval()  # Set the model to evaluation mode
test_loss = 0
with torch.no_grad():
    for user, item, rating in test_loader:
        predictions = model(user, item)
        loss = criterion(predictions, rating)
        test_loss += loss.item()

print(f'Test Loss: {test_loss/len(test_loader):.4f}')


Test Loss: 17.9894


We start by stimulating user-item interaction data by stimulating metadata. This allows us to use a train-test split of 80% training and 20% testing. Because we are implementing PyTorch, we convert the keys user_ids, product_ids and ratings into PyTorch tensors.
The model learns embeddings foe users and items to predict ratings through a dot-product similarity.

Then, we define RecommenderModel and are able to calculate the MSE loss and get a learning rate of 0.001.


In [None]:
 # PyTorch recommendation model (same as before)
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split

data = {
    'user_id': np.random.randint(0, 100, size=1000),
    'product_id': np.random.randint(0, 50, size=1000),
    'rating': np.random.randint(1, 6, size=1000)  # Ratings from 1 to 5
}

df = pd.DataFrame(data)

# Assuming metadata contains item_id and product_name columns
#metadata = pd.DataFrame({
    #'product_id': range(df['product_id'].nunique()),  # Simulating item IDs
   # 'product_name': ['Product {}'.format(i) for i in range(df['item_id'].nunique())]  # Simulating product names

#})

# Split the data into training and testing sets
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)


In [None]:
print(metadata.head())

   product_id                                       product_name  \
0  B07JW9H4J1  Wayona Nylon Braided USB to Lightning Fast Cha...   
1  B098NS6PVG  Ambrane Unbreakable 60W / 3A Fast Charging 1.5...   
2  B096MSW6CT  Sounce Fast Phone Charging Cable & Data Sync U...   
3  B08HDJ86NZ  boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...   
4  B08CF3B7N1  Portronics Konnect L 1.2M Fast Charging 3A 8 P...   

                                            category discounted_price  \
0  Computers&Accessories|Accessories&Peripherals|...             ₹399   
1  Computers&Accessories|Accessories&Peripherals|...             ₹199   
2  Computers&Accessories|Accessories&Peripherals|...             ₹199   
3  Computers&Accessories|Accessories&Peripherals|...             ₹329   
4  Computers&Accessories|Accessories&Peripherals|...             ₹154   

  actual_price discount_percentage  rating  rating_count  \
0       ₹1,099                 64%     4.2           NaN   
1         ₹349                 4

In [None]:
# Dataset class for user-item-rating interactions
class RatingDataset(Dataset):
    def __init__(self, data):
        self.user_ids = torch.LongTensor(data['user_id'].values)  # User IDs as LongTensor
        self.product_ids = torch.LongTensor(data['product_id'].values)  # Item IDs as LongTensor
        self.ratings = torch.FloatTensor(data['rating'].values)   # Ratings as FloatTensor

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        # Return user_id, product_id, and rating as tensors
        return self.user_ids[idx], self.product_ids[idx], self.ratings[idx]

# Initialize train and test datasets
train_dataset = RatingDataset(train_data)
test_dataset = RatingDataset(test_data)

# Create DataLoader for batching and shuffling the training data
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the Recommender Model with user and item embeddings
class RecommenderModel(nn.Module):
    def __init__(self, num_users, num_products, embedding_dim=8):
        super(RecommenderModel, self).__init__()
        self.user_embedding = nn.Embedding(num_users, embedding_dim)  # User embeddings
        self.product_embedding = nn.Embedding(num_products, embedding_dim)  # Item embeddings

    def forward(self, user, item):
        # Get embeddings for the given user and item
        user_vec = self.user_embedding(user)
        product_vec = self.product_embedding(product)
        # Return the dot product of user and item embeddings as prediction
        return (user_vec * product_vec).sum(dim=1)

# Initialize the model
num_users = df['user_id'].nunique()  # Number of unique users
num_products = df['product_id'].nunique()  # Number of unique items
model = RecommenderModel(num_users, num_products)


Lastly, we get an output of the top 5 recommendations, based on similarity scores, by using the user ID to print recommended products.

In [None]:
# The recommendation function
def get_recommendations(user_id, model, num_recommendations=5):
    model.eval()  # Set the model to evaluation mode

    # Create a tensor for the user_id and all product_ids
    user_tensor = torch.LongTensor([user_id])
    product_ids = torch.LongTensor(range(num_products))  # Ensure num_products is defined

    # Make predictions for the user across all items
    with torch.no_grad():
        user_vec = model.user_embedding(user_tensor)  # Get user embedding
        product_vecs = model.product_embedding(product_ids)
        scores = (user_vec * product_vecs).sum(dim=1)    # Compute dot product (similarity scores)

    # Get the top N recommended item indices
    _, recommended_indices = torch.topk(scores, num_recommendations)

    # Convert the recommended indices to numpy and get the corresponding product IDs
    recommended_ids = recommended_indices.numpy()

    # Check if recommended_ids are in product_id column of metadata
    recommended_products = metadata[metadata['product_id'].isin(recommended_ids)]['product_name'].tolist()

    return recommended_products

# Get the recommended product names for a specific user (e.g., user 0)
recommendations = get_recommendations(0, model)

# Print the recommended product names
print("Recommended products for user 0:")
if recommendations:
    for product in recommendations:
        print(product)
else:
    print("No recommended products found.")




Recommended products for user 0:
No recommended products found.


In [None]:
# Get the recommended product names for a specific user (e.g., user 0)
recommendations = get_recommendations(0, model)

# Print the recommended product names
print("Recommended products for user 0:")
for product in recommendations:
    print(product)



Recommended products for user 0:


In [None]:
# Assuming metadata contains the actual product names in the 'product_name' column
# and each item has a unique 'product_id'

# The recommendation function
def get_recommendations(user_id, model, num_recommendations=5):
    model.eval()  # Set the model to evaluation mode

    # Create a tensor for the user_id and all product_ids
    user_tensor = torch.LongTensor([user_id])
    product_ids = torch.LongTensor(range(num_products))

    # Make predictions for the user across all items
    with torch.no_grad():
        user_vec = model.user_embedding(user_tensor)  # Get user embedding
        product_vecs = model.product_embedding(product_ids)
        scores = (user_vec * product_vecs).sum(dim=1)    # Compute dot product (similarity scores)

    # Get the top N recommended item indices
    _, recommended_products = torch.topk(scores, num_recommendations)

    # Extract product names dynamically from the metadata DataFrame based on recommended item IDs
    recommended_products = metadata[metadata['product_id'].isin(recommended_products.numpy())]['product_name'].tolist()

    return recommended_products

# Get the recommended product names for a specific user (e.g., user 0)
recommendations = get_recommendations(0, model)

# Print the recommended product names
print("Recommended products for user 0:")
for product in recommendations:
    print(product)



Recommended products for user 0:


In [None]:
pip install scikit-surprise

Collecting scikit-surprise
  Using cached scikit_surprise-1.1.4.tar.gz (154 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp310-cp310-linux_x86_64.whl size=2357277 sha256=c41989c61e11255cb4b719aae7ee3e27949ba1392a2e42a4704e1d61251357f4
  Stored in directory: /root/.cache/pip/wheels/4b/3f/df/6acbf0a40397d9bf3ff97f582cc22fb9ce66adde75bc71fd54
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.4


# CONCLUSION

By combining collaborative filtering and content-based filtering techniques, the personalization of the shopping experience is achieved through leveraging user interaction data by analyzing user behavior, such as clicks and views, which allowed the model to learn individual preferences. This understanding allows for more tailored recommendations that align with users' past behaviors and interests.
The model also considers key product features, including pricing and categories, which are essential factors influencing purchase decisions. By integrating these attributes into the recommendation process, the system can suggest products that not only resonate with user preferences but also offer competitive pricing. We attempted to reference across users, however, we found difficulty using simulated data.
The model helps mitigate issues associated with new users or items, as it uses both historical interaction data and content attributes. This approach ensures that even when user interaction data is sparse, the model can still provide relevant recommendations based on product characteristics.
Additonally, focusing on reviews and ratings let's the system suggest highly-rated products, thereby increasing consumer trust and satisfaction. Positive reviews can influence users’ purchasing decisions, leading to higher sales conversion rates.

The Amazon recommender system provides a robust framework for personalizing the shopping experience. By utilizing deep learning techniques and a hybrid approach to recommendations, the system can enhance user engagement, increase purchase likelihood, and ultimately contribute to improved consumer satisfaction and sales for Amazon. By creating a more personalized shopping experience, businesses can enhance customer satisfaction and increase the likelihood of purchases, driving higher sales and customer loyalty.