## Final Project Part 2: Product Recommender System Using NLP   

Laine Close  
Marcos Fernandez  
Owen Randolph

In [1]:
# If using google collab must save both datasets to google drive due to size restrictions with uploading locally to 'contents' directory
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [15]:
import json
import gzip
import pandas as pd
import nltk

from nltk.corpus import stopwords
import re
import string
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize
from nltk.corpus import stopwords

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('omw-1.4')
nltk.download('punkt_tab')
stop_words = set(stopwords.words('english'))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [3]:
# Used for text embedding with BERT
!pip install sentence-transformers
!pip install transformers



In [4]:
# Used for collaborative filtering
!pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp312-cp312-linux_x86_64.whl size=2611201 sha256=b1db0a7a94fef2d4df76073adda263e6504b9fd81b9a9f436678eebf06f2f50c
  Stored in directory: /root/.cache/pip/wheels/75/fa/bc/739bc2cb1fbaab6061854e6cfbb81a0ae52c92a502a7fa454b
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.4


## Part 1. Import Electronics Data

Import Electronics review file and store as df_review dataframe. Only pull in records with non-missing parent_asin values.
The file is .json and is is in a zip folder. Only temporarily pull in 5000 records because of the size of the file. This will be removed prior to the final submission.

In [5]:
# Import 'Electronics' review data
## NOTE: UPDATE PATH TO YOUR SOURCE DATA
#file = '/content/Electronics.jsonl.gz'
file = '/content/drive/MyDrive/Graduate School/Natural Language Processing/Electronics.jsonl.gz'

data = []
with gzip.open(file, 'rt', encoding='utf-8') as fp:
    for i, line in enumerate(fp):
        if i >= 5000:
            break
        record = json.loads(line.strip())
        if record.get('parent_asin'):
            data.append(record)

# Convert to dataframe
df_review = pd.DataFrame(data)
df_review.head()

Unnamed: 0,rating,title,text,images,asin,parent_asin,user_id,timestamp,helpful_vote,verified_purchase
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline ...,[{'small_image_url': 'https://m.media-amazon.c...,B083NRGZMM,B083NRGZMM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1658185117948,0,True
1,1.0,Didn’t work at all lenses loose/broken.,These didn’t work. Idk if they were damaged in...,[],B07N69T6TM,B07N69T6TM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1592678549731,0,True
2,5.0,Excellent!,I love these. They even come with a carry case...,[],B01G8JO5F2,B01G8JO5F2,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1523093017534,0,True
3,5.0,Great laptop backpack!,I was searching for a sturdy backpack for scho...,[],B001OC5JKY,B001OC5JKY,AGGZ357AO26RQZVRLGU4D4N52DZQ,1290278495000,18,True
4,5.0,Best Headphones in the Fifties price range!,I've bought these headphones three times becau...,[],B013J7WUGC,B07CJYMRWM,AG2L7H23R5LLKDKLBEF2Q3L2MVDA,1676601581238,0,True


Import Electronics metadata file and store as df_meta dataframe. Only pull in records with non-missing parent_asin values.
The file is .json and is is in a zip folder. Only temporarily pull the parent_asin that are in the df_review dataframe. This will be removed prior to the final submission.

In [6]:
# Import 'Electronics' metadata
## NOTE: UPDATE PATH TO YOUR SOURCE DATA
#file = '/content/meta_Electronics.jsonl.gz'
file = '/content/drive/MyDrive/Graduate School/Natural Language Processing/meta_Electronics.jsonl.gz'
data = []

# Create a set of valid ASINs from df_review
valid_asins = set(df_review['asin'])

with gzip.open(file, 'rt', encoding='utf-8') as fp:
    for line in fp:
        record = json.loads(line.strip())
        if record.get('parent_asin') in valid_asins and record.get('parent_asin'):
            data.append(record)

# Convert to dataframe
df_meta = pd.DataFrame(data)
df_meta.head()

Unnamed: 0,main_category,title,average_rating,rating_number,features,description,price,images,videos,store,categories,details,parent_asin,bought_together,subtitle,author
0,Amazon Home,"Aproca Hard Storage Travel Case, for AKASO EK7...",4.6,489,[Eco-friendly Material: Made of High-density E...,[],14.99,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'LTGEM EVA Hard Case for AKASO EK70...,Aproca,"[Electronics, Camera & Photo, Bags & Cases, Ca...",{'Package Dimensions': '9.1 x 5.8 x 3.6 inches...,B07ZZ595TG,,,
1,Cell Phones & Accessories,ivencase 3D Penguin Silicone Soft Skin Case Co...,3.4,59,"[New fashion design, Very novel, cute and popu...",[Specifications: [ Install Kit ] 1x screen pro...,,[{'thumb': 'https://m.media-amazon.com/images/...,[],Allstarry,"[Electronics, Headphones, Earbuds & Accessorie...",{'Product Dimensions': '6.3 x 3.15 x 1.97 inch...,B009NPLXGS,,,
2,Home Audio & Theater,Bose Lifestyle V35 Home Theater System (Discon...,3.9,81,[5-speaker surround sound system includes 4 Je...,"[Product Description, The Bose Lifestyle V35 h...",,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'I don't think sound quality can ge...,Bose,"[Electronics, Television & Video, Home Theater...","{'Product Dimensions': '21 x 13 x 19 inches', ...",B003JQLPYC,,,
3,Computers,"Laptop Camera Cover Slide, Anti-spy Webcam Cov...",4.1,43,[[Ultra Thin & Fashion Design ] Super thin des...,[],,[{'thumb': 'https://m.media-amazon.com/images/...,[],Elimoons,"[Electronics, Computers & Accessories, Compute...","{'Brand': 'Elimoons', 'Hardware Platform': 'ta...",B083169BMW,,,
4,Computers,iPad Stand TechMatte Multi-Angle Aluminum Hold...,4.6,1955,[Sleek and stylish two-tone colors beautifully...,[Introducing the TechMatte Multi-Angle Mini Po...,10.99,[{'thumb': 'https://m.media-amazon.com/images/...,"[{'title': 'Simple, well built stand for multi...",TechMatte,"[Electronics, Computers & Accessories, Tablet ...","{'Standing screen display size': '4 Inches', '...",B00HHEAMXC,,,


In [7]:
# Merge both the review and metadata together by parent_asin from meta and asin from non-meta
comb_df = pd.merge(df_review,
                   df_meta,
                   how='inner', # Inner join to only keep records that are in both reviews and meta
                   left_on='asin', # Merge on parent_asin
                   right_on='parent_asin'
)

pd.set_option('display.max_columns', None)
display(comb_df.head())
print(len(comb_df))

Unnamed: 0,rating,title_x,text,images_x,asin,parent_asin_x,user_id,timestamp,helpful_vote,verified_purchase,main_category,title_y,average_rating,rating_number,features,description,price,images_y,videos,store,categories,details,parent_asin_y,bought_together,subtitle,author
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline ...,[{'small_image_url': 'https://m.media-amazon.c...,B083NRGZMM,B083NRGZMM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1658185117948,0,True,Camera & Photo,"Binoculars, 12x42 Binoculars for Adults, Binoc...",4.3,134,[],[],,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'Really Good Binoculars with Great ...,Hikkogo,"[Electronics, Camera & Photo, Binoculars & Sco...",{'Package Dimensions': '7.28 x 6.77 x 3.07 inc...,B083NRGZMM,,,
1,1.0,Didn’t work at all lenses loose/broken.,These didn’t work. Idk if they were damaged in...,[],B07N69T6TM,B07N69T6TM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1592678549731,0,True,Camera & Photo,"Toys for 4-5 Year Old Boys, Mom&myaboys 8 X 21...",4.1,115,[✔SUPERIOR SAFETY -Soft Rubber Surrounded Eyep...,[],15.99,[{'thumb': 'https://m.media-amazon.com/images/...,"[{'title': ' Binocular for Kids', 'url': 'http...",mom&myaboys,"[Electronics, Camera & Photo, Binoculars & Sco...",{'Package Dimensions': '4.8 x 3.6 x 2.3 inches...,B07N69T6TM,,,
2,5.0,Excellent!,I love these. They even come with a carry case...,[],B01G8JO5F2,B01G8JO5F2,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1523093017534,0,True,All Electronics,"Senso Bluetooth Headphones, Best Wireless Spor...",4.1,42824,[True HD high Fidelity sound featuring latest ...,[],24.96,[{'thumb': 'https://m.media-amazon.com/images/...,[],Senso,"[Electronics, Headphones, Earbuds & Accessorie...",{'Product Dimensions': '4.9 x 4.7 x 1.3 inches...,B01G8JO5F2,,,
3,5.0,Great laptop backpack!,I was searching for a sturdy backpack for scho...,[],B001OC5JKY,B001OC5JKY,AGGZ357AO26RQZVRLGU4D4N52DZQ,1290278495000,18,True,,"Targus Air Traveler Laptop Backpack, Professio...",4.2,265,[The Targus Zip-Thru Air Traveler Backpack is ...,"[Product Description, The Targus Checkpoint-Fr...",,[{'thumb': 'https://m.media-amazon.com/images/...,"[{'title': 'Durable And Spacious Backpack!', '...",Targus,"[Electronics, Computers & Accessories, Laptop ...",{'Product Dimensions': '17.8 x 14.8 x 3.8 inch...,B001OC5JKY,,,
4,5.0,solid sound for the price,Update 2-they sent a new warranty replacement....,[],B07BHHB5RH,B07BHHB5RH,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,1565130879386,0,True,Cell Phones & Accessories,"Bluetooth Headphones, Soundcore Spirit Sports ...",4.1,2126,[IPX7 Sweat Guard Technology：Truly sweat proof...,[],,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'HEALTH HAZARD; Random LOUD Buzzing...,Anker,"[Electronics, Headphones, Earbuds & Accessorie...",{'Product Dimensions': '23.6 x 1.3 x 0.47 inch...,B07BHHB5RH,,,


2755


## Part 2. Preprocessing  
Provide all essential steps that you deem necessary for your application


In [8]:
# Only keep columns required for recommender system
keep_columns = ['rating','text','user_id','title_y','categories','parent_asin_y','description']

df_preprocess = comb_df[keep_columns]
df_preprocess.head()

Unnamed: 0,rating,text,user_id,title_y,categories,parent_asin_y,description
0,3.0,First & most offensive: they reek of gasoline ...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Binoculars, 12x42 Binoculars for Adults, Binoc...","[Electronics, Camera & Photo, Binoculars & Sco...",B083NRGZMM,[]
1,1.0,These didn’t work. Idk if they were damaged in...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Toys for 4-5 Year Old Boys, Mom&myaboys 8 X 21...","[Electronics, Camera & Photo, Binoculars & Sco...",B07N69T6TM,[]
2,5.0,I love these. They even come with a carry case...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Senso Bluetooth Headphones, Best Wireless Spor...","[Electronics, Headphones, Earbuds & Accessorie...",B01G8JO5F2,[]
3,5.0,I was searching for a sturdy backpack for scho...,AGGZ357AO26RQZVRLGU4D4N52DZQ,"Targus Air Traveler Laptop Backpack, Professio...","[Electronics, Computers & Accessories, Laptop ...",B001OC5JKY,"[Product Description, The Targus Checkpoint-Fr..."
4,5.0,Update 2-they sent a new warranty replacement....,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,"Bluetooth Headphones, Soundcore Spirit Sports ...","[Electronics, Headphones, Earbuds & Accessorie...",B07BHHB5RH,[]


Remove english stop words, nbsp html and tokenize text

In [12]:
# Clean text. Not that this was pulled and modified from Week 8 Recommender Systems Demo
# Load BERT tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def clean_bert(text):
    text = re.sub("'", "", text) # Remove "'"
    text = text.replace("nbsp", "") # Remove html

    tokens = tokenizer.tokenize(text) # Tokenize using Bert

    filtered_tokens = [
        token for token in tokens
        if token not in stop_words and token not in string.punctuation and len(token) > 2 # Remove stop words and short tokens
    ]

    return " ".join(filtered_tokens)

In [13]:
# Combine text columns into a single column for data preprocessing
df_preprocess = df_preprocess.copy()

df_preprocess.loc[:, "prod_desc"] = df_preprocess["title_y"].astype(str) +" "+ df_preprocess["categories"].astype(str) +" "+ df_preprocess["description"].astype(str)
df_preprocess['categories'] = df_preprocess['categories'].astype(str)
df_preprocess.head()

Unnamed: 0,rating,text,user_id,title_y,categories,parent_asin_y,description,prod_desc
0,3.0,First & most offensive: they reek of gasoline ...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Binoculars, 12x42 Binoculars for Adults, Binoc...","['Electronics', 'Camera & Photo', 'Binoculars ...",B083NRGZMM,[],"Binoculars, 12x42 Binoculars for Adults, Binoc..."
1,1.0,These didn’t work. Idk if they were damaged in...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Toys for 4-5 Year Old Boys, Mom&myaboys 8 X 21...","['Electronics', 'Camera & Photo', 'Binoculars ...",B07N69T6TM,[],"Toys for 4-5 Year Old Boys, Mom&myaboys 8 X 21..."
2,5.0,I love these. They even come with a carry case...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,"Senso Bluetooth Headphones, Best Wireless Spor...","['Electronics', 'Headphones, Earbuds & Accesso...",B01G8JO5F2,[],"Senso Bluetooth Headphones, Best Wireless Spor..."
3,5.0,I was searching for a sturdy backpack for scho...,AGGZ357AO26RQZVRLGU4D4N52DZQ,"Targus Air Traveler Laptop Backpack, Professio...","['Electronics', 'Computers & Accessories', 'La...",B001OC5JKY,"[Product Description, The Targus Checkpoint-Fr...","Targus Air Traveler Laptop Backpack, Professio..."
4,5.0,Update 2-they sent a new warranty replacement....,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,"Bluetooth Headphones, Soundcore Spirit Sports ...","['Electronics', 'Headphones, Earbuds & Accesso...",B07BHHB5RH,[],"Bluetooth Headphones, Soundcore Spirit Sports ..."


In [18]:
# Apply clean bert function to columns
df_preprocess['prod_desc'] = df_preprocess['prod_desc'].apply(clean_bert)
df_preprocess['text'] = df_preprocess['text'].apply(clean_bert)
df_preprocess['title_y'] = df_preprocess['title_y'].apply(clean_bert)
df_preprocess['categories'] = df_preprocess['categories'].apply(clean_bert)
df_preprocess.head()

Unnamed: 0,rating,text,user_id,title_y,categories,parent_asin_y,description,prod_desc
0,3.0,first offensive gasoline sensitive allergic pe...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,binoculars binoculars adults binoculars huntin...,electronics camera photo binoculars scope bino...,B083NRGZMM,[],binoculars binoculars adults binoculars huntin...
1,1.0,work damaged shipping lenses loose something c...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,toys year old boys mom ##ys kids binoculars ch...,electronics camera photo binoculars scope bino...,B07N69T6TM,[],toys year old boys mom ##ys kids binoculars ch...
2,5.0,love even come carry case several sizes ear bu...,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,sen blue tooth head phones best wireless sport...,electronics head phones ear accessories head p...,B01G8JO5F2,[],sen blue tooth head phones best wireless sport...
3,5.0,searching sturdy backpack school would allow c...,AGGZ357AO26RQZVRLGU4D4N52DZQ,tar gus air traveler laptop backpack professio...,electronics computers accessories laptop acces...,B001OC5JKY,"[Product Description, The Targus Checkpoint-Fr...",tar gus air traveler laptop backpack professio...
4,5.0,update sent new warrant replacement good compa...,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,blue tooth head phones sound core spirit sport...,electronics head phones ear accessories head p...,B07BHHB5RH,[],blue tooth head phones sound core spirit sport...


## Feature Extraction  
Use BERT for content based filtering  
-Recommends products that a user liked that will be based on review and the product description.  

Collabaritve filtering
-Recommends products based on what similar users liked

In [48]:
from sentence_transformers import SentenceTransformer
bert_model = SentenceTransformer('all-MiniLM-L6-v2')

item_embeddings = bert_model.encode(df_preprocess['prod_desc'].tolist(), show_progress_bar=True)

Batches:   0%|          | 0/87 [00:00<?, ?it/s]

In [82]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split

# Encode the user_id and item_id
df_preprocess['user_idx'] = df_preprocess['user_id'].astype('category').cat.codes

# Store the categorical series of prod_desc before getting codes
item_categorical = df_preprocess['prod_desc'].astype('category')
df_preprocess['item_idx'] = item_categorical.cat.codes

# Get unique uuser_id and item_id
num_users = df_preprocess['user_idx'].nunique()
num_items = df_preprocess['item_idx'].nunique()

# Prepare training data
X = df_preprocess[['user_idx', 'item_idx']].values
y = df_preprocess['rating'].values

# Generate train and test sets and split the data at 0.2 for test size
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pytorch class representing a dataset. This is PyTorch documentation found in reference [6]. See 'class torch.utils.data.Dataset' section
class CF_Dataset(torch.utils.data.Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __getitem__(self, index):
        user, item = self.data[index]
        return torch.tensor(user, dtype=torch.long), torch.tensor(item, dtype=torch.long), torch.tensor(self.labels[index], dtype=torch.float32)

    def __len__(self):
        return len(self.data)

train_dataset = CF_Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=256, shuffle=True)

# Collaborative Filtering Model
## Note: Collaborative filtering code template was pulled from reference [5] and modified for our needs
class CF_Model(nn.Module):
    def __init__(self, num_users, num_items, latent_dim=32):
        super(CF_Model, self).__init__()
        self.user_embedding = nn.Embedding(num_users, latent_dim)
        self.item_embedding = nn.Embedding(num_items, latent_dim)
        self.fc = nn.Sequential(
            nn.Linear(latent_dim * 2, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

    def forward(self, user, item):
        user_vec = self.user_embedding(user)
        item_vec = self.item_embedding(item)
        x = torch.cat([user_vec, item_vec], dim=-1)
        return self.fc(x).squeeze()

# Initialize model
model = CF_Model(num_users, num_items)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop - set epoch to 5o for now
for epoch in range(50):
    model.train()
    total_loss = 0
    for user, item, rating in train_loader:
        optimizer.zero_grad()
        output = model(user, item)
        loss = criterion(output, rating)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

Epoch 1, Loss: 178.6185
Epoch 2, Loss: 158.2309
Epoch 3, Loss: 139.9699
Epoch 4, Loss: 122.3572
Epoch 5, Loss: 105.1129
Epoch 6, Loss: 89.2037
Epoch 7, Loss: 73.9113
Epoch 8, Loss: 59.9802
Epoch 9, Loss: 48.0482
Epoch 10, Loss: 38.7972
Epoch 11, Loss: 32.1387
Epoch 12, Loss: 27.4119
Epoch 13, Loss: 24.4356
Epoch 14, Loss: 21.7182
Epoch 15, Loss: 19.8577
Epoch 16, Loss: 18.1948
Epoch 17, Loss: 16.9713
Epoch 18, Loss: 15.9031
Epoch 19, Loss: 14.9605
Epoch 20, Loss: 14.1602
Epoch 21, Loss: 13.4507
Epoch 22, Loss: 12.7492
Epoch 23, Loss: 12.2778
Epoch 24, Loss: 11.8140
Epoch 25, Loss: 11.2493
Epoch 26, Loss: 10.8885
Epoch 27, Loss: 10.5506
Epoch 28, Loss: 10.1746
Epoch 29, Loss: 9.9601
Epoch 30, Loss: 9.5413
Epoch 31, Loss: 9.2512
Epoch 32, Loss: 9.0706
Epoch 33, Loss: 8.8393
Epoch 34, Loss: 8.6216
Epoch 35, Loss: 8.3629
Epoch 36, Loss: 8.0716
Epoch 37, Loss: 8.0117
Epoch 38, Loss: 7.6875
Epoch 39, Loss: 7.5203
Epoch 40, Loss: 7.3773
Epoch 41, Loss: 7.1930
Epoch 42, Loss: 7.0288
Epoch 43, 

In [77]:
#Measure how well the model performs
from sklearn.metrics import mean_squared_error
import numpy as np

# Get predictions on the test set
model.eval() # Set the model to evaluation mode
with torch.no_grad(): # Disable gradient calculation
    user_test = torch.tensor(X_test[:, 0], dtype=torch.long)
    item_test = torch.tensor(X_test[:, 1], dtype=torch.long)
    model_predictions = model(user_test, item_test).numpy()

# Calculate RMSE manually
rmse = np.sqrt(mean_squared_error(y_test, model_predictions))
print(f"RMSE on test set: {rmse:.4f}")

RMSE on test set: 1.2815


## Main Functionality  
This is your main task. For example, if you are creating a translation app, your main task is Translation.

In [78]:
# Map user_id to user_idx
user_id_to_idx = dict(zip(df_preprocess['user_id'], df_preprocess['user_idx']))

# Map item_idx to title_y for display
item_titles = df_preprocess.drop_duplicates('item_idx')[['item_idx', 'title_y']]
idx_to_title = dict(zip(item_titles['item_idx'], item_titles['title_y']))

# Build user-item history to avoid recommending already rated items
user_rated_items = df_preprocess.groupby('user_idx')['item_idx'].apply(set).to_dict()

In [83]:
# Build function to recommend the N= number of times for a specified user
def recommend_top_n_cf(user_id, model, N=10):
    model.eval()
    user_idx = user_id_to_idx.get(user_id)
    if user_idx is None: # Catch issues here is user_id entered is not found, then print 'Uers ID not found.'
        print(f"User ID {user_id} not found.")
        return

    rated_items = user_rated_items.get(user_idx, set())
    scores = []

    for item_idx in range(num_items):
        if item_idx in rated_items:
            continue  # Skip items already rated

        with torch.no_grad():
            score = model(torch.tensor(user_idx), torch.tensor(item_idx)).item()
        scores.append((item_idx, score))

    top_items = sorted(scores, key=lambda x: x[1], reverse=True)[:N]

    # Print the top N= items and add line break after each item for better visibility.
    print(f"\nTop {N} recommended products for user id: {user_id}:\n")
    for item_idx, score in top_items:
        print(f"{idx_to_title.get(item_idx, 'Unknown Item')}\n")  # Line break after each title

In [84]:
# Create a map from item index to its BERT embedding
recommend_top_n_cf('AFKZENTNBQ7A7V7UXW5JJI6UGRYQ', model, N=5)


Top 5 recommended products for user id: AFKZENTNBQ7A7V7UXW5JJI6UGRYQ:

net universal ethernet adapt ##e 200

lal aka compatible air pods cute case cartoon character silicon animal air pod designer skin wai funny fun cool ring design cover kids teens air pods pro cases girls boys ear blue

vol ##z equilibrium micro usb cable pack fast charging nylon braid tangle free android samsung nokia sony etc

100 wire gel filled bean type connector

kind leather cover steel blue updated design fits kind keyboard



In [85]:
# Enter a user_id to generate product recommendations
user_id = 'AFKZENTNBQ7A7V7UXW5JJI6UGRYQ'

# Filter rows for this user
user_purchases = df_preprocess[df_preprocess['user_id'] == user_id]

# Print each purchased item's title_y with a line break
print(f"\nItems purchased by user {user_id}:\n")
for title in user_purchases['title_y']:
    print(f"{title}\n")


Items purchased by user AFKZENTNBQ7A7V7UXW5JJI6UGRYQ:

binoculars binoculars adults binoculars hunting compact binoculars trip ##d smartphone adapt hunting bird watching hiking traveling sports

toys year old boys mom ##ys kids binoculars children compact telescope boys gifts years old bird watching scenery yellow

sen blue tooth head phones best wireless sports ear mic water proof stereo sweat proof ear phones gym running workout noise cancel ling ear phones ear noise cancel ling heads ##s



## Personal Contrubtion Statement  
Summary of tasks and team members' contributions  
Proofreading

## References  
[1] https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023   
[2] https://www.codegenes.net/blog/collaborative-filtering-pytorch/  
[3] https://www.geeksforgeeks.org/machine-learning/build-a-recommendation-engine-with-collaborative-filtering/  
[4] https://pytorch.org/blog/introducing-torchrec/  
[5] https://www.slingacademy.com/article/combining-content-based-and-collaborative-approaches-in-pytorch-recommenders/  
[6] https://docs.pytorch.org/docs/stable/data.html   
