# **Welcome to the notebook**

### Task 1 - Set up project environment

Installing the needed modules

In [2]:
!pip install openai==1.16.2 python-dotenv

Collecting openai==1.16.2
  Downloading openai-1.16.2-py3-none-any.whl.metadata (21 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading openai-1.16.2-py3-none-any.whl (267 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.1/267.1 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, openai
  Attempting uninstall: openai
    Found existing installation: openai 1.97.0
    Uninstalling openai-1.97.0:
      Successfully uninstalled openai-1.97.0
Successfully installed openai-1.16.2 python-dotenv-1.1.1


In [3]:
    pip install httpx==0.27.2

Collecting httpx==0.27.2
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Downloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.28.1
    Uninstalling httpx-0.28.1:
      Successfully uninstalled httpx-0.28.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-genai 1.26.0 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.2 which is incompatible.
firebase-admin 6.9.0 requires httpx[http2]==0.28.1, but you have httpx 0.27.2 which is incompatible.[0m[31m
[0mSuccessfully installed httpx-0.27.2


Importing the needed modules and setup the OpenAI API

In [4]:
import pandas as pd
import numpy as np
import os
from openai import OpenAI
from dotenv import load_dotenv
from matplotlib import pyplot as plt
import plotly.express as px

from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity

# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='apikey.env.txt')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")
APIKEY = 'APIKEY'
# ORGID = os.getenv("ORGID")

# Creating an instance of the OpenAI client with the provided API key and organization ID
client = OpenAI(
  # organization= ORGID,
  api_key=APIKEY
)

client

<openai.OpenAI at 0x78a6d4fc77d0>

Import our dataset

In [5]:
df = pd.read_csv('products_dataset.csv')
df

Unnamed: 0,product_id,title,description
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat..."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...
...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal..."
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo..."
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...


List of last 8 products recently viewed by the user.

In [6]:
searched_products_id = [
    'P1938',
    'P1970',
    'P1044',
    'P1838',
    'P1048',
    'P1017',
    'P1310',
    'P1444',
]

### Task 2 - Prepare the dataset

Let's label the data points that are recently veiwed.

In [7]:
df['product_status'] = 'not_viewed'
df.loc[df.product_id.isin(searched_products_id),'product_status'] = 'recently_viewed'
df[df.product_status=='recently_viewed']

Unnamed: 0,product_id,title,description,product_status
1017,P1017,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,Love your space like never before with the hig...,recently_viewed
1044,P1044,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed
1048,P1048,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1310,P1310,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1444,P1444,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1838,P1838,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1938,P1938,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed
1970,P1970,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed


Now let's combine the product `title` and `description` and store it into a column called `combined`.

In [8]:
df['combined'] = df.title + ' ' + df.description
df

Unnamed: 0,product_id,title,description,product_status,combined
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...
...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...


### Task 3 - Text embedding and visualization


Creating the text embedding vectors

In [9]:
response = client.embeddings.create(
    input = df.combined.to_list(),
    model = 'text-embedding-3-small',
    dimensions = 512
)
vectors = [d.embedding for d in response.data]
df['text_embeddings']= vectors
df

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.042664770036935806, 0.02092622220516205, -0..."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.04413316026329994, 0.009090634062886238, 0...."
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[0.042361605912446976, -0.06515178084373474, 0..."
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.049733716994524, -0.011679209768772125, 0...."
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.026085881516337395, 0.048493191599845886, -..."
...,...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...,"[0.09004916995763779, -0.055176541209220886, -..."
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,"[0.013627329841256142, -0.04656423255801201, 0..."
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"[-0.02963854931294918, -0.009340153075754642, ..."
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,"[-0.01301631424576044, -0.009222120977938175, ..."


> We know that each vector has 512 dimensions. In order to be able to visualize the vectors in a scatter plot, we need to use Principal Component Analysis (PCA) to reduce the dimension from 512 to 2.

In [10]:
pca = PCA(2)
vector_2d = pca.fit_transform(df.text_embeddings.to_list())
df['pc1'] = vector_2d[:,0]
df['pc2'] = vector_2d[:,1]
df

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.042664770036935806, 0.02092622220516205, -0...",-0.000228,0.066309
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.04413316026329994, 0.009090634062886238, 0....",-0.363652,0.235434
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[0.042361605912446976, -0.06515178084373474, 0...",-0.209285,-0.214405
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.049733716994524, -0.011679209768772125, 0....",-0.179814,0.039086
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.026085881516337395, 0.048493191599845886, -...",-0.212957,0.143710
...,...,...,...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...,"[0.09004916995763779, -0.055176541209220886, -...",-0.038057,-0.195877
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,"[0.013627329841256142, -0.04656423255801201, 0...",-0.245514,-0.483502
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"[-0.02963854931294918, -0.009340153075754642, ...",-0.081740,0.105612
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,"[-0.01301631424576044, -0.009222120977938175, ...",0.502578,0.003993


Now that we have the text embedding vectors in two dimensions, we can use them to create a 2D plot.

In [11]:
px.scatter(df, x='pc1', y='pc2', color= 'product_status')

### Task 4 - Find similar products

In [12]:
df.head()

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.042664770036935806, 0.02092622220516205, -0...",-0.000228,0.066309
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.04413316026329994, 0.009090634062886238, 0....",-0.363652,0.235434
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[0.042361605912446976, -0.06515178084373474, 0...",-0.209285,-0.214405
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.049733716994524, -0.011679209768772125, 0....",-0.179814,0.039086
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.026085881516337395, 0.048493191599845886, -...",-0.212957,0.14371


Get the data related to `recently_viewed` and `not_viewed` products

In [14]:
df_recently_viewed = df[df.product_status=='recently_viewed']
df_not_viewed = df[df.product_status=='not_viewed']
df_recently_viewed

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
1017,P1017,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,Love your space like never before with the hig...,recently_viewed,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,"[0.05918155610561371, -0.02796226739883423, 0....",0.470231,-0.057174
1044,P1044,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,"[0.02989116683602333, -0.02771798148751259, 0....",0.470721,-0.046263
1048,P1048,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,"[0.0008034154889173806, -0.027133531868457794,...",0.457054,-0.02932
1310,P1310,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,"[0.00219642068259418, -0.006548991892486811, 0...",0.466699,-0.048816
1444,P1444,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,"[0.05091223120689392, -0.016536731272935867, 0...",0.464626,-0.051651
1838,P1838,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,"[0.00441074650734663, -0.014054427854716778, 0...",0.459178,-0.051003
1938,P1938,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,"[0.006750619970262051, -0.060344669967889786, ...",0.469535,-0.050441
1970,P1970,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,"[0.03179488703608513, -0.06424853950738907, 0....",0.471999,-0.045673


Convert the embedding vectors to Numpy arrays

In [16]:
vectors_recently_viewed = np.array(df_recently_viewed.text_embeddings.to_list())
vectors_not_viewed = np.array(df_not_viewed.text_embeddings.to_list())
vectors_recently_viewed

array([[ 0.05918156, -0.02796227,  0.05370384, ...,  0.03199653,
        -0.0369931 ,  0.02376145],
       [ 0.02989117, -0.02771798,  0.05270961, ...,  0.01695084,
        -0.08368737,  0.05685841],
       [ 0.00080342, -0.02713353,  0.04528342, ..., -0.00164107,
        -0.04122982,  0.04933701],
       ...,
       [ 0.00441075, -0.01405443,  0.05610869, ..., -0.00937568,
        -0.03830218,  0.05352857],
       [ 0.00675062, -0.06034467,  0.04007325, ..., -0.02216943,
        -0.0787768 ,  0.08100744],
       [ 0.03179489, -0.06424854,  0.01230332, ..., -0.00732387,
        -0.02857859,  0.0742462 ]])

Find the similarity between each viewed product and all the unviewed products.

In [22]:
similarity_matrix = cosine_similarity(vectors_recently_viewed, vectors_not_viewed)
top_ids= []
for row in similarity_matrix:
  top_id = np.argmax(row)
  top_ids.append(top_id)

most_similar_product_ids=list(df_not_viewed.iloc[top_ids].product_id)
most_similar_product_ids

['P854', 'P1061', 'P1705', 'P733', 'P1327', 'P1705', 'P1501', 'P314']

### Task 5 - Recommend products based on the searched products

Let's update the status of the top similar products to `recommended`.

In [24]:
df.loc[df.product_id.isin(most_similar_product_ids),'product_status'] = 'recommended'
df[df.product_status=='recommended']

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
314,P314,8 oz. #230F-7 Florence Brown Semi-Gloss Enamel...,Introducing the best of BEHR Paint. Featuring ...,recommended,8 oz. #230F-7 Florence Brown Semi-Gloss Enamel...,"[-0.003989859018474817, -0.060487765818834305,...",0.493714,-0.052179
733,P733,5 gal. #N440-1 Streetwise Extra Durable Semi-G...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #N440-1 Streetwise Extra Durable Semi-G...,"[0.009646818973124027, -0.017993971705436707, ...",0.459837,-0.004503
854,P854,1 qt. #N460-1 Evening White Satin Enamel Inter...,Love your space like never before with the hig...,recommended,1 qt. #N460-1 Evening White Satin Enamel Inter...,"[0.04255978390574455, -0.019562887027859688, 0...",0.491744,-0.053056
1061,P1061,1 gal. #MQ1-28 Orange Flambe One-Coat Hide Egg...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 gal. #MQ1-28 Orange Flambe One-Coat Hide Egg...,"[0.01663183607161045, -0.026091130450367928, 0...",0.496061,-0.063851
1327,P1327,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,"[0.0505719892680645, -0.01987646520137787, 0.0...",0.468602,-0.063277
1501,P1501,1 gal. #S-H-620 Midnight Sky Semi-Gloss Enamel...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 gal. #S-H-620 Midnight Sky Semi-Gloss Enamel...,"[0.020269175991415977, -0.041455335915088654, ...",0.492105,-0.039515
1705,P1705,5 gal. #310D-4 Gold Buff Extra Durable Satin E...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #310D-4 Gold Buff Extra Durable Satin E...,"[-0.002052590949460864, -0.012245520949363708,...",0.460866,-0.032162


Let's visualize the recommended products.

In [25]:
px.scatter(df, x='pc1', y='pc2', color= 'product_status', hover_data='title')