<a href="https://www.kaggle.com/code/sacrum/e-commerce-products-search-engine-using-qdrant?scriptVersionId=162650913" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

### Pricegram
`Pricegram` is a price comparison platform that scraps data from eminent ecomerce stores in Pakistan to compare different products.

### Qdrant
`Qdrant` is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service.

In this project we have used `qdrant` to develop a search engine for `pricegram`.

# Dataset

## Loading

In [1]:
import pandas as pd

# path to kaggle downloaded data
DATA_PATH = "/kaggle/input/e-commerce-products-search-engine-recommendation/data.csv"

# load dataset in pandas dataframe
df = pd.read_csv(DATA_PATH)

# setting the index by product's id
df = df.set_index("id")

# print shape of dataframe
print("Shape of DataFrame:", df.shape)

# print first 10 rows in dataframe
print("First 10 rows:")
df.head(10)

Shape of DataFrame: (1666, 21)
First 10 rows:


Unnamed: 0_level_0,slug,title,imgs,brand,category,vendor,used,address,availability,currency,...,discounted_price,specifications,description,delivery_fee,delivery_details,warranty,warranty_type,average_rating,num_ratings,reviews
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,https://www.mega.pk/mobiles_products/23522/Not...,Nothing Phone 1 8GB RAM 256GB Storage Non PTA ...,['https://www.mega.pk/items_images/Nothing+Pho...,,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '8GB', 'Memory quantity': '', 'Interna...",,,,,,,,[]
1,https://www.mega.pk/mobiles_products/23458/Opp...,Oppo F21 Pro 8GB Ram 128GB Storage 5G PTA Appr...,['https://www.mega.pk/items_images/Oppo+F21+Pr...,OPPO,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '8gb', 'Memory quantity': '', 'Interna...",,,,,,,,[]
2,https://www.mega.pk/mobiles_products/24393/Tec...,Tecno Spark 10,['https://www.mega.pk/items_images/Tecno+Spark...,Tecno,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",Coming Soon,PKR,...,,"{'RAM': '4GB,8GB', 'Memory quantity': '', 'Int...",,,,1 year,,,,[]
3,https://www.mega.pk/mobiles_products/24259/Viv...,Vivo V27 5G,['https://www.mega.pk/items_images/Vivo+V27+5G...,Vivo,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",Coming Soon,PKR,...,,"{'RAM': '8GB,12GB', 'Memory quantity': '', 'In...",,,,1 year,,,,[]
4,https://www.mega.pk/mobiles_products/24204/App...,Apple Iphone 15 Pro Max,['https://www.mega.pk/items_images/Apple+Iphon...,Apple,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",Coming Soon,PKR,...,,"{'RAM': '8GB', 'Memory quantity': '', 'Interna...",,,,,,,,[]
5,https://www.mega.pk/mobiles_products/24114/Rea...,Realme GT3,['https://www.mega.pk/items_images/Realme+GT3_...,Realme,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",Coming Soon,PKR,...,,"{'RAM': '8GB,12GB,16GB', 'Memory quantity': ''...",,,,1 year,,,,[]
6,https://www.mega.pk/mobiles_products/24418/Tec...,Sparx S9 2GB RAM 32GB Storage PTA Approved,['https://www.mega.pk/items_images/Sparx+S9+2G...,Sparx,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '2 GB', 'Memory quantity': '', 'Intern...",,,,1 year,,,,[]
7,https://www.mega.pk/mobiles_products/24417/Spa...,Sparx S6 2GB RAM 32GB Storage,['https://www.mega.pk/items_images/Sparx+S6+2G...,Sparx,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '2 GB', 'Memory quantity': '', 'Intern...",,,,1 year,,,,[]
8,https://www.mega.pk/mobiles_products/24412/Tec...,Tecno Pova Neo 2 4GB RAM 64GB Storage PTA Appr...,['https://www.mega.pk/items_images/Tecno+Pova+...,Tecno,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '4GB', 'Memory quantity': '', 'Interna...",,,,1 year,,,,[]
9,https://www.mega.pk/mobiles_products/24411/Viv...,Vivo Y73 8GB RAM 128GB Storage PTA Approved,['https://www.mega.pk/items_images/Vivo+Y73+8G...,Vivo,Mobile,MEGA.PK,0,"Office 11, 12, 14 Basement Ahmed Center, I-8 M...",,PKR,...,,"{'RAM': '8 GB', 'Memory quantity': '', 'Intern...",,,,1 year,,,,[]


## Preparing Columns
Some colums are stored as a string in csv file so you need to be converted to python datatypes

In [2]:
# columns that have been dumped as strings but are python lists and dictionaries
df['reviews'] = df['reviews'].fillna(str([])).map(eval)
df['imgs'] = df['imgs'].map(eval)
df['specifications'] = df['specifications'].map(eval)

## Concatenating Columns
To create a document to perform a search operation on, we need to merge all the columns that can have relevant keywords and semantic meaning impertinent to the search query

In [3]:
# this function converts the row will all columns with unstructred data into plain sentences

def convert_to_sentences(row):
    sep = " __ "
    invalids = {"-", "N/A", "NA", "N\\A"}

    sents = []
    for k, v in row.items():

        if type(v) == list:
            if len(v) == 0: continue
            sent = k + " " + " ".join(v)

        elif type(v) == dict:
            if len(v) == 0: continue
            # sent = sep.join([f"{k2} {v2}" for k2, v2 in v.items() if v2 not in invalids])
            for k2, v2 in v.items():
                if v2 not in invalids:
                    sents.append(f"{k2} {v2}")
            continue

        else:
            if pd.isna(v): continue
            sent = f"{k} {v}"

        sent = sent.lower()
        sent = [i for i in sent.split(". ") if len(i) > 0]
        sents.extend(sent)
    
    sents = ". ".join(sents)
    sents = sents.lower()

    return sents

convert_to_sentences(df.iloc[0])

'slug https://www.mega.pk/mobiles_products/23522/nothing-phone-1-8gb-ram-256gb-storage-non-pta-5g-black.html. title nothing phone 1 8gb ram 256gb storage non pta 5g black . imgs https://www.mega.pk/items_images/nothing+phone+1+8gb+ram+256gb+storage+non+pta+5g+black+price+in+pakistan%2c+specifications%2c+features_-_23522.webp. category mobile. vendor mega.pk. used 0. address office 11, 12, 14 basement ahmed center, i-8 markaz, islamabad, pakistan. currency pkr. original_price 129999.0. ram 8gb. memory quantity . internal storage space 256gb. main camera pixels 50 mp, f/1.9, 24mm (wide), 1/1.56. battery capacity li-po 4500 mah, non-removable. screen size 6.55 inches. 5g support yes. finger print yes. display technology oled, 1b colors, 120hz, hdr10+, 500 nits (typ), 700 nits (peak). display 6.55 inches oled, 1b colors. number of colours 1b. scratch resistant display . screen resolution 1080 x 2400 pixels. pixel density 402 ppi. dual screens . sd card yes. sdio . compatible memory cards 1

In [4]:
df['sentences'] = df.apply(convert_to_sentences, axis=1)

# Qdrant

## Installing with Pip
Here we are installing qdrant with `fastembed` which is a fast, accurate, lightweight python library to make state of the art Embedding

In [5]:
!pip install qdrant-client[fastembed]

Collecting qdrant-client[fastembed]
  Downloading qdrant_client-1.7.3-py3-none-any.whl.metadata (9.3 kB)
Collecting grpcio-tools>=1.41.0 (from qdrant-client[fastembed])
  Downloading grpcio_tools-1.60.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.2 kB)
Collecting httpx>=0.14.0 (from httpx[http2]>=0.14.0->qdrant-client[fastembed])
  Downloading httpx-0.26.0-py3-none-any.whl.metadata (7.6 kB)
Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client[fastembed])
  Downloading portalocker-2.8.2-py3-none-any.whl.metadata (8.5 kB)
Collecting fastembed==0.1.1 (from qdrant-client[fastembed])
  Downloading fastembed-0.1.1-py3-none-any.whl.metadata (3.8 kB)
Collecting onnxruntime<2.0,>=1.15 (from fastembed==0.1.1->qdrant-client[fastembed])
  Downloading onnxruntime-1.17.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.2 kB)
Collecting tokenizers<0.14,>=0.13 (from fastembed==0.1.1->qdrant-client[fastembed])
  Downloading tokenizers-0

## Preparing Inputs
We can save a point in vector database by providing
1. `id`: It will be used to give a point a unique indentification
2. `document`: This is where the vectory similarity is performed
3. `metadata`: We can perform filtering and other conditions for a point in vector space using this metadata

In [6]:
ids = df.index
documents = df['sentences']
metadata = df[['title', 'brand', 'category', 'vendor', 'used', 'original_price', 'discounted_price']].fillna("").to_dict(orient='records')

## Uploading to Vector Database

In [7]:
# Name of Qdrant Collection for saving vectors
QD_COLLECTION_NAME = "collection_name"

In [8]:
from qdrant_client import QdrantClient

client = QdrantClient(":memory:")

client.add(
    collection_name=QD_COLLECTION_NAME,
    ids=ids,
    documents=documents,
    metadata=metadata,
)

print("Completed")

100%|██████████| 77.7M/77.7M [00:01<00:00, 43.8MiB/s]


Completed


## Using the Search Engine
Now we will test the search engine to see how accurate and relevant results it can come up with

In [9]:
# This function displays the results understandable output

def display_results(results):
    for i, result in enumerate(results):
        print()
        print(f"{i+1})")
        for k, v in result.metadata.items():
            if k != "document":
                print(f"{k.capitalize()}: {v}")

In [10]:
results = client.query(
    collection_name=QD_COLLECTION_NAME,
    query_text="amd ryzen laptops",
    limit=5
)
display_results(results)


1)
Title: Dell Inspiron 3515 15.6 inches AMD Ryzen 5 3450U (8GB-256GB)
Brand: 
Category: Laptop
Vendor: PriceOye
Used: 0
Original_price: 145000.0
Discounted_price: 134499.0

2)
Title: HP Laptop EQ2180AU 15.6 Inches AMD Ryzen 5 (8GB RAM - 512GBSSD)
Brand: 
Category: Laptop
Vendor: PriceOye
Used: 0
Original_price: 175000.0
Discounted_price: 171499.0

3)
Title: Dell Laptop Inspiron 5515 15.6 inches AMD Ryzen 7 (8GB RAM - 512GB SSD)
Brand: 
Category: Laptop
Vendor: PriceOye
Used: 0
Original_price: 245499.0
Discounted_price: 191499.0

4)
Title: ASUS TUF Gaming A15 FA507R AMD Ryzen 7 8GB RAM 512GB SSD 4GB RTX 3050Ti Windows 11 Mecha Grey 
Brand: Asus
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 294999.0
Discounted_price: 

5)
Title: Asus M515U Ryzen 5
Brand: Asus
Category: Laptop
Vendor: Paklap
Used: 0
Original_price: 139900.0
Discounted_price: 


In [11]:
from qdrant_client.http.models import Filter, FieldCondition, MatchValue

# searching with a condition: products only belonging to vendor "MEGA.PK"
results = client.query(
    collection_name=QD_COLLECTION_NAME,
    query_text="gaming laptops",
    query_filter=Filter(
        must=[
            FieldCondition(
                key="vendor",
                match=MatchValue(value='MEGA.PK')
            )
        ]
    ),
    limit=5
)

display_results(results)


1)
Title: Asus ROG ZEPHYRUS 16 GU603ZW GAMING Core i9 12th Generation 16GB RAM 1TB SSD 8GB NVIDIA RTX 3070Ti Windows 11 
Brand: Asus
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 650000.0
Discounted_price: 

2)
Title: Dell G15 5520 Gaming Core i7 12th Generation 16GB RAM 512GB SSD 6GB NVIDIA RTX3060 Windows 11 
Brand: Dell
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 376999.0
Discounted_price: 

3)
Title: Dell G15 5520 Gaming Core i5 12th Generation 8GB RAM 256GB SSD 4GB NVIDIA RTX3050 DOS 
Brand: Dell
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 247999.0
Discounted_price: 

4)
Title: Hp Victus Gaming 15 FB0028NR AMD Ryzen 7 16GB RAM 512GB SSD 4GB RTX 3050Ti Windows 11 
Brand: HP
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 264999.0
Discounted_price: 

5)
Title: ASUS TUF Gaming A15 FA507R AMD Ryzen 7 8GB RAM 512GB SSD 4GB RTX 3050Ti Windows 11 Mecha Grey 
Brand: Asus
Category: Laptop
Vendor: MEGA.PK
Used: 0
Original_price: 294999.0
Discou

# Explore More

- Create Search Engine in 5 minutes using Qdrant on Quora Quesiton Pair Dataset
    - Read this article on Medium: [Build a search engine in 5 minutes using Qdrant](https://medium.com/@raoarmaghanshakir040/build-a-search-engine-in-5-minutes-using-qdrant-f43df4fbe8d1)
    - See the implementation in Kaggle Notebook: [Quora Search Engine Using Qdrant:](https://www.kaggle.com/code/sacrum/quora-search-engine-using-qdrant)
- [Qdrant](https://qdrant.tech)
- [Qdrant Documentation](https://qdrant.tech/documentation/)
- [Qdrant Python Client Documentation](https://python-client.qdrant.tech)
- [Pricegram](https://github.com/Me-AU/pricegram)
