# Semantic Spotter Project - Search Product from Myntra Fashion Database using RAG in LangChain

The "Semantic Spotter Project" is an application designed to facilitate product searches from the Myntra Fashion Database using Retrieval-Augmented Generation (RAG) in LangChain. The project aims to enhance product discovery by leveraging advanced language models to provide contextually relevant product recommendations based on user queries.

By combining vector databases (using ChromaDB) and OpenAI embeddings, the system stores fashion product descriptions and metadata (such as features, price, and brand). This allows the model to efficiently search and retrieve the most relevant products based on semantic similarity.

**The key steps in the project involve:**

1. **Data Collection:** Extracting product data (such as descriptions and features) from the Myntra fashion database.

2. **Embedding Generation:** Using OpenAI embeddings to convert product information into dense vector representations.

3. **Vector Store:** Storing the embeddings and metadata in ChromaDB for fast retrieval.

4. **Query Processing:** Allowing users to input natural language queries, which are then processed using the LangChain framework to find the most relevant products.

5. **RAG Integration:** The system utilizes Retrieval-Augmented Generation (RAG) to augment the response generation by retrieving the top product matches based on the semantic relevance of the query.

By using this approach, the project aims to improve the search experience for Myntra’s users by delivering more accurate and context-aware product recommendations, making the fashion search process more intuitive and efficient.

In [39]:
# Install required libraries
!pip install -qU langchain langchain-openai langchain-community tiktoken

In [40]:
# Import required libraries
import pandas as pd
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document

## Explatory Data Analysis

In [3]:
# Mounting google drive and importing Myntra Fashion dataset
# Dataset can be downloaded from the link : https://www.kaggle.com/datasets/manishmathias/myntra-fashion-dataset
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [4]:
# drive path
drive_path ="/content/drive/MyDrive/semantic_spotter_project/"

In [5]:
# Loading dataset in dataframe
df = pd.read_csv(drive_path+"Myntra Fasion Clothing.csv")
df.head(5)

  df = pd.read_csv(drive_path+"Myntra Fasion Clothing.csv")


Unnamed: 0,URL,Product_id,BrandName,Category,Individual_category,category_by_Gender,Description,DiscountPrice (in Rs),OriginalPrice (in Rs),DiscountOffer,SizeOption,Ratings,Reviews
0,https://www.myntra.com/jeans/roadster/roadster...,2296012,Roadster,Bottom Wear,jeans,Men,roadster men navy blue slim fit mid rise clean...,824.0,1499.0,45% OFF,"28, 30, 32, 34, 36",3.9,999.0
1,https://www.myntra.com/track-pants/locomotive/...,13780156,LOCOMOTIVE,Bottom Wear,track-pants,Men,locomotive men black white solid slim fit tra...,517.0,1149.0,55% OFF,"S, M, L, XL",4.0,999.0
2,https://www.myntra.com/shirts/roadster/roadste...,11895958,Roadster,Topwear,shirts,Men,roadster men navy white black geometric print...,629.0,1399.0,55% OFF,"38, 40, 42, 44, 46, 48",4.3,999.0
3,https://www.myntra.com/shapewear/zivame/zivame...,4335679,Zivame,Lingerie & Sleep Wear,shapewear,Women,zivame women black saree shapewear zi3023core0...,893.0,1295.0,31% OFF,"S, M, L, XL, XXL",4.2,999.0
4,https://www.myntra.com/tshirts/roadster/roadst...,11690882,Roadster,Western,tshirts,Women,roadster women white solid v neck pure cotton ...,,599.0,35% OFF,"XS, S, M, L, XL",4.2,999.0


In [6]:
# Size of the data
df.shape

(526564, 13)

In [7]:
# Checking number of null values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526564 entries, 0 to 526563
Data columns (total 13 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   URL                    526564 non-null  object 
 1   Product_id             526564 non-null  int64  
 2   BrandName              526564 non-null  object 
 3   Category               526564 non-null  object 
 4   Individual_category    526564 non-null  object 
 5   category_by_Gender     526564 non-null  object 
 6   Description            526564 non-null  object 
 7   DiscountPrice (in Rs)  333406 non-null  float64
 8   OriginalPrice (in Rs)  526564 non-null  float64
 9   DiscountOffer          452258 non-null  object 
 10  SizeOption             526564 non-null  object 
 11  Ratings                190412 non-null  float64
 12  Reviews                190412 non-null  float64
dtypes: float64(4), int64(1), object(8)
memory usage: 52.2+ MB


In [8]:
# Changing columns name
df = df.rename(columns={'URL': 'Url',
                        'Product_id': 'ProductId',
                        'Individual_category': 'SubCategory',
                        'category_by_Gender': 'Gender',
                        'DiscountPrice (in Rs)': 'DiscountPrice',
                        'OriginalPrice (in Rs)': 'OriginalPrice'
                       }
              )
df.columns

Index(['Url', 'ProductId', 'BrandName', 'Category', 'SubCategory', 'Gender',
       'Description', 'DiscountPrice', 'OriginalPrice', 'DiscountOffer',
       'SizeOption', 'Ratings', 'Reviews'],
      dtype='object')

In [9]:
# Dropping NA values
df=  df.dropna()
df.shape

(117541, 13)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 117541 entries, 0 to 188769
Data columns (total 13 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Url            117541 non-null  object 
 1   ProductId      117541 non-null  int64  
 2   BrandName      117541 non-null  object 
 3   Category       117541 non-null  object 
 4   SubCategory    117541 non-null  object 
 5   Gender         117541 non-null  object 
 6   Description    117541 non-null  object 
 7   DiscountPrice  117541 non-null  float64
 8   OriginalPrice  117541 non-null  float64
 9   DiscountOffer  117541 non-null  object 
 10  SizeOption     117541 non-null  object 
 11  Ratings        117541 non-null  float64
 12  Reviews        117541 non-null  float64
dtypes: float64(4), int64(1), object(8)
memory usage: 12.6+ MB


In [11]:
# Cleaned dataframe
df.head(2)

Unnamed: 0,Url,ProductId,BrandName,Category,SubCategory,Gender,Description,DiscountPrice,OriginalPrice,DiscountOffer,SizeOption,Ratings,Reviews
0,https://www.myntra.com/jeans/roadster/roadster...,2296012,Roadster,Bottom Wear,jeans,Men,roadster men navy blue slim fit mid rise clean...,824.0,1499.0,45% OFF,"28, 30, 32, 34, 36",3.9,999.0
1,https://www.myntra.com/track-pants/locomotive/...,13780156,LOCOMOTIVE,Bottom Wear,track-pants,Men,locomotive men black white solid slim fit tra...,517.0,1149.0,55% OFF,"S, M, L, XL",4.0,999.0


In [41]:
# Dataset is very large so we are using only top 5000 rows
df2 =df[:5000]
df2.shape

(5000, 13)

In [42]:
#Preparaing data for embedding
documents = []
for index, row in df2.iterrows():
    doc_content = row['Description']
    doc_metadata = {
        'id': row['ProductId'],
        'BrandName': row['BrandName'],
        'Category': row['Category'],
        'SubCategory': row['SubCategory'],
        'Gender': row['Gender'],
        'DiscountPrice': row['DiscountPrice'],
        'OriginalPrice': row['OriginalPrice'],
        'DiscountOffer': row['DiscountOffer'],
        'Size': row['SizeOption'],
        'Ratings': row['Ratings'],
        'Reviews': row['Reviews'],
    }
    documents.append(Document(page_content=doc_content, metadata=doc_metadata))

## Initializing LLM

In [15]:
# Getting OpenAI Key from google colab
from google.colab import userdata
openai_key = userdata.get('open_ai')

In [34]:
# Initialize OpenAI LLM
llm = OpenAI(openai_api_key=openai_key)

## Create Database store and storing embeddings

In [43]:
# Store documents in ChromaDB
embedding = OpenAIEmbeddings(openai_api_key=openai_key)
vector_store = Chroma.from_documents(documents, embedding)

## Initializing RAG using LangChain

In [44]:
# Set up the retriever for ChromaDB (to retrieve top 3 results)
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [45]:
# Function to recommend top 3 products based on user query
def recommend_products(query):
    # Retrieve the top 3 results from ChromaDB
    results = retriever.get_relevant_documents(query)

    for idx, result in enumerate(results, start=1):
        Description = result.page_content
        ProductId = result.metadata.get('id', 'N/A')
        BrandName = result.metadata.get('BrandName', 'N/A')
        Category = result.metadata.get('Category', 'N/A')
        SubCategory = result.metadata.get('SubCategory', 'N/A')
        Gender = result.metadata.get('Gender', 'N/A')
        DiscountPrice = result.metadata.get('DiscountPrice', 'N/A')
        OriginalPrice = result.metadata.get('OriginalPrice', 'N/A')
        DiscountOffer = result.metadata.get('DiscountOffer', 'N/A')
        Size = result.metadata.get('Size', 'N/A')
        Ratings = result.metadata.get('Ratings', 'N/A')
        Reviews = result.metadata.get('Reviews', 'N/A')

        print(f"Rank: {idx}")
        print(f"Product Id: {ProductId}")
        print(f"Description: {Description}")
        print(f"Brand Name: {BrandName}")
        print(f"Category: {Category}")
        print(f"Sub Category: {SubCategory}")
        print(f"Gender: {Gender}")
        print(f"Discount Price: {DiscountPrice}")
        print(f"Original Price: {OriginalPrice}")
        print(f"Discount Offer: {DiscountOffer}")
        print(f'Size: {Size}')
        print(f'Ratings: {Ratings}')
        print(f'Reviews: {Reviews}')
        print()
        print("=" * 50)

    return ""

## Testing Queries

In [46]:
# Query 1
query = "Show me the latest dresses in size M"
recommend_products(query)

Rank: 1
Product Id: 10996416
Description: myshka women navy blue  orange printed maxi a line dress
Brand Name: Myshka
Category: Indian Wear
Sub Category: dresses
Gender: Women
Discount Price: 899.0
Original Price: 1999.0
Discount Offer: 55% OFF
Size: S, M, L, XL, XXL
Ratings: 4.3
Reviews: 452.0

Rank: 2
Product Id: 9825443
Description: myshka women teal blue  grey printed a line dress
Brand Name: Myshka
Category: Indian Wear
Sub Category: dresses
Gender: Women
Discount Price: 629.0
Original Price: 1499.0
Discount Offer: 58% OFF
Size: S, M, L, XL, XXL
Ratings: 3.6
Reviews: 543.0

Rank: 3
Product Id: 2499823
Description: herenow women maroon printed maxi dress
Brand Name: HERE&NOW
Category: Plus Size
Sub Category: dresses
Gender: Women
Discount Price: 689.0
Original Price: 2299.0
Discount Offer: 70% OFF, Hurry*
Size: XS, S, M, L, XL, XXL
Ratings: 4.3
Reviews: 881.0



''

In [47]:
# Query 2
query = "Show me t-shirts for men in size L and color blue"
recommend_products(query)

Rank: 1
Product Id: 14311750
Description: louis philippe jeans men teal blue pure cotton t shirt
Brand Name: Louis Philippe Jeans
Category: Topwear
Sub Category: tshirts
Gender: Men
Discount Price: 539.0
Original Price: 899.0
Discount Offer: 40% OFF
Size: S, M, L, XL, XXL, 3XL
Ratings: 4.3
Reviews: 344.0

Rank: 2
Product Id: 13920490
Description: herenow men teal blue printed pure cotton t shirt
Brand Name: HERE&NOW
Category: Topwear
Sub Category: tshirts
Gender: Men
Discount Price: 399.0
Original Price: 999.0
Discount Offer: 60% OFF
Size: S, M, L, XL
Ratings: 4.0
Reviews: 553.0

Rank: 3
Product Id: 12787018
Description: herenow men teal blue solid round neck t shirt
Brand Name: HERE&NOW
Category: Topwear
Sub Category: tshirts
Gender: Men
Discount Price: 174.0
Original Price: 699.0
Discount Offer: 75% OFF
Size: S, M, L, XL
Ratings: 4.3
Reviews: 851.0



''

In [48]:
# Query 3
query = "Show me jeans for men under 2000"
recommend_products(query)

Rank: 1
Product Id: 8936551
Description: pepe jeans men blue printed brief 8904311301035
Brand Name: Pepe Jeans
Category: Inner Wear &  Sleep Wear
Sub Category: briefs
Gender: Men
Discount Price: 254.0
Original Price: 299.0
Discount Offer: 15% OFF
Size: S, M, L
Ratings: 4.3
Reviews: 445.0

Rank: 2
Product Id: 8936555
Description: pepe jeans men navy blue solid hipster briefs 8904311300830
Brand Name: Pepe Jeans
Category: Inner Wear &  Sleep Wear
Sub Category: briefs
Gender: Men
Discount Price: 228.0
Original Price: 269.0
Discount Offer: 15% OFF
Size: S, M, L
Ratings: 4.2
Reviews: 472.0

Rank: 3
Product Id: 8936487
Description: pepe jeans men black solid brief 8904311300755
Brand Name: Pepe Jeans
Category: Inner Wear &  Sleep Wear
Sub Category: briefs
Gender: Men
Discount Price: 228.0
Original Price: 269.0
Discount Offer: 15% OFF
Size: S, M, L
Ratings: 4.3
Reviews: 443.0



''

In [49]:
# Query 4
query = "Show me best summer collection for children"
recommend_products(query)

Rank: 1
Product Id: 9717187
Description: bhama couture blue  white striped a line pure cotton top
Brand Name: Bhama Couture
Category: Indian Wear
Sub Category: tops
Gender: Women
Discount Price: 899.0
Original Price: 1799.0
Discount Offer: 50% OFF
Size: S, M, L, XL
Ratings: 4.1
Reviews: 912.0

Rank: 2
Product Id: 9717187
Description: bhama couture blue  white striped a line pure cotton top
Brand Name: Bhama Couture
Category: Indian Wear
Sub Category: tops
Gender: Women
Discount Price: 899.0
Original Price: 1799.0
Discount Offer: 50% OFF
Size: S, M, L, XL
Ratings: 4.1
Reviews: 912.0

Rank: 3
Product Id: 10343555
Description: bhama couture women yellow  white bandhani print kurta with palazzos
Brand Name: Bhama Couture
Category: Indian Wear
Sub Category: kurta-sets
Gender: Women
Discount Price: 1403.0
Original Price: 3599.0
Discount Offer: 61% OFF
Size: S, M, L, XL, XXL
Ratings: 4.4
Reviews: 407.0



''

In [50]:
# Query 5
query = "Show me products for highest discounts"
recommend_products(query)

Rank: 1
Product Id: 1518326
Description: dupatta bazaar black embroidered chiffon dupatta
Brand Name: Dupatta Bazaar
Category: Indian Wear
Sub Category: dupatta
Gender: Women
Discount Price: 476.0
Original Price: 899.0
Discount Offer: 47% OFF
Size: Onesize
Ratings: 4.3
Reviews: 418.0

Rank: 2
Product Id: 1763993
Description: dollar bigboss men multicoloured pack of 5 briefs mdbr 02 po5 1
Brand Name: Dollar Bigboss
Category: Inner Wear &  Sleep Wear
Sub Category: briefs
Gender: Men
Discount Price: 671.0
Original Price: 895.0
Discount Offer: 25% OFF
Size: S, M, L, XL, XXL, 3XL, 4XL
Ratings: 4.0
Reviews: 436.0

Rank: 3
Product Id: 12027362
Description: levis men assorted printed  pure cotton boxers 023 boxer shorts
Brand Name: Levis
Category: Inner Wear &  Sleep Wear
Sub Category: boxers
Gender: Men
Discount Price: 494.0
Original Price: 549.0
Discount Offer: 10% OFF
Size: S, M, L, XL
Ratings: 4.4
Reviews: 411.0



''