# RAG Fundamentals for Fashion Search

**Project:** AI Fashion Assistant v2.2  
**Focus:** Retrieval-Augmented Generation from scratch  
**Author:** Hatice Baydemir  
**Date:** January 2, 2026

---

## What is RAG?

**Retrieval-Augmented Generation** combines:
1. **Retrieval:** Find relevant documents from knowledge base
2. **Augmentation:** Add retrieved context to LLM prompt
3. **Generation:** LLM generates answer using context

**Why RAG for Fashion?**
- Product catalog as knowledge base (44,417 products)
- Natural language queries ("summer dress for beach wedding")
- Contextual recommendations with reasoning
- Explainable results (cite specific products)

---

## Learning Objectives

‚úÖ Understand RAG architecture  
‚úÖ Implement RAG from scratch (no frameworks)  
‚úÖ Apply to fashion product search  
‚úÖ Evaluate retrieval quality  

---

## 1. Setup & Dependencies

In [1]:
from google.colab import drive
import os

drive.mount('/content/drive')
os.chdir('/content/drive/MyDrive/ai_fashion_assistant_v2')

print('‚úÖ Drive mounted')
print(f'üìÅ {os.getcwd()}')

Mounted at /content/drive
‚úÖ Drive mounted
üìÅ /content/drive/MyDrive/ai_fashion_assistant_v2


In [2]:
# Install minimal dependencies
!pip install -q groq sentence-transformers faiss-cpu

print('‚úÖ Packages installed!')
print('   - GROQ (LLM)')
print('   - Sentence Transformers (embeddings)')
print('   - FAISS (vector search)')

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m138.3/138.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.8/23.8 MB[0m [31m85.0 MB/s[0m eta [36m0:00:00[0m
[?25h‚úÖ Packages installed!
   - GROQ (LLM)
   - Sentence Transformers (embeddings)
   - FAISS (vector search)


In [3]:
# Imports
import numpy as np
import pandas as pd
from typing import List, Dict
from groq import Groq
from sentence_transformers import SentenceTransformer
import faiss

print('‚úÖ Imports successful')



‚úÖ Imports successful


## 2. Load Fashion Data

In [4]:
# Load metadata
metadata = pd.read_csv('data/processed/meta_ssot.csv')

# Load pre-computed embeddings
text_emb = np.load('v2.0-baseline/embeddings/text/mpnet_768d.npy')
image_emb = np.load('v2.0-baseline/embeddings/image/clip_image_768d_normalized.npy')

# Normalize text embeddings for cosine similarity
text_emb_norm = text_emb / np.linalg.norm(text_emb, axis=1, keepdims=True)

print(f'‚úÖ Data loaded!')
print(f'   Products: {len(metadata):,}')
print(f'   Text embeddings: {text_emb.shape}')
print(f'   Image embeddings: {image_emb.shape}')

‚úÖ Data loaded!
   Products: 44,417
   Text embeddings: (44417, 768)
   Image embeddings: (44417, 768)


In [5]:
# Inspect sample product
sample = metadata.iloc[0]
print('üì¶ Sample Product:')
print(f'   Name: {sample["productDisplayName"]}')
print(f'   Category: {sample["masterCategory"]}')
print(f'   Type: {sample["articleType"]}')
print(f'   Color: {sample["baseColour"]}')
print(f'   Gender: {sample["gender"]}')
print(f'   Season: {sample["season"]}')

üì¶ Sample Product:
   Name: Turtle Check Men Navy Blue Shirt
   Category: Apparel
   Type: Shirts
   Color: Navy Blue
   Gender: Men
   Season: Fall


## 3. Create Product Documents

Transform structured data into text documents for RAG.

In [6]:
def create_product_document(row) -> str:
    """Convert product metadata to text document"""
    return f"""{row['productDisplayName']}.
Category: {row.get('masterCategory', 'Unknown')}.
Type: {row.get('articleType', 'Unknown')}.
Color: {row.get('baseColour', 'Unknown')}.
Gender: {row.get('gender', 'Unisex')}.
Season: {row.get('season', 'All')}.
Usage: {row.get('usage', 'Casual')}."""

# Create documents for all products
product_docs = [create_product_document(row) for _, row in metadata.iterrows()]

print(f'‚úÖ Created {len(product_docs):,} product documents')
print(f'\nüìÑ Sample document:')
print(product_docs[0])

‚úÖ Created 44,417 product documents

üìÑ Sample document:
Turtle Check Men Navy Blue Shirt. 
Category: Apparel. 
Type: Shirts. 
Color: Navy Blue. 
Gender: Men. 
Season: Fall. 
Usage: Casual.


## 4. Build FAISS Vector Index

FAISS (Facebook AI Similarity Search) enables fast nearest neighbor search.

In [7]:
print('Building FAISS index...')

# Create FAISS index (Inner Product = Cosine Similarity for normalized vectors)
dimension = text_emb_norm.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(text_emb_norm.astype('float32'))

print(f'‚úÖ FAISS index built!')
print(f'   Dimension: {dimension}d')
print(f'   Vectors: {index.ntotal:,}')
print(f'   Index type: IndexFlatIP (cosine similarity)')

Building FAISS index...
‚úÖ FAISS index built!
   Dimension: 768d
   Vectors: 44,417
   Index type: IndexFlatIP (cosine similarity)


## 5. Setup GROQ LLM

GROQ provides fast inference for Llama models.

In [13]:
# GROQ API configuration
GROQ_API_KEY = "YOUR_GROQ_API_KEY_HERE"  # ‚ö†Ô∏è REPLACE THIS!

client = Groq(api_key=GROQ_API_KEY)

def generate_answer(prompt: str, max_tokens: int = 500) -> str:
    """Generate answer using GROQ LLM"""
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,  # Low temperature for consistency
        max_tokens=max_tokens
    )
    return response.choices[0].message.content

print('‚úÖ GROQ LLM configured')
print('   Model: Llama-3.3-70B-Versatile')
print('   Temperature: 0.1')
print('‚ö†Ô∏è  Remember to add your API key above!')

‚úÖ GROQ LLM configured
   Model: Llama-3.3-70B-Versatile
   Temperature: 0.1
‚ö†Ô∏è  Remember to add your API key above!


## 6. RAG Pipeline Implementation

Three-stage pipeline: Retrieve ‚Üí Augment ‚Üí Generate

In [14]:
# Initialize encoder (done once)
encoder = SentenceTransformer(
    'sentence-transformers/paraphrase-multilingual-mpnet-base-v2'
)
print('‚úÖ Sentence encoder loaded')

‚úÖ Sentence encoder loaded


In [15]:
def rag_pipeline(query: str, k: int = 5) -> Dict:
    """
    Complete RAG pipeline for fashion product search.

    Args:
        query: Natural language query
        k: Number of products to retrieve

    Returns:
        Dict with query, answer, retrieved products, scores
    """

    # STAGE 1: RETRIEVE
    # Encode query
    query_emb = encoder.encode([query])[0]
    query_emb = query_emb / np.linalg.norm(query_emb)  # Normalize

    # Search FAISS index
    scores, indices = index.search(
        query_emb.reshape(1, -1).astype('float32'),
        k
    )

    # Get retrieved products
    retrieved_products = [product_docs[i] for i in indices[0]]

    # STAGE 2: AUGMENT
    # Create context from retrieved products
    context = "\n\n".join([
        f"{i+1}. {prod}"
        for i, prod in enumerate(retrieved_products)
    ])

    # Create RAG prompt
    prompt = f"""You are a fashion shopping assistant. Recommend products based on the user's query.

Available Products:
{context}

User Query: {query}

Recommendation (be specific, mention product names):"""

    # STAGE 3: GENERATE
    answer = generate_answer(prompt)

    return {
        'query': query,
        'answer': answer,
        'retrieved_products': retrieved_products,
        'scores': scores[0].tolist(),
        'indices': indices[0].tolist()
    }

print('‚úÖ RAG pipeline function ready!')

‚úÖ RAG pipeline function ready!


## 7. Test RAG System

Run test queries to validate the system.

In [16]:
# Single test query
test_query = "I need a blue shirt for summer"

print(f'üîç Query: "{test_query}"')
print('\nProcessing...')

result = rag_pipeline(test_query, k=5)

print('\n' + '='*70)
print('üìä RETRIEVAL:')
print(f'   Top match score: {result["scores"][0]:.3f}')
print(f'   Products retrieved: {len(result["retrieved_products"])}')

print('\nüìÑ Retrieved Products:')
for i, prod in enumerate(result['retrieved_products'][:3], 1):
    print(f'   {i}. {prod[:80]}...')

print('\nü§ñ RAG ANSWER:')
print(result['answer'])
print('='*70)

üîç Query: "I need a blue shirt for summer"

Processing...

üìä RETRIEVAL:
   Top match score: 0.765
   Products retrieved: 5

üìÑ Retrieved Products:
   1. Scullers For Her Check Blue Shirt. 
Category: Apparel. 
Type: Shirts. 
Color: Bl...
   2. s.Oliver Men's All you Need Blue T-shirt. 
Category: Apparel. 
Type: Tshirts. 
C...
   3. Scullers For Her Striped Blue Shirt. 
Category: Apparel. 
Type: Shirts. 
Color: ...

ü§ñ RAG ANSWER:
Based on your query, I'd be happy to recommend some blue shirts for summer. Here are a few options:

For Women: You may like the Scullers For Her Check Blue Shirt or the Scullers For Her Striped Blue Shirt, both of which are perfect for casual summer wear.

For Men: I'd suggest the s.Oliver Men's All you Need Blue T-shirt, which is a great option for casual summer outings.

For Kids or those who prefer unisex options: You can consider the Tantra Kid's Cool Royal Blue Kidswear or the Tantra Kid's Unisex Caution Blue Kidswear, both of which are suitable 

In [17]:
# Multiple test queries
test_queries = [
    "Show me casual shoes for men",
    "What red dresses do you have?",
    "Winter jackets",
    "Formal wear for office"
]

print('üß™ RUNNING MULTIPLE TESTS')
print('='*70)

for i, query in enumerate(test_queries, 1):
    print(f'\n[{i}/{len(test_queries)}] Query: "{query}"')

    result = rag_pipeline(query, k=5)

    print(f'   Score: {result["scores"][0]:.3f}')
    print(f'   Answer preview: {result["answer"][:100]}...')
    print('-'*70)

print('\n‚úÖ All tests complete!')

üß™ RUNNING MULTIPLE TESTS

[1/4] Query: "Show me casual shoes for men"
   Score: 0.827
   Answer preview: Based on your query, I'd be happy to recommend some casual shoes for men. Here are a few options:

1...
----------------------------------------------------------------------

[2/4] Query: "What red dresses do you have?"
   Score: 0.778
   Answer preview: We have a variety of beautiful red dresses available for you. You can choose from the following opti...
----------------------------------------------------------------------

[3/4] Query: "Winter jackets"
   Score: 0.699
   Answer preview: Based on your query for winter jackets, I would recommend the following products:

1. Just Natural M...
----------------------------------------------------------------------

[4/4] Query: "Formal wear for office"
   Score: 0.574
   Answer preview: Based on your query for formal wear for the office, I would recommend the Avirate Black Formal Dress...
------------------------------------------

## 8. Summary

**What we built:**
- ‚úÖ Complete RAG pipeline from scratch
- ‚úÖ FAISS vector search (44,417 products)
- ‚úÖ GROQ LLM integration
- ‚úÖ Natural language fashion recommendations

**Key components:**
1. **Retrieve:** FAISS finds similar products using embeddings
2. **Augment:** Context injected into LLM prompt
3. **Generate:** LLM creates natural language recommendations

**Next steps:**
- Notebook 2: Production-ready pipeline class
- Notebook 3: Comprehensive evaluation

---

**Framework-agnostic implementation - Full control, minimal dependencies!** üöÄ