# 02 - Embeddings Comparison

This notebook compares two embedding approaches for RAG:

1. **OpenAI Embeddings** - High quality, API-based
2. **HuggingFace Embeddings** - Local, free, privacy-friendly

We'll create FAISS vector stores for both and compare performance.

**Prerequisites:**
- 01_setup_and_basics.ipynb completed
- OpenAI API key configured

**Duration:** ~8 minutes

**Outputs:**
- Vector stores saved to `data/vector_stores/`
- Performance comparison metrics

## 1. Setup

Load necessary modules and prepare documents/chunks from previous notebook.

In [1]:
import sys
sys.path.append('../..')

from shared.config import OPENAI_VECTOR_STORE_PATH, HF_VECTOR_STORE_PATH
from shared.loaders import load_and_split
from shared.utils import print_section_header, save_vector_store
import time

print_section_header("Loading Documents and Chunks")

# Load and split documents
docs, chunks = load_and_split(verbose=True)

print(f"\nâœ… Ready with {len(docs)} documents and {len(chunks)} chunks")

  from pydantic.v1.fields import FieldInfo as FieldInfoV1
USER_AGENT environment variable not set, consider setting it to identify your requests.



LOADING DOCUMENTS AND CHUNKS

Loading 4 documents from web...
  - https://python.langchain.com/docs/use_cases/question_answering/
  - https://python.langchain.com/docs/modules/data_connection/retrievers/
  - https://python.langchain.com/docs/modules/model_io/llms/
  - https://python.langchain.com/docs/use_cases/chatbots/
âœ“ Loaded 4 documents
âœ“ Added custom metadata to all documents
Splitting documents...
  - Chunk size: 1000
  - Chunk overlap: 200
âœ“ Created 120 chunks

  Sample chunk:
    - Length: 839 chars
    - Source: https://python.langchain.com/docs/use_cases/question_answering/
    - Preview: Build a RAG agent with LangChain - Docs by LangChainSkip to main contentWe've raised a $125M Series B to build the platform for agent engineering. Rea...

âœ… Ready with 4 documents and 120 chunks


## 2. OpenAI Embeddings

### Features
- Model: `text-embedding-3-small`
- Dimensions: 1536
- Cost: ~$0.02 per 1M tokens
- Quality: Excellent

### Use When
- Production quality required
- Budget available
- Internet connection reliable

In [2]:
from langchain_openai import OpenAIEmbeddings

print_section_header("OpenAI Embeddings")

# Initialize OpenAI embeddings
print("Initializing OpenAI embeddings...")
openai_embeddings = OpenAIEmbeddings()
print("âœ“ OpenAI embeddings initialized")

# Test with a sample query
test_query = "What is retrieval-augmented generation?"
print(f"\nTesting with query: '{test_query}'")

start_time = time.time()
test_embedding = openai_embeddings.embed_query(test_query)
elapsed = time.time() - start_time

print(f"\nResults:")
print(f"  Dimension: {len(test_embedding)}")
print(f"  Time: {elapsed:.3f}s")
print(f"  First 5 values: {[f'{v:.4f}' for v in test_embedding[:5]]}")


OPENAI EMBEDDINGS

Initializing OpenAI embeddings...
âœ“ OpenAI embeddings initialized

Testing with query: 'What is retrieval-augmented generation?'

Results:
  Dimension: 1536
  Time: 0.894s
  First 5 values: ['-0.0403', '-0.0036', '0.0001', '0.0009', '-0.0103']


## 3. HuggingFace Embeddings

### Features
- Model: `sentence-transformers/all-MiniLM-L6-v2`
- Dimensions: 384
- Cost: Free (runs locally)
- Quality: Very good

### Use When
- Privacy is critical
- Offline operation needed
- Cost is a constraint
- Development/testing

In [3]:
from langchain_huggingface import HuggingFaceEmbeddings
import os

print_section_header("HuggingFace Embeddings")

print("Initializing HuggingFace embeddings...")
print("(First run downloads model ~90MB - may take 1-2 minutes)")
print(f"Cache location: {os.path.expanduser('~/.cache/huggingface/')}\n")

try:
    hf_embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    print("âœ“ HuggingFace embeddings initialized")
    
    # Test with same query
    print(f"\nTesting with query: '{test_query}'")
    
    start_time = time.time()
    test_embedding_hf = hf_embeddings.embed_query(test_query)
    elapsed_hf = time.time() - start_time
    
    print(f"\nResults:")
    print(f"  Dimension: {len(test_embedding_hf)}")
    print(f"  Time: {elapsed_hf:.3f}s")
    print(f"  First 5 values: {[f'{v:.4f}' for v in test_embedding_hf[:5]]}")
    
except Exception as e:
    print(f"âœ— Error: {e}")
    print("\nTroubleshooting:")
    print("  1. Check internet connection (first run only)")
    print("  2. Verify disk space (~200MB needed)")
    print("  3. Try: pip install --upgrade sentence-transformers")
    raise


HUGGINGFACE EMBEDDINGS

Initializing HuggingFace embeddings...
(First run downloads model ~90MB - may take 1-2 minutes)
Cache location: /Users/gianlucamazza/.cache/huggingface/

âœ“ HuggingFace embeddings initialized

Testing with query: 'What is retrieval-augmented generation?'

Results:
  Dimension: 384
  Time: 0.310s
  First 5 values: ['-0.1110', '-0.0263', '-0.0579', '0.0598', '-0.0208']


## 4. Side-by-Side Comparison

In [4]:
from shared.utils import print_comparison_table
import numpy as np

print_section_header("Embeddings Comparison")

data = [
    ["Feature", "OpenAI", "HuggingFace"],
    ["Dimension", len(test_embedding), len(test_embedding_hf)],
    ["Time (s)", f"{elapsed:.3f}", f"{elapsed_hf:.3f}"],
    ["Mean", f"{np.mean(test_embedding):.4f}", f"{np.mean(test_embedding_hf):.4f}"],
    ["Std Dev", f"{np.std(test_embedding):.4f}", f"{np.std(test_embedding_hf):.4f}"],
    ["Cost", "Paid", "Free"],
    ["Privacy", "Cloud", "Local"]
]

print_comparison_table(data)

print("\nðŸ’¡ Key Takeaways:")
print("   - OpenAI: Higher dimension, cloud-based, paid")
print("   - HuggingFace: Lower dimension, local, free")
print("   - Both produce high-quality embeddings")
print("   - Choice depends on requirements and constraints")


EMBEDDINGS COMPARISON

Feature    OpenAI   HuggingFace  
---------------------------------
Dimension  1536     384          
Time (s)   0.894    0.310        
Mean       -0.0008  0.0002       
Std Dev    0.0255   0.0510       
Cost       Paid     Free         
Privacy    Cloud    Local        

ðŸ’¡ Key Takeaways:
   - OpenAI: Higher dimension, cloud-based, paid
   - HuggingFace: Lower dimension, local, free
   - Both produce high-quality embeddings
   - Choice depends on requirements and constraints


## 5. Create Vector Stores

Now we'll create FAISS vector stores for both embedding types. These will be saved for reuse in all subsequent notebooks.

In [5]:
from langchain_community.vectorstores import FAISS

print_section_header("Creating Vector Stores")

# Create OpenAI vector store
print("Creating FAISS vector store with OpenAI embeddings...")
start_time = time.time()
vectorstore_openai = FAISS.from_documents(chunks, openai_embeddings)
elapsed_openai = time.time() - start_time

print(f"âœ“ OpenAI vector store created in {elapsed_openai:.2f}s")
print(f"  - {len(chunks)} documents indexed")
print(f"  - Embedding dimension: 1536")

# Create HuggingFace vector store
print("\nCreating FAISS vector store with HuggingFace embeddings...")
start_time = time.time()
vectorstore_hf = FAISS.from_documents(chunks, hf_embeddings)
elapsed_hf = time.time() - start_time

print(f"âœ“ HuggingFace vector store created in {elapsed_hf:.2f}s")
print(f"  - {len(chunks)} documents indexed")
print(f"  - Embedding dimension: 384")

print("\nðŸ“Š Performance:")
print(f"   OpenAI: {elapsed_openai:.2f}s")
print(f"   HuggingFace: {elapsed_hf:.2f}s")
print(f"   Ratio: {elapsed_hf/elapsed_openai:.2f}x")


CREATING VECTOR STORES

Creating FAISS vector store with OpenAI embeddings...
âœ“ OpenAI vector store created in 1.19s
  - 120 documents indexed
  - Embedding dimension: 1536

Creating FAISS vector store with HuggingFace embeddings...
âœ“ HuggingFace vector store created in 0.54s
  - 120 documents indexed
  - Embedding dimension: 384

ðŸ“Š Performance:
   OpenAI: 1.19s
   HuggingFace: 0.54s
   Ratio: 0.46x


## 6. Save Vector Stores

**IMPORTANT:** We save the vector stores to disk to avoid re-embedding in every notebook. This:
- Saves time (~3-4 seconds per notebook)
- Reduces API costs (OpenAI charges per embedding)
- Ensures consistency across notebooks

In [6]:
print_section_header("Saving Vector Stores")

# Save OpenAI vector store
save_vector_store(vectorstore_openai, OPENAI_VECTOR_STORE_PATH, verbose=True)

# Save HuggingFace vector store
save_vector_store(vectorstore_hf, HF_VECTOR_STORE_PATH, verbose=True)

print(f"\nâœ… Vector stores saved successfully!")
print(f"\nðŸ“‚ Locations:")
print(f"   OpenAI: {OPENAI_VECTOR_STORE_PATH}")
print(f"   HuggingFace: {HF_VECTOR_STORE_PATH}")
print(f"\nðŸ’¡ These will be loaded in subsequent notebooks to avoid re-embedding.")


SAVING VECTOR STORES

âœ“ Saved vector store to /Users/gianlucamazza/Workspace/notebooks/llm_rag/notebooks/fundamentals/../../data/vector_stores/openai_embeddings
âœ“ Saved vector store to /Users/gianlucamazza/Workspace/notebooks/llm_rag/notebooks/fundamentals/../../data/vector_stores/huggingface_embeddings

âœ… Vector stores saved successfully!

ðŸ“‚ Locations:
   OpenAI: /Users/gianlucamazza/Workspace/notebooks/llm_rag/notebooks/fundamentals/../../data/vector_stores/openai_embeddings
   HuggingFace: /Users/gianlucamazza/Workspace/notebooks/llm_rag/notebooks/fundamentals/../../data/vector_stores/huggingface_embeddings

ðŸ’¡ These will be loaded in subsequent notebooks to avoid re-embedding.


## 7. Test Similarity Search

Quick test to verify the vector stores work correctly.

In [7]:
from shared.utils import print_results

print_section_header("Testing Similarity Search")

query = "How to build a RAG agent with LangChain?"
print(f"Query: '{query}'\n")

# Test OpenAI
print("\n=== OpenAI Embeddings ===")
results_openai = vectorstore_openai.similarity_search(query, k=3)
print_results(results_openai, max_docs=3, preview_length=200)

# Test HuggingFace
print("\n" + "=" * 80)
print("\n=== HuggingFace Embeddings ===")
results_hf = vectorstore_hf.similarity_search(query, k=3)
print_results(results_hf, max_docs=3, preview_length=200)

print("\nâœ… Both vector stores working correctly!")


TESTING SIMILARITY SEARCH

Query: 'How to build a RAG agent with LangChain?'


=== OpenAI Embeddings ===

Retrieved Documents
--------------------------------------------------------------------------------

1. Source: https://python.langchain.com/docs/use_cases/question_answering/
   Type: web_documentation
   Date: 2025-11-12
   Content: Build a RAG agent with LangChain - Docs by LangChainSkip to main contentWe've raised a $125M Series B to build the platform for agent engineering. Read more.Docs by LangChain home pageLangChain + Lang...

2. Source: https://python.langchain.com/docs/use_cases/chatbots/
   Type: web_documentation
   Date: 2025-11-12
   Content: Build a RAG agent with LangChain - Docs by LangChainSkip to main contentWe've raised a $125M Series B to build the platform for agent engineering. Read more.Docs by LangChain home pageLangChain + Lang...

3. Source: https://python.langchain.com/docs/use_cases/question_answering/
   Type: web_documentation
   Date: 2025-11-12
 

## Summary

In this notebook, we:

âœ… Compared OpenAI vs HuggingFace embeddings  
âœ… Created FAISS vector stores for both  
âœ… Saved vector stores to disk for reuse  
âœ… Tested similarity search  

### Key Takeaways

| Feature | OpenAI | HuggingFace |
|---------|--------|-------------|
| Quality | Excellent | Very Good |
| Dimension | 1536 | 384 |
| Cost | $0.02/1M tokens | Free |
| Privacy | Cloud | Local |
| Speed | Fast API | Local compute |

### Recommendation

- **Production:** OpenAI (higher quality, reliable)
- **Development:** HuggingFace (free, fast iteration)
- **Privacy-sensitive:** HuggingFace (data stays local)
- **Offline:** HuggingFace (no internet needed after download)

### Next Steps

Continue to **[03_simple_rag.ipynb](03_simple_rag.ipynb)** to:
- Create retrievers from vector stores
- Build complete RAG chains
- Compare retrieval strategies (Similarity vs MMR)
- Evaluate RAG performance

---

**ðŸ’¾ Important:** Vector stores are now saved and ready for all subsequent notebooks!