# Vector Stores (FAISS & Chroma) ‚Äî Practical Workshop

**Phase:** Storing embeddings for fast semantic search üí°üìåüß†üîç

This notebook teaches vector stores (FAISS and Chroma), how they work, and how to build a RAG index. It includes hands-on demos you can run in Google Colab or locally.

## Learning Guide

**What you will learn**

- What vector stores are and why they matter for semantic search and RAG
- Differences between FAISS and ChromaDB (strengths & trade-offs)
- How to build, save, and query FAISS and Chroma indexes
- How to assemble a simple RAG pipeline: Ingest ‚Üí Split ‚Üí Embed ‚Üí Store ‚Üí Query

**Why it matters**

Vector stores let you search millions of embeddings quickly and are the backbone of retrieval-augmented generation systems.

**Hands-on steps**

1. Install dependencies (optional cell provided)
2. Load or create sample documents
3. Chunk documents (we provide a splitter example)
4. Generate embeddings (local or Gemini)
5. Build FAISS and Chroma stores and run queries


In [None]:
from secrete_key import my_gemini_api_key
API_KEY = my_gemini_api_key()
print('API_KEY loaded (hidden)')

In [None]:
# Uncomment to install dependencies in Colab or a fresh VM
# !pip install --quiet faiss-cpu sentence-transformers chromadb langchain-google-genai numpy scikit-learn
print('If packages are missing, run the pip install line above in your environment.')

## 4.1 What Are Vector Stores?

Vector stores (a.k.a. vector databases or indexes) store embeddings and support fast nearest-neighbor search. Key concepts:

- **Indexing**: Organizes vectors to make search fast (flat, IVF, HNSW, etc.)
- **Distance metric**: L2 (Euclidean) or cosine are common
- **Metadata**: Many stores support storing document metadata for filtered search


## 4.2 FAISS

**Facebook AI Similarity Search** ‚Äî a high-performance library for vector similarity search.

- Strengths: Extremely fast, handles millions of vectors, many index types (Flat, IVFFlat, HNSW, PQ)
- Weaknesses: Lower-level API, you must manage metadata externally, persistence is via file saves

We'll build a simple FAISS index (IndexFlatL2) for demo purposes.

In [None]:
import numpy as np
print('FAISS demo will run if `faiss` and `sentence_transformers` are installed.')

sample_documents = [
    'Python is a programming language',
    'Java is also a programming language',
    'The weather is sunny today',
    'Machine learning uses algorithms',
    'Deep learning is a subset of machine learning',
]

try:
    import faiss
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model.encode(sample_documents, convert_to_numpy=True, normalize_embeddings=True)
    dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(embeddings.astype('float32'))
    print(f'Built FAISS index with dimension {dim} and {index.ntotal} vectors')
    # Query example
    query = 'What is programming?'
    q_emb = model.encode([query], convert_to_numpy=True, normalize_embeddings=True).astype('float32')
    k = 2
    distances, indices = index.search(q_emb, k)
    print('\nSearch results:')
    for rank, idx in enumerate(indices[0]):
        print(f"{rank+1}. {sample_documents[idx]} (index {idx}, distance {distances[0][rank]:.4f})")
except Exception as e:
    print('Skipping FAISS demo ‚Äî ensure faiss-cpu and sentence-transformers are installed. Error:', e)

In [None]:
try:
    import faiss
    # Save index
    faiss.write_index(index, 'faiss_demo.index')
    print('Saved FAISS index to faiss_demo.index')
    # Load index
    loaded = faiss.read_index('faiss_demo.index')
    print('Loaded FAISS index, ntotal =', loaded.ntotal)
except Exception as e:
    print('Could not save/load FAISS index ‚Äî', e)

## 4.3 ChromaDB

Chroma is a simple, developer-friendly vector store with built-in persistence and metadata support. It's great for prototypes and small-to-medium projects.

- Strengths: Easy to use, metadata filtering, persistent by default
- Weaknesses: Not yet as optimized as FAISS for very large datasets

We'll demonstrate creating a Chroma collection, inserting documents, and querying.

In [None]:
try:
    import chromadb
    from chromadb.config import Settings
    from sentence_transformers import SentenceTransformer
    print('Chroma import OK')
    # Initialize client (in-memory or persistent)
    client = chromadb.Client(Settings(chroma_db_impl='duckdb+parquet', persist_directory='.chromadb'))
    collection = client.create_collection('demo_collection')
    st = SentenceTransformer('all-MiniLM-L6-v2')
    docs = [
        'Python is great for data science',
        'JavaScript runs in browsers',
        'SQL is used for databases',
        'Machine learning predicts outcomes',
    ]
    ids = [f'doc{i}' for i in range(len(docs))]
    embeddings = st.encode(docs, convert_to_numpy=True, normalize_embeddings=True)
    # Add to collection (Chroma can accept precomputed embeddings)
    collection.add(documents=docs, ids=ids, embeddings=embeddings.tolist())
    print('Added documents to Chroma collection. Count:', collection.count())
    # Query
    query = 'Tell me about Python'
    q_emb = st.encode([query], convert_to_numpy=True, normalize_embeddings=True)[0].tolist()
    results = collection.query(query_embeddings=[q_emb], n_results=2)
    print('\nChroma query results:')
    for doc in results['documents'][0]:
        print('-', doc)
except Exception as e:
    print('Skipping Chroma demo ‚Äî ensure chromadb and sentence-transformers are installed. Error:', e)

## 4.4 Building a RAG Index (Ingest ‚Üí Split ‚Üí Embed ‚Üí Store)

High-level steps:

1. **Ingest** documents from files or a database
2. **Split** into chunks using an appropriate splitter (e.g., RecursiveCharacterTextSplitter or semantic chunker)
3. **Embed** chunks using an embeddings provider (Gemini or sentence-transformers for offline)
4. **Store** embeddings + metadata in a vector store (FAISS, Chroma)
5. **Query**: embed the user query, search the store, and pass top chunks as context to the LLM


In [None]:
# Example RAG pipeline (small demo)
try:
    from sentence_transformers import SentenceTransformer
    import faiss
    st = SentenceTransformer('all-MiniLM-L6-v2')
    docs = [
        'Our office hours are 9 AM to 5 PM, Monday to Friday.',
        'We offer a 30-day return policy on all products.',
        'Customer support: support@company.com',
        'Shipping takes 3-5 business days within the US.'
    ]
    # Simple splitter: split into sentences (for demo)
    import re
    def simple_split(doc):
        return [s.strip() for s in re.split(r'(?<=[.!?])\\s+', doc) if s.strip()]
    chunks = []
    metadatas = []
    for i, d in enumerate(docs):
        sents = simple_split(d)
        for j, s in enumerate(sents):
            chunks.append(s)
            metadatas.append({'source_doc': i})
    embeddings = st.encode(chunks, convert_to_numpy=True, normalize_embeddings=True)
    dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(embeddings.astype('float32'))
    print('Built FAISS index for RAG with', index.ntotal, 'chunks')
    # Query pipeline
    def rag_query(query, k=2):
        q_emb = st.encode([query], convert_to_numpy=True, normalize_embeddings=True).astype('float32')
        distances, indices = index.search(q_emb, k)
        return [(chunks[idx], metadatas[idx], float(distances[0][i])) for i, idx in enumerate(indices[0])]
    print('\nRAG demo query: "How do I return a product?"')
    results = rag_query('How do I return a product?', k=2)
    for r in results:
        print('\nChunk:', r[0])
        print('Metadata:', r[1])
        print('Distance:', r[2])
except Exception as e:
    print('Skipping RAG pipeline demo ‚Äî ensure sentence-transformers and faiss are installed. Error:', e)

## 4.5 Hands-On Demo & FAISS vs Chroma Comparison

Try both FAISS and Chroma on your dataset and measure:

- Build time
- Query latency
- Ease of working with metadata
- Persistence and portability

Guidance:

- Use FAISS if you need raw speed and are comfortable managing metadata and persistence yourself.
- Use Chroma for rapid prototyping, metadata filtering, and persistence out of the box.


In [None]:
# Performance test stub: generate N random vectors and time FAISS vs simple Chroma ops
try:
    import time
    import numpy as np
    N = 1000
    dim = 384
    vectors = np.random.random((N, dim)).astype('float32')
    # FAISS timing
    import faiss
    idx = faiss.IndexFlatL2(dim)
    t0 = time.time()
    idx.add(vectors)
    t1 = time.time()
    # search
    q = np.random.random((1, dim)).astype('float32')
    t2 = time.time()
    d, ix = idx.search(q, 5)
    t3 = time.time()
    print(f'FAISS add time: {t1-t0:.3f}s, search time: {t3-t2:.6f}s')
    # Chroma timing (approx)
    try:
        import chromadb
        from chromadb.config import Settings
        client = chromadb.Client(Settings(chroma_db_impl='duckdb+parquet', persist_directory='.chromadb_perf'))
        coll = client.create_collection('perf_test')
        coll.add(ids=[str(i) for i in range(N)], documents=['doc']*N, embeddings=vectors.tolist())
        t4 = time.time()
        res = coll.query(query_embeddings=[q.flatten().tolist()], n_results=5)
        t5 = time.time()
        print(f'Chroma add+query approx time: {(t5-t4):.3f}s (query part)')
    except Exception as e:
        print('Chroma perf test skipped ‚Äî chromadb not available or error:', e)
except Exception as e:
    print('Performance test skipped ‚Äî', e)

## Best practices

- **Choose chunk size** to preserve semantic units ‚Äî for contracts, larger chunks (500-1000 chars) often work better.
- **Overlap** helps keep context between chunks: 50-200 characters or token-equivalent.
- **Normalize embeddings** if you use cosine similarity with L2 indexes.
- **Persist** FAISS indexes to disk and store metadata in a separate table (e.g., SQLite) if using FAISS in production.
- **Test** retrieval quality with real queries before bulk ingestion.
