```{contents}
```
## Vector Stores

### What a Vector Store Is

A **Vector Store** is a specialized database optimized to **store, index, and search embedding vectors** using similarity metrics.

> It enables **semantic search** by finding vectors that are *closest in meaning*, not just matching keywords.

In LangChain, vector stores sit between **embeddings** and **retrievers**.

---

### Why Vector Stores Are Critical

Without vector stores:

* No semantic retrieval
* No scalable RAG
* No fast similarity search

Vector stores provide:

* Approximate Nearest Neighbor (ANN) search
* Low-latency retrieval at scale
* Metadata filtering
* Persistence and updates

---

### Where Vector Stores Fit in RAG

```
Text → Embeddings → Vector Store → Retriever → LLM
```

They are used at:

* **Ingestion time** (store vectors)
* **Query time** (search vectors)

---

### Core Responsibilities of a Vector Store

### Storage

* Persist vectors (float arrays)
* Persist metadata (source, page, tags)

### Indexing

* Build ANN indexes for fast search

### Search

* Similarity (cosine, dot, L2)
* Top-k nearest neighbors

### Filtering

* Metadata-based constraints

---

### Vector Store vs Traditional Database

| Feature          | Vector Store | SQL / NoSQL      |
| ---------------- | ------------ | ---------------- |
| Primary data     | Vectors      | Rows / documents |
| Query type       | Similarity   | Exact / range    |
| ANN search       | ✅            | ❌                |
| Semantic meaning | ✅            | ❌                |

---

### Vector Stores in LangChain

LangChain provides a **uniform interface**:

```python
vectorstore = FAISS.from_texts(texts, embeddings)
retriever = vectorstore.as_retriever()
```

This abstracts the backend.

---

### Basic Vector Store Demonstration

#### In-Memory Vector Store (FAISS)



In [1]:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

texts = [
    "LangChain is a framework for LLM apps",
    "RAG combines retrieval and generation"
]

vectorstore = FAISS.from_texts(texts, embeddings)




---

### Querying the Vector Store

```python
docs = vectorstore.similarity_search(
    "What is RAG?",
    k=2
)
```

Returns the most semantically similar documents.

---

### Popular Vector Store Backends

### Local / In-Memory

* FAISS
* Chroma

### Managed / Cloud

* Pinecone
* Weaviate
* Qdrant
* Milvus

### Hybrid Databases

* PostgreSQL (pgvector)
* Elasticsearch (dense_vector)

---

### Production-Grade Vector Store Concepts

### 1. Index Type (ANN Algorithms)

Vector stores use ANN to trade **accuracy for speed**.

Common index types:

* HNSW (graph-based)
* IVF (inverted file)
* Flat (exact, slow)

**Production rule**:

* Small data → Flat / HNSW
* Large data → HNSW / IVF

---

### 2. Distance Metrics

Choose one and stay consistent:

| Metric      | Use Case                      |
| ----------- | ----------------------------- |
| Cosine      | Text embeddings (most common) |
| Dot Product | Normalized embeddings         |
| L2          | Image / numeric vectors       |

Changing metric ⇒ **re-index required**.

---

### 3. Metadata Filtering (Critical)

```python
retriever = vectorstore.as_retriever(
    search_kwargs={
        "filter": {"source": "jira"},
        "k": 5
    }
)
```

Used for:

* Tenant isolation
* Access control
* Time-based filtering
* Data partitioning

---

### 4. Chunk-Level Granularity

Production systems store **chunks**, not documents.

Each vector corresponds to:

```python
{
  "embedding": [...],
  "text": "chunk text",
  "metadata": {
    "doc_id": "...",
    "chunk_id": 3,
    "source": "pdf"
  }
}
```

---

### 5. Update & Delete Strategy

Vector stores are **not append-only** in production.

You need:

* Delete by ID
* Upsert
* Re-embedding strategy

Example:

* Document updated → delete old vectors → insert new ones

---

### 6. Re-Embedding Strategy

You must re-embed when:

* Chunking changes
* Embedding model changes
* Language/domain changes

**Never mix embedding models in one index.**

---

### 7. Consistency & Versioning

Production best practice:

* Version vector indexes
* Blue-green reindexing
* Zero-downtime swaps

```
index_v1 → index_v2 → switch retriever
```

---

### 8. Scaling Considerations

Key scaling dimensions:

* Number of vectors
* Query QPS
* Index memory footprint

Strategies:

* Sharding
* Replication
* Separate read/write paths

---

### 9. Latency Budget

Typical production targets:

* Vector search: < 50 ms
* End-to-end RAG: < 500 ms

Techniques:

* Smaller embedding models
* Lower k
* Rerank only top-N

---

### 10. Hybrid Search (Production Standard)

Combine:

* Vector similarity
* Keyword/BM25

Benefits:

* Higher recall
* Better precision

Many systems:

```
Hybrid Search → Reranker → LLM
```

---

### Vector Store + Retriever (LangChain)

Retriever is the **query-time interface**:

```python
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)
```

Production systems often customize retrievers heavily.

---

### Vector Store vs Reranker

| Component    | Purpose                  |
| ------------ | ------------------------ |
| Vector Store | Fast candidate retrieval |
| Reranker     | Precise ordering         |

Rerankers are **optional but recommended** at scale.

---

### Common Production Mistakes

#### Using in-memory stores in prod

❌ Data loss on restart

#### No metadata filters

❌ Security issues

#### Large k values

❌ Latency spikes

#### Re-embedding ad-hoc

❌ Inconsistent indexes

---

### Production Best Practices Checklist

* ✅ Stable embedding model
* ✅ Chunk before embedding
* ✅ Metadata-rich vectors
* ✅ Index versioning
* ✅ Filter by tenant/source
* ✅ Monitor recall & latency
* ✅ Rerank when needed

---

### Interview-Ready Summary

> “A vector store is a database optimized for storing and searching embedding vectors using similarity metrics. In production RAG systems, vector stores require careful index selection, metadata filtering, versioning, and scaling to ensure low latency, high recall, and data safety.”

---

### Rule of Thumb

* **Small data → FAISS / Chroma**
* **Production → Managed vector DB or pgvector**
* **Scale → ANN + filtering + reranking**
* **Change embeddings → Re-index everything**
