```{contents}
```
## Embeddings

### What Embeddings Are

**Embeddings** are **numerical vector representations of text** that capture **semantic meaning**.
Similar meanings → vectors are **close in vector space**.

> Embeddings convert text into numbers so machines can **search, compare, and retrieve meaning**, not just keywords.

They are a **core primitive** for RAG, semantic search, clustering, and recommendation systems.

---

### Why Embeddings Are Needed

LLMs are generative.
Embeddings are **representational**.

Without embeddings:

* Keyword search only
* Poor semantic recall
* No similarity matching

With embeddings:

* Semantic search
* Context-aware retrieval
* Scalable RAG pipelines
* Efficient similarity comparisons

---

### Where Embeddings Fit in the RAG Pipeline

```
Raw Data
   ↓
Document Loader
   ↓
Text Splitter
   ↓
Chunks
   ↓
Embeddings  ← (THIS STEP)
   ↓
Vector Store
   ↓
Retriever
   ↓
LLM
```

Embeddings are created at **ingestion time** and reused at **query time**.

---

### What an Embedding Vector Looks Like

Example (simplified):

```text
"Email service is down"
→ [0.021, -0.113, 0.884, ..., -0.042]
```

Properties:

* Fixed length (e.g., 384, 768, 1536 dimensions)
* Dense floating-point numbers
* Meaning encoded geometrically

---

### How Similarity Is Measured

Common distance metrics:

### Cosine Similarity (Most Common)

```
cosine(v1, v2) → similarity score
```

* Closer to 1 → more similar
* Closer to 0 → unrelated

### Other Metrics

* Euclidean distance
* Dot product

Vector stores choose the metric.

---

### Embeddings in LangChain

LangChain provides a **standard Embeddings interface** that abstracts providers.

```python
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)
```

This object:

* Converts text → vectors
* Handles batching
* Is provider-agnostic

---

### Basic Embedding Demonstration

#### Embedding a Single Query

```python
vector = embeddings.embed_query(
    "What is LangChain?"
)

print(len(vector))
```

Output:

```
1536
```

---

### Embedding Multiple Documents

```python
texts = [
    "LangChain is a framework for LLM applications",
    "RAG combines retrieval with generation"
]

vectors = embeddings.embed_documents(texts)
```

Each text → one vector.

---

### Embeddings + Vector Store (Typical Usage)

```python
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_texts(
    texts=texts,
    embedding=embeddings
)
```

Now the data is:

* Embedded
* Indexed
* Searchable

---

### Query-Time Embedding

At query time:

```python
retriever = vectorstore.as_retriever()
docs = retriever.get_relevant_documents(
    "How does RAG work?"
)
```

Internally:

1. Query is embedded
2. Similar vectors are searched
3. Top-k documents are returned

---

### Embeddings vs LLMs (Critical Difference)

| Aspect      | Embeddings         | LLM        |
| ----------- | ------------------ | ---------- |
| Purpose     | Representation     | Generation |
| Output      | Vectors            | Text       |
| Determinism | High               | Variable   |
| Cost        | Low                | Higher     |
| Use case    | Search, clustering | Answering  |

---

### Embeddings vs TF-IDF / BM25

| Feature          | Embeddings | Keyword Search |
| ---------------- | ---------- | -------------- |
| Semantic meaning | ✅          | ❌              |
| Synonyms         | ✅          | ❌              |
| Context-aware    | ✅          | ❌              |
| Explainability   | ❌          | ✅              |

Hybrid search often combines both.

---

### Common Embedding Models

#### OpenAI

* `text-embedding-3-small`
* `text-embedding-3-large`

#### Open Source

* SentenceTransformers (e.g., all-MiniLM)
* BGE, E5 families

#### Local Models

* HuggingFace embeddings
* Ollama embeddings

LangChain supports all via adapters.

---

### Choosing an Embedding Model

#### Key Factors

* Dimensionality
* Domain relevance
* Latency
* Cost
* Language support

---

### General Recommendations

| Use Case        | Recommendation        |
| --------------- | --------------------- |
| General RAG     | 768–1536 dims         |
| Low latency     | Smaller models        |
| Domain-specific | Fine-tuned embeddings |
| On-prem         | Open-source models    |

---

### Embeddings and Chunking (Very Important)

Embeddings work on **chunks**, not full documents.

Bad chunking → bad embeddings.

Rules:

* Split before embedding
* Use overlap
* Keep chunks semantically coherent

---

### Common Mistakes

#### Embedding full documents

❌ Too large, poor retrieval

#### No chunk overlap

❌ Boundary context loss

#### Re-embedding at query time

❌ Wasteful

#### Mixing embedding models

❌ Incompatible vectors

---

### Embeddings Are NOT Knowledge

Embeddings:

* Do not store facts
* Do not reason
* Do not answer questions

They only help **find relevant context**.

---

### Best Practices

* Embed once, reuse many times
* Keep embedding model consistent
* Store metadata with vectors
* Monitor vector drift when data updates
* Re-embed if chunking strategy changes

---

### Interview-Ready Summary

> “Embeddings are dense vector representations of text that capture semantic meaning. In LangChain, embeddings are used to index document chunks in vector stores, enabling semantic retrieval for RAG systems.”

---

### Rule of Thumb

* **Search → Embeddings**
* **Answer → LLM**
* **Good RAG → Good chunking + good embeddings**
* **Change chunking → Re-embed**
