In [1]:
from dotenv import load_dotenv
load_dotenv()
import pprint

# üß† Embedding Models

Imagine turning any piece of text‚Äîtweets, docs, or books‚Äîinto a compact, machine-readable form. That‚Äôs what **embedding models** do! They transform text into numerical vectors that capture meaning, enabling semantic search, ranking, and clustering.

---

## üîë Key Concepts

![Embedding Models](assets/embeddings_concept-975a9aaba52de05b457a1aeff9a7393a.png "Embedding Models")

1. **Embed text** ‚û°Ô∏è Convert text into a vector (a list of numbers)
2. **Compare meaning** ‚û°Ô∏è Use math (like cosine similarity) to compare vectors

---

## üï∞Ô∏è A Quick History

- üìå **BERT (2018)**: Google‚Äôs breakthrough for understanding text  
- ‚ö° **SBERT**: Tuned for sentence embeddings with better speed & accuracy  
- üß™ **MTEB**: A benchmark to compare modern embedding models

**Explore more:**
- [BERT Paper](#)
- [Cameron Wolfe's Review](#)
- [MTEB Leaderboard](#)

---

## üß© LangChain Interface

LangChain makes working with embeddings simple:

- `embed_documents` ‚Üí for multiple texts  
- `embed_query` ‚Üí for a single query  

üõ†Ô∏è **Batch embed:**

In [3]:
from langchain_openai import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings()
embeddings = embeddings_model.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!"
    ]
)
len(embeddings), len(embeddings[0])

(5, 1536)

üõ†Ô∏è **Query embed:**

In [4]:
query_embedding = embeddings_model.embed_query("What is the meaning of life?")

**Learn more:** [Embedding Integrations](#) ¬∑ [How-to Guides](#)


## üìè Measuring Similarity

Embeddings are like coordinates in space. The closer two vectors are, the more semantically related their texts are.

### Common Similarity Metrics:

- üìê **Cosine Similarity**: Angle between vectors  
- üìç **Euclidean Distance**: Straight-line distance  
- üéØ **Dot Product**: Projection of one onto another  


```python
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)
```

**Suggested by:** OpenAI (for their models)

---

## üìö Further Reading

- [Simon Willison on Embeddings](https://simonwillison.net/2023/Oct/23/embeddings/)  
- [Google on Similarity Metrics](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity)  
- [OpenAI FAQ on Similarity](https://platform.openai.com/docs/guides/embeddings/faq)