# üî¢ Notebook 04: Embeddings and Vector Representations

**LangChain 1.0.5+ | Mixed Level Class**

## üéØ Objectives
1. Understand what embeddings are
2. Use OpenAI Embeddings
3. Use Google Gemini Embeddings
4. Compare embedding models
5. Calculate similarity between vectors

In [3]:
from dotenv import load_dotenv
import os
load_dotenv()
print("‚úÖ Setup complete")

‚úÖ Setup complete


## 1. What are Embeddings? ü§î

### üî∞ BEGINNER

**Embeddings** convert text into numbers (vectors) that capture meaning.

Think of it like a GPS coordinate:
- "dog" ‚Üí [0.2, 0.8, 0.1, ...] (1536 numbers)
- "cat" ‚Üí [0.3, 0.7, 0.2, ...] (close to "dog"!)
- "car" ‚Üí [0.9, 0.1, 0.8, ...] (far from "dog")

Similar meanings = Similar vectors!

## 2. OpenAI Embeddings

In [2]:
from langchain_openai import OpenAIEmbeddings

# Initialize
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create embedding for a query
query = "What is machine learning?"
query_vector = embeddings.embed_query(query)

print(f"Query: {query}")
print(f"Vector dimensions: {len(query_vector)}")
print(f"First 5 values: {query_vector[:5]}")

# Embed multiple documents
docs = [
    "Machine learning is a subset of AI",
    "Deep learning uses neural networks",
    "The weather is sunny today"
]

doc_vectors = embeddings.embed_documents(docs)
print(f"\nEmbedded {len(doc_vectors)} documents")

Query: What is machine learning?
Vector dimensions: 1536
First 5 values: [-0.002476818859577179, -0.012755980715155602, -0.006645360495895147, -0.03157883137464523, 0.028759293258190155]

Embedded 3 documents


## 3. Google Gemini Embeddings

In [5]:
!uv pip install langchain-google-genai

[2K[2mResolved [1m39 packages[0m [2min 907ms[0m[0m                                        [0m
[2K[37m‚†ô[0m [2mPreparing packages...[0m (0/9)                                                   
[2K[1A[37m‚†ô[0m [2mPreparing packages...[0m (0/9)----[0m[0m     0 B/11.23 KiB                     [1A
[2K[1A[37m‚†ô[0m [2mPreparing packages...[0m (0/9)[2m[0m[0m 11.23 KiB/11.23 KiB                   [1A
[2mcachetools[0m [32m------------------------------[2m[0m[0m 11.23 KiB/11.23 KiB
[2K[2A[37m‚†ô[0m [2mPreparing packages...[0m (0/9)---------[0m[0m     0 B/169.63 KiB               [2A
[2mcachetools[0m [32m------------------------------[2m[0m[0m 11.23 KiB/11.23 KiB
[2mgoogle-api-core[0m [32m[2m------------------------------[0m[0m     0 B/169.63 KiB
[2K[3A[37m‚†ô[0m [2mPreparing packages...[0m (0/9)------------------[0m[0m     0 B/290.54 KiB      [3A
[2mgoogle-api-core[0m [32m[2m------------------------------[0m[0m     0 B/1

In [6]:
# Google Gemini Embeddings
if os.getenv("GOOGLE_API_KEY"):
    from langchain_google_genai import GoogleGenerativeAIEmbeddings
    
    gemini_embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001"
    )
    
    vector = gemini_embeddings.embed_query("Hello world")
    print(f"Gemini vector dimensions: {len(vector)}")
else:
    print("‚ö†Ô∏è Google API key not found. Set GOOGLE_API_KEY in .env")

GoogleGenerativeAIError: Error embedding content: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0 [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerUserPerProjectPerModel-FreeTier"
}
]

## 4. Similarity Calculation

In [7]:
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Test similarity
texts = [
    "I love programming",
    "I enjoy coding",
    "The weather is nice"
]

vectors = [embeddings.embed_query(text) for text in texts]

print("Similarity Scores:")
print(f"'{texts[0]}' vs '{texts[1]}': {cosine_similarity(vectors[0], vectors[1]):.4f}")
print(f"'{texts[0]}' vs '{texts[2]}': {cosine_similarity(vectors[0], vectors[2]):.4f}")
print("\nSimilar meanings = Higher similarity score!")

Similarity Scores:
'I love programming' vs 'I enjoy coding': 0.7529
'I love programming' vs 'The weather is nice': 0.2328

Similar meanings = Higher similarity score!


## Summary

‚úÖ Embeddings convert text to vectors
‚úÖ OpenAI text-embedding-3-small: 1536 dimensions
‚úÖ Google Gemini: 768 dimensions
‚úÖ Cosine similarity measures vector similarity

**Next:** Vector Stores for efficient search!