# Embedding with Gemini

Embedding are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Embedding are useful for working with NL and code, because they can be readily consumed and compared by other ML models and algorithms like clustering or search.

### Setup

In [45]:
import os
import google.generativeai as genai
from dotenv import load_dotenv
import numpy as np

load_dotenv('../.env.local')
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

print("✅ Gemini API configured")

✅ Gemini API configured


### Single Embedding

In [46]:
result = genai.embed_content(
    model="models/text-embedding-004",
    content="What is the capital of France?"
)

embedding = result["embedding"]
print(f"Embedding dimension: {len(embedding)}")
print(f"First 10 values: {embedding[:10]}")

Embedding dimension: 768
First 10 values: [-0.023838477, -0.008524507, 0.010140137, -0.036359083, 0.005881804, 0.017632408, 0.03440266, -0.022488268, -0.03767718, 0.090126775]


### Batch Embeddings

In [47]:
# Cell 3: Batch embeddings
texts = [
    "Python is a programming language",
    "Machine learning is a subset of AI",
    "Paris is the capital of France"
]

embeddings = []
for text in texts:
    result = genai.embed_content(
        model="models/text-embedding-004",
        content=text
    )
    embeddings.append(result['embedding'])

print(f"Created {len(embeddings)} embeddings")
print(f"Each embedding has {len(embeddings[0])} dimensions")

Created 3 embeddings
Each embedding has 768 dimensions


### Calculating Similarity between texts

In [50]:
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare first two texts
sim = cosine_similarity(embeddings[0], embeddings[1])
print(f"Similarity between text 1 and 2: {sim:.4f}")

# Compare all pairs
for i in range(len(texts)):
    for j in range(i+1, len(texts)):
        sim = cosine_similarity(embeddings[i], embeddings[j])
        print(f"\n'{texts[i][:30]}...' \nvs \n'{texts[j][:30]}...'\nSimilarity: {sim:.4f}")

Similarity between text 1 and 2: 0.4605

'Python is a programming langua...' 
vs 
'Machine learning is a subset o...'
Similarity: 0.4605

'Python is a programming langua...' 
vs 
'Paris is the capital of France...'
Similarity: 0.3913

'Machine learning is a subset o...' 
vs 
'Paris is the capital of France...'
Similarity: 0.2628


### Different task types

In [51]:
result = genai.embed_content(
    model="models/text-embedding-004",
    content="What is Python?",
    task_type="retrieval_query"  # or "retrieval_document", "semantic_similarity"
)

print(f"Query embedding created: {len(result['embedding'])} dimensions")

Query embedding created: 768 dimensions


# Google Gemini Embeddings

## Key Points:
- **FREE** - No cost for embeddings
- **768 dimensions** - Smaller than OpenAI (1536)
- **Latest model**: `text-embedding-004`
- **Access**: Direct function call (no client needed)

## Usage:
```python
result = genai.embed_content(
    model="models/text-embedding-004",
    content="your text here"
)
embedding = result['embedding']