# Vector Embeddings in FireProx

This notebook demonstrates how to work with vector embeddings in FireProx using the `FireVector` class.

## Important Limitations

**Firestore Emulator Does NOT Support Vector Embeddings**

- Vector embeddings are a production-only feature
- The Firestore emulator will reject any operations involving vectors
- All examples in this notebook require a real Firestore instance
- See [GitHub Issue #7216](https://github.com/firebase/firebase-tools/issues/7216)

**Vector Constraints**:
- Maximum 2048 dimensions per vector
- Vectors cannot be nested inside arrays or maps
- Vectors must be at the top level of a document field

## What are Vector Embeddings?

Vector embeddings are numerical representations of data (text, images, etc.) that capture semantic meaning. They enable:
- Semantic search (find similar documents)
- Clustering and classification
- Recommendation systems
- Question answering

FireProx provides the `FireVector` class to wrap Firestore's native Vector type with a Pythonic interface.

## Setup

**Note**: These examples will fail with the emulator. You must use a real Firestore project.

In [None]:
from google.cloud import firestore

from fire_prox import AsyncFireProx, FireProx, FireVector

# Initialize clients (PRODUCTION ONLY - will not work with emulator)
project_id = 'your-project-id'  # Replace with your actual project ID

# Synchronous client
sync_client = firestore.Client(project=project_id)
db = FireProx(sync_client)

# Asynchronous client
async_client = firestore.AsyncClient(project=project_id)
async_db = AsyncFireProx(async_client)

print("✓ Connected to production Firestore")
print("⚠️  Remember: Vector embeddings DO NOT work with the emulator")

## Feature 1: Creating and Storing Vectors (Sync)

Create a `FireVector` from a list of floats and store it in a document.

In [None]:
# Create a collection for documents with embeddings
documents = db.collection('semantic_documents')

# Create a simple 3-dimensional embedding
doc1 = documents.new()
doc1.title = "Introduction to Machine Learning"
doc1.content = "Machine learning is a subset of artificial intelligence..."
doc1.embedding = FireVector([0.12, 0.45, 0.78])  # Simple 3D embedding

# Save to Firestore
doc1.save(doc_id='ml_intro')

print(f"✓ Saved document with {doc1.embedding.dimensions}-dimensional embedding")
print(f"  Title: {doc1.title}")
print(f"  Embedding: {doc1.embedding.to_list()}")

## Feature 2: Reading Vectors from Firestore (Sync)

FireProx automatically converts native Firestore Vectors to `FireVector` objects when reading.

In [None]:
# Read the document back
retrieved = db.doc('semantic_documents/ml_intro')
retrieved.fetch()

# Access the vector - automatically converted to FireVector
print(f"Document: {retrieved.title}")
print(f"Embedding type: {type(retrieved.embedding)}")
print(f"Dimensions: {retrieved.embedding.dimensions}")
print(f"Values: {retrieved.embedding.to_list()}")

# You can iterate over the vector
print("\nVector values:")
for i, value in enumerate(retrieved.embedding):
    print(f"  Dimension {i}: {value}")

## Feature 3: Working with Higher-Dimensional Embeddings

Real-world embeddings typically have many more dimensions (e.g., 384, 768, 1536 dimensions).

In [None]:
import random

# Create a document with a realistic 384-dimensional embedding
# (typical for models like sentence-transformers/all-MiniLM-L6-v2)
doc2 = documents.new()
doc2.title = "Deep Learning Fundamentals"
doc2.content = "Deep learning uses neural networks with multiple layers..."

# Generate a random 384-dimensional embedding (in practice, use a real model)
embedding_384d = [random.random() for _ in range(384)]
doc2.embedding = FireVector(embedding_384d)

doc2.save(doc_id='dl_fundamentals')

print(f"✓ Saved document with {doc2.embedding.dimensions}-dimensional embedding")
print(f"  First 5 dimensions: {doc2.embedding.to_list()[:5]}")
print(f"  Last 5 dimensions: {doc2.embedding.to_list()[-5:]}")

## Feature 4: Dimension Validation

FireVector enforces Firestore's maximum dimension limit of 2048.

In [None]:
from fire_prox.fire_vector import MAX_DIMENSIONS

print(f"Firestore maximum dimensions: {MAX_DIMENSIONS}")

# This works - exactly at the limit
max_vector = FireVector([0.1] * MAX_DIMENSIONS)
print(f"✓ Created vector with {max_vector.dimensions} dimensions (max allowed)")

# This will fail - exceeds the limit
try:
    too_large = FireVector([0.1] * (MAX_DIMENSIONS + 1))
except ValueError as e:
    print(f"\n✗ Error: {e}")

# You can disable validation if needed (not recommended)
unvalidated = FireVector([0.1] * 3000, validate=False)
print(f"\n⚠️  Created unvalidated vector with {unvalidated.dimensions} dimensions")
print("   (This will fail when you try to save to Firestore!)")

## Feature 5: Async Operations with Vectors

Vectors work seamlessly with the async API.

In [None]:
# Async version - store and retrieve vectors
async_documents = async_db.collection('semantic_documents')

# Create and save
async_doc = async_documents.new()
async_doc.title = "Neural Network Architectures"
async_doc.content = "Neural networks consist of interconnected layers..."
async_doc.embedding = FireVector([0.23, 0.56, 0.89])

await async_doc.save(doc_id='nn_architectures')

print(f"✓ Saved async document with {async_doc.embedding.dimensions}D embedding")

# Read back
async_retrieved = async_db.doc('semantic_documents/nn_architectures')
await async_retrieved.fetch()

print(f"\nRetrieved: {async_retrieved.title}")
print(f"Embedding: {async_retrieved.embedding.to_list()}")

## Feature 6: Real-World Example - Text Embeddings

Simulate generating embeddings from text using a hypothetical embedding model.

**Note**: This example shows the pattern. In production, you would use a real embedding model like:
- OpenAI's `text-embedding-ada-002` (1536 dimensions)
- Sentence Transformers (384-768 dimensions)
- Google's Vertex AI embeddings (768 dimensions)

In [None]:
def generate_fake_embedding(text: str, dimensions: int = 384) -> list:
    """
    Simulate an embedding model (in production, use a real model).
    
    Real examples:
    - openai.embeddings.create(input=text, model="text-embedding-ada-002")
    - sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2').encode(text)
    - vertexai.TextEmbeddingModel.from_pretrained('textembedding-gecko').get_embeddings([text])
    """
    import hashlib
    import random

    # Use text hash as seed for reproducible "embeddings"
    seed = int(hashlib.md5(text.encode()).hexdigest(), 16) % (2**32)
    random.seed(seed)

    return [random.gauss(0, 1) for _ in range(dimensions)]

# Example documents to embed
articles = [
    {
        'title': 'Introduction to Python',
        'content': 'Python is a high-level programming language known for its simplicity and readability.'
    },
    {
        'title': 'JavaScript Basics',
        'content': 'JavaScript is the programming language of the web, enabling interactive websites.'
    },
    {
        'title': 'Database Design Principles',
        'content': 'Good database design ensures data integrity, reduces redundancy, and improves query performance.'
    }
]

# Store articles with embeddings
for i, article in enumerate(articles):
    doc = documents.new()
    doc.title = article['title']
    doc.content = article['content']

    # Generate embedding from content
    embedding = generate_fake_embedding(article['content'])
    doc.embedding = FireVector(embedding)

    doc.save(doc_id=f'article_{i}')
    print(f"✓ Saved: {article['title']} ({doc.embedding.dimensions}D)")

print("\n✓ All articles embedded and stored")

## Feature 7: Vector Properties and Methods

`FireVector` provides a Pythonic interface for working with embeddings.

In [None]:
# Create a vector for demonstration
demo_vector = FireVector([0.1, 0.2, 0.3, 0.4, 0.5])

# 1. Length / Dimensions
print(f"Dimensions: {demo_vector.dimensions}")
print(f"Length: {len(demo_vector)}")

# 2. Indexing
print(f"\nFirst dimension: {demo_vector[0]}")
print(f"Last dimension: {demo_vector[-1]}")

# 3. Iteration
print("\nAll dimensions:")
for i, value in enumerate(demo_vector):
    print(f"  [{i}]: {value}")

# 4. Convert to list (for further processing)
values = demo_vector.to_list()
print(f"\nAs list: {values}")

# 5. Equality comparison
vector_a = FireVector([0.1, 0.2, 0.3])
vector_b = FireVector([0.1, 0.2, 0.3])
vector_c = FireVector([0.1, 0.2, 0.4])

print(f"\nvector_a == vector_b: {vector_a == vector_b}")  # True
print(f"vector_a == vector_c: {vector_a == vector_c}")  # False

# 6. String representations
print(f"\nstr(): {str(demo_vector)}")
print(f"repr(): {repr(demo_vector)}")

## Feature 8: Vector Type Conversion

FireProx handles conversion between `FireVector` and native Firestore `Vector` automatically.

In [None]:

# Create a FireVector
fire_vec = FireVector([0.1, 0.2, 0.3])
print(f"FireVector: {type(fire_vec).__name__}")
print(f"  {fire_vec}")

# Convert to native Firestore Vector (happens automatically during save)
native_vec = fire_vec.to_firestore_vector()
print(f"\nNative Vector: {type(native_vec).__name__}")

# Convert back to FireVector (happens automatically during fetch)
converted_back = FireVector.from_firestore_vector(native_vec)
print(f"\nConverted back: {type(converted_back).__name__}")
print(f"  {converted_back}")

# Verify they're equal
print(f"\nEqual after round-trip: {fire_vec == converted_back}")

## Server-Side Embedding Generation

### Firebase Extension for Automatic Embeddings

Firebase provides an extension that can automatically generate embeddings when documents are created or updated:

**Extension**: `firestore-genkit-embedding`

**How it works**:
1. You configure which collection and field to monitor
2. When a document is created/updated, the extension triggers
3. It sends the text field to an embedding model (Vertex AI / Gemini)
4. The generated embedding is stored back in the document

**Configuration Example**:
```yaml
# Extension configuration
collection: semantic_documents
input_field: content
output_field: embedding
model: textembedding-gecko@003  # Vertex AI model
dimensions: 768
```

**Workflow**:
```python
# 1. Save document with text content (no embedding yet)
doc = documents.new()
doc.title = "My Article"
doc.content = "This is the text content to embed..."
doc.save()

# 2. Extension automatically triggers:
#    - Reads doc.content
#    - Calls Vertex AI embedding API
#    - Writes result to doc.embedding

# 3. Read back with embedding (after extension completes)
import time
time.sleep(2)  # Wait for extension to process
doc.fetch(force=True)
print(f"Auto-generated embedding: {doc.embedding.dimensions}D")
```

**Important Notes**:
- Extension does NOT work with emulator (production only)
- Requires Vertex AI API enabled
- Incurs costs for embedding API calls
- Processing is asynchronous (not instant)
- Uses Cloud Functions under the hood

**Alternative: Client-Side Embeddings**

For more control, generate embeddings in your application:

```python
# Using OpenAI
import openai

response = openai.embeddings.create(
    input="Your text here",
    model="text-embedding-ada-002"
)
embedding = response.data[0].embedding

doc.embedding = FireVector(embedding)
doc.save()

# Using Sentence Transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("Your text here").tolist()

doc.embedding = FireVector(embedding)
doc.save()
```

## Cleanup

In [None]:
# Delete test documents
test_docs = [
    'ml_intro',
    'dl_fundamentals',
    'nn_architectures',
    'article_0',
    'article_1',
    'article_2'
]

for doc_id in test_docs:
    try:
        doc = db.doc(f'semantic_documents/{doc_id}')
        doc.delete()
        print(f"✓ Deleted {doc_id}")
    except Exception as e:
        print(f"  (Could not delete {doc_id}: {e})")

print("\n✓ Cleanup complete")

## Summary

### Key Takeaways

1. **FireVector Wrapper**: Pythonic interface for Firestore vector embeddings
2. **Automatic Conversion**: FireProx handles Vector ↔ FireVector conversion seamlessly
3. **Validation**: Enforces 2048 dimension limit by default
4. **Sync & Async**: Works with both synchronous and asynchronous APIs
5. **Production Only**: Vectors do NOT work with Firestore emulator

### Limitations to Remember

- ⚠️ Emulator does not support vectors
- ⚠️ Maximum 2048 dimensions
- ⚠️ Vectors cannot be nested in arrays/maps
- ⚠️ Vectors must be top-level document fields

### Use Cases

- **Semantic Search**: Find documents similar to a query
- **Content Recommendations**: Suggest related articles/products
- **Question Answering**: Match questions to relevant answers
- **Image Search**: Find similar images by embedding
- **Clustering**: Group similar documents together

### Next Steps

To build a complete semantic search system:
1. Choose an embedding model (OpenAI, Sentence Transformers, Vertex AI)
2. Generate embeddings for your documents
3. Store using `FireVector`
4. Implement similarity search (cosine similarity, nearest neighbors)
5. Consider using Firestore's vector search capabilities (if available)