# Memvid Adapter Tutorial

This notebook demonstrates how to use Pydapter's Memvid adapters for video-based AI memory storage and retrieval.

## What is Memvid?

Memvid enables storing and retrieving text data by encoding it into video files. This provides:
- Efficient storage and compression
- Semantic search capabilities
- Novel data persistence patterns

In [None]:
# Import libraries and check availability
from pydantic import BaseModel
from typing import List
import tempfile
import os

# Check Memvid availability
try:
    from pydapter.extras.memvid_ import MemvidAdapter
    print("✅ Basic Memvid adapter available")
    memvid_available = True
except ImportError as e:
    print(f"❌ Memvid not available: {e}")
    print("Install with: pip install memvid")
    memvid_available = False

# Check Pulsar integration
try:
    from pydapter.extras.async_memvid_pulsar import AsyncPulsarMemvidAdapter
    print("✅ Pulsar Memvid adapter available")
    pulsar_available = True
except ImportError:
    print("❌ Pulsar integration not available")
    pulsar_available = False

print(f"\nFeatures available:")
print(f"- Basic Memvid: {memvid_available}")
print(f"- Pulsar streaming: {pulsar_available}")

In [ ]:
# Define data models
class Document(BaseModel):
    """A document with text content."""
    id: str
    title: str
    text: str
    category: str
    tags: List[str] = []

# Create sample documents
documents = [
    Document(
        id="doc1",
        title="Machine Learning Basics",
        text="Machine learning enables computers to learn from data without explicit programming. It uses algorithms to identify patterns.",
        category="AI",
        tags=["ml", "ai"]
    ),
    Document(
        id="doc2",
        title="Python Programming",
        text="Python is a versatile programming language popular in data science and web development.",
        category="Programming",
        tags=["python", "programming"]
    ),
    Document(
        id="doc3", 
        title="Data Analysis",
        text="Data analysis involves examining datasets to discover insights and patterns that inform decision making.",
        category="Data Science",
        tags=["data", "analysis"]
    )
]

print(f"Created {len(documents)} documents:")
for doc in documents:
    print(f"- {doc.title} ({doc.category})")

## Basic Memvid Usage

Let's demonstrate the core functionality of encoding documents into video memory and searching them:

In [ ]:
# Basic Memvid usage demo
if memvid_available:
    print("🎬 Creating video memory from documents...")
    
    with tempfile.TemporaryDirectory() as temp_dir:
        video_file = os.path.join(temp_dir, "knowledge_base.mp4")
        index_file = os.path.join(temp_dir, "knowledge_index.json")
        
        print(f"Video file: {video_file}")
        print(f"Index file: {index_file}")
        
        try:
            # Encode documents into video memory
            result = MemvidAdapter.to_obj(
                documents,
                video_file=video_file,
                index_file=index_file,
                text_field="text",  # Field containing text to encode
                chunk_size=256,     # Size of text chunks
                overlap=32          # Overlap between chunks
            )
            
            print(f"\n✅ Encoding successful!")
            print(f"Result: {result}")
            
            # Check if files were created
            if os.path.exists(video_file) and os.path.exists(index_file):
                video_size = os.path.getsize(video_file)
                index_size = os.path.getsize(index_file)
                print(f"\n📊 File sizes:")
                print(f"Video: {video_size:,} bytes")
                print(f"Index: {index_size:,} bytes")
                
                # Search the video memory
                print(f"\n🔍 Searching video memory...")
                
                search_queries = [
                    "machine learning algorithms",
                    "python programming", 
                    "data analysis patterns"
                ]
                
                for query in search_queries:
                    print(f"\nQuery: '{query}'")
                    
                    search_config = {
                        "video_file": video_file,
                        "index_file": index_file,
                        "query": query,
                        "top_k": 2
                    }
                    
                    try:
                        results = MemvidAdapter.from_obj(Document, search_config, many=True)
                        print(f"Found {len(results)} results:")
                        
                        for i, result_doc in enumerate(results, 1):
                            print(f"  {i}. {result_doc.title}")
                            print(f"     Category: {result_doc.category}")
                            
                    except Exception as e:
                        print(f"Search failed: {e}")
            else:
                print("❌ Video files were not created")
                
        except Exception as e:
            print(f"❌ Encoding failed: {e}")
            print("This might be because memvid library is not properly installed")
            
else:
    print("❌ Memvid not available - please install with:")
    print("   pip install pydapter[memvid]")
    print("\nWhat you would see:")
    print("- Documents encoded into video files")
    print("- Semantic search returning relevant results")
    print("- Efficient storage and retrieval")

## Error Handling

Proper error handling is crucial for production applications:

In [ ]:
# Error handling demo
if memvid_available:
    from pydapter.exceptions import ValidationError, ResourceError
    
    print("🛡️ Testing error handling scenarios:")
    
    # 1. Missing required parameters
    try:
        MemvidAdapter.to_obj(documents, video_file="", index_file="test.json")
        print("❌ Should have failed!")
    except ValidationError as e:
        print(f"✅ Caught validation error: {str(e)[:60]}...")
    
    # 2. Wrong field name
    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            MemvidAdapter.to_obj(
                documents,
                video_file=os.path.join(temp_dir, "test.mp4"),
                index_file=os.path.join(temp_dir, "test.json"),
                text_field="nonexistent_field"
            )
        print("❌ Should have failed!")
    except ValidationError as e:
        print(f"✅ Caught field error: {str(e)[:60]}...")
    
    # 3. Empty data handling
    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            result = MemvidAdapter.to_obj(
                [],  # Empty list
                video_file=os.path.join(temp_dir, "empty.mp4"),
                index_file=os.path.join(temp_dir, "empty.json")
            )
        print(f"✅ Empty data handled gracefully: {result}")
    except Exception as e:
        print(f"❌ Unexpected error with empty data: {e}")
    
    # 4. Missing video files for search
    try:
        search_config = {
            "video_file": "/nonexistent/path.mp4",
            "index_file": "/nonexistent/index.json",
            "query": "test"
        }
        MemvidAdapter.from_obj(Document, search_config, many=True)
        print("❌ Should have failed!")
    except ResourceError as e:
        print(f"✅ Caught resource error: {str(e)[:60]}...")
    
    print("\n💡 Always handle these exceptions in production code!")
    
else:
    print("❌ Memvid not available for error handling demo")
    print("\nKey exceptions to handle:")
    print("- ValidationError: Invalid parameters or model fields")
    print("- ResourceError: Missing video/index files")
    print("- ConnectionError: Memvid library issues")

## Summary

This tutorial demonstrated the core capabilities of Pydapter's Memvid adapters:

### ✅ What We Learned:
1. **Basic Operations** - Encoding documents into video memory and searching
2. **Error Handling** - Robust exception handling patterns
3. **Integration** - Seamless Pydantic model support

### 🎯 Key Benefits:
- **Novel Storage** - Video-based encoding for efficient data storage
- **Semantic Search** - Find content by meaning, not just keywords
- **Pydantic Integration** - Seamless model conversion
- **Enterprise Ready** - Pulsar streaming for production scale

### 🚀 Next Steps:
- Explore different chunk sizes and overlap settings
- Try the Pulsar integration for enterprise streaming
- Integrate with your existing data pipelines
- Check out the [Memvid documentation](https://github.com/memvid/memvid)

### 📚 Related Resources:
- [Pydapter Documentation](https://github.com/khive-ai/pydapter)
- [Apache Pulsar](https://pulsar.apache.org/)
- [Pydantic Models](https://docs.pydantic.dev/)"