# Memvid Adapter Tutorial

This notebook demonstrates how to use Pydapter's Memvid adapters for video-based AI memory storage and retrieval.

## What is Memvid?

Memvid enables storing and retrieving text data by encoding it into video files. This provides:
- Efficient storage and compression
- Semantic search capabilities
- Novel data persistence patterns

In [6]:
# Import libraries and check availability
from pydantic import BaseModel
from typing import List
import tempfile
import os

# Check Memvid availability
try:
    from pydapter.extras.memvid_ import MemvidAdapter
    print("✅ Basic Memvid adapter available")
    memvid_available = True
except ImportError as e:
    print(f"❌ Memvid not available: {e}")
    print("Install with: pip install memvid")
    memvid_available = False

# Check Pulsar integration
try:
    from pydapter.extras.async_memvid_pulsar import AsyncPulsarMemvidAdapter
    print("✅ Pulsar Memvid adapter available")
    pulsar_available = True
except ImportError:
    print("❌ Pulsar integration not available")
    pulsar_available = False

print(f"\nFeatures available:")
print(f"- Basic Memvid: {memvid_available}")
print(f"- Pulsar streaming: {pulsar_available}")

✅ Basic Memvid adapter available
✅ Pulsar Memvid adapter available

Features available:
- Basic Memvid: True
- Pulsar streaming: True


In [7]:
# Define data models
from pydapter.core import Adaptable

class Document(Adaptable, BaseModel):
    """A document with text content."""
    id: str
    title: str
    text: str
    category: str
    tags: List[str] = []

# For search results, we need a simpler model since memvid only returns text chunks
class SearchResult(Adaptable, BaseModel):
    """A search result containing found text."""
    text: str
    id: str = "0"  # Default value since memvid assigns sequential IDs

# Register adapters
Document.register_adapter(MemvidAdapter)
SearchResult.register_adapter(MemvidAdapter)

# Create sample documents
documents = [
    Document(
        id="doc1",
        title="Machine Learning Basics",
        text="Machine learning enables computers to learn from data without explicit programming. It uses algorithms to identify patterns.",
        category="AI",
        tags=["ml", "ai"]
    ),
    Document(
        id="doc2",
        title="Python Programming",
        text="Python is a versatile programming language popular in data science and web development.",
        category="Programming",
        tags=["python", "programming"]
    ),
    Document(
        id="doc3", 
        title="Data Analysis",
        text="Data analysis involves examining datasets to discover insights and patterns that inform decision making.",
        category="Data Science",
        tags=["data", "analysis"]
    )
]

print(f"Created {len(documents)} documents:")
for doc in documents:
    print(f"- {doc.title} ({doc.category})")

Created 3 documents:
- Machine Learning Basics (AI)
- Python Programming (Programming)
- Data Analysis (Data Science)


## Basic Memvid Usage

Let's demonstrate the core functionality of encoding documents into video memory and searching them:

In [8]:
# Basic Memvid usage demo
if memvid_available:
    print("🎬 Creating video memory from documents...")
    
    with tempfile.TemporaryDirectory() as temp_dir:
        video_file = os.path.join(temp_dir, "knowledge_base.mp4")
        index_file = os.path.join(temp_dir, "knowledge_index.json")
        
        print(f"Video file: {video_file}")
        print(f"Index file: {index_file}")
        
        try:
            # Encode documents into video memory
            result = MemvidAdapter.to_obj(
                documents,
                video_file=video_file,
                index_file=index_file,
                text_field="text",  # Field containing text to encode
                chunk_size=256,     # Size of text chunks
                overlap=32          # Overlap between chunks
            )
            
            print(f"\n✅ Encoding successful!")
            print(f"Result: {result}")
            
            # Check if files were created
            if os.path.exists(video_file) and os.path.exists(index_file):
                video_size = os.path.getsize(video_file)
                index_size = os.path.getsize(index_file)
                print(f"\n📊 File sizes:")
                print(f"Video: {video_size:,} bytes")
                print(f"Index: {index_size:,} bytes")
                
                # Search the video memory
                print(f"\n🔍 Searching video memory...")
                
                search_queries = [
                    "machine learning algorithms",
                    "python programming", 
                    "data analysis patterns"
                ]
                
                for query in search_queries:
                    print(f"\nQuery: '{query}'")
                    
                    search_config = {
                        "video_file": video_file,
                        "index_file": index_file,
                        "query": query,
                        "top_k": 2
                    }
                    
                    try:
                        # Use SearchResult model for search results
                        results = MemvidAdapter.from_obj(SearchResult, search_config, many=True)
                        print(f"Found {len(results)} results:")
                        
                        for i, result in enumerate(results, 1):
                            # Display the found text
                            text_preview = result.text[:100] + "..." if len(result.text) > 100 else result.text
                            print(f"  {i}. {text_preview}")
                            
                    except Exception as e:
                        print(f"Search failed: {e}")
                        
                # Demonstrate single result search
                print(f"\n🎯 Single result search:")
                try:
                    single_config = {
                        "video_file": video_file,
                        "index_file": index_file,
                        "query": "python",
                        "top_k": 1
                    }
                    single_result = MemvidAdapter.from_obj(SearchResult, single_config, many=False)
                    print(f"Best match: {single_result.text[:80]}...")
                except Exception as e:
                    print(f"Single search failed: {e}")
                    
            else:
                print("❌ Video files were not created")
                
        except Exception as e:
            print(f"❌ Encoding failed: {e}")
            print("This might be because memvid library is not properly installed")
            
else:
    print("❌ Memvid not available - please install with:")
    print("   pip install pydapter[memvid]")
    print("\nWhat you would see:")
    print("- Documents encoded into video files")
    print("- Semantic search returning relevant text chunks")
    print("- Efficient storage and retrieval")

🎬 Creating video memory from documents...
Video file: /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpk_qf158y/knowledge_base.mp4
Index file: /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpk_qf158y/knowledge_index.json


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


🐛 FRAMES: 3 files in /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpve8sztlq/frames
🐛 FFMPEG: frames=/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpve8sztlq/frames → docker_mount=/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpve8sztlq

✅ Encoding successful!
Result: {'encoded_count': 3, 'video_file': '/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpk_qf158y/knowledge_base.mp4', 'index_file': '/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpk_qf158y/knowledge_index.json', 'backend': 'opencv', 'codec': 'mp4v', 'total_frames': 3, 'video_size_mb': 4.1961669921875e-05, 'fps': 15, 'duration_seconds': 0.2, 'total_chunks': 3, 'index_stats': {'total_chunks': 3, 'total_frames': 3, 'index_type': 'Flat', 'embedding_model': 'all-MiniLM-L6-v2', 'dimension': 384, 'avg_chunks_per_frame': 1.0}}

📊 File sizes:
Video: 96,051 bytes
Index: 1,595 bytes

🔍 Searching video memory...

Query: 'machine learning algorithms'




Found 2 results:
  1. Machine learning enables computers to learn from data without explicit programming. It uses algorith...
  2. Data analysis involves examining datasets to discover insights and patterns that inform decision mak...

Query: 'python programming'
Found 2 results:
  1. Python is a versatile programming language popular in data science and web development.
  2. Machine learning enables computers to learn from data without explicit programming. It uses algorith...

Query: 'data analysis patterns'
Found 2 results:
  1. Data analysis involves examining datasets to discover insights and patterns that inform decision mak...
  2. Machine learning enables computers to learn from data without explicit programming. It uses algorith...

🎯 Single result search:
Best match: Python is a versatile programming language popular in data science and web devel...


## Error Handling

Proper error handling is crucial for production applications:

In [9]:
# Error handling demo
if memvid_available:
    from pydapter.exceptions import ValidationError, ResourceError, ConnectionError
    
    print("🛡️ Testing error handling scenarios:")
    
    # 1. Missing required parameters
    try:
        MemvidAdapter.to_obj(documents, video_file="", index_file="test.json")
        print("❌ Should have failed!")
    except ValidationError as e:
        print(f"✅ Caught validation error: {str(e)[:60]}...")
    
    # 2. Wrong field name
    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            MemvidAdapter.to_obj(
                documents,
                video_file=os.path.join(temp_dir, "test.mp4"),
                index_file=os.path.join(temp_dir, "test.json"),
                text_field="nonexistent_field"
            )
        print("❌ Should have failed!")
    except ValidationError as e:
        print(f"✅ Caught field error: {str(e)[:60]}...")
    
    # 3. Empty data handling
    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            result = MemvidAdapter.to_obj(
                [],  # Empty list
                video_file=os.path.join(temp_dir, "empty.mp4"),
                index_file=os.path.join(temp_dir, "empty.json")
            )
        print(f"✅ Empty data handled gracefully: {result}")
    except Exception as e:
        print(f"❌ Unexpected error with empty data: {e}")
    
    # 4. Missing video files for search (FileNotFoundError -> ResourceError)
    try:
        search_config = {
            "video_file": "/nonexistent/path.mp4",
            "index_file": "/nonexistent/index.json",
            "query": "test"
        }
        MemvidAdapter.from_obj(SearchResult, search_config, many=True)
        print("❌ Should have failed!")
    except (ResourceError, ConnectionError) as e:
        # Both exceptions are possible depending on the specific error
        print(f"✅ Caught file/connection error: {str(e)[:60]}...")
    
    # 5. Query without results
    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            # Create empty video memory
            empty_result = MemvidAdapter.to_obj(
                [],
                video_file=os.path.join(temp_dir, "empty.mp4"),
                index_file=os.path.join(temp_dir, "empty.json")
            )
            
            if empty_result['encoded_count'] == 0:
                print("✅ Empty memory created for testing no results scenario")
                # This would typically raise ResourceError when no results found
            
    except Exception as e:
        print(f"Note: {e}")
    
    print("\n💡 Always handle these exceptions in production code:")
    print("- ValidationError: Invalid parameters or model fields")
    print("- ResourceError: Missing files or no search results")
    print("- ConnectionError: Memvid library or FAISS issues")
    
else:
    print("❌ Memvid not available for error handling demo")
    print("\nKey exceptions to handle:")
    print("- ValidationError: Invalid parameters or model fields")
    print("- ResourceError: Missing video/index files")
    print("- ConnectionError: Memvid library issues")

🛡️ Testing error handling scenarios:
✅ Caught validation error: Missing required parameter 'video_file'...
✅ Caught field error: Text field 'nonexistent_field' not found in model...
✅ Empty data handled gracefully: {'encoded_count': 0}
✅ Caught file/connection error: Failed to create MemvidRetriever: Error in faiss::FileIORead...
✅ Empty memory created for testing no results scenario

💡 Always handle these exceptions in production code:
- ValidationError: Invalid parameters or model fields
- ResourceError: Missing files or no search results
- ConnectionError: Memvid library or FAISS issues


## Summary

This tutorial demonstrated the core capabilities of Pydapter's Memvid adapters:

### ✅ What We Learned:
1. **Basic Operations** - Encoding documents into video memory and searching
2. **Error Handling** - Robust exception handling patterns
3. **Integration** - Seamless Pydantic model support

### 🎯 Key Benefits:
- **Novel Storage** - Video-based encoding for efficient data storage
- **Semantic Search** - Find content by meaning, not just keywords
- **Pydantic Integration** - Seamless model conversion
- **Enterprise Ready** - Pulsar streaming for production scale

### 🚀 Next Steps:
- Explore different chunk sizes and overlap settings
- Try the Pulsar integration for enterprise streaming
- Integrate with your existing data pipelines
- Check out the [Memvid documentation](https://github.com/memvid/memvid)

### 📚 Related Resources:
- [Pydapter Documentation](https://github.com/khive-ai/pydapter)
- [Apache Pulsar](https://pulsar.apache.org/)
- [Pydantic Models](https://docs.pydantic.dev/)"

In [10]:
# Import libraries and check availability
from pydantic import BaseModel
from typing import List
import tempfile
import os

# Check Memvid availability
try:
    from pydapter.extras.memvid_ import MemvidAdapter
    print("✅ Basic Memvid adapter available")
    memvid_available = True
except ImportError as e:
    print(f"❌ Memvid not available: {e}")
    print("Install with: pip install memvid")
    memvid_available = False

# Check Pulsar integration
try:
    from pydapter.extras.async_memvid_pulsar import AsyncPulsarMemvidAdapter
    print("✅ Pulsar Memvid adapter available")
    pulsar_available = True
except ImportError:
    print("❌ Pulsar integration not available")
    pulsar_available = False

print(f"\nFeatures available:")
print(f"- Basic Memvid: {memvid_available}")
print(f"- Pulsar streaming: {pulsar_available}")

✅ Basic Memvid adapter available
✅ Pulsar Memvid adapter available

Features available:
- Basic Memvid: True
- Pulsar streaming: True


In [11]:
# Define data models
from pydapter.core import Adaptable

class Document(Adaptable, BaseModel):
    """A document with text content."""
    id: str
    title: str
    text: str
    category: str
    tags: List[str] = []

# For search results, we need a simpler model since memvid only returns text chunks
class SearchResult(Adaptable, BaseModel):
    """A search result containing found text."""
    text: str
    id: str = "0"  # Default value since memvid assigns sequential IDs

# Register adapters
Document.register_adapter(MemvidAdapter)
SearchResult.register_adapter(MemvidAdapter)

# Create sample documents
documents = [
    Document(
        id="doc1",
        title="Machine Learning Basics",
        text="Machine learning enables computers to learn from data without explicit programming. It uses algorithms to identify patterns.",
        category="AI",
        tags=["ml", "ai"]
    ),
    Document(
        id="doc2",
        title="Python Programming",
        text="Python is a versatile programming language popular in data science and web development.",
        category="Programming",
        tags=["python", "programming"]
    ),
    Document(
        id="doc3", 
        title="Data Analysis",
        text="Data analysis involves examining datasets to discover insights and patterns that inform decision making.",
        category="Data Science",
        tags=["data", "analysis"]
    )
]

print(f"Created {len(documents)} documents:")
for doc in documents:
    print(f"- {doc.title} ({doc.category})")

Created 3 documents:
- Machine Learning Basics (AI)
- Python Programming (Programming)
- Data Analysis (Data Science)


In [12]:
# Basic Memvid usage demo - test a smaller version first
if memvid_available:
    print("🎬 Creating video memory from documents...")
    
    with tempfile.TemporaryDirectory() as temp_dir:
        video_file = os.path.join(temp_dir, "knowledge_base.mp4")
        index_file = os.path.join(temp_dir, "knowledge_index.json")
        
        print(f"Video file: {video_file}")
        print(f"Index file: {index_file}")
        
        try:
            # Encode documents into video memory
            result = MemvidAdapter.to_obj(
                documents,
                video_file=video_file,
                index_file=index_file,
                text_field="text",  # Field containing text to encode
                chunk_size=256,     # Size of text chunks
                overlap=32          # Overlap between chunks
            )
            
            print(f"\n✅ Encoding successful!")
            print(f"Result keys: {list(result.keys())}")
            
            # Check if files were created
            if os.path.exists(video_file) and os.path.exists(index_file):
                print(f"\n📊 Files created successfully")
                
                # Test search with SearchResult model
                print(f"\n🔍 Testing search...")
                
                search_config = {
                    "video_file": video_file,
                    "index_file": index_file,
                    "query": "machine learning",
                    "top_k": 1
                }
                
                try:
                    # Use SearchResult model for search results
                    results = MemvidAdapter.from_obj(SearchResult, search_config, many=True)
                    print(f"✅ Search successful! Found {len(results)} results:")
                    
                    for i, result in enumerate(results, 1):
                        text_preview = result.text[:80] + "..." if len(result.text) > 80 else result.text
                        print(f"  {i}. {text_preview}")
                        
                except Exception as e:
                    print(f"❌ Search failed: {e}")
                    
            else:
                print("❌ Video files were not created")
                
        except Exception as e:
            print(f"❌ Encoding failed: {e}")
            import traceback
            traceback.print_exc()
else:
    print("❌ Memvid not available")

🎬 Creating video memory from documents...
Video file: /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpn_inqz8j/knowledge_base.mp4
Index file: /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpn_inqz8j/knowledge_index.json


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


🐛 FRAMES: 3 files in /var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpi9gql03y/frames
🐛 FFMPEG: frames=/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpi9gql03y/frames → docker_mount=/var/folders/5p/rcbw097d29j3s2qt861tsjfh0000gn/T/tmpi9gql03y

✅ Encoding successful!
Result keys: ['encoded_count', 'video_file', 'index_file', 'backend', 'codec', 'total_frames', 'video_size_mb', 'fps', 'duration_seconds', 'total_chunks', 'index_stats']

📊 Files created successfully

🔍 Testing search...




✅ Search successful! Found 1 results:
  1. Machine learning enables computers to learn from data without explicit programmi...
