# SmartGlass AI Agent - Advanced Development Notebook

This notebook is for developers who want to customize and extend the SmartGlass AI Agent.

## Topics Covered
- Custom model configurations
- Performance optimization
- Advanced multimodal scenarios
- Real-time processing pipelines
- Custom use case development

## 1. Setup

In [None]:
# Install dependencies
!pip install -q torch transformers openai-whisper pillow numpy soundfile scipy opencv-python librosa

# Clone repository
!git clone https://github.com/farmountain/SmartGlass-AI-Agent.git
%cd SmartGlass-AI-Agent

import sys
sys.path.append('src')

print("✅ Setup complete!")

## 2. Custom Agent Configuration

Learn how to customize the agent for specific needs.

In [None]:
from smartglass_agent import SmartGlassAgent
from whisper_processor import WhisperAudioProcessor
from clip_vision import CLIPVisionProcessor
from gpt2_generator import GPT2TextGenerator

# Option 1: Quick initialization with defaults
agent_default = SmartGlassAgent()

# Option 2: Performance-optimized (faster but less accurate)
agent_fast = SmartGlassAgent(
    whisper_model="tiny",
    clip_model="openai/clip-vit-base-patch32",
    gpt2_model="gpt2",
    device="cuda"  # Use GPU
)

# Option 3: Accuracy-optimized (slower but more accurate)
agent_accurate = SmartGlassAgent(
    whisper_model="small",
    clip_model="openai/clip-vit-large-patch14",
    gpt2_model="gpt2-medium",
    device="cuda"
)

print("✅ Custom agents initialized!")

## 3. Using Individual Components

Work with components separately for fine-grained control.

In [None]:
# Initialize individual components
audio = WhisperAudioProcessor(model_size="base")
vision = CLIPVisionProcessor()
language = GPT2TextGenerator()

print("✅ Individual components initialized!")

# Example: Custom vision processing
from PIL import Image
import numpy as np

test_image = Image.fromarray(np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8))

# Custom queries for specific domain
medical_queries = [
    "medical equipment",
    "patient monitoring device",
    "medical chart",
    "hospital environment"
]

result = vision.understand_image(test_image, medical_queries)
print(f"Medical context: {result['best_match']}")

## 4. Real-Time Processing Pipeline

Simulate real-time processing for smart glasses.

In [None]:
import time
from queue import Queue

class RealtimeSmartGlassProcessor:
    """Simulates real-time processing pipeline for smart glasses."""
    
    def __init__(self, agent):
        self.agent = agent
        self.frame_queue = Queue(maxsize=10)
        self.audio_queue = Queue(maxsize=10)
        
    def process_frame(self, image):
        """Process a single video frame."""
        start = time.time()
        
        # Quick scene analysis
        scene = self.agent.analyze_scene(image)
        
        elapsed = time.time() - start
        fps = 1.0 / elapsed if elapsed > 0 else 0
        
        return {
            'scene': scene,
            'processing_time': elapsed,
            'fps': fps
        }
    
    def process_audio_chunk(self, audio_chunk):
        """Process audio chunk."""
        start = time.time()
        
        # Transcribe audio
        text = self.agent.audio_processor.transcribe_realtime(audio_chunk)
        
        elapsed = time.time() - start
        
        return {
            'text': text,
            'processing_time': elapsed
        }

# Initialize processor
realtime = RealtimeSmartGlassProcessor(agent_fast)

# Test with sample frame
test_frame = Image.fromarray(np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8))
result = realtime.process_frame(test_frame)

print(f"Processing time: {result['processing_time']:.3f}s")
print(f"Effective FPS: {result['fps']:.1f}")
print(f"Scene: {result['scene'].get('description', 'N/A')}")

## 5. Custom Use Case: Shopping Assistant

In [None]:
class ShoppingAssistant:
    """Smart glasses shopping assistant."""
    
    def __init__(self, agent):
        self.agent = agent
        self.shopping_list = []
        self.found_items = []
    
    def add_to_list(self, items):
        """Add items to shopping list."""
        self.shopping_list.extend(items)
        return f"Added {len(items)} items to your list"
    
    def scan_product(self, image):
        """Scan product in view."""
        # Try to identify what's in the image
        product_categories = [
            'groceries', 'electronics', 'clothing',
            'household items', 'food products', 'beverages'
        ]
        
        category = self.agent.identify_object(image, product_categories)
        
        # Check if it's on the list
        if category in self.shopping_list:
            self.found_items.append(category)
            return f"✓ Found: {category} (on your list!)"
        else:
            return f"Scanned: {category} (not on your list)"
    
    def get_recommendations(self, image):
        """Get product recommendations."""
        scene = self.agent.analyze_scene(image)
        
        prompt = f"Based on this shopping context: {scene.get('description', '')}, suggest related products."
        recommendation = self.agent.generate_response(prompt)
        
        return recommendation
    
    def check_list_status(self):
        """Check shopping list status."""
        remaining = len(self.shopping_list) - len(self.found_items)
        return {
            'total': len(self.shopping_list),
            'found': len(self.found_items),
            'remaining': remaining,
            'found_items': self.found_items
        }

# Demo
shopping = ShoppingAssistant(agent_fast)
shopping.add_to_list(['milk', 'bread', 'eggs', 'coffee'])

status = shopping.check_list_status()
print(f"Shopping list: {status['total']} items")
print(f"Found: {status['found']}, Remaining: {status['remaining']}")

## 6. Custom Use Case: Accessibility Assistant

In [None]:
class AccessibilityAssistant:
    """Smart glasses accessibility features for visually impaired users."""
    
    def __init__(self, agent):
        self.agent = agent
    
    def describe_surroundings(self, image):
        """Detailed description of surroundings."""
        # Get scene description
        scene = self.agent.analyze_scene(image)
        description = scene.get('description', 'Unable to analyze')
        
        # Generate detailed description
        prompt = f"Describe this scene in detail for a visually impaired person: {description}"
        detailed = self.agent.generate_response(prompt)
        
        return detailed
    
    def detect_obstacles(self, image):
        """Detect potential obstacles."""
        obstacles = [
            'stairs', 'door', 'wall', 'furniture',
            'person', 'vehicle', 'curb', 'steps'
        ]
        
        detected = self.agent.identify_object(image, obstacles)
        return f"⚠️ Detected: {detected}"
    
    def read_text(self, image):
        """Identify if there's readable text."""
        text_types = [
            'sign with text',
            'label',
            'menu',
            'document',
            'screen display'
        ]
        
        result = self.agent.vision_processor.understand_image(image, text_types)
        
        if result['confidence'] > 0.3:
            return f"Text detected: {result['best_match']}"
        else:
            return "No readable text detected"
    
    def identify_person(self, image):
        """Detect if people are present."""
        people_queries = [
            'no people present',
            'one person',
            'multiple people',
            'crowd of people'
        ]
        
        result = self.agent.vision_processor.understand_image(image, people_queries)
        return f"👥 {result['best_match']}"

# Demo
accessibility = AccessibilityAssistant(agent_default)

test_image = Image.fromarray(np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8))

print("Accessibility Features Demo:")
print("=" * 60)
print(f"\nObstacle detection: {accessibility.detect_obstacles(test_image)}")
print(f"Text detection: {accessibility.read_text(test_image)}")
print(f"People detection: {accessibility.identify_person(test_image)}")

## 7. Performance Benchmarking

In [None]:
import time
import matplotlib.pyplot as plt

def benchmark_agent(agent, num_iterations=10):
    """Benchmark agent performance."""
    
    results = {
        'vision': [],
        'text_generation': []
    }
    
    # Test image
    test_img = Image.fromarray(np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8))
    
    print(f"Running {num_iterations} iterations...")
    
    for i in range(num_iterations):
        # Vision processing
        start = time.time()
        agent.analyze_scene(test_img)
        results['vision'].append(time.time() - start)
        
        # Text generation
        start = time.time()
        agent.generate_response("Test query")
        results['text_generation'].append(time.time() - start)
        
        if (i + 1) % 5 == 0:
            print(f"Progress: {i + 1}/{num_iterations}")
    
    # Calculate statistics
    stats = {}
    for key, times in results.items():
        stats[key] = {
            'mean': np.mean(times),
            'std': np.std(times),
            'min': np.min(times),
            'max': np.max(times)
        }
    
    # Plot results
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    for idx, (key, times) in enumerate(results.items()):
        axes[idx].plot(times, marker='o')
        axes[idx].set_title(f'{key.replace("_", " ").title()} Performance')
        axes[idx].set_xlabel('Iteration')
        axes[idx].set_ylabel('Time (seconds)')
        axes[idx].grid(True, alpha=0.3)
        axes[idx].axhline(y=stats[key]['mean'], color='r', linestyle='--', label=f"Mean: {stats[key]['mean']:.3f}s")
        axes[idx].legend()
    
    plt.tight_layout()
    plt.show()
    
    return stats

# Run benchmark
print("Benchmarking fast agent...")
stats = benchmark_agent(agent_fast, num_iterations=10)

print("\nPerformance Statistics:")
print("=" * 60)
for component, metrics in stats.items():
    print(f"\n{component.upper()}:")
    for metric, value in metrics.items():
        print(f"  {metric}: {value:.3f}s")

## 8. Export and Deployment

Tips for deploying the agent to edge devices.

In [None]:
print("Deployment Recommendations:")
print("=" * 60)
print("""
1. Edge Device Selection:
   - Raspberry Pi 4 (8GB): Use 'tiny' models
   - Jetson Nano: Use 'base' models with GPU
   - Jetson Xavier: Use 'small' models with GPU

2. Model Optimization:
   - Convert to ONNX for faster inference
   - Use quantization (INT8) for smaller models
   - Implement model caching

3. Power Management:
   - Batch process frames (skip frames if needed)
   - Use trigger-based activation (only process on command)
   - Implement sleep modes

4. Connectivity:
   - Implement local-first processing
   - Use cloud offloading for complex queries
   - Cache common responses

5. Real-time Optimization:
   - Use threading for parallel processing
   - Implement frame skipping (process every Nth frame)
   - Pre-warm models on startup
""")

# Example: Save agent configuration
import json

config = {
    "whisper_model": "base",
    "clip_model": "openai/clip-vit-base-patch32",
    "gpt2_model": "gpt2",
    "device": "cuda",
    "optimization": {
        "frame_skip": 3,
        "batch_size": 1,
        "use_fp16": True
    }
}

with open('agent_config.json', 'w') as f:
    json.dump(config, f, indent=2)

print("\n✅ Configuration saved to agent_config.json")

## 9. Next Steps

Ideas for further development:

1. **Add More Modalities**
   - GPS/Location context
   - Accelerometer/Gyroscope data
   - Temperature/Light sensors

2. **Enhance Vision**
   - Add OCR (Optical Character Recognition)
   - Implement face recognition
   - Add QR code scanning

3. **Improve Language**
   - Use larger language models (LLaMA, Mistral)
   - Add text-to-speech output
   - Implement multi-turn dialogue

4. **Add Features**
   - Translation capabilities
   - Object tracking
   - Gesture recognition
   - Voice commands

5. **Optimize for Production**
   - Model compression
   - Edge deployment
   - Real-time streaming
   - Battery optimization