# SmartGlass AI Agent - Meta Ray-Ban Testing Notebook

This notebook provides a complete environment for testing the SmartGlass AI Agent with Meta Ray-Ban smart glasses.

## Features
- 🎤 **Whisper**: Speech-to-text transcription
- 👁️ **CLIP**: Vision-language understanding
- 💬 **GPT-2**: Natural language generation

## Compatible with:
- Meta Ray-Ban Wayfarer
- Meta Ray-Ban Stories
- Any smart glasses with camera and audio capabilities

## 1. Setup and Installation

First, let's install all required dependencies.

In [None]:
# Install dependencies
!pip install -q torch transformers openai-whisper pillow numpy soundfile scipy opencv-python librosa

print("✅ All dependencies installed successfully!")

### Clone the Repository

Clone the SmartGlass AI Agent repository to access the code.

In [None]:
# Clone the repository
!git clone https://github.com/farmountain/SmartGlass-AI-Agent.git
%cd SmartGlass-AI-Agent

# Add src to Python path
import sys
sys.path.append('src')

print("✅ Repository cloned and path configured!")

### Import Modules

Import the SmartGlass AI Agent components.

In [None]:
from smartglass_agent import SmartGlassAgent
from whisper_processor import WhisperAudioProcessor
from clip_vision import CLIPVisionProcessor
from gpt2_generator import GPT2TextGenerator

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from IPython.display import Audio, display

print("✅ All modules imported successfully!")

## 2. Initialize SmartGlass AI Agent

Initialize the agent with optimal settings for Colab environment.

In [None]:
# Initialize the SmartGlass AI Agent
agent = SmartGlassAgent(
    whisper_model="base",  # Good balance of speed and accuracy
    clip_model="openai/clip-vit-base-patch32",
    gpt2_model="gpt2",
    device=None  # Auto-detect GPU if available
)

print("\n✅ SmartGlass AI Agent ready!")

### Display Agent Information

In [None]:
# Display agent components information
info = agent.get_agent_info()

print("Agent Components:")
print("=" * 60)
for component, details in info.items():
    print(f"\n{component.upper()}:")
    for key, value in details.items():
        print(f"  {key}: {value}")

## 3. Test Individual Components

Test each component separately before full integration.

### 3.1 Test Vision (CLIP)

In [None]:
# Create a test image (you can upload your own)
def create_test_image():
    """Create a sample test image."""
    img = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
    return Image.fromarray(img)

# Create or upload an image
test_image = create_test_image()

# Display the image
plt.figure(figsize=(5, 5))
plt.imshow(test_image)
plt.axis('off')
plt.title('Test Image')
plt.show()

# Test scene understanding
print("\nScene Analysis:")
print("=" * 60)
scene_result = agent.analyze_scene(test_image)
print(scene_result.get('description', 'No description available'))

# Test object classification
print("\nObject Classification:")
print("=" * 60)
objects = ['person', 'car', 'building', 'nature', 'abstract pattern']
identified = agent.identify_object(test_image, objects)
print(f"Identified as: {identified}")

### Upload Your Own Image (from Meta Ray-Ban)

In [None]:
from google.colab import files
from PIL import Image
import io

# Upload an image from your Meta Ray-Ban smart glasses
print("Upload an image captured from your Meta Ray-Ban smart glasses:")
uploaded = files.upload()

# Process the first uploaded image
if uploaded:
    image_name = list(uploaded.keys())[0]
    image_data = uploaded[image_name]
    rayban_image = Image.open(io.BytesIO(image_data))
    
    # Display the image
    plt.figure(figsize=(8, 6))
    plt.imshow(rayban_image)
    plt.axis('off')
    plt.title('Meta Ray-Ban Captured Image')
    plt.show()
    
    # Analyze the scene
    print("\nAnalyzing Meta Ray-Ban image...")
    print("=" * 60)
    scene_analysis = agent.analyze_scene(rayban_image)
    print(scene_analysis.get('description', 'No description'))
    
    print("\n✅ Image analysis complete!")
else:
    print("No image uploaded.")

### 3.2 Test Audio (Whisper)

Test speech-to-text transcription.

In [None]:
# Upload an audio file from Meta Ray-Ban
print("Upload an audio file recorded from your Meta Ray-Ban smart glasses:")
uploaded_audio = files.upload()

if uploaded_audio:
    audio_name = list(uploaded_audio.keys())[0]
    
    # Transcribe the audio
    print(f"\nTranscribing audio: {audio_name}")
    print("=" * 60)
    
    transcription = agent.process_audio_command(audio_name)
    
    print(f"Transcribed text: {transcription}")
    print("\n✅ Audio transcription complete!")
else:
    print("No audio file uploaded.")
    print("\nExample usage:")
    print("  text = agent.process_audio_command('audio_file.wav')")

### 3.3 Test Text Generation (GPT-2)

In [None]:
# Test text generation
print("Testing GPT-2 Text Generation")
print("=" * 60)

# Test query
test_query = "What are smart glasses used for?"
print(f"\nQuery: {test_query}")

# Generate response
response = agent.generate_response(test_query)
print(f"\nResponse: {response}")

print("\n✅ Text generation test complete!")

## 4. Full Multimodal Integration

Test the complete SmartGlass experience with audio, vision, and language.

### Scenario 1: Visual Question Answering

In [None]:
# Scenario: User asks about what they're seeing
print("Scenario: Visual Question Answering")
print("=" * 60)

# Use previously uploaded image or create a test one
if 'rayban_image' in locals():
    query_image = rayban_image
else:
    query_image = create_test_image()

# Display the image
plt.figure(figsize=(6, 4))
plt.imshow(query_image)
plt.axis('off')
plt.title('What the smart glasses see')
plt.show()

# User asks a question
user_question = "What do you see in this image?"
print(f"\nUser: {user_question}")

# Get response with visual context
result = agent.help_identify(query_image, text_query=user_question)
print(f"\nAgent: {result}")

### Scenario 2: Complete Multimodal Query

In [None]:
# Scenario: Combine audio command with visual input
print("Scenario: Multimodal Query (Text + Vision)")
print("=" * 60)

# Simulate a complete multimodal interaction
text_command = "Describe what I'm looking at and tell me if it's interesting"

# Process multimodal query
result = agent.process_multimodal_query(
    text_query=text_command,
    image_input=query_image
)

print(f"\nQuery: {result['query']}")
print(f"\nVisual Context: {result['visual_context']}")
print(f"\nResponse: {result['response']}")

### Scenario 3: Interactive Conversation

In [None]:
# Scenario: Have a conversation with the agent
print("Scenario: Interactive Conversation")
print("=" * 60)

# Clear previous conversation
agent.clear_conversation_history()

# Have a conversation
conversation = [
    "Hello! Can you help me?",
    "What time is it?",
    "Thanks for your help!"
]

print("\nConversation:")
print("-" * 60)

for user_msg in conversation:
    print(f"\n👤 User: {user_msg}")
    response = agent.generate_response(user_msg)
    print(f"🤖 Agent: {response}")

# Show conversation history
print("\n" + "=" * 60)
print("Conversation History:")
print("=" * 60)
history = agent.get_conversation_history()
for entry in history:
    print(entry)

## 5. Custom Use Cases for Meta Ray-Ban

Explore specific use cases for Meta Ray-Ban smart glasses.

### Use Case 1: Object Recognition Assistant

In [None]:
# Object recognition for everyday items
print("Use Case: Object Recognition Assistant")
print("=" * 60)

# Define common objects
common_objects = [
    'phone', 'keys', 'wallet', 'book', 'cup', 'bottle',
    'laptop', 'tablet', 'headphones', 'watch', 'glasses',
    'pen', 'notebook', 'bag', 'card'
]

print(f"\nRecognizable objects: {', '.join(common_objects)}")
print("\nUsage:")
print("  # Upload image from Meta Ray-Ban")
print("  object = agent.identify_object(image, common_objects)")
print("  print(f'I see your {object}')")

### Use Case 2: Navigation Assistant

In [None]:
# Navigation and scene understanding
print("Use Case: Navigation Assistant")
print("=" * 60)

navigation_queries = [
    "indoor hallway",
    "outdoor street",
    "staircase",
    "elevator",
    "door or entrance",
    "crosswalk",
    "sidewalk",
    "parking area"
]

print(f"\nNavigation contexts: {', '.join(navigation_queries)}")
print("\nUsage:")
print("  # Analyze scene for navigation")
print("  scene = agent.analyze_scene(image, custom_queries=navigation_queries)")
print("  print(f\"You are at: {scene['best_match']}\"")

### Use Case 3: Text Reading Assistant

In [None]:
# Text and document recognition
print("Use Case: Text Reading Assistant")
print("=" * 60)

text_queries = [
    "document or paper",
    "book or magazine",
    "sign or label",
    "menu",
    "screen or display",
    "handwritten note"
]

print(f"\nText content types: {', '.join(text_queries)}")
print("\nUsage:")
print("  # Identify text content")
print("  content_type = agent.identify_object(image, text_queries)")
print("  print(f'I see a {content_type}')")
print("  # Then use OCR or other tools for actual text extraction")

## 6. Tips for Meta Ray-Ban Integration

### Best Practices:

1. **Image Quality**: Ensure good lighting when capturing images with Ray-Ban
2. **Audio Recording**: Record in quiet environments for better transcription
3. **Processing Time**: Base models provide good balance of speed/accuracy
4. **Battery Life**: Consider model size vs. battery consumption
5. **Privacy**: Always respect privacy when using smart glasses in public

### Performance Optimization:

- Use `whisper_model="tiny"` for faster transcription
- Use `whisper_model="base"` for better accuracy (recommended)
- Enable GPU acceleration in Colab (Runtime → Change runtime type → GPU)
- Process images at lower resolution for faster results

### Next Steps:

1. Deploy to edge device (Raspberry Pi, Jetson Nano)
2. Create real-time processing pipeline
3. Add text-to-speech for audio responses
4. Implement custom training for specific use cases
5. Integrate with Meta Ray-Ban API (if available)

## 7. Resources and Documentation

- **GitHub Repository**: https://github.com/farmountain/SmartGlass-AI-Agent
- **Whisper Documentation**: https://github.com/openai/whisper
- **CLIP Documentation**: https://github.com/openai/CLIP
- **GPT-2 Documentation**: https://huggingface.co/gpt2
- **Meta Ray-Ban**: https://www.ray-ban.com/usa/discover-ray-ban-stories

### Support

For issues and questions, please visit the GitHub repository.

---

## 🎉 You're all set!

Your SmartGlass AI Agent is ready to use with Meta Ray-Ban smart glasses.

**Happy testing!** 👓🤖