# Ai4 Multimodal Agents Workshop - No API Key Required! 🚀

This version of the multimodal agents workshop uses **pre-created embeddings** and does not require VoyageAI API keys, making it accessible to everyone!

**Workshop Overview:**
- Build a multimodal AI agent that can analyze documents and images
- Use MongoDB Atlas Vector Search for retrieval 
- Implement function calling with Gemini 2.0 Flash
- Add memory and ReAct reasoning capabilities
- **NEW**: Works without VoyageAI API keys using pre-created embeddings

## 🎯 Learning Objectives
By the end of this workshop, you will be able to:
- Load and use pre-created multimodal embeddings
- Set up MongoDB Atlas vector search indexes
- Build an AI agent with tool calling capabilities
- Implement session-based memory for conversational agents
- Create a ReAct (Reasoning + Acting) agent architecture
- **NEW**: Understand embedding fallback strategies for production systems

In [None]:
# Initialize progress tracking and lab utilities
import sys
import os

# Force load from development source if available
dev_path = "/Users/michael.lynn/code/mongodb/developer-days/jupyter-utils/jupyter-lab-progress"
if os.path.exists(dev_path) and dev_path not in sys.path:
    sys.path.insert(0, dev_path)

# Remove any cached modules
modules_to_remove = [key for key in sys.modules.keys() if key.startswith('jupyter_lab_progress')]
for module in modules_to_remove:
    del sys.modules[module]

try:
    from jupyter_lab_progress import (
        LabProgress, LabValidator, show_info, show_warning, 
        show_success, show_error, show_hint
    )
    show_success("Progress tracking libraries loaded successfully! 🎉")
except ImportError as e:
    print(f"Warning: Could not import progress tracking: {e}")
    print("Installing basic fallbacks...")
    def show_info(msg, title=None): print(f"ℹ️ {title or 'Info'}: {msg}")
    def show_warning(msg, title=None): print(f"⚠️ {title or 'Warning'}: {msg}")
    def show_success(msg, title=None): print(f"✅ {title or 'Success'}: {msg}")
    def show_error(msg, title=None): print(f"❌ {title or 'Error'}: {msg}")
    def show_hint(msg, title=None): print(f"💡 {title or 'Hint'}: {msg}")

In [None]:
# Set up comprehensive lab progress tracking
try:
    progress = LabProgress(
        steps=[
            "Environment Setup",
            "Fallback Embeddings Setup", 
            "Pre-created Embeddings Loading",
            "Data Ingestion",
            "Vector Index Creation",
            "Agent Tools Setup", 
            "LLM Integration",
            "Basic Agent Testing",
            "Memory Implementation",
            "ReAct Agent Enhancement"
        ],
        lab_name="Multimodal Agents - No API Keys Required",
        persist=True
    )
    
    # Set up validation
    validator = LabValidator(progress_tracker=progress)
    
    show_success("Lab progress tracking initialized!")
    show_info(f"Workshop: {progress.lab_name}")
    show_info(f"Total steps: {len(progress.steps)}")
    
except NameError:
    show_info("Running without progress tracking")

# Step 1: Environment Setup

Let's start by setting up our environment and connecting to MongoDB Atlas.

**Note**: This version only requires `MONGODB_URI` and `GOOGLE_API_KEY` - no VoyageAI API key needed!

In [None]:
# Show step guidance
try:
    progress.show_step_tips("Environment Setup")
except (NameError, AttributeError):
    show_info("Setting up environment and connections...")

In [None]:
import os
from pymongo import MongoClient

# Load environment variables from .env file (only MongoDB and Google API keys required)
from pathlib import Path

env_path = Path('.') / '.env'
if env_path.exists():
    # Load variables from .env
    with open(env_path) as f:
        for line in f:
            if '=' in line and not line.strip().startswith('#'):
                key, value = line.strip().split('=', 1)
                os.environ[key] = value.strip('\"\'')
    show_info("Loaded environment variables from .env file")
else:
    show_warning(".env file not found, using environment variables from system")

# Check required environment variables (VoyageAI not required!)
required_vars = ["MONGODB_URI", "GOOGLE_API_KEY"]
missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    show_error(f"❌ Missing required environment variables: {missing_vars}")
    show_hint("Create a .env file with:\nMONGODB_URI=your_mongodb_connection_string\nGOOGLE_API_KEY=your_google_api_key")
    raise ValueError(f"Missing required environment variables: {missing_vars}")

show_success("All required environment variables are set!")
show_info("✓ MONGODB_URI: Available")
show_info("✓ GOOGLE_API_KEY: Available")
show_success("✅ No VoyageAI API key required for this workshop!")

# Validate connection variables
try:
    validator.validate_variable_exists("MONGODB_URI", {"MONGODB_URI": os.getenv("MONGODB_URI")}, str)
except NameError:
    pass

In [None]:
# Connect to MongoDB Atlas
MONGODB_URI = os.getenv("MONGODB_URI")
SERVERLESS_URL = os.getenv("SERVERLESS_URL")  # Optional fallback
LLM_PROVIDER = "google"

# Initialize MongoDB client
try:
    mongodb_client = MongoClient(MONGODB_URI)
    # Test the connection
    result = mongodb_client.admin.command("ping")
    
    if result.get("ok") == 1:
        show_success("Successfully connected to MongoDB Atlas! 🎉")
        
        # Mark step as complete
        try:
            progress.mark_done("Environment Setup", score=100, notes="MongoDB connection successful")
        except NameError:
            pass
    else:
        show_error("MongoDB connection failed")
        
except Exception as e:
    show_error(f"Connection error: {e}")
    show_hint("Check your connection string and network access settings", 
             "Connection Troubleshooting")

# Step 2: Fallback Embeddings Setup

Set up a lightweight embedding system that works without VoyageAI API keys.

**Strategy**:
- **Document embeddings**: Load from pre-created `data/embeddings.json` file
- **Query embeddings**: Use sentence-transformers as a lightweight fallback

In [None]:
# Show step guidance
try:
    progress.show_step_tips("Fallback Embeddings Setup")
except (NameError, AttributeError):
    show_info("Setting up fallback embedding system...")

In [None]:
import numpy as np
import json
from PIL import Image
from pathlib import Path

# Install sentence-transformers if not available (fallback for query embeddings)
try:
    from sentence_transformers import SentenceTransformer
    show_success("sentence-transformers available for query embedding fallback!")
    # Use a lightweight model for query embeddings
    query_encoder = SentenceTransformer('all-MiniLM-L6-v2')  # Fast, 384-dim embeddings
    show_success("Query embedding model loaded!")
except ImportError:
    show_warning("sentence-transformers not available, installing...")
    import subprocess
    subprocess.check_call(["pip", "install", "sentence-transformers"])
    from sentence_transformers import SentenceTransformer
    query_encoder = SentenceTransformer('all-MiniLM-L6-v2')
    show_success("sentence-transformers installed and ready!")

# Normalize vector function (MongoDB doesn't auto-normalize)
def normalize_vector(v):
    """Normalize a vector to unit length."""
    norm = np.linalg.norm(v)
    return v / norm if norm > 0 else v

show_success("Vector normalization utility ready")

# Mark step complete
try:
    progress.mark_done("Fallback Embeddings Setup", score=100, 
                      notes="Fallback embedding system configured")
except NameError:
    pass

# Step 3: Pre-created Embeddings Loading

Load the pre-created embeddings instead of generating them with VoyageAI.

In [None]:
# Show step guidance
try:
    progress.show_step_tips("Pre-created Embeddings Loading")
except (NameError, AttributeError):
    show_info("Loading pre-created embeddings...")

In [None]:
# Define embedding generation function with fallback
def generate_embedding_fallback(data, input_type="document"):
    """
    Generate embedding using fallback methods - no VoyageAI API required!
    
    Args:
        data: PIL Image or text string
        input_type: "document" or "query"
    
    Returns:
        list: Normalized embedding vector
    """
    try:
        if isinstance(data, Image.Image):
            # For images, we'll use pre-created embeddings (loaded separately)
            show_warning("Image embedding requested, but we use pre-created embeddings for images")
            return None
        else:
            # For text queries, use sentence-transformers
            embedding = query_encoder.encode(str(data))
            
            # Note: sentence-transformers produces 384-dim embeddings, 
            # but our pre-created embeddings are 1024-dim
            # We'll need to handle this dimension mismatch
            
            # Pad to 1024 dimensions to match document embeddings
            if len(embedding) < 1024:
                padding = np.zeros(1024 - len(embedding))
                embedding = np.concatenate([embedding, padding])
            elif len(embedding) > 1024:
                embedding = embedding[:1024]  # Truncate if too long
            
            # Normalize the embedding
            normalized_embedding = normalize_vector(np.array(embedding)).tolist()
            
            return normalized_embedding
            
    except Exception as e:
        show_error(f"Fallback embedding generation failed: {e}")
        return None

show_success("Fallback embedding generation function ready!")

In [None]:
# Load pre-created embeddings
embeddings_file = Path("data/embeddings.json")

if not embeddings_file.exists():
    show_error(f"Pre-created embeddings file not found: {embeddings_file}")
    show_hint("Make sure the data/embeddings.json file exists in your working directory", 
             "Missing File")
    raise FileNotFoundError(f"Required embeddings file not found: {embeddings_file}")

try:
    with open(embeddings_file, "r") as f:
        embedded_docs = json.load(f)
    
    show_success(f"Loaded {len(embedded_docs)} pre-created document embeddings!")
    
    # Analyze the structure
    if embedded_docs:
        sample = embedded_docs[0]
        show_info(f"Sample document keys: {list(sample.keys())}")
        show_info(f"Embedding dimensions: {len(sample.get('embedding', []))}")
        
        # Add missing page_number field if needed
        for i, doc in enumerate(embedded_docs):
            if 'page_number' not in doc:
                # Extract page number from image filename (e.g., "data/images/1.png" -> 1)
                key = doc.get('key', '')
                if 'images/' in key:
                    try:
                        page_num = int(key.split('/')[-1].split('.')[0])
                        doc['page_number'] = page_num
                    except:
                        doc['page_number'] = i + 1
                else:
                    doc['page_number'] = i + 1
        
        show_success("Added missing page_number fields to documents")
    
    # Validate embeddings
    try:
        validator.validate_custom(
            len(embedded_docs) > 0,
            "Pre-created embeddings loaded successfully",
            "No embeddings found in the file"
        )
        
        progress.mark_done("Pre-created Embeddings Loading", score=100, 
                          notes=f"Loaded {len(embedded_docs)} pre-created embeddings")
    except NameError:
        pass
        
except Exception as e:
    show_error(f"Failed to load pre-created embeddings: {e}")
    show_hint("Check that the embeddings.json file is valid JSON", "File Format")
    raise

# Step 4: Data Ingestion

Ingest the pre-created embeddings into MongoDB Atlas.

In [None]:
# Database configuration  
DB_NAME = "mongodb_aiewf"
COLLECTION_NAME = "multimodal_workshop_fallback"

# Connect to the collection
collection = mongodb_client[DB_NAME][COLLECTION_NAME]

show_info(f"Connected to database: {DB_NAME}")
show_info(f"Using collection: {COLLECTION_NAME}")

In [None]:
# Ingest data into MongoDB
show_info("📚 Reference: https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.insert_many")

try:
    # Clear existing documents
    delete_result = collection.delete_many({})
    show_info(f"Deleted {delete_result.deleted_count} existing documents")
    
    # Bulk insert documents into the collection
    insert_result = collection.insert_many(embedded_docs)
    
    # Verify insertion
    doc_count = collection.count_documents({})
    
    show_success(f"Successfully ingested {doc_count} documents into {COLLECTION_NAME}! 🎉")
    
    # Validate ingestion
    try:
        validator.validate_custom(
            doc_count == len(embedded_docs),
            "All documents ingested successfully",
            f"Document count mismatch: expected {len(embedded_docs)}, got {doc_count}"
        )
        
        progress.mark_done("Data Ingestion", score=100, 
                          notes=f"Ingested {doc_count} documents")
    except NameError:
        pass
        
except Exception as e:
    show_error(f"Data ingestion failed: {e}")
    show_hint("Check your MongoDB connection and permissions", "Database Error")

# Step 5: Vector Search Index Creation

Create a vector search index to enable similarity search on our pre-created embeddings.

In [None]:
# Show step guidance
try:
    progress.show_step_tips("Vector Index Creation")
except (NameError, AttributeError):
    show_info("Creating vector search index...")

In [None]:
VS_INDEX_NAME = "vector_index_fallback"

# Define vector index configuration
model = {
    "name": VS_INDEX_NAME,
    "type": "vectorSearch",
    "definition": {
        "fields": [
            {
                "type": "vector",
                "path": "embedding",
                "numDimensions": 1024,  # Pre-created embeddings are 1024-dim
                "similarity": "cosine",
            }
        ]
    },
}

show_info(f"Index configuration: {VS_INDEX_NAME}")
show_info("Vector field: embedding")
show_info("Dimensions: 1024 (Pre-created embeddings)")
show_info("Similarity metric: cosine")

In [None]:
# Create the vector search index
show_info("📚 Reference: https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.create_search_index")

try:
    # Check if index already exists
    existing_indexes = list(collection.list_search_indexes())
    index_exists = any(idx.get('name') == VS_INDEX_NAME for idx in existing_indexes)
    
    if index_exists:
        show_info(f"Index '{VS_INDEX_NAME}' already exists")
    else:
        show_info("Creating vector search index...")
        
        # Create the vector search index
        collection.create_search_index(model=model)
        
        show_success(f"Vector search index '{VS_INDEX_NAME}' created successfully! 🎉")
    
    # Mark step complete
    try:
        progress.mark_done("Vector Index Creation", score=100, 
                          notes=f"Index '{VS_INDEX_NAME}' ready")
    except NameError:
        pass
        
except Exception as e:
    show_error(f"Index creation failed: {e}")
    show_hint("Index creation may take a few minutes. Check Atlas UI to monitor progress", 
             "Index Status")

In [None]:
# Verify index status
try:
    indexes = list(collection.list_search_indexes())
    
    show_info("Current search indexes:")
    for idx in indexes:
        name = idx.get('name', 'Unknown')
        status = idx.get('status', 'Unknown')
        
        if status == 'READY':
            show_success(f"✅ {name}: {status}")
        else:
            show_warning(f"⏳ {name}: {status}")
    
    # Check if our index is ready
    our_index = next((idx for idx in indexes if idx.get('name') == VS_INDEX_NAME), None)
    
    if our_index and our_index.get('status') == 'READY':
        show_success(f"Index '{VS_INDEX_NAME}' is ready for vector search! 🚀")
    else:
        show_warning(f"Index '{VS_INDEX_NAME}' is still building. Please wait...")
        show_hint("Index creation can take several minutes. Check the Atlas UI for progress.", 
                 "Index Building")
        
except Exception as e:
    show_error(f"Failed to check index status: {e}")

# Step 6: Agent Tools Setup

Create the vector search tool using fallback embedding generation for queries.

In [None]:
from typing import List

show_info("📚 Reference: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#ann-examples")

In [None]:
def get_information_for_question_answering(user_query: str) -> List[str]:
    """
    Retrieve information using vector search to answer a user query.
    Uses fallback embedding generation for queries (no API keys required).

    Args:
        user_query (str): The user's query string.

    Returns:
        List[str]: List of image file paths retrieved from vector search.
    """
    try:
        show_info(f"🔍 Searching for: {user_query}")
        
        # Generate query embedding using fallback method
        query_embedding = generate_embedding_fallback(user_query, input_type="query")
        
        if not query_embedding:
            show_error("Failed to generate query embedding")
            return []
        
        show_success(f"Generated query embedding: {len(query_embedding)} dimensions")

        # Define aggregation pipeline with $vectorSearch and $project stages
        pipeline = [
            {
                "$vectorSearch": {
                    "index": VS_INDEX_NAME,
                    "path": "embedding",
                    "queryVector": query_embedding,
                    "numCandidates": 150,  # Higher for better recall
                    "limit": 2,  # Top results
                }
            },
            {
                "$project": {
                    "_id": 0,
                    "key": 1,
                    "width": 1,
                    "height": 1,
                    "page_number": 1,
                    "score": {"$meta": "vectorSearchScore"},
                }
            },
        ]

        # Execute the aggregation pipeline
        results = list(collection.aggregate(pipeline))
        
        # Extract image keys and scores
        keys = [result["key"] for result in results]
        scores = [result["score"] for result in results]
        
        show_success(f"Found {len(keys)} relevant images")
        for i, (key, score) in enumerate(zip(keys, scores)):
            show_info(f"  {i+1}. {key} (score: {score:.4f})")
        
        return keys
        
    except Exception as e:
        show_error(f"Vector search failed: {e}")
        return []

In [None]:
# Define function declaration for Gemini function calling
show_info("📚 Reference: https://ai.google.dev/gemini-api/docs/function-calling#step_1_define_function_declaration")

# Define the function declaration
get_information_for_question_answering_declaration = {
    "name": "get_information_for_question_answering",
    "description": "Retrieve information using vector search to answer a user query. Uses pre-created embeddings and fallback query encoding.",
    "parameters": {
        "type": "object",
        "properties": {
            "user_query": {
                "type": "string",
                "description": "Query string to use for vector search",
            }
        },
        "required": ["user_query"],
    },
}

show_success("Function declaration created for Gemini integration!")

# Mark step complete
try:
    progress.mark_done("Agent Tools Setup", score=100, 
                      notes="Vector search tool with fallback embedding ready")
except NameError:
    pass

# Step 7: LLM Integration

Set up Gemini 2.0 Flash with function calling capabilities.

In [None]:
from google import genai
from google.genai import types
from google.genai.types import FunctionCall

LLM = "gemini-2.0-flash"

try:
    # Use GOOGLE_API_KEY from environment (required)
    api_key = os.getenv("GOOGLE_API_KEY")
    
    # Initialize Gemini client
    gemini_client = genai.Client(api_key=api_key)
    
    show_success(f"Gemini client initialized with model: {LLM}")
    show_info("Using GOOGLE_API_KEY from environment")
    
    # Validate client setup
    try:
        validator.validate_variable_exists('gemini_client', locals(), genai.Client)
    except NameError:
        pass
        
except Exception as e:
    show_error(f"LLM setup failed: {e}")
    show_hint("Check your GOOGLE_API_KEY in .env file", "API Key Error")

In [None]:
# Create generation configuration
try:
    tools = types.Tool(
        function_declarations=[get_information_for_question_answering_declaration]
    )
    tools_config = types.GenerateContentConfig(tools=[tools], temperature=0.0)
    
    show_success("Generation configuration created with function calling enabled!")
    show_info("Temperature: 0.0 (deterministic responses)")
    show_info("Available tools: get_information_for_question_answering")
    
    # Mark step complete
    try:
        progress.mark_done("LLM Integration", score=100, 
                          notes="Gemini 2.0 Flash configured with function calling")
    except NameError:
        pass
        
except Exception as e:
    show_error(f"Configuration failed: {e}")

# Step 8: Basic Agent Implementation

Create the core agent functions for tool selection and response generation.

In [None]:
show_info("📚 Reference: https://ai.google.dev/gemini-api/docs/function-calling#step_4_create_user_friendly_response")

In [None]:
def select_tool(messages: List) -> FunctionCall | None:
    """
    Use an LLM to decide which tool to call.

    Args:
        messages (List): Messages as a list

    Returns:
        FunctionCall: Function call object or None
    """
    try:
        system_prompt = [
            (
                "You're an AI assistant. Based on the given information, decide which tool to use. "
                "If the user is asking to explain an image, don't call any tools unless that would help you better explain the image. "
                "Here is the provided information:\n"
            )
        ]
        
        # Input to the LLM
        contents = system_prompt + messages
        
        # Generate response using Gemini
        response = gemini_client.models.generate_content(
            model=LLM, contents=contents, config=tools_config
        )
        
        # Extract and return the function call
        if response.candidates and response.candidates[0].content.parts:
            return response.candidates[0].content.parts[0].function_call
        
        return None
        
    except Exception as e:
        show_error(f"Tool selection failed: {e}")
        return None

show_success("Tool selection function created!")

In [None]:
def generate_answer(user_query: str, images: List = []) -> str:
    """
    Execute any tools and generate a response.

    Args:
        user_query (str): User's query string
        images (List): List of image file paths. Defaults to [].

    Returns:
        str: LLM-generated response
    """
    try:
        # Use select_tool to determine if we need to call any tools
        tool_call = select_tool([user_query])
        
        # If a tool call is found and it's our vector search function
        if (
            tool_call is not None
            and tool_call.name == "get_information_for_question_answering"
        ):
            show_info(f"🛠️ Agent calling tool: {tool_call.name}")
            
            # Call the tool with the extracted arguments
            tool_images = get_information_for_question_answering(**tool_call.args)
            
            # Add retrieved images to the input images
            images.extend(tool_images)

        # Prepare system prompt
        system_prompt = (
            "Answer the questions based on the provided context only. "
            "If the context is not sufficient, say I DON'T KNOW. "
            "DO NOT use any other information to answer the question."
        )
        
        # Load and validate images
        valid_images = []
        for img_path in images:
            try:
                img = Image.open(img_path)
                valid_images.append(img)
            except Exception as e:
                show_warning(f"Failed to load image {img_path}: {e}")
        
        # Prepare contents for the LLM
        contents = [system_prompt] + [user_query] + valid_images

        # Get the response from the LLM
        response = gemini_client.models.generate_content(
            model=LLM,
            contents=contents,
            config=types.GenerateContentConfig(temperature=0.0),
        )
        
        return response.text
        
    except Exception as e:
        show_error(f"Answer generation failed: {e}")
        return "I apologize, but I encountered an error while processing your question."

show_success("Answer generation function created!")

In [None]:
def execute_agent(user_query: str, images: List = []) -> None:
    """
    Execute the agent and display the response.

    Args:
        user_query (str): User query
        images (List, optional): List of image file paths. Defaults to [].
    """
    try:
        show_info(f"🤖 Processing query: {user_query}")
        
        response = generate_answer(user_query, images)
        
        show_success("🤖 Agent Response:")
        print(f"\n{response}\n")
        
    except Exception as e:
        show_error(f"Agent execution failed: {e}")

show_success("Agent execution function created!")

# Mark step complete
try:
    progress.mark_done("Basic Agent Testing", score=100, 
                      notes="Agent functions with fallback embeddings ready")
except NameError:
    pass

In [None]:
# Test the agent with different types of queries
show_info("🧪 Testing the agent with sample queries...")

# Test 1: Text-based query requiring vector search
show_info("Test 1: Factual question requiring document search")
execute_agent("What is the Pass@1 accuracy of DeepSeek R1 on AIME 2024?")

In [None]:
# Test 2: Simple image analysis if we have the original images
import os

# Check if we have access to any of the original images
test_images = ["data/images/1.png", "data/images/2.png", "data/images/3.png"]
available_images = [img for img in test_images if os.path.exists(img)]

if available_images:
    show_info("Test 2: Document page analysis")
    execute_agent("What can you see in this document page?", [available_images[0]])
else:
    show_warning("No original document images available for testing")
    show_hint("The pre-created embeddings reference images that may not exist locally")

# Step 9: Memory Implementation

Add conversational memory to enable multi-turn conversations with context retention.

In [None]:
from datetime import datetime

# Set up history collection
history_collection = mongodb_client[DB_NAME]["history_fallback"]

show_info(f"Setting up conversation memory in: {DB_NAME}.history_fallback")
show_info("📚 Reference: https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.create_index")

In [None]:
# Create index for efficient session queries
try:
    # Create index on session_id field
    history_collection.create_index("session_id")
    
    show_success("Session index created for conversation history!")
    
except Exception as e:
    show_error(f"Index creation failed: {e}")

In [None]:
def store_chat_message(session_id: str, role: str, type: str, content: str) -> None:
    """
    Create chat history document and store it in MongoDB.

    Args:
        session_id (str): Session ID
        role (str): Message role, one of 'user' or 'agent'
        type (str): Type of message, one of 'text' or 'image'
        content (str): Content of the message (text or image path)
    """
    try:
        # Create message document
        message = {
            "session_id": session_id,
            "role": role,
            "type": type,
            "content": content,
            "timestamp": datetime.now(),
        }
        
        # Insert message into history collection
        history_collection.insert_one(message)
        
    except Exception as e:
        show_error(f"Failed to store chat message: {e}")

show_success("Chat message storage function created!")

In [None]:
def retrieve_session_history(session_id: str) -> List:
    """
    Retrieve chat history for a particular session.

    Args:
        session_id (str): Session ID

    Returns:
        List: List of messages (text and images)
    """
    try:
        show_info("📚 Reference: https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor.sort")
        
        # Query history collection and sort by timestamp
        cursor = history_collection.find({"session_id": session_id}).sort("timestamp", 1)
        
        messages = []
        if cursor:
            for msg in cursor:
                # If message type is text, append content as is
                if msg["type"] == "text":
                    messages.append(msg["content"])
                # If message type is image, open and append the image
                elif msg["type"] == "image":
                    try:
                        messages.append(Image.open(msg["content"]))
                    except Exception as e:
                        show_warning(f"Could not load image {msg['content']}: {e}")
        
        return messages
        
    except Exception as e:
        show_error(f"Failed to retrieve session history: {e}")
        return []

show_success("Session history retrieval function created!")

In [None]:
# Enhanced generate_answer function with memory
def generate_answer_with_memory(session_id: str, user_query: str, images: List = []) -> str:
    """
    Execute tools and generate response with conversation memory.

    Args:
        session_id (str): Session ID for conversation tracking
        user_query (str): User's query string
        images (List): List of image file paths. Defaults to [].

    Returns:
        str: LLM-generated response
    """
    try:
        # Retrieve conversation history
        history = retrieve_session_history(session_id)
        
        show_info(f"Retrieved {len(history)} previous messages for session {session_id}")
        
        # Determine if tools need to be called
        tool_call = select_tool(history + [user_query])
        
        if (
            tool_call is not None
            and tool_call.name == "get_information_for_question_answering"
        ):
            show_info(f"🛠️ Agent calling tool: {tool_call.name}")
            tool_images = get_information_for_question_answering(**tool_call.args)
            images.extend(tool_images)

        # Generate response with history context
        system_prompt = (
            "Answer the questions based on the provided context only. "
            "If the context is not sufficient, say I DON'T KNOW. "
            "DO NOT use any other information to answer the question."
        )
        
        contents = (
            [system_prompt]
            + history
            + [user_query]
            + [Image.open(image) for image in images if os.path.exists(image)]
        )
        
        response = gemini_client.models.generate_content(
            model=LLM,
            contents=contents,
            config=types.GenerateContentConfig(temperature=0.0),
        )
        
        answer = response.text
        
        # Store conversation in memory
        # Store user query
        store_chat_message(session_id, "user", "text", user_query)
        
        # Store image references
        for image in images:
            store_chat_message(session_id, "user", "image", image)
        
        # Store agent response
        store_chat_message(session_id, "agent", "text", answer)
        
        return answer
        
    except Exception as e:
        show_error(f"Memory-enabled answer generation failed: {e}")
        return "I apologize, but I encountered an error while processing your question."

show_success("Memory-enabled answer generation function created!")

In [None]:
# Enhanced execute_agent function with memory
def execute_agent_with_memory(session_id: str, user_query: str, images: List = []) -> None:
    """
    Execute the agent with conversation memory.

    Args:
        session_id (str): Session ID for conversation tracking
        user_query (str): User query
        images (List, optional): List of image file paths. Defaults to [].
    """
    try:
        show_info(f"🧠 Session {session_id} - Processing: {user_query}")
        
        response = generate_answer_with_memory(session_id, user_query, images)
        
        show_success("🤖 Agent Response:")
        print(f"\n{response}\n")
        
    except Exception as e:
        show_error(f"Memory-enabled agent execution failed: {e}")

show_success("Memory-enabled agent execution function created!")

# Mark step complete
try:
    progress.mark_done("Memory Implementation", score=100, 
                      notes="Conversation memory system implemented")
except NameError:
    pass

In [None]:
# Test memory-enabled agent
show_info("🧪 Testing memory-enabled agent...")

# First query in session
show_info("Test 1: Initial query")
execute_agent_with_memory(
    "session_fallback_1",
    "What is the Pass@1 accuracy of Deepseek R1 on the MATH500 benchmark?",
)

In [None]:
# Follow-up query to test memory
show_info("Test 2: Follow-up query to test memory")
execute_agent_with_memory(
    "session_fallback_1",
    "What did I just ask you?",
)

# Step 10: ReAct Agent Enhancement

Implement a ReAct (Reasoning + Acting) agent that can reason about whether it has enough information and iteratively gather more data if needed.

In [None]:
def generate_answer_react(user_query: str, images: List = []) -> str:
    """
    Implement a ReAct (Reasoning + Acting) agent with fallback embeddings.

    Args:
        user_query (str): User's query string
        images (List): List of image file paths. Defaults to [].

    Returns:
        str: LLM-generated response
    """
    try:
        show_info("🧠 Starting ReAct agent processing with fallback embeddings...")
        
        # Define reasoning prompt
        system_prompt = [
            (
                "You are an AI assistant with access to pre-created document embeddings for search. "
                "Based on the current information, decide if you have enough to answer the user query, or if you need more information. "
                "If you have enough information, respond with 'ANSWER: <your answer>'. "
                "If you need more information, respond with 'TOOL: <question for the tool>'. Keep the question concise. "
                f"User query: {user_query}\n"
                "Current information:\n"
            )
        ]
        
        # Set max iterations to prevent infinite loops
        max_iterations = 3
        current_iteration = 0
        
        # Initialize list to accumulate information
        current_information = []

        # If the user provided images, add them to current information
        if len(images) != 0:
            valid_images = []
            for image in images:
                if os.path.exists(image):
                    valid_images.append(Image.open(image))
            current_information.extend(valid_images)
            show_info(f"Added {len(valid_images)} user-provided images to context")

        # Run the reasoning → action loop
        while current_iteration < max_iterations:
            current_iteration += 1
            show_info(f"🔄 ReAct Iteration {current_iteration}:")
            
            # Generate reasoning and decision
            response = gemini_client.models.generate_content(
                model=LLM,
                contents=system_prompt + current_information,
                config=types.GenerateContentConfig(temperature=0.0),
            )
            
            decision = response.text
            show_info(f"💭 Agent decision: {decision[:100]}...")
            
            # If the agent has the final answer, return it
            if "ANSWER:" in decision:
                final_answer = decision.split("ANSWER:", 1)[1].strip()
                show_success(f"✅ Final answer reached in {current_iteration} iterations")
                return final_answer
            
            # If the agent decides to use a tool
            elif "TOOL:" in decision:
                tool_query = decision.split("TOOL:", 1)[1].strip()
                show_info(f"🛠️ Agent requesting tool with query: {tool_query}")
                
                # Use tool selection to get the function call
                tool_call = select_tool([tool_query])
                
                if (
                    tool_call is not None
                    and tool_call.name == "get_information_for_question_answering"
                ):
                    show_info(f"📊 Calling fallback-powered vector search with: {tool_call.args}")
                    
                    # Call the tool and add results to current information
                    tool_images = get_information_for_question_answering(**tool_call.args)
                    
                    if tool_images:
                        new_images = []
                        for image in tool_images:
                            if os.path.exists(image):
                                new_images.append(Image.open(image))
                        current_information.extend(new_images)
                        show_success(f"➕ Added {len(new_images)} retrieved images to context")
                    else:
                        show_warning("No relevant images found")
                        current_information.append("No relevant visual information found for this query.")
                else:
                    show_warning("Tool selection failed or returned unexpected tool")
                    current_information.append("Tool call failed.")
            else:
                show_warning("Agent response didn't contain ANSWER or TOOL directive")
                current_information.append("Unable to determine next action.")
        
        # If we've exhausted iterations without a final answer
        show_warning(f"⚠️ Reached maximum iterations ({max_iterations}) without final answer")
        return "I apologize, but I couldn't find a definitive answer after exploring the available information. Please try rephrasing your question or asking for more specific details."
        
    except Exception as e:
        show_error(f"ReAct agent failed: {e}")
        return "I apologize, but I encountered an error while processing your question with the ReAct approach."

show_success("ReAct agent with fallback embeddings completed!")

In [None]:
def execute_react_agent(user_query: str, images: List = []) -> None:
    """
    Execute the ReAct agent.

    Args:
        user_query (str): User query
        images (List, optional): List of image file paths. Defaults to [].
    """
    try:
        show_info(f"🦸‍♀️ ReAct Agent Processing: {user_query}")
        
        response = generate_answer_react(user_query, images)
        
        show_success("🤖 ReAct Agent Final Response:")
        print(f"\n{response}\n")
        
    except Exception as e:
        show_error(f"ReAct agent execution failed: {e}")

show_success("ReAct agent execution function created!")

# Mark final step complete
try:
    progress.mark_done("ReAct Agent Enhancement", score=100, 
                      notes="ReAct reasoning and acting agent with fallback embeddings implemented")
except NameError:
    pass

In [None]:
# Test ReAct agent
show_info("🧪 Testing ReAct agent with fallback embeddings...")

# Test 1: Question requiring document search
show_info("Test 1: Complex factual question")
execute_react_agent("What is the Pass@1 accuracy of Deepseek R1 on the MATH500 benchmark?")

In [None]:
# Test 2: Document analysis if images are available
if available_images:
    show_info("Test 2: Document page analysis with ReAct")
    execute_react_agent("What technical concepts are discussed in this document page?", [available_images[0]])
else:
    show_warning("No document pages available for ReAct testing")

# 🎉 Workshop Complete!

Congratulations! You've successfully built a comprehensive multimodal AI agent system **without requiring VoyageAI API keys**!

In [None]:
# Final progress summary
try:
    show_success("🎓 No-API-Key Workshop Completed Successfully!")
    
    # Display final progress
    progress.display_progress(detailed=True)
    
    # Show completion statistics
    completion_rate = progress.get_completion_rate()
    avg_score = progress.get_average_score()
    
    show_info(f"📊 Overall Completion: {completion_rate:.1f}%")
    if avg_score:
        show_info(f"📈 Average Score: {avg_score:.1f}/100")
    
    # Show what was accomplished
    show_success("""
    🚀 What You've Built WITHOUT API Keys:
    
    ✅ Pre-created embeddings loading system
    ✅ Fallback query embedding generation
    ✅ MongoDB Atlas vector search integration
    ✅ AI agent with function calling capabilities
    ✅ Conversational memory system
    ✅ ReAct (Reasoning + Acting) agent architecture
    ✅ Production-ready fallback strategies
    ✅ Cost-effective multimodal AI application
    ✅ Educational understanding of embedding systems
    """)
    
    # Next steps
    show_info("""
    🎯 Next Steps and Learning:
    
    • Compare performance vs full VoyageAI integration
    • Experiment with different sentence-transformer models
    • Implement better dimension matching strategies
    • Add support for custom embedding generation
    • Explore hybrid retrieval approaches
    • Build your own embedding generation pipeline
    • Consider upgrading to full API access for production
    """)
    
except NameError:
    show_success("🎓 No-API-Key Workshop completed successfully!")
    show_info("All agent implementations with fallback embeddings are ready for use.")

In [None]:
# Optional: Export progress analytics
try:
    if hasattr(progress, 'export_analytics_json'):
        analytics_file = progress.export_analytics_json()
        show_success(f"📄 Progress analytics exported to: {analytics_file}")
        
        # Show summary
        summary = progress.get_analytics_summary()
        if summary:
            show_info(f"⏱️ Total session time: {summary.get('session_duration', 'N/A')} seconds")
            show_info(f"📝 Total interactions: {summary.get('total_events', 'N/A')}")
except (NameError, AttributeError):
    pass

show_success("Thank you for completing the Multimodal Agents Workshop - No API Keys Required! 🙏")
show_info("🎯 Key Learning: You've mastered embedding fallback strategies for accessible AI!")
show_info("📚 Consider upgrading to VoyageAI or similar services for production applications")