# 📓 The GenAI Revolution Cookbook

**Title:** CrewAI Agent: Build a Production-Ready Planner-Executor with Memory

**Description:** Ship a production-ready CrewAI agent that plans tasks, validates tools, persists memory, and returns deterministic schema-validated JSON with automatic retries.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



Thought: I now can give a great answer

---

Real apps fail from hallucinations, context loss, nondeterministic outputs, and tool misuse. We'll fix this with stable planning, strict tool I/O contracts, persistent memory, and schema-enforced outputs with retries. The result is a CrewAI agent that behaves predictably and is safe to ship behind an API. If you want to dive deeper into why LLMs struggle with context loss and how to address it, our article on [Context Rot - Why LLMs "Forget" as Their Memory Grows](/article/context-rot-why-llms-forget-as-their-memory-grows-3) provides practical strategies for managing model memory.

You'll leave with a single Colab notebook you can run end-to-end, demonstrating planning, tool execution, memory persistence, and schema-validated JSON outputs with automatic retries.

---

## Why This Approach Works

CrewAI orchestrates multi-agent workflows, but without constraints, agents hallucinate tool calls, produce malformed JSON, and lose context across steps. This guide enforces:

- **Hierarchical planning**: A planner agent decomposes the query; an executor follows the plan step-by-step, reducing drift.
- **Strict tool contracts**: Each tool validates inputs with Pydantic, returns structured success/error objects, and applies retries with exponential backoff for external APIs.
- **Persistent memory**: ChromaDB stores facts across sessions, enabling context reuse and reducing redundant tool calls.
- **Schema-enforced outputs**: A Pydantic model validates the final JSON. If validation fails, the system injects corrective instructions and retries automatically.
- **Deterministic model settings**: Explicit model, temperature, and token limits reduce nondeterminism.

---

## How It Works (High-Level Overview)

1. **User submits a query** (e.g., "What's the current BTC price in USD, and if it rises by 2.5%, what will it be? Include 2 sources.").
2. **Planner agent** decomposes the query into numbered steps with tool suggestions.
3. **Executor agent** runs each step using validated tools (web search, calculator, crypto price API, memory read/write).
4. **Memory tools** persist useful facts and retrieve prior context.
5. **Schema validator** checks the final JSON against `FinalAnswer`. If invalid, the system retries with stricter instructions.
6. **Output** is a structured JSON with query, plan, step results, answer, and citations.

---

## Setup & Installation

Run this cell first to install dependencies:

In [None]:
!pip install -q "crewai>=0.50.0" "pydantic>=2.8.0" "chromadb>=0.5.0" "httpx>=0.27.0" "python-dotenv>=1.0.1" "tenacity>=8.5.0" "crewai-tools>=0.11.0"

Set your API keys. In Colab, use this cell (replace placeholders with your keys):

In [None]:
import os

# Set API keys directly in Colab (or use .env locally)
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["TAVILY_API_KEY"] = "tvly-..."

# Verify keys are set
required_keys = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY", "TAVILY_API_KEY"]
missing = [k for k in required_keys if not os.getenv(k)]
if missing:
    raise EnvironmentError(f"Missing API keys: {missing}. Set them in the cell above.")
print("✅ All API keys set.")

Verify installed versions:

In [None]:
import crewai
import pydantic
import chromadb
print(f"CrewAI: {crewai.__version__}")
print(f"Pydantic: {pydantic.__version__}")
print(f"ChromaDB: {chromadb.__version__}")

---

## Step-by-Step Implementation

### Step 1: Define the Output Schema

This Pydantic model enforces the structure of the final JSON. Each field is required and typed strictly to prevent malformed outputs.

In [None]:
%%writefile schemas.py
# schemas.py
# Purpose: Define the strict output schema for the agent's final answer.

from pydantic import BaseModel, HttpUrl, Field
from typing import List, Optional

class StepResult(BaseModel):
    """
    Represents the result of a single execution step.
    
    Attributes:
        step (int): Step number in the plan.
        tool (str): Tool used (e.g., "web_search", "calculator").
        success (bool): Whether the step succeeded.
        data (Optional[dict]): Tool output data if successful.
        error (Optional[str]): Error message if failed.
    """
    step: int
    tool: str
    success: bool
    data: Optional[dict] = None
    error: Optional[str] = None

class FinalAnswer(BaseModel):
    """
    The final validated output from the agent.
    
    Attributes:
        query (str): The original user query.
        plan (List[str]): Numbered steps executed.
        results (List[StepResult]): Detailed results for each step.
        answer (str): Natural language answer to the query.
        citations (List[HttpUrl]): Valid URLs cited as sources.
    """
    query: str
    plan: List[str]
    results: List[StepResult]
    answer: str
    citations: List[HttpUrl] = Field(default_factory=list)

**Why this design**: `HttpUrl` ensures citations are valid URLs. `StepResult` captures success/failure per step, enabling debugging and retry logic. `plan` is included so the executor must explicitly copy the planner's steps, reducing drift.

---

### Step 2: Build Production-Grade Tools

Each tool validates inputs, handles errors gracefully, and returns structured outputs. We apply retries with exponential backoff for external APIs.

#### Web Search Tool

In [None]:
%%writefile tools/search.py
# tools/search.py
# Purpose: Web search tool with strict input validation, retry logic, and structured error handling.

import httpx
from crewai_tools import BaseTool
from pydantic import BaseModel, Field, ValidationError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from typing import List, Optional
import os
import logging

logger = logging.getLogger("tools.search")

class SearchInput(BaseModel):
    """Input schema for web search."""
    query: str = Field(..., min_length=1, max_length=500, description="Search query string")
    max_results: int = Field(default=3, ge=1, le=10, description="Max number of results to return")

class TavilySearchTool(BaseTool):
    name: str = "web_search"
    description: str = (
        "Search the web using Tavily API. Returns a list of results with title, URL, and snippet. "
        "Input must be JSON: {\"query\": \"your search\", \"max_results\": 3}. "
        "Example: {\"query\": \"Bitcoin price news\", \"max_results\": 3}"
    )

    @retry(
        reraise=True,
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=0.5, min=0.5, max=4),
        retry=retry_if_exception_type(httpx.HTTPStatusError)
    )
    def _run(self, query: str, max_results: int = 3) -> dict:
        """
        Execute web search with retry logic.
        
        Args:
            query (str): Search query.
            max_results (int): Number of results to return.
        
        Returns:
            dict: {"success": bool, "data": list or None, "error": str or None}
        """
        try:
            # Validate inputs
            validated = SearchInput(query=query, max_results=max_results)
        except ValidationError as e:
            logger.error(f"Search input validation failed: {e}")
            return {"success": False, "data": None, "error": f"Invalid input: {e}"}

        api_key = os.getenv("TAVILY_API_KEY")
        if not api_key:
            return {"success": False, "data": None, "error": "TAVILY_API_KEY not set"}

        try:
            with httpx.Client(timeout=10.0) as client:
                resp = client.post(
                    "https://api.tavily.com/search",
                    json={"query": validated.query, "max_results": validated.max_results, "api_key": api_key}
                )
                resp.raise_for_status()
                data = resp.json()
                # Filter out placeholder URLs and invalid results
                results = [
                    {"title": r.get("title", ""), "url": r.get("url", ""), "snippet": r.get("content", "")}
                    for r in data.get("results", [])
                    if r.get("url") and not r["url"].startswith("https://example.com")
                ]
                logger.info(f"Search returned {len(results)} results for query: {validated.query}")
                return {"success": True, "data": results, "error": None}
        except httpx.HTTPStatusError as e:
            logger.error(f"Tavily API error: {e.response.status_code}")
            return {"success": False, "data": None, "error": f"API error: {e.response.status_code}"}
        except Exception as e:
            logger.error(f"Search failed: {e}")
            return {"success": False, "data": None, "error": str(e)}

**Why retries**: External APIs can fail transiently. Exponential backoff reduces load and increases success rate. **Why structured errors**: The executor can parse `{"success": false, "error": "..."}` and include it in `FinalAnswer.results` without crashing.

---

#### Calculator Tool

In [None]:
%%writefile tools/calculator.py
# tools/calculator.py
# Purpose: Safe arithmetic evaluator with strict input validation and operator whitelisting.

import ast
import operator
from crewai_tools import BaseTool
from pydantic import BaseModel, Field, ValidationError
import logging

logger = logging.getLogger("tools.calculator")

class CalculatorInput(BaseModel):
    """Input schema for calculator."""
    expression: str = Field(..., min_length=1, max_length=200, description="Arithmetic expression to evaluate")

class CalculatorTool(BaseTool):
    name: str = "calculator"
    description: str = (
        "Evaluate arithmetic expressions safely. Supports +, -, *, /, %, //, **. "
        "Input must be JSON: {\"expression\": \"2 + 2\"}. "
        "Example: {\"expression\": \"100 * 1.025\"}"
    )

    # Whitelist of allowed operators (excludes pow to prevent abuse)
    ALLOWED_OPS = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Mod: operator.mod,
        ast.FloorDiv: operator.floordiv,
        ast.USub: operator.neg,
    }

    def _eval_node(self, node):
        """Recursively evaluate AST node with operator whitelist."""
        if isinstance(node, ast.Constant):  # Python 3.8+
            return node.value
        elif isinstance(node, ast.Num):  # Python 3.7 compatibility
            return node.n
        elif isinstance(node, ast.BinOp):
            if type(node.op) not in self.ALLOWED_OPS:
                raise ValueError(f"Operator {type(node.op).__name__} not allowed")
            left = self._eval_node(node.left)
            right = self._eval_node(node.right)
            return self.ALLOWED_OPS[type(node.op)](left, right)
        elif isinstance(node, ast.UnaryOp):
            if type(node.op) not in self.ALLOWED_OPS:
                raise ValueError(f"Operator {type(node.op).__name__} not allowed")
            operand = self._eval_node(node.operand)
            return self.ALLOWED_OPS[type(node.op)](operand)
        else:
            raise ValueError(f"Unsupported node type: {type(node).__name__}")

    def _run(self, expression: str) -> dict:
        """
        Evaluate arithmetic expression safely.
        
        Args:
            expression (str): Arithmetic expression.
        
        Returns:
            dict: {"success": bool, "data": {"result": float} or None, "error": str or None}
        """
        try:
            validated = CalculatorInput(expression=expression)
        except ValidationError as e:
            logger.error(f"Calculator input validation failed: {e}")
            return {"success": False, "data": None, "error": f"Invalid input: {e}"}

        try:
            tree = ast.parse(validated.expression, mode="eval")
            result = self._eval_node(tree.body)
            logger.info(f"Calculated: {validated.expression} = {result}")
            return {"success": True, "data": {"result": float(result)}, "error": None}
        except Exception as e:
            logger.error(f"Calculation failed: {e}")
            return {"success": False, "data": None, "error": str(e)}

**Why operator whitelist**: Prevents abuse (e.g., `2**9999999` causing hangs). **Why length cap**: Limits complexity and prevents prompt injection via long expressions.

---

#### Crypto Price Tool

In [None]:
%%writefile tools/prices.py
# tools/prices.py
# Purpose: Fetch cryptocurrency prices from CoinGecko with retry logic and strict validation.

import httpx
from crewai_tools import BaseTool
from pydantic import BaseModel, Field, ValidationError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import logging

logger = logging.getLogger("tools.prices")

class PriceInput(BaseModel):
    """Input schema for crypto price lookup."""
    coin_id: str = Field(..., min_length=1, max_length=50, description="CoinGecko coin ID (e.g., 'bitcoin')")
    currency: str = Field(default="usd", min_length=3, max_length=3, description="Currency code (e.g., 'usd')")

class CoinGeckoPriceTool(BaseTool):
    name: str = "crypto_price"
    description: str = (
        "Fetch current cryptocurrency price from CoinGecko. "
        "Input must be JSON: {\"coin_id\": \"bitcoin\", \"currency\": \"usd\"}. "
        "Example: {\"coin_id\": \"ethereum\", \"currency\": \"usd\"}"
    )

    @retry(
        reraise=True,
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=0.5, min=0.5, max=4),
        retry=retry_if_exception_type(httpx.HTTPStatusError)
    )
    def _run(self, coin_id: str, currency: str = "usd") -> dict:
        """
        Fetch crypto price with retry logic.
        
        Args:
            coin_id (str): CoinGecko coin ID.
            currency (str): Currency code.
        
        Returns:
            dict: {"success": bool, "data": {"coin_id": str, "currency": str, "price": float} or None, "error": str or None}
        """
        try:
            validated = PriceInput(coin_id=coin_id, currency=currency)
        except ValidationError as e:
            logger.error(f"Price input validation failed: {e}")
            return {"success": False, "data": None, "error": f"Invalid input: {e}"}

        try:
            with httpx.Client(timeout=10.0) as client:
                resp = client.get(
                    "https://api.coingecko.com/api/v3/simple/price",
                    params={"ids": validated.coin_id, "vs_currencies": validated.currency}
                )
                resp.raise_for_status()
                data = resp.json()
                # Handle missing coin_id or currency gracefully
                if validated.coin_id not in data or validated.currency not in data[validated.coin_id]:
                    return {"success": False, "data": None, "error": f"Price not found for {validated.coin_id}/{validated.currency}"}
                price = float(data[validated.coin_id][validated.currency])
                logger.info(f"Fetched price: {validated.coin_id}/{validated.currency} = {price}")
                return {"success": True, "data": {"coin_id": validated.coin_id, "currency": validated.currency, "price": price}, "error": None}
        except httpx.HTTPStatusError as e:
            logger.error(f"CoinGecko API error: {e.response.status_code}")
            return {"success": False, "data": None, "error": f"API error: {e.response.status_code}"}
        except Exception as e:
            logger.error(f"Price fetch failed: {e}")
            return {"success": False, "data": None, "error": str(e)}

**Why coin_id validation**: Prevents injection of arbitrary strings. **Why graceful key path handling**: Avoids KeyError crashes if the API response structure changes.

---

#### Memory Tools

In [None]:
%%writefile tools/memory.py
# tools/memory.py
# Purpose: Persistent memory tools using ChromaDB for context retention across sessions.

import chromadb
from crewai_tools import BaseTool
from pydantic import BaseModel, Field, ValidationError
import os
import logging

logger = logging.getLogger("tools.memory")

# Ensure ChromaDB persists to a writable directory (Colab-compatible)
CHROMA_DIR = os.getenv("CHROMA_DIR", "./chroma_data")
os.makedirs(CHROMA_DIR, exist_ok=True)
chroma_client = chromadb.PersistentClient(path=CHROMA_DIR)
collection = chroma_client.get_or_create_collection("agent_memory")

class MemoryWriteInput(BaseModel):
    """Input schema for memory write."""
    user_id: str = Field(..., min_length=1, max_length=100, description="User identifier")
    fact: str = Field(..., min_length=1, max_length=1000, description="Fact to store")

class MemorySearchInput(BaseModel):
    """Input schema for memory search."""
    user_id: str = Field(..., min_length=1, max_length=100, description="User identifier")
    query: str = Field(..., min_length=1, max_length=500, description="Search query")
    top_k: int = Field(default=3, ge=1, le=10, description="Number of results to return")

class MemoryWriteTool(BaseTool):
    name: str = "memory_write"
    description: str = (
        "Store a fact in persistent memory for a user. "
        "Input must be JSON: {\"user_id\": \"user123\", \"fact\": \"User prefers BTC over ETH\"}. "
        "Example: {\"user_id\": \"alice\", \"fact\": \"Alice's favorite coin is bitcoin\"}"
    )

    def _run(self, user_id: str, fact: str) -> dict:
        """
        Store a fact in ChromaDB.
        
        Args:
            user_id (str): User identifier.
            fact (str): Fact to store.
        
        Returns:
            dict: {"success": bool, "data": {"stored": str} or None, "error": str or None}
        """
        try:
            validated = MemoryWriteInput(user_id=user_id, fact=fact)
        except ValidationError as e:
            logger.error(f"Memory write input validation failed: {e}")
            return {"success": False, "data": None, "error": f"Invalid input: {e}"}

        try:
            doc_id = f"{validated.user_id}_{hash(validated.fact)}"
            collection.add(
                documents=[validated.fact],
                metadatas=[{"user_id": validated.user_id}],
                ids=[doc_id]
            )
            logger.info(f"Stored fact for {validated.user_id}: {validated.fact}")
            return {"success": True, "data": {"stored": validated.fact}, "error": None}
        except Exception as e:
            logger.error(f"Memory write failed: {e}")
            return {"success": False, "data": None, "error": str(e)}

class MemorySearchTool(BaseTool):
    name: str = "memory_search"
    description: str = (
        "Search stored facts for a user. "
        "Input must be JSON: {\"user_id\": \"user123\", \"query\": \"favorite coin\", \"top_k\": 3}. "
        "Example: {\"user_id\": \"alice\", \"query\": \"bitcoin\", \"top_k\": 2}"
    )

    def _run(self, user_id: str, query: str, top_k: int = 3) -> dict:
        """
        Search ChromaDB for relevant facts.
        
        Args:
            user_id (str): User identifier.
            query (str): Search query.
            top_k (int): Number of results to return.
        
        Returns:
            dict: {"success": bool, "data": {"facts": list} or None, "error": str or None}
        """
        try:
            validated = MemorySearchInput(user_id=user_id, query=query, top_k=top_k)
        except ValidationError as e:
            logger.error(f"Memory search input validation failed: {e}")
            return {"success": False, "data": None, "error": f"Invalid input: {e}"}

        try:
            results = collection.query(
                query_texts=[validated.query],
                n_results=validated.top_k,
                where={"user_id": validated.user_id}
            )
            facts = results["documents"][0] if results["documents"] else []
            logger.info(f"Retrieved {len(facts)} facts for {validated.user_id}")
            return {"success": True, "data": {"facts": facts}, "error": None}
        except Exception as e:
            logger.error(f"Memory search failed: {e}")
            return {"success": False, "data": None, "error": str(e)}

**Why user_id scoping**: Prevents cross-user data leakage. **Why persistent client**: Retains memory across notebook restarts (if CHROMA_DIR is mounted or persistent).

---

### Step 3: Assemble the Agent System

Define planner and executor agents with explicit model settings for determinism, then wire them into a hierarchical crew.

In [None]:
%%writefile main.py
# main.py
# Purpose: Entrypoint for a production-grade CrewAI agent with validated tools, persistent memory, and deterministic schema-validated outputs.

import os
import json
import logging
from dotenv import load_dotenv

# Load environment variables
try:
    from google.colab import userdata
    load_dotenv()
    for key in ["OPENAI_API_KEY", "ANTHROPIC_API_KEY", "TAVILY_API_KEY"]:
        if key not in os.environ and userdata.get(key):
            os.environ[key] = userdata.get(key)
except ImportError:
    load_dotenv()

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s %(message)s")
logger = logging.getLogger("main")

from crewai import Agent, Task, Crew, Process
from pydantic import ValidationError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

from tools.search import TavilySearchTool
from tools.calculator import CalculatorTool
from tools.prices import CoinGeckoPriceTool
from tools.memory import MemoryWriteTool, MemorySearchTool
from schemas import FinalAnswer

# Instantiate tools
web_search = TavilySearchTool()
calculator = CalculatorTool()
crypto_price = CoinGeckoPriceTool()
memory_write = MemoryWriteTool()
memory_search = MemorySearchTool()

# Define agents with explicit model settings for determinism
planner = Agent(
    role="Planner",
    goal="Decompose the user request into a minimal set of executable steps with proper tool usage.",
    backstory="You are a precise project planner. You only produce plans; you do not execute.",
    allow_delegation=True,
    verbose=True,
    llm="gpt-4o-mini",  # Explicit model
    temperature=0.2,  # Low temperature for determinism
    max_tokens=1000
)

executor = Agent(
    role="Executor",
    goal=(
        "Execute the plan step-by-step using the available tools, validate tool outputs, "
        "persist useful facts with memory_write, and construct the final JSON strictly matching the given schema. "
        "Use user_id='{user_id}' in all memory_* tool calls."
    ),
    backstory="You are a reliable operator who follows instructions exactly and validates everything.",
    tools=[web_search, calculator, crypto_price, memory_write, memory_search],
    allow_delegation=False,
    verbose=True,
    llm="gpt-4o-mini",
    temperature=0.2,
    max_tokens=2000
)

# Schema instruction for strict output validation
SCHEMA_INSTRUCTION = (
    "You must output a single JSON object that matches this schema exactly:\n"
    "{\n"
    '  "query": string,\n'
    '  "plan": array of strings (copy the numbered steps you executed),\n'
    '  "results": array of { "step": int, "tool": string, "success": boolean, "data": object|null, "error": string|null },\n'
    '  "answer": string,\n'
    '  "citations": array of valid URLs\n'
    "}\n"
    "No extra keys. No markdown. No commentary. JSON only."
)

# Define tasks
planning_task = Task(
    description=(
        "Analyze the user request: '{query}'. "
        "Produce a concise, numbered plan of steps. For each step, specify: objective, suggested tool (if any), and expected output."
    ),
    expected_output="A numbered list of 2-6 concrete steps with tool suggestions.",
    agent=planner
)

execution_task = Task(
    description=(
        "Follow the plan precisely. For each step: select the appropriate tool, run it with validated inputs, "
        "and record success or error in the results array. Use memory_search when relevant; write useful facts via memory_write. "
        f"Finally, produce a strictly valid JSON that matches the provided schema.\n\n{SCHEMA_INSTRUCTION}"
    ),
    expected_output="A single JSON object matching the FinalAnswer schema exactly.",
    agent=executor
)

# Assemble crew
crew = Crew(
    agents=[planner, executor],
    tasks=[planning_task, execution_task],
    process=Process.hierarchical,
    manager_agent=planner,
    verbose=True
)

class SchemaError(Exception):
    """Custom exception for schema validation errors."""
    pass

def validate_output(text: str) -> FinalAnswer:
    """
    Validate the output text against the FinalAnswer schema.
    
    Args:
        text (str): The raw output text from the agent.
    
    Returns:
        FinalAnswer: Parsed and validated FinalAnswer object.
    
    Raises:
        SchemaError: If the output does not match the schema.
    """
    try:
        s = text.strip()
        # Handle code fences
        if s.startswith("

"):
            s = s.strip("`")
            if "json" in s[:10].lower():
                s = s[s.find("\n")+1:]
            s = s[s.find("{"): s.rfind("}") + 1]
        data = json.loads(s)
        return FinalAnswer.model_validate(data)
    except Exception as e:
        logger.error(f"Schema validation failed: {e}")
        raise SchemaError(str(e))

def run_crew_once(user_query: str, user_id: str) -> str:
    """
    Run the CrewAI process once for the given user query.
    
    Args:
        user_query (str): The user's input query.
        user_id (str): User identifier for memory scoping.
    
    Returns:
        str: The raw output from the CrewAI process.
    """
    logger.info(f"Running CrewAI for query: {user_query}, user_id: {user_id}")
    result = crew.kickoff(inputs={"query": user_query, "user_id": user_id})
    return result

@retry(
    reraise=True,
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=0.5, min=0.5, max=4),
    retry=retry_if_exception_type(SchemaError),
)
def run_with_validation(user_query: str, user_id: str = "default_user") -> FinalAnswer:
    """
    Run the CrewAI process with schema validation and automatic retries.
    
    Args:
        user_query (str): The user's input query.
        user_id (str): User identifier for memory scoping.
    
    Returns:
        FinalAnswer: The validated output matching the FinalAnswer schema.
    
    Raises:
        SchemaError: If all retries fail to produce valid output.
    """
    raw = run_crew_once(user_query, user_id)
    try:
        return validate_output(raw)
    except SchemaError as err:
        logger.warning(f"Output failed schema validation: {err}. Retrying with stricter instructions.")
        # Inject corrective instruction (idempotent: only add if not present)
        if "STRICT OUTPUT INSTRUCTION" not in execution_task.description:
            execution_task.description += f"\n\nSTRICT OUTPUT INSTRUCTION:\n{SCHEMA_INSTRUCTION}\n\n"
        execution_task.expected_output = "Strictly valid JSON per schema."
        raw2 = run_crew_once(user_query, user_id)
        return validate_output(raw2)

if __name__ == "__main__":
    q = "What's the current BTC price in USD, and if it rises by 2.5%, what will it be? Include 2 sources."
    try:
        answer = run_with_validation(q, user_id="alice")
        print(answer.model_dump_json(indent=2))
    except Exception as e:
        logger.error(f"Failed to produce valid output: {e}")

In [None]:
**Why hierarchical process**: The planner manages the executor, reducing drift. **Why explicit model/temperature**: Ensures deterministic outputs. **Why user_id in inputs**: Enables memory scoping per user. **Why idempotent retry instruction**: Prevents prompt bloat on repeated retries.

---

## Run and Validate

### Test 1: Valid End-to-End Query

Run the main script to execute a full query with planning, tool calls, and schema validation.

python
from main import run_with_validation

query = "What's the current BTC price in USD, and if it rises by 2.5%, what will it be? Include 2 sources."
result = run_with_validation(query, user_id="alice")
print(result.model_dump_json(indent=2))

In [None]:
**Expected output**: A JSON object with `query`, `plan`, `results` (showing successful tool calls), `answer`, and `citations`.

---

### Test 2: Tool Failure Handling

Trigger a tool failure by passing an invalid coin ID.

python
query_invalid = "What's the current price of invalid_coin_xyz in USD?"
result_invalid = run_with_validation(query_invalid, user_id="alice")
print(result_invalid.model_dump_json(indent=2))

In [None]:
**Expected output**: `FinalAnswer.results` includes a step with `"success": false` and a descriptive error message. The agent should still produce valid JSON.

---

### Test 3: Schema Violation and Retry

Manually inject a malformed output to test retry logic (for demonstration, you can modify `validate_output` temporarily to always raise `SchemaError` on first attempt).

python
# Simulate schema violation by forcing a retry
# (In practice, this happens when the LLM produces invalid JSON)
# The retry mechanism will inject stricter instructions and rerun.

In [None]:
**Expected behavior**: The system logs a validation error, injects corrective instructions, and retries. The second attempt should produce valid JSON.

---

### Test 4: Memory Persistence

Run a query that writes a fact, then run a second query that retrieves it.

python
# First query: store a fact
query1 = "Remember that Alice's favorite cryptocurrency is Bitcoin."
result1 = run_with_validation(query1, user_id="alice")
print("First run:", result1.answer)

# Second query: retrieve the fact
query2 = "What is Alice's favorite cryptocurrency?"
result2 = run_with_validation(query2, user_id="alice")
print("Second run:", result2.answer)
```

**Expected output**: The second query should retrieve "Bitcoin" from memory without calling external APIs.

---

## Conclusion

You've built a production-grade CrewAI agent with:

- **Hierarchical planning** to reduce drift
- **Strict tool contracts** with Pydantic validation and retry logic
- **Persistent memory** using ChromaDB for context retention
- **Schema-enforced outputs** with automatic retries
- **Deterministic model settings** for predictable behavior

This system is ready to deploy behind an API. Key decisions:

- **Why hierarchical process**: Separates planning from execution, reducing hallucinations.
- **Why Pydantic schemas**: Enforces strict input/output contracts, preventing malformed data.
- **Why retries with exponential backoff**: Handles transient API failures gracefully.
- **Why memory scoping by user_id**: Prevents cross-user data leakage.
- **Why explicit model/temperature**: Reduces nondeterminism in outputs.

### Next Steps

- **Add domain allowlists** for web_search to restrict outbound requests.
- **Implement rate limiting** per user to prevent abuse.
- **Extend memory** with semantic search or time-based expiration.
- **Deploy** via FastAPI or Flask with request_id logging for observability.
- **Monitor** tool success rates and schema validation failures in production.

For more on model selection and trade-offs, see our guide on [How do you pick an LLM?](/article/how-to-choose-an-ai-model-for-your-app-speed-cost-reliability). To understand prompt structure and information placement, check out [Lost in the Middle: Placing Critical Info in Long Prompts](/article/lost-in-the-middle-placing-critical-info-in-long-prompts).