# ðŸ““ The GenAI Revolution Cookbook

**Title:** How to Build a Multi-Agent Chatbot with CrewAI, ChromaDB, Gradio

**Description:** Build a production-ready multi-agent chatbot with analyst and reviewer agents, ChromaDB RAG, CrewAI, and Gradio, delivering clearer, verified answers consistently.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



A multi-agent chatbot with two rolesâ€”Analyst (retrieve/synthesize) and Reviewer (verify/refine)â€”orchestrated by CrewAI. The system uses ChromaDB for vector search (RAG) and a Gradio frontend for a chat UI. The result: fewer hallucinations, clearer answers, and traceable sources, ready to ship or extend. For more on orchestrating multi-agent workflows, see our guide on [how to build multi-agent AI systems with CrewAI and YAML](/article/how-to-build-multi-agent-ai-systems-with-crewai-and-yaml-2).

## What you'll build

A chatbot that answers questions from a knowledge base (e.g., Bank of Canada reports) with citations. Two agents collaborate: the Analyst retrieves and drafts, the Reviewer validates and refines. You'll implement document chunking, embeddings, vector storage, tool creation, agent orchestration, and a chat UI. For practical tips on improving retrieval accuracy in your RAG pipeline, check out [7 retrieval tricks to boost answer accuracy](/article/rag-application-7-retrieval-tricks-to-boost-answer-accuracy-2).

**Stack:**
- **ChromaDB** â€“ persistent vector store for semantic search
- **OpenAI embeddings** â€“ text-embedding-3-small for document encoding
- **CrewAI** â€“ multi-agent orchestration with sequential tasks
- **Gradio** â€“ chat interface with history and examples
- **LangChain** â€“ standardized embedding interface for provider flexibility

**Why this stack?**
- **ChromaDB** persists embeddings on disk, supports metadata filtering, and requires no external service.
- **CrewAI** handles agent coordination, tool calling, and memory without custom loops.
- **Gradio** provides a production-ready chat UI in ~10 lines.
- **LangChain** wraps OpenAI embeddings with batching and provider swap ease.

## How it works

1. **Load and chunk** markdown documents by headings, merge to ~1200 chars with overlap.
2. **Embed and index** chunks in ChromaDB with source/section metadata.
3. **Define tools** for semantic search and ambiguity detection.
4. **Create agents**: Analyst (search + draft) and Reviewer (validate + refine).
5. **Orchestrate** with CrewAI sequential tasks: Analyst â†’ Reviewer.
6. **Serve** via Gradio chat interface with conversation history.

User query â†’ Analyst retrieves from Chroma (RAG) and drafts â†’ Reviewer validates, cross-checks, and refines â†’ Gradio returns a final, cited answer.

## Setup

Install dependencies in a Colab or notebook cell:

In [None]:
!pip install "langchain>=0.2" "langchain-openai>=0.1.7" crewai "crewai-tools>=0.1.0" chromadb gradio python-dotenv

Securely load API keys from Colab userdata:

In [None]:
import os
from google.colab import userdata
from google.colab.userdata import SecretNotFoundError

keys = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY"]
missing = []
for k in keys:
    value = None
    try:
        value = userdata.get(k)
    except SecretNotFoundError:
        pass

    os.environ[k] = value if value is not None else ""

    if not os.environ[k]:
        missing.append(k)

if missing:
    raise EnvironmentError(f"Missing keys: {', '.join(missing)}. Add them in Colab â†’ Settings â†’ Secrets.")

print("All keys loaded.")

Create directories and download sample documents:

In [None]:
from pathlib import Path
import urllib.request

DOCS_DIR = Path("data/docs")
DOCS_DIR.mkdir(parents=True, exist_ok=True)

sample_url = "https://raw.githubusercontent.com/example/sample-docs/main/sample.md"
sample_path = DOCS_DIR / "sample.md"
if not sample_path.exists():
    urllib.request.urlretrieve(sample_url, sample_path)
    print(f"Downloaded sample document to {sample_path}")
else:
    print(f"Sample document already exists at {sample_path}")

## Stage 1: Document loading and semantic chunking

Read markdown files, split by headings, and merge into ~1200-character chunks with 200-character overlap. This balances context size and retrieval precision.

In [None]:
import re
import uuid
from pathlib import Path
from typing import List, Dict, Any

DOCS_DIR = Path("data/docs")
PERSIST_DIR = "data/chroma"

def read_markdown_files(folder: Path) -> List[Dict[str, Any]]:
    """
    Read all markdown files from a folder.

    Args:
        folder (Path): Path to the folder containing .md files.

    Returns:
        List[Dict[str, Any]]: List of dicts with 'path' and 'text' keys.
    """
    docs = []
    for path in folder.glob("**/*.md"):
        text = path.read_text(encoding="utf-8", errors="ignore")
        docs.append({"path": str(path), "text": text})
    return docs

def split_by_headings(text: str) -> List[Dict[str, Any]]:
    """
    Split markdown text into sections by headings.

    Args:
        text (str): Markdown document as a string.

    Returns:
        List[Dict[str, Any]]: List of dicts with 'heading' and 'text'.
    """
    parts = re.split(r'(?m)(^#{1,6}\s.*$)', text)
    chunks = []
    current = ""
    current_heading = "Introduction"
    for i in range(len(parts)):
        if re.match(r'(?m)^#{1,6}\s', parts[i] or ""):
            if current.strip():
                chunks.append({"heading": current_heading, "text": current.strip()})
            current_heading = parts[i].strip().lstrip("#").strip()
            current = ""
        else:
            current += parts[i] or ""
    if current.strip():
        chunks.append({"heading": current_heading, "text": current.strip()})
    return chunks

def merge_to_target_size(sections: List[Dict[str, str]], max_chars=1200, overlap=200) -> List[Dict[str, str]]:
    """
    Merge sections into chunks of approximately max_chars, with overlap.

    Args:
        sections (List[Dict[str, str]]): List of section dicts.
        max_chars (int): Target maximum characters per chunk (1200 balances context and precision).
        overlap (int): Number of characters to overlap between chunks (200 preserves context across boundaries).

    Returns:
        List[Dict[str, str]]: List of merged chunk dicts.
    """
    merged = []
    buf = ""
    start_idx = 0
    heading = sections[0]["heading"] if sections else "Document"
    for idx, sec in enumerate(sections):
        candidate = (buf + "\n\n" + f"# {sec['heading']}\n{sec['text']}".strip()).strip()
        if len(candidate) > max_chars and buf:
            merged.append({"heading": heading, "text": buf.strip(), "section_start": start_idx, "section_end": idx - 1})
            buf = (buf[-overlap:] + "\n\n" + f"# {sec['heading']}\n{sec['text']}".strip()) if overlap > 0 else f"# {sec['heading']}\n{sec['text']}".strip()
            heading = sec["heading"]
            start_idx = idx
        else:
            buf = candidate
    if buf.strip():
        merged.append({"heading": heading, "text": buf.strip(), "section_start": start_idx, "section_end": len(sections)-1})
    return merged

def build_chunks(docs: List[Dict[str, Any]], max_chars=1200, overlap=200) -> List[Dict[str, Any]]:
    """
    Build semantic chunks from loaded documents.

    Args:
        docs (List[Dict[str, Any]]): List of loaded documents.
        max_chars (int): Max characters per chunk.
        overlap (int): Overlap in characters between chunks.

    Returns:
        List[Dict[str, Any]]: List of chunk dicts with metadata.
    """
    all_chunks = []
    for d in docs:
        sections = split_by_headings(d["text"])
        merged = merge_to_target_size(sections, max_chars=max_chars, overlap=overlap)
        for i, m in enumerate(merged):
            all_chunks.append({
                "id": str(uuid.uuid4()),
                "text": m["text"],
                "metadata": {
                    "source": d["path"],
                    "section": m["heading"],
                    "chunk_index": i
                }
            })
    return all_chunks

docs = read_markdown_files(DOCS_DIR)
print(f"Loaded {len(docs)} markdown files")
chunks = build_chunks(docs)
print(f"Produced {len(chunks)} chunks")

## Stage 2: Embedding and vector store

Embed chunks with OpenAI text-embedding-3-small and store in ChromaDB with cosine similarity. Persist to disk for reuse across sessions.

In [None]:
import chromadb
from langchain_openai import OpenAIEmbeddings

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
assert OPENAI_API_KEY, "Set OPENAI_API_KEY in your environment or Colab secrets"

def get_embeddings():
    """
    Initialize OpenAI embeddings for document encoding.

    Returns:
        OpenAIEmbeddings: Embedding model instance.
    """
    return OpenAIEmbeddings(model="text-embedding-3-small", api_key=OPENAI_API_KEY)

def get_chroma_collection(persist_dir=PERSIST_DIR, name="kb"):
    """
    Get or create a persistent ChromaDB collection.

    Args:
        persist_dir (str): Directory for ChromaDB persistence.
        name (str): Collection name.

    Returns:
        chromadb.Collection: ChromaDB collection object.
    """
    client = chromadb.PersistentClient(path=persist_dir)
    col = client.get_or_create_collection(name=name, metadata={"hnsw:space": "cosine"})
    return col

def index_chunks(chunks: List[Dict[str, Any]], col, embeddings):
    """
    Embed and index document chunks in ChromaDB.

    Args:
        chunks (List[Dict[str, Any]]): List of chunk dicts.
        col (chromadb.Collection): ChromaDB collection.
        embeddings (OpenAIEmbeddings): Embedding model.
    """
    texts = [c["text"] for c in chunks]
    ids = [c["id"] for c in chunks]
    metas = [c["metadata"] for c in chunks]
    print(f"Embedding {len(texts)} chunks...")
    vecs = embeddings.embed_documents(texts)
    col.add(documents=texts, metadatas=metas, ids=ids, embeddings=vecs)
    print(f"Stored {col.count()} documents in vector database")

emb = get_embeddings()
col = get_chroma_collection()

if col.count() == 0 and chunks:
    index_chunks(chunks, col, emb)
else:
    print(f"Chroma collection has {col.count()} documents")

Test retrieval with a sample query:

In [None]:
def test_retrieval(col, embeddings, query: str, k=5, min_sim=0.7):
    """
    Test semantic retrieval from ChromaDB.

    Args:
        col (chromadb.Collection): ChromaDB collection.
        embeddings (OpenAIEmbeddings): Embedding model.
        query (str): User query.
        k (int): Number of results.
        min_sim (float): Minimum similarity threshold (0.7 filters low-relevance results).

    Returns:
        List[Dict[str, Any]]: List of retrieved docs with similarity and metadata.
    """
    qvec = embeddings.embed_query(query)
    res = col.query(query_embeddings=[qvec], n_results=k, include=["documents", "metadatas", "distances"])
    docs = []
    for doc, meta, dist in zip(res["documents"][0], res["metadatas"][0], res["distances"][0]):
        sim = 1.0 - float(dist)
        if sim >= min_sim:
            docs.append({"similarity": round(sim, 3), "meta": meta, "snippet": (doc[:300] + "...")})
    return docs

examples = test_retrieval(col, emb, "What is the Bank of Canada's inflation target?", k=3)
for d in examples:
    print(d)

## Stage 3: CrewAI tools for search and clarification

Define tools for semantic search and ambiguity detection. The Analyst and Reviewer will call these during task execution.

In [None]:
from typing import Tuple, Optional
from crewai_tools import tool
import json

def chroma_search(col, embeddings, query: str, k=5, min_sim=0.68, where: Optional[Dict[str, Any]] = None):
    """
    Perform semantic search over ChromaDB.

    Args:
        col (chromadb.Collection): ChromaDB collection.
        embeddings (OpenAIEmbeddings): Embedding model.
        query (str): Search query.
        k (int): Number of results.
        min_sim (float): Minimum similarity threshold (0.68 balances recall and precision).
        where (Optional[Dict[str, Any]]): Metadata filter (e.g., {"source": {"$contains": "report"}}).

    Returns:
        List[Dict[str, Any]]: List of matching items.
    """
    qvec = embeddings.embed_query(query)
    res = col.query(query_embeddings=[qvec], n_results=k, where=where, include=["documents", "metadatas", "distances"])
    items = []
    for doc, meta, dist in zip(res["documents"][0], res["metadatas"][0], res["distances"][0]):
        sim = 1.0 - float(dist)
        if sim >= min_sim:
            items.append({"similarity": float(sim), "metadata": meta, "document": doc})
    return items

def is_ambiguous(query: str) -> Tuple[bool, str]:
    """
    Heuristically determine if a query is ambiguous.

    Args:
        query (str): User query.

    Returns:
        Tuple[bool, str]: (Is ambiguous, clarification message)
    """
    q = query.strip().lower()
    too_short = len(q) < 12
    vague_terms = any(t in q for t in ["tell me more", "details", "explain", "what about it", "how is it"])
    if too_short or vague_terms:
        return True, "Your question seems broad. Please specify the topic, time period, or metric you care about."
    return False, ""

_EMB = emb
_COL = col

@tool("search_kb", return_direct=False)
def search_kb(query: str, k: int = 5, min_similarity: float = 0.68, source_filter: str = "") -> str:
    """
    Search the knowledge base using semantic similarity.

    Args:
        query (str): The user query.
        k (int): Number of results to return.
        min_similarity (float): Minimum cosine similarity threshold (0-1).
        source_filter (str): Substring that must be in the source path.

    Returns:
        str: JSON with results: [{"similarity": float, "source": str, "section": str, "text": str}, ...]
    """
    global _EMB, _COL
    where = {"source": {"$contains": source_filter}} if source_filter else None
    items = chroma_search(_COL, _EMB, query, k=k, min_sim=min_similarity, where=where)
    results = []
    for it in items:
        m = it["metadata"]
        results.append({
            "similarity": round(it["similarity"], 3),
            "source": m.get("source"),
            "section": m.get("section"),
            "text": it["document"][:1200]
        })
    return json.dumps(results)

@tool("maybe_ask_for_clarification", return_direct=True)
def maybe_ask_for_clarification(query: str) -> str:
    """
    If the query is ambiguous or too broad, return a clarifying question. Otherwise return an empty string.

    Args:
        query (str): User query.

    Returns:
        str: Clarification message or empty string.
    """
    amb, msg = is_ambiguous(query)
    if amb:
        return f"CLARIFICATION_NEEDED: {msg} Example: 'Compare Q2 2023 inflation vs. Q2 2024 and cite sources.'"
    return ""

## Stage 4: CrewAI agents and tasks

Define two agents: Analyst (retrieve and draft) and Reviewer (validate and refine). Each agent has a role, goal, backstory, and tools. If you're interested in foundational agent design patterns, see our walkthrough on [how to build an LLM agent from scratch with GPT-4 ReAct](/article/how-to-build-an-llm-agent-from-scratch-with-gpt-4-react-5).

In [None]:
from crewai import Agent, Task, Crew, Process

def build_agents():
    """
    Build Analyst and Reviewer agents for CrewAI.

    Returns:
        Tuple[Agent, Agent]: (Analyst, Reviewer)
    """
    analyst = Agent(
        role="Financial Research Analyst",
        goal="Find precise, sourced information from the knowledge base and draft a structured answer.",
        backstory=(
            "You are a diligent analyst specializing in central bank reports. "
            "You always verify with the search_kb tool and provide citations."
        ),
        tools=[search_kb, maybe_ask_for_clarification],
        allow_delegation=False,
        verbose=True,
        memory=True,
        llm="gpt-4o-mini"
    )

    reviewer = Agent(
        role="Fact-Checking Reviewer",
        goal="Verify the Analyst's draft against sources, fix errors, and improve clarity.",
        backstory=(
            "You are a critical editor. You cross-check every claim against the provided sources and ensure the answer is unambiguous."
        ),
        tools=[search_kb],
        allow_delegation=False,
        verbose=True,
        memory=True,
        llm="gpt-4o-mini"
    )
    return analyst, reviewer

def build_tasks(analyst: Agent, reviewer: Agent):
    """
    Build Analyst and Reviewer tasks for CrewAI.

    Args:
        analyst (Agent): Analyst agent.
        reviewer (Agent): Reviewer agent.

    Returns:
        Tuple[Task, Task]: (Analyst task, Reviewer task)
    """
    analyst_task = Task(
        description=(
            "Use the tools to address the user's question.\n"
            "Inputs:\n"
            "User Query: {user_query}\n"
            "Conversation Summary: {conv_summary}\n"
            "Instructions:\n"
            "1) If the query is ambiguous, call maybe_ask_for_clarification and STOP until user responds.\n"
            "2) Otherwise, call search_kb with a focused query; review top results; synthesize.\n"
            "3) Produce a structured DRAFT including:\n"
            "   - Direct answer in 2-4 sentences\n"
            "   - Key evidence (bullet list, short quotes)\n"
            "   - Citations as [source:path#section]\n"
            "   - Notes on uncertainty"
        ),
        agent=analyst,
        expected_output="DRAFT with 'Answer', 'Evidence', 'Citations', 'Uncertainty' sections."
    )

    reviewer_task = Task(
        description=(
            "Review the Analyst's DRAFT for factual accuracy and clarity.\n"
            "Steps:\n"
            "1) Re-run search_kb for any claims you doubt.\n"
            "2) Remove any unsourced or conflicting statements.\n"
            "3) Tighten language; ensure citations map to evidence.\n"
            "4) Output FINAL_ANSWER with concise paragraphs and a 'Sources' list.\n"
            "5) If information is missing, state limitations explicitly."
        ),
        agent=reviewer,
        expected_output="FINAL_ANSWER with short paragraphs and 'Sources' section.",
        context=[analyst_task]
    )
    return analyst_task, reviewer_task

def build_crew():
    """
    Build the CrewAI crew with Analyst and Reviewer agents.

    Returns:
        Crew: CrewAI crew object.
    """
    analyst, reviewer = build_agents()
    analyst_task, reviewer_task = build_tasks(analyst, reviewer)
    crew = Crew(
        agents=[analyst, reviewer],
        tasks=[analyst_task, reviewer_task],
        process=Process.sequential,
        verbose=True
    )
    return crew

CREW = build_crew()

## Stage 5: Gradio chat interface

Wrap the crew in a Gradio chat interface. The chat handler summarizes conversation history and invokes the crew for each user message.

In [None]:
import gradio as gr

def summarize_history(history: list) -> str:
    """
    Summarize recent chat history for context.

    Args:
        history (list): List of (user, bot) message pairs.

    Returns:
        str: Summarized conversation string (last 5 exchanges, truncated to 2000 chars).
    """
    lines = []
    for user_msg, bot_msg in history[-5:]:
        lines.append(f"User: {user_msg}")
        lines.append(f"Bot: {bot_msg[:400] if bot_msg else ''}")
    return "\n".join(lines[-2000:])

def chat_predict(message, history):
    """
    Gradio chat handler: process user message and return bot response.

    Args:
        message (str): User message.
        history (list): Conversation history.

    Returns:
        str: Bot response.
    """
    amb, clar_msg = is_ambiguous(message)
    if amb:
        return f"{clar_msg}"

    conv_summary = summarize_history(history or [])
    result = CREW.kickoff(inputs={"user_query": message, "conv_summary": conv_summary})
    if isinstance(result, dict) and "final_output" in result:
        return result["final_output"]
    return str(result)

with gr.Blocks(fill_height=True, theme="soft") as demo:
    gr.Markdown("# Multi-Agent RAG Chatbot\nA two-stage Analyst â†’ Reviewer workflow with ChromaDB RAG.")
    examples = [
        "Summarize the Bank of Canada's latest inflation outlook and cite sources.",
        "Compare Q2 2023 vs. Q2 2024 GDP growth figures with citations.",
        "What risks did the report highlight about housing markets?"
    ]
    chat = gr.ChatInterface(
        fn=chat_predict,
        title="Analyst + Reviewer Chatbot",
        description="Ask questions about your knowledge base. Answers include citations and reviewer verification.",
        examples=examples,
        cache_examples=False,
        submit_btn="Ask",
        retry_btn="Retry",
        undo_btn="Undo",
        clear_btn="Clear"
    )
    gr.Markdown("Tip: Be specific (time period, metric). The system asks for clarification when needed.")

demo.launch(share=True)

## Run and validate

Launch the Gradio app and test with sample queries:

In [None]:
demo.launch(share=True)

Test retrieval accuracy:

In [None]:
test_queries = [
    "What is the Bank of Canada's inflation target?",
    "Compare Q2 2023 vs. Q2 2024 GDP growth",
    "What risks did the report highlight?"
]

for q in test_queries:
    print(f"\nQuery: {q}")
    results = test_retrieval(col, emb, q, k=3, min_sim=0.68)
    for r in results:
        print(f"  Similarity: {r['similarity']}, Source: {r['meta']['source']}, Section: {r['meta']['section']}")

Test clarification path with a vague query:

In [None]:
vague_query = "Tell me more"
amb, msg = is_ambiguous(vague_query)
print(f"Ambiguous: {amb}, Message: {msg}")

## Tuning and trade-offs

**Chunk size and overlap:**
- Smaller chunks (600-800 chars) improve precision but may lose context.
- Larger chunks (1500+ chars) retain context but dilute relevance.
- Start with 1200 chars and 200 overlap; adjust based on your corpus.

**Similarity threshold:**
- Lower (0.5-0.6) increases recall but adds noise.
- Higher (0.75+) improves precision but may miss relevant results.
- 0.68 is a balanced default; tune based on retrieval tests.

**Model choice:**
- gpt-4o-mini is fast and cost-effective for most queries.
- Upgrade to gpt-4o for complex reasoning or multi-hop questions.
- Use gpt-3.5-turbo for high-volume, low-complexity workloads.

**Recommended settings by corpus size:**
- Small (< 50 docs): k=5, threshold=0.68, gpt-4o-mini
- Medium (50-500 docs): k=7, threshold=0.70, gpt-4o-mini
- Large (500+ docs): k=10, threshold=0.72, gpt-4o

## Next steps

**Add metadata filters:** Filter by date, author, or document type in search_kb to narrow results.

**Implement retry logic:** Wrap crew.kickoff and embedding calls in retry/backoff for API timeouts.

**Deploy with FastAPI:** Wrap chat_predict in a FastAPI endpoint for production serving.

**Add observability:** Log queries, retrieval results, and agent outputs to track performance.

**Swap providers:** Replace OpenAI with Azure OpenAI or OSS models by updating base_url and api_key.

**Extend agents:** Add a third agent for summarization or a fourth for fact-checking external sources.