# Lesson 10: Capstone — Research Intelligence Agent

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harshit-vibes/lyzr-adk-demo/blob/master/notebooks/10_capstone_project.ipynb)


![Advanced](https://img.shields.io/badge/Level-Advanced-red) ![Duration](https://img.shields.io/badge/Duration-45%20min-blue)

**Series:** lyzr-adk in 10 Lessons | **Lesson:** 10 of 10

---

## Overview

You've learned all the building blocks. Now we put them all together.

In this capstone, you'll build a **Research Intelligence Agent** — a production-grade assistant that combines RAG, memory, tools, structured outputs, dynamic contexts, and RAI guardrails into a single, powerful agent.

---

## What You'll Build

The **Research Intelligence Agent** is a senior AI analyst that:

- Draws on a **knowledge base** of market research documents (RAG) to ground its answers in real data
- Uses **custom tools** — a web search stub and a citation formatter — to gather and format information
- Maintains **conversation memory** across a multi-turn research session so every follow-up builds on prior context
- Adapts its behavior using **dynamic contexts** — it knows the current project name, client, deadline, and the researcher's preferred report style
- Enforces **RAI guardrails** to detect toxicity, block prompt injection attacks, and redact PII before it ever reaches the LLM
- Produces a final **structured output** in the form of a `ResearchReport` Pydantic model, with typed fields for `topic`, `executive_summary`, `key_findings`, `recommendations`, `confidence_score`, and `sources_used`

By the end you'll have a working agent you could drop into a real enterprise workflow with minimal changes.

---

## Features Combined in This Lesson

| Feature | Source Lesson |
|---|---|
| `Studio` + `create_agent` + `agent.run` | Lesson 1 |
| Custom LLM provider (`openai/gpt-4o`) | Lesson 2 |
| Agent lifecycle (create, inspect) | Lesson 3 |
| Structured output (`ResearchReport`) | Lesson 4 |
| Memory + multi-turn sessions | Lesson 5 |
| Custom tools (`search_web`, `format_citation`) | Lesson 6 |
| Knowledge base / RAG | Lesson 7 |
| Dynamic contexts (project + profile) | Lesson 8 |
| RAI guardrails (toxicity, PII, injection) | Lesson 9 |

## Prerequisites

> **All Lessons 1–9 must be completed before running this notebook.**

This capstone assumes you are comfortable with every concept introduced in the series:

- **Lesson 1** — Connecting to Lyzr Studio and running your first agent
- **Lesson 2** — Choosing LLM providers and models (OpenAI, Anthropic, Google, etc.)
- **Lesson 3** — The full agent lifecycle: create, list, get, update, delete
- **Lesson 4** — Returning structured data with Pydantic and `response_format`
- **Lesson 5** — Conversation memory and persistent multi-turn sessions with `session_id`
- **Lesson 6** — Registering Python functions as callable tools with `agent.add_tool`
- **Lesson 7** — Building knowledge bases and enabling RAG with `knowledge_base_ids`
- **Lesson 8** — Injecting dynamic runtime context with `studio.create_context`
- **Lesson 9** — Configuring Responsible AI policies for safety and compliance

You'll also need:
- A valid `LYZR_API_KEY` set as an environment variable (or pasted directly below)
- The `lyzr-adk` and `pydantic` packages installed

In [None]:
!pip install lyzr-adk[jupyter] -q

In [None]:
import os
import uuid
import json
from lyzr import Studio
from pydantic import BaseModel, Field
from typing import List

API_KEY = os.getenv("LYZR_API_KEY", "YOUR_LYZR_API_KEY")
studio = Studio(api_key=API_KEY)

print("All imports ready! Let's build our Research Intelligence Agent.")

## Step 1: Define the Research Report Schema

Before we build the agent, we define what we want back from it.

By passing a Pydantic model as `response_format`, lyzr-adk instructs the LLM to return a JSON object that exactly matches the schema. The SDK then parses and validates the response, so `report_response.response` is a fully typed `ResearchReport` instance — not a raw string.

Our schema captures everything a strategy team needs in an executive research brief:

| Field | Type | Description |
|---|---|---|
| `topic` | `str` | The research topic analyzed |
| `executive_summary` | `str` | 2–3 sentence high-level summary |
| `key_findings` | `List[str]` | 5–7 specific, evidence-based findings |
| `recommendations` | `List[str]` | 3–5 actionable recommendations |
| `confidence_score` | `float` | Agent's confidence in the findings (0.0–1.0) |
| `sources_used` | `List[str]` | Sources and references cited |

In [None]:
class ResearchReport(BaseModel):
    topic: str = Field(description="The research topic analyzed")
    executive_summary: str = Field(description="2-3 sentence high-level summary for executives")
    key_findings: List[str] = Field(description="5-7 specific, evidence-based findings")
    recommendations: List[str] = Field(description="3-5 actionable recommendations")
    confidence_score: float = Field(ge=0.0, le=1.0, description="Confidence in findings from 0.0 to 1.0")
    sources_used: List[str] = Field(description="List of sources/references cited")

print("ResearchReport schema defined")
print(f"   Fields: {list(ResearchReport.model_fields.keys())}")

## Step 2: Build the Knowledge Base

Our agent needs background material to reason from — a foundation of facts it can cite in its report.

We create a **knowledge base** and load it with a market overview document covering:
- Key AI agent frameworks and their positioning
- Adoption statistics and market trends
- Enterprise usage patterns and top use cases

When the agent runs, lyzr-adk performs a **RAG retrieval** — it finds the most relevant passages from this KB and injects them into the LLM's context automatically, grounding the agent's answers in real data rather than hallucinations.

In [None]:
# Create the research knowledge base
kb = studio.create_knowledge_base(name="ai_trends_research_kb")
print(f"Knowledge base created: {kb.id}")

# Add background research content
research_background = """
# AI Agent Frameworks — Market Overview 2026

## Key Players
- **Lyzr ADK**: Enterprise-focused, production-ready, supports 20+ models. Key features: RAG, memory, tools, RAI guardrails.
- **LangChain**: Open-source framework with large ecosystem. Complex setup, flexible.
- **AutoGen**: Microsoft's multi-agent framework. Strong for agentic workflows.
- **CrewAI**: Role-based multi-agent system. Good for team simulations.

## Market Trends
1. RAG adoption is up 340% year-over-year as enterprises ground LLMs in private data.
2. Safety-first development: 78% of enterprise buyers require built-in guardrails.
3. Multi-model flexibility: teams want to switch between OpenAI, Anthropic, and Google.
4. Memory and sessions: persistent agents outperform stateless agents in user satisfaction (3.2x).
5. Tool calling is now standard: 94% of production agents use at least one external tool.

## Enterprise Adoption
- 62% of Fortune 500 companies are piloting AI agents in 2026.
- Average production agent uses 3.4 tools and 1.2 knowledge bases.
- Top use cases: customer support (34%), internal Q&A (28%), data analysis (19%).
"""

kb.add_text(text=research_background, source="ai-market-report-2026")
print("Background research added to KB")

## Step 3: Define Research Tools

Our agent needs two tools:

1. **`search_web`** — simulates a live web search for current market data. In a real deployment you'd call the Tavily, SerpAPI, or Bing Search API here. The agent calls this tool autonomously when it determines it needs fresh information.

2. **`format_citation`** — takes a source name and a short description and returns a consistently formatted citation string. This ensures every source in `sources_used` follows the same format regardless of which turn the agent found it in.

Key point: the **docstring is the tool's description** for the LLM. Write it clearly — the model reads the docstring to decide when and how to call the function.

In [None]:
def search_web(query: str) -> str:
    """Search the web for the latest information, news, and data on a given topic.

    Returns relevant search results including titles, snippets, and sources.
    Use this when you need current information beyond your training data.
    """
    # Simulated search results (in production, call a real search API)
    simulated_results = {
        "ai agent frameworks": (
            "Result: Top frameworks in 2026 include Lyzr ADK, LangChain, AutoGen, CrewAI. "
            "Lyzr leads in enterprise adoption due to built-in safety features."
        ),
        "rag adoption rate": (
            "Result: RAG adoption grew 340% in 2025-2026. "
            "71% of enterprises now use RAG for internal knowledge management."
        ),
        "default": (
            f"Result: Found recent articles and reports about '{query}'. "
            "Multiple authoritative sources confirm growing interest in this area."
        ),
    }
    query_lower = query.lower()
    for key in simulated_results:
        if key in query_lower:
            return simulated_results[key]
    return simulated_results["default"]


def format_citation(source_name: str, description: str) -> str:
    """Format a source name and description into a properly formatted citation for a research report.

    Use this when compiling the sources_used list in the final report.
    """
    return f"[{source_name}]: {description}"


print("Tools defined: search_web, format_citation")

## Step 4: Configure Safety Guardrails

Production agents handle real user input — and real user input can be malicious, sensitive, or toxic. We attach a **Responsible AI policy** to guard against three threat vectors:

| Guardrail | Setting | Effect |
|---|---|---|
| Toxicity | `True` | Blocks requests containing harmful or abusive content |
| Prompt injection | `True` | Detects attempts to hijack the agent's instructions |
| PII | `"redact"` | Strips names, emails, phone numbers, etc. before they reach the LLM |

The policy is enforced at the platform level on every `agent.run` call — no extra code required at inference time.

In [None]:
rai_policy = studio.create_rai_policy(
    name="Research Agent Safety",
    toxicity=True,
    prompt_injection=True,
    pii="redact"   # Redact any PII from research queries before the LLM sees them
)
print(f"RAI policy created: {rai_policy.id}")

## Step 5: Set Up Dynamic Contexts

Contexts let us inject runtime information into the agent's system prompt without hardcoding it.

We create two contexts:

**`research_project`** — tells the agent what project it's working on, who the client is, and when the deliverable is due. This shapes the report's tone and scope.

**`researcher_profile`** — tells the agent the preferences of the human who'll receive the output: their expertise level, preferred report style, and citation format. This makes the output feel tailored rather than generic.

In a real deployment, these values would be fetched dynamically from a database or user session on each call — the agent always has fresh context without any code changes.

In [None]:
# Current research project context
project_ctx = studio.create_context(
    name="research_project",
    value=(
        "Project: AI Agent Framework Analysis 2026 | "
        "Client: TechCorp Strategy Team | "
        "Deadline: End of Q1 2026 | "
        "Deliverable: Executive research report"
    )
)

# Researcher profile context
user_ctx = studio.create_context(
    name="researcher_profile",
    value=(
        "Researcher: Senior AI Analyst | "
        "Expertise: Enterprise software, AI/ML | "
        "Report style: Concise, data-driven, executive-friendly | "
        "Citation style: APA"
    )
)

print("Contexts created:")
print(f"   Project: {project_ctx.id}")
print(f"   Profile: {user_ctx.id}")

## Step 6: Assemble the Agent

This is where everything comes together.

We call `studio.create_agent` with a provider, role, goal, and instructions, then chain on all the features we've prepared. The order of the `.add_*` calls doesn't matter — lyzr-adk assembles the full agent configuration before the first `run`.

```
create_agent         → identity, LLM, role, goal, instructions, KB
  .add_tool          → search_web
  .add_tool          → format_citation
  .add_memory        → 15-message sliding window
  .add_context       → project details
  .add_context       → researcher profile
  .add_rai_policy    → toxicity + injection + PII redaction
```

In [None]:
research_agent = studio.create_agent(
    name="Research Intelligence Agent",
    provider="openai/gpt-4o",
    role="Senior AI research analyst with expertise in enterprise technology",
    goal="Produce comprehensive, accurate, actionable research reports grounded in evidence",
    instructions=(
        "You are a senior research analyst. Your reports are data-driven and concise. "
        "Always use the search_web tool to find current information. "
        "Always use the knowledge base context when available. "
        "Format citations using the format_citation tool. "
        "When generating reports, be specific and include real statistics. "
        "Match the researcher's preferred style from the profile context."
    ),
    knowledge_base_ids=[kb.id],
    response_model=ResearchReport   # structured output set at creation time
)

# Add all features
research_agent.add_tool(search_web)
research_agent.add_tool(format_citation)
research_agent.add_memory(max_messages=15)
research_agent.add_context(project_ctx)
research_agent.add_context(user_ctx)
research_agent.add_rai_policy(rai_policy)

print("Research Intelligence Agent assembled!")
print(f"   Agent ID: {research_agent.id}")
print()
print("Features enabled:")
print("  Knowledge base (RAG)")
print("  Tools (search_web, format_citation)")
print("  Memory (15 messages)")
print("  Contexts (project + profile)")
print("  RAI guardrails (toxicity, PII, injection)")
print("  Structured output (ResearchReport)")

## Step 7: Run the Research Session

We drive the agent through a **multi-turn research workflow** using a shared `session_id`.

Each turn builds on the last:
1. **Turn 1 — Scope:** Define the research question and identify frameworks to analyze
2. **Turn 2 — Data gathering:** Trigger the web search tool for live market data
3. **Turn 3 — Report generation:** Synthesize everything into a structured `ResearchReport`

Because memory is enabled, the agent remembers what it said in Turn 1 when answering Turn 2, and has the full conversation history when composing the final report in Turn 3.

In [None]:
# Start a research session
session_id = str(uuid.uuid4())
print(f"Research session started: {session_id}\n")

# Turn 1: Scope the research
r1 = research_agent.run(
    "I need to research AI agent frameworks for our enterprise strategy report. "
    "What are the key frameworks we should analyze?",
    session_id=session_id
)
print(f"Turn 1 — Scoping:\n{r1.response}\n")
print("-" * 60)

In [None]:
# Turn 2: Deep dive with the web search tool
r2 = research_agent.run(
    "Search for the latest adoption rates and market trends for AI agent frameworks in enterprise.",
    session_id=session_id
)
print(f"Turn 2 — Market data:\n{r2.response}\n")
print("-" * 60)

In [None]:
# Turn 3: Generate the final structured report
# response_model=ResearchReport was set on the agent, so run() returns ResearchReport directly
print("Generating structured research report...\n")

report: ResearchReport = research_agent.run(
    "Based on our research session, generate a comprehensive research report for the TechCorp strategy team.",
    session_id=session_id
)

# Display the report
print("=" * 60)
print("RESEARCH REPORT")
print("=" * 60)
print(f"\nTopic: {report.topic}")
print(f"\nExecutive Summary:\n{report.executive_summary}")
print(f"\nKey Findings:")
for i, finding in enumerate(report.key_findings, 1):
    print(f"   {i}. {finding}")
print(f"\nRecommendations:")
for i, rec in enumerate(report.recommendations, 1):
    print(f"   {i}. {rec}")
print(f"\nConfidence Score: {report.confidence_score:.1%}")
print(f"\nSources Used:")
for source in report.sources_used:
    print(f"   - {source}")

## What We Built

Here's every feature used in this capstone and where it came from:

| Feature | API call | Lesson |
|---|---|---|
| Studio client | `Studio(api_key=...)` | 1 |
| Custom LLM provider | `provider="openai/gpt-4o"` | 2 |
| Agent creation | `studio.create_agent(...)` | 1, 3 |
| Knowledge base | `studio.create_knowledge_base()` + `kb.add_text()` | 7 |
| RAG retrieval | `knowledge_base_ids=[kb.id]` | 7 |
| Custom tools | `agent.add_tool(search_web)` | 6 |
| Tool with side effect | `agent.add_tool(format_citation)` | 6 |
| Conversation memory | `agent.add_memory(max_messages=15)` | 5 |
| Multi-turn session | `session_id=session_id` on every `run` | 5 |
| Dynamic project context | `studio.create_context(name="research_project", ...)` | 8 |
| Dynamic user context | `studio.create_context(name="researcher_profile", ...)` | 8 |
| RAI guardrails | `studio.create_rai_policy(toxicity=True, pii="redact", ...)` | 9 |
| Structured output | `response_format=ResearchReport` | 4 |
| Typed response access | `report_response.response` as `ResearchReport` | 4 |

Every one of these features works independently — and they compose cleanly because lyzr-adk was designed from the start for production use cases exactly like this one.

## Congratulations!

You've completed all 10 lessons of **lyzr-adk in 10 Lessons**.

Here's the full journey:

```
Lesson 1:  Studio + create_agent + agent.run
Lesson 2:  Providers & models (20+ LLMs)
Lesson 3:  Agent lifecycle (CRUD)
Lesson 4:  Structured outputs (Pydantic)
Lesson 5:  Memory & sessions
Lesson 6:  Custom tools & functions
Lesson 7:  RAG & knowledge bases
Lesson 8:  Dynamic contexts
Lesson 9:  Responsible AI guardrails
Lesson 10: Capstone — all features combined
```

You now know how to build production-grade AI agents that are:
- **Grounded** — RAG keeps answers anchored to real data
- **Capable** — tools extend what the agent can do beyond the LLM alone
- **Persistent** — memory makes sessions feel like a real conversation
- **Adaptive** — contexts let the agent tailor itself to each user or project
- **Safe** — RAI guardrails protect users and your organization
- **Structured** — Pydantic outputs make agent responses programmable

The Research Intelligence Agent you built today is not a demo — with a real search API key and your own knowledge base documents, it's deployable as-is.

## Beyond This Series

### Optional Advanced Lessons

The series continues with three optional advanced lessons:

- **Lesson 11: Streaming** — stream token-by-token responses for real-time UIs using `agent.stream`
- **Lesson 12: Image & File Generation** — agents that produce images, PDFs, and structured files
- **Lesson 13: Advanced Features** — webhooks, agent-to-agent calls, and deployment patterns

### Deploy to Production

- Full documentation: [docs.lyzr.ai](https://docs.lyzr.ai)
- API reference, deployment guides, and integration examples

### Community & Support

- Join the community: [discord.gg/lyzr](https://discord.gg/lyzr)
- Report issues, share agents, get help from the team and other builders

### Package

- PyPI: [pypi.org/project/lyzr-adk](https://pypi.org/project/lyzr-adk/)
- `pip install lyzr-adk` — always installs the latest stable release

---

*Built with lyzr-adk — the fastest way to ship production AI agents.*