# Multi-Agent Report Generation with AgentWorkflow

In this notebook, we will explore how to use an `AgentWorkflow` in LlamaIndex to create multi-agent systems. Specifically, we will create a system that can generate a report on a given topic.

For this, we leverage both local serving of a `qwen3-8b` Small Language Model (SLM) as our LLM, served by LM Studio, and a state-of-the-art model `gemini-2.5-flash` hosted by Google Cloud. For all supported LLM inference providers and models by LlamaIndex, check out the [examples documentation](https://docs.llamaindex.ai/en/stable/examples/llm/openai/) or [LlamaHub](https://llamahub.ai/?tab=llms) for a list of all supported LLMs and how to install/use them.

Note that if we wanted, each agent could have a different LLM, but for this example, we will use the same LLM for all agents.

## Prerequisites

Before running this notebook, ensure you have the following requirements met:

### 1. API Keys and Services

You'll need to obtain API keys for the following services:

- **Google Gemini API Key**: Required for the LLM functionality
  - Visit [Google AI Studio](https://aistudio.google.com/app/apikey) to get your API key
  - Set as `GOOGLE_API_KEY` environment variable

- **Tavily Search API Key**: Required for web search functionality
  - Visit [Tavily](https://tavily.com/) to create an account and get your API key
  - Set as `TAVILY_SEARCH_API_KEY` environment variable

### 2. Environment Setup

Create a `.env` file in your project root with the following variables:
```
GOOGLE_API_KEY=your_google_api_key_here
TAVILY_SEARCH_API_KEY=your_tavily_api_key_here
```

### 3. Python Dependencies

We leverage `uv` as a high-performance Python package and project manager. Once you have cloned this repository, just run `uv sync` to handle dependency management.

### 4. Directory Structure

The project has the following directory structure:
```
multi-agent-governance/
├── .env
├── agent_workflow_multi.ipynb
├── data/
│   ├── input/          # Place PDF files here
│   └── output/         # Converted markdown files will be stored here
└── database/
    └── vector_store/   # ChromaDB database will be created here
```

### 5. Optional: Local LLM Setup

If you want to use a local LLM instead of Google Gemini:

- **LM Studio**: Download and install [LM Studio](https://lmstudio.ai/)
- **Model**: Download a compatible model (e.g., qwen3-8b)
- **Configuration**: Start LM Studio server on `http://127.0.0.1:1234/v1`

### 6. Input Data

- Place any PDF documents you want to process in the `data/input/` directory
- The notebook includes an example with `internet-history-09.pdf`

**Note**: This notebook demonstrates advanced multi-agent workflows and may take several minutes to complete, especially during the initial setup and PDF processing phases.


## Execution Variables Setup

This section includes all the necessary variables to the execution of the RAG pipeline and the agentic flow. We also perform a quick test against the model endpoint before proceeding.

In [1]:
# Load environment variables from .env file
import os
from dotenv import load_dotenv
load_dotenv()

# Environment variables for local LM studio inference
model = "qwen/qwen3-8b"
base_url = "http://127.0.0.1:1234/v1"
api_key = ""

# Environment variables for Google GenAI API inference
google_api_key = os.getenv("GOOGLE_API_KEY", "")
google_model = "gemini-2.5-pro"
google_model_eval = "gemini-2.5-pro"

# Environment variables for Ramalama serving
ramalama_model = "llama3.2:3b"
ramalama_url = "http://localhost:8080/v1"

# Environment variables for local data
dir_input = './data/input'
dir_output = './data/output'
dir_chromadb = './database/vector_store/'
chromadb_collection = 'nvidia'

In [11]:
# Fix for "RuntimeError: This event loop is already running"
import nest_asyncio
nest_asyncio.apply()

from llama_index.llms.google_genai import GoogleGenAI
from llama_index.llms.openai_like import OpenAILike
from llama_index.core.base.llms.types import ChatMessage, MessageRole

# Initialize the Google GenAI client with the API key
llm = GoogleGenAI(
    model=google_model,
    api_key=google_api_key,  
)

# Initialize the Ramalama client with OpenAILike 
# OpenAILike is thin wrapper around the OpenAI model that makes it compatible 
# with 3rd party tools that provide an openai-compatible api
#llm = OpenAILike(
#    model=ramalama_model,
#    api_base=ramalama_url,
#    api_key="none",
#    context_window=8096,
#    is_chat_model=True,
#    is_function_calling_model=True,
#)

In [3]:
# Test the LLM endpoint with a simple prompt
response = llm.complete("Write a paragraph on the history of the internet.")
print(str(response))

The history of the internet is a complex and multifaceted story that spans several decades. The modern internet as we know it today began to take shape in the 1960s, when the United States Department of Defense's Advanced Research Projects Agency (ARPA) funded a project to create a network of computers that could communicate with each other. This project, called ARPANET, was the first operational packet switching network, and it was launched in 1969. ARPANET was designed to be a robust and fault-tolerant network that could survive a nuclear attack, and it was initially used by government and academic researchers to communicate with each other. In the 1980s, the Internet Protocol (IP) was developed, which allowed different networks to communicate with each other and formed the basis of the modern internet. The World Wide Web (WWW) was invented in 1989 by Tim Berners-Lee, a British computer scientist, who developed the HTTP protocol and the first web browser. The internet quickly expande

## System Design

Our system will have three agents:

1. A `ResearchAgent` that will search local data as well as the web for information on the given topic.
2. A `WriteAgent` that will write the report by summarising the information found by the `ResearchAgent`.
3. A `ReviewAgent` that will review the report and provide feedback.

We will use the `AgentWorkflow` class to create a multi-agent system that will execute these agents in order. Also the `ResearchAgent` is meant to orchestrate a controlled evaluation via real-time feedback from evaluation tooling provided to assess the quality of responses from its tools and other agents.

While there are many ways to implement this system, in this case, we will use a few tools to help with the research and writing processes.

1. A `web_search` tool to search the web for information on the given topic.
2. A `query_engine` tool to query local documents via Query Engine (RAG)
3. A `evaluate_rag_quality` tool to evaluate the quality of the response from the `query_engine`
4. A `record_notes` tool to record notes on the given topic.
5. A `write_report` tool to write the report using the information found by the `ResearchAgent`.
6. A `review_report` tool to review the report and provide feedback.

Utilizing the `Context` class, we can pass state between agents, and each agent will have access to the current state of the system.


## RAG Pipeline

### Function convert_pdfs_to_markdown

The function takes two arguments: the directory containing PDF files and the directory where the converted Markdown files will be saved. The function checks if the output directory exists and creates it if necessary. It then iterates over all PDF files in the input directory, converts each to Markdown using the DocumentConverter class, and saves the result in the output directory.

In [4]:
from warnings import filterwarnings
from docling.document_converter import DocumentConverter

# Suppress warning from easyocr to avoid cluttering the output of the conversion process
filterwarnings(action="ignore", category=FutureWarning, module="easyocr") 


def convert_pdfs_to_markdown(pdf_dir, md_dir):
	if not os.path.exists(md_dir):
		os.makedirs(md_dir)

	pdf_files = [f for f in os.listdir(pdf_dir) if f.endswith('.pdf')]
	for pdf_file in pdf_files:
		pdf_path = os.path.join(pdf_dir, pdf_file)
		md_path = os.path.join(md_dir, f"{os.path.splitext(pdf_file)[0]}.md")

		if not os.path.exists(md_path):
			print(f"Converting `{pdf_file}` to Markdown ...")

			doc_converter = DocumentConverter()
			result = doc_converter.convert(source=pdf_path)
			
			with open(md_path, 'w', encoding='utf-8') as md_file:
				md_file.write(result.document.export_to_markdown())

### Execute the convert_pdfs_to_markdown function

Convert all PDFs in the specified input directory to Markdown format and saving them in the output directory. The function prints messages to indicate the progress of the conversion process.

In [5]:
convert_pdfs_to_markdown(dir_input, dir_output)

Converting `nvda-20250126.pdf` to Markdown ...




Converting `nvda-20250427.pdf` to Markdown ...


### Initializes models and clients required for generating the vector database.

We create an embedding model using the HuggingFace library then read the converted Markdown documents from the output directory and loads them into a SimpleDirectoryReader.

Next, the code initializes a ChromaDB client and creates or retrieves a collection within the database. It sets up a vector store using the ChromaDB collection and a storage context with default settings. Finally, it creates a VectorStoreIndex from the loaded documents, using the embedding model for vectorization. The process concludes with a print statement indicating that the vector database has been successfully generated.

In [6]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
import chromadb

chroma_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(input_dir=dir_output).load_data()

chroma_client = chromadb.PersistentClient(path = dir_chromadb)
chroma_collection = chroma_client.get_or_create_collection(name=chromadb_collection)

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, embed_model=chroma_embed_model)

print("Vector database successfully generated!")

NLTK download error: File is not a zip file


  return forward_call(*args, **kwargs)


Vector database successfully generated!


### Test a simple query to the vector database

In [7]:
test_query = "What are some recent litigations faced by NVIDIA?"
result = index.as_query_engine(llm=llm).query(test_query)
print(f"Q: {test_query}\nA: {result.response.strip()}\n\nSources:")
display([(n.text, n.metadata) for n in result.source_nodes])

Q: What are some recent litigations faced by NVIDIA?
A: securities class action lawsuit, captioned 4:18-cv-07669-HSG, initially filed on December 21, 2018 in the United States District Court for the Northern District of California, and titled In Re NVIDIA Corporation Securities Litigation, filed an amended complaint on May 13, 2020. 
Another putative derivative action was filed on October 30, 2023 in the Court of Chancery of the State of Delaware, captioned Horanic v. Huang, et al. (Case No. 2023-1096-KSJM). 
Additionally, there is a putative derivative lawsuit pending in the United States District Court for the Northern District of California, captioned 4:19-cv00341-HSG, initially filed January 18, 2019 and titled In re NVIDIA Corporation Consolidated Derivative Litigation, which was stayed pending resolution of the plaintiffs' appeal in the In Re NVIDIA Corporation Securities Litigation action.

Sources:


[("The case has not yet been reopened by the court. The lawsuit asserts claims, purportedly on behalf of us, against certain officers and directors of the Company for breach of fiduciary duty, unjust enrichment, waste of corporate assets, and violations of Sections 14(a), 10(b), and 20(a) of the Exchange Act based on the dissemination of allegedly false and misleading statements related to channel inventory and the impact of cryptocurrency mining on GPU demand. The plaintiffs are seeking unspecified damages and other relief, including reforms and improvements to NVIDIA's corporate governance and internal procedures.\n\nThe putative derivative actions initially filed September 24, 2019 and pending in the United States District Court for the District of Delaware, Lipchitz v. Huang, et al. (Case No. 1:19-cv-01795-MN) and Nelson v. Huang, et. al. (Case No. 1:19-cv-01798-MN), were stayed pending resolution of the plaintiffs' appeal in the In Re NVIDIA Corporation Securities Litigation actio

## DeepEval Integration for RAG/LLM Evaluation

Now we’ll set DeepEval metrics to evaluate our overall agentic application as well as elements of the system, including the RAG pipeline and the report writing built on LlamaIndex.

DeepEval’s metrics are powered by LLM-as-Judge. Here we override the default provider OpenAI and model `gpt-4o` which are used by default, to use a Google Gemini model `gemini-2.5-pro`. For the purposes of this agentic application, we use a set of metrics including G-Eval, Bias for overall system evaluation, 3 RAG metrics (Answer Relevancy, Faithfulness, and Contextual Precision) for our RAG pipeline, and the Summarization metric to assess the quality of the report writing process.These can help us measure:

1. **Faithfulness**: Measure whether the RAG agent output factually aligns with the contents of the RAG,s retrival context 
2. **Answer Relevancy**: How relevant the answer of the RAG agent is to the question
3. **Contextual Relevancy**: How relevant the retrieved context of the RAG pipeline is to the question
4. **Bias Detection**: Whether the response contains biased content
5. **Summarization**: Whether the response 
6. **G-Eeval**: Allows us to use a Reasoning LLM to act as a "judge," scoring the final report generated by our agentic system based on our own custom-defined criteria for overall performance.

These metrics will be used to assess the quality of our multi-agent system's outputs.


In [8]:
# Use native integration with Google Gemini for evaluation model
from deepeval.models import GeminiModel
eval_model = GeminiModel(
    model_name=google_model_eval,
    api_key=google_api_key
)
from deepeval.metrics import (
    AnswerRelevancyMetric,
    FaithfulnessMetric,
    ContextualRelevancyMetric,
    GEval,
)
from deepeval.test_case import LLMTestCase
from deepeval.test_case import LLMTestCaseParams
from deepeval import evaluate

metric_answer_relevancy = AnswerRelevancyMetric(
        model=eval_model,
        threshold=0.8)
metric_faithfulness = FaithfulnessMetric(
        model=eval_model,
        threshold=0.8)
metric_contextual_relevancy = ContextualRelevancyMetric(
        model=eval_model,
        threshold=0.8)

# Define a specific GEval metric for completeness
metric_completeness = GEval(
    name="Completeness",
    model=eval_model,
    evaluation_steps=[
        "Determine if the response answers every part of the input or question.",
        "Identify any missing elements, skipped sub-questions, or incomplete reasoning.",
        "Check whether the output provides sufficient detail for each aspect mentioned.",
        "Do not penalize for brevity if the coverage is complete and accurate."
    ],
    evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.INPUT],
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Overriding of current TracerProvider is not allowed


### Test RAG Evaluation Metrics for a simple query on the RAG database

In [12]:
# Run a test query against the indexed local data and evaluate the response
test_query = "What are some recent litigations faced by NVIDIA?"
result = index.as_query_engine(llm=llm, similarity_top_k=3).query(test_query)
print(f"Q: {test_query}\nA: {result.response.strip()}\n\nSources:")
display([(n.text, n.metadata) for n in result.source_nodes])

# Extract the actual output (generated answer) from the model
actual_output = result.response

# Extract the retrieved context used to generate the answer
retrieval_context = [source_node.get_content() for source_node in result.source_nodes]

# Create a test case object to evaluate the model's performance
test_case = LLMTestCase(
    input=test_query,  # the input question
    actual_output=actual_output,  # the model's generated answer
    retrieval_context=retrieval_context  # the supporting retrieved context
)

# Evaluate the RAG metrics for this test case
evaluate([test_case], [metric_answer_relevancy, metric_faithfulness, metric_contextual_relevancy])

# Print the evaluation results
print(metric_answer_relevancy.score)
print(metric_faithfulness.score) 
print(metric_contextual_relevancy.score) 

Q: What are some recent litigations faced by NVIDIA?
A: NVIDIA is facing several securities class action and derivative lawsuits. The core allegations across these cases relate to the company and certain executives making false or misleading statements regarding channel inventory and the effect of cryptocurrency mining on GPU demand.

Here are details on some of the specific legal actions:

*   **In Re NVIDIA Corporation Securities Litigation:** This putative securities class action lawsuit was initially filed in December 2018. It alleges that NVIDIA and certain executives violated the Securities Exchange Act. After being dismissed by a district court, the case was appealed. The Supreme Court reviewed the case but ultimately dismissed the writ of certiorari in December 2024. In February 2025, the case was sent back to the district court for further proceedings. The plaintiffs are seeking class certification and unspecified compensatory damages.

*   **In re NVIDIA Corporation Consolida

[("The case has not yet been reopened by the court. The lawsuit asserts claims, purportedly on behalf of us, against certain officers and directors of the Company for breach of fiduciary duty, unjust enrichment, waste of corporate assets, and violations of Sections 14(a), 10(b), and 20(a) of the Exchange Act based on the dissemination of allegedly false and misleading statements related to channel inventory and the impact of cryptocurrency mining on GPU demand. The plaintiffs are seeking unspecified damages and other relief, including reforms and improvements to NVIDIA's corporate governance and internal procedures.\n\nThe putative derivative actions initially filed September 24, 2019 and pending in the United States District Court for the District of Delaware, Lipchitz v. Huang, et al. (Case No. 1:19-cv-01795-MN) and Nelson v. Huang, et. al. (Case No. 1:19-cv-01798-MN), were stayed pending resolution of the plaintiffs' appeal in the In Re NVIDIA Corporation Securities Litigation actio



Metrics Summary

  - ✅ Answer Relevancy (score: 1.0, threshold: 0.8, strict: False, evaluation model: gemini-2.5-pro, reason: The score is 1.00 because the output is perfectly relevant to the user's request. It directly answers the question about NVIDIA's recent litigations without including any unnecessary information. Excellent job!, error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.8, strict: False, evaluation model: gemini-2.5-pro, reason: The score is 1.0 because the output is perfectly faithful to the provided context, with no contradictions found. Excellent work!, error: None)
  - ❌ Contextual Relevancy (score: 0.75, threshold: 0.8, strict: False, evaluation model: gemini-2.5-pro, reason: The score is 0.75 because while the context provides highly relevant details on specific lawsuits like 'Lipchitz v. Huang, et al.' and 'In Re NVIDIA Corporation Securities Litigation', it also contains irrelevant general financial information such as 'Accounting for Loss Contingencies

None
None
None


## Agentic System Setup

We will start by defining the set of tools available to our Agents.

In [16]:
from tavily import AsyncTavilyClient
from llama_index.core.workflow import Context

async def search_web(query: str) -> str:
    """Useful for using the web to answer questions."""
    client = AsyncTavilyClient(api_key=os.environ.get("TAVILY_SEARCH_API_KEY"))
    return str(await client.search(query))

async def query_data(query: str) -> str:
    """"Query local vector database for information from Quarterly and Yearly Financial Reports."""
    result = index.as_query_engine(llm=llm).query(query)
    formatted_output = f"Q: {query}\nA: {result.response.strip()}\n\nSources:\n{[(n.text, n.metadata) for n in result.source_nodes]}"
    
    # Extract the actual output (generated answer) from the model
    actual_output = result.response
    # Extract the retrieved context used to generate the answer
    retrieval_context = [source_node.get_content() for source_node in result.source_nodes]
    
    # Store evaluation data for later assessment (key by query for simplicity)
    global rag_evaluation_data
    rag_evaluation_data[query] = {
        'query': query,
        'actual_output': actual_output,
        'retrieval_context': retrieval_context,
    }
    
    # Return the formatted output immediately without running evaluation
    return str(formatted_output)

# Global storage for evaluation data
rag_evaluation_data = {}

async def evaluate_rag_quality(query: str) -> str:
    """Evaluate the quality of the last RAG retrieval and response for a given query."""
    if query not in rag_evaluation_data:
        return "No RAG data found for this query. Please run query_data first."
    
    try:
        eval_data = rag_evaluation_data[query]
        
        # Create a test case object to evaluate the model's performance
        test_case = LLMTestCase(
            input=eval_data['query'],
            actual_output=eval_data['actual_output'],
            retrieval_context=eval_data['retrieval_context']
        )
        
        # Evaluate the RAG metrics for this test case
        evaluate([test_case], [metric_answer_relevancy, metric_faithfulness, metric_contextual_relevancy])
        
        # Determine if quality is acceptable (you can adjust these thresholds)
        answer_relevancy_score = metric_answer_relevancy.score or 0
        faithfulness_score = metric_faithfulness.score or 0
        contextual_relevancy_score = metric_contextual_relevancy.score or 0
        
        # Create evaluation summary
        evaluation_summary = f"""RAG Quality Evaluation for query: "{query}"
        
Metrics:
- Answer Relevancy: {answer_relevancy_score:.2f} (threshold: 0.8)
- Faithfulness: {faithfulness_score:.2f} (threshold: 0.8)  
- Contextual Relevancy: {contextual_relevancy_score:.2f} (threshold: 0.8)

Quality Assessment:
- Answer Relevancy: {'✅ PASS' if answer_relevancy_score >= 0.8 else '❌ FAIL'}
- Faithfulness: {'✅ PASS' if faithfulness_score >= 0.8 else '❌ FAIL'}
- Contextual Relevancy: {'✅ PASS' if contextual_relevancy_score >= 0.8 else '❌ FAIL'}

Overall Quality: {'✅ ACCEPTABLE' if sum(score >= 0.8 for score in [answer_relevancy_score, faithfulness_score, contextual_relevancy_score]) >= 2 else '❌ NEEDS IMPROVEMENT'}

Recommendation: {'The RAG retrieval quality is good. You can proceed with this answer.' if sum(score >= 0.8 for score in [answer_relevancy_score, faithfulness_score, contextual_relevancy_score]) >= 2 else 'The RAG retrieval quality is poor. Consider trying a different query or search approach.'}"""
        
        return evaluation_summary
        
    except Exception as e:
        return f"Error during RAG evaluation: {str(e)}"

async def record_notes(ctx: Context, notes: str, notes_title: str) -> str:
    """Useful for recording notes on a given topic. Your input should be notes with a title to save the notes under."""
    current_state = await ctx.get("state")
    if "research_notes" not in current_state:
        current_state["research_notes"] = {}
    current_state["research_notes"][notes_title] = notes
    await ctx.set("state", current_state)
    return "Notes recorded."


async def write_report(ctx: Context, report_content: str) -> str:
    """Useful for writing a report on a given topic. Your input should be a markdown formatted report."""
    current_state = await ctx.get("state")
    current_state["report_content"] = report_content
    await ctx.set("state", current_state)
    return "Report written."


async def review_report(ctx: Context, review: str) -> str:
    """Useful for reviewing a report and providing feedback. Your input should be a review of the report."""
    current_state = await ctx.get("state")
    current_state["review"] = review
    await ctx.set("state", current_state)
    return "Report reviewed."

With our tools defined, we can now create our agents.

If the LLM you are using supports tool calling, you can use the `FunctionAgent` class. Otherwise, you can use the `ReActAgent` class.

Here, the name and description of each agent is used so that the system knows what each agent is responsible for and when to hand off control to the next agent.

In [17]:
from llama_index.core.agent.workflow import FunctionAgent, ReActAgent

test_agent_search = ReActAgent(
    tools=[search_web],
    llm=llm,
    system_prompt="You are a helpful assistant that can search the web for information.",
)

test_agent_query = ReActAgent(
    tools=[query_data, evaluate_rag_quality],
    llm=llm,
    system_prompt=(
        "You are a helpful assistant that can ONLY answer questions by querying a local vector database for information from Quarterly and Yearly Financial Reports. "
        "IMPORTANT: You do NOT have any recent knowledge of corporate data - you MUST use the query_data tool for every question. "
        "WORKFLOW (follow this exact order): "
        "1. ALWAYS start by using the query_data tool to search the vector database "
        "2. ALWAYS evaluate the quality using the evaluate_rag_quality tool "
        "3. Only then provide your final response in markdown format "
        "4. If quality is poor, try rephrasing the query and repeat steps 1-2 "
        "NEVER answer questions directly without using the query_data tool first. "
        "Your response should include both the answer and the evaluation results."
    ),
)

research_agent = ReActAgent(
    name="ResearchAgent",
    description="Useful for searching the web for information on a given topic and recording notes on the topic.",
    system_prompt=(
        "You are the ResearchAgent that can search local data or on the web for information on a given topic and record notes on the topic."
        "You should first search the local vector database with the query_data tool for information on the topic if relevant to the information stored. "
        "After getting results, you can optionally evaluate the RAG quality using the evaluate_rag_quality tool to ensure good results. "
        "If not sufficient, you should then search web with the search_web tool for information on the topic. "
        "You should always record notes on the topic using the record_notes tool. "
        "Once notes are recorded and once you are satisfied, you should always hand off control to the WriteAgent to write a report on the topic. "
        "You should have at least some notes on a topic before handing off control to the WriteAgent."
    ),
    llm=llm,
    tools=[query_data, search_web, record_notes, evaluate_rag_quality],
    can_handoff_to=["WriteAgent"],
)

write_agent = ReActAgent(
    name="WriteAgent",
    description="Useful for writing a report on a given topic.",
    system_prompt=(
        "You are the WriteAgent that can write a report on a given topic. "
        "Your report should be in a markdown format. The content should be grounded in the research notes. "
        "Once the report is written, you should get feedback at least once from the ReviewAgent."
    ),
    llm=llm,
    tools=[write_report],
    can_handoff_to=["ReviewAgent", "ResearchAgent"],
)

review_agent = ReActAgent(
    name="ReviewAgent",
    description="Useful for reviewing a report and providing feedback.",
    system_prompt=(
        "You are the ReviewAgent that can review the write report and provide feedback. "
        "Your review should either approve the current report or request changes for the WriteAgent to implement. "
        "If you have feedback that requires changes, you should hand off control to the WriteAgent to implement the changes after submitting the review."
    ),
    llm=llm,
    tools=[review_report],
    can_handoff_to=["WriteAgent"],
)

## Testing a single agent

Use the test agent to ensure that agent and tools work with the chosen model

In [15]:
response = await test_agent_search.run(user_msg="What is the weather in San Francisco?")
print(str(response))

The weather in San Francisco is currently 62.6°F and misty. The wind is blowing from the WSW at 12.1 mph, and the humidity is at 83%.


In [18]:
rag_evaluation_data.clear()
response = await test_agent_query.run(user_msg="What are some recent litigations faced by NVIDIA?")
print(str(response))

# Check what evaluation data was stored
print("Stored RAG evaluation data:")
print("=" * 40)
for query, data in rag_evaluation_data.items():
    print(f"Query: {query}")
    print(f"Answer length: {len(data['actual_output'])} characters")
    print(f"Number of sources: {len(data['retrieval_context'])}")
    print("-" * 40)

Based on recent financial reports, NVIDIA is currently involved in several significant legal proceedings, primarily centered around a securities class action lawsuit and related derivative actions.

Here is a summary of the key litigations:

### **1. In Re NVIDIA Corporation Securities Litigation**

*   **Allegations:** This is the main lawsuit, first filed in December 2018. It alleges that between May 2017 and November 2018, NVIDIA and some of its executives made false or misleading statements about the company's channel inventory and the true impact of cryptocurrency mining on the demand for its GPUs.
*   **Current Status:** The case has a complex history. After being dismissed by a district court, the dismissal was partially reversed on appeal. The U.S. Supreme Court took up the case but ultimately dismissed it in December 2024. As of February 2025, the case has been sent back to the district court to proceed. The plaintiffs are seeking class certification and unspecified damages.



## Running the Workflow

With our agents defined, we can create our `AgentWorkflow` and run it.

In [19]:
from llama_index.core.agent.workflow import AgentWorkflow

agent_workflow = AgentWorkflow(
    agents=[research_agent, write_agent, review_agent],
    root_agent=research_agent.name,
    initial_state={
        "research_notes": {},
        "report_content": "Not written yet.",
        "review": "Review required.",
    },
)

As the workflow is running, we will stream the events to get an idea of what is happening under the hood.

In [20]:
from llama_index.core.agent.workflow import (
    AgentInput,
    AgentOutput,
    ToolCall,
    ToolCallResult,
    AgentStream,
)

workflow_message=(
     "Identify some recent litigations faced by NVIDIA and write a report on it. "
        "You should identify different litigations, their outcomes, and any significant impacts on the company. "
        "Then you should write a commentary on the impact of these litigations on NVIDIA's business and reputation. "
)

handler = agent_workflow.run(
    user_msg=(workflow_message)
)

current_agent = None
current_tool_calls = ""
async for event in handler.stream_events():
    if (
        hasattr(event, "current_agent_name")
        and event.current_agent_name != current_agent
    ):
        current_agent = event.current_agent_name
        print(f"\n{'='*50}")
        print(f"🤖 Agent: {current_agent}")
        print(f"{'='*50}\n")

    # if isinstance(event, AgentStream):
    #     if event.delta:
    #         print(event.delta, end="", flush=True)
    # elif isinstance(event, AgentInput):
    #     print("📥 Input:", event.input)
    elif isinstance(event, AgentOutput):
        if event.response.content:
            print("📤 Output:", event.response.content)
        if event.tool_calls:
            print(
                "🛠️  Planning to use tools:",
                [call.tool_name for call in event.tool_calls],
            )
    elif isinstance(event, ToolCallResult):
        print(f"🔧 Tool Result ({event.tool_name}):")
        print(f"  Arguments: {event.tool_kwargs}")
        print(f"  Output: {event.tool_output}")
    elif isinstance(event, ToolCall):
        print(f"🔨 Calling Tool: {event.tool_name}")
        print(f"  With arguments: {event.tool_kwargs}")


🤖 Agent: ResearchAgent

📤 Output: Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: search_web
Action Input: {"query": "recent litigations faced by NVIDIA"}
🛠️  Planning to use tools: ['search_web']
🔨 Calling Tool: search_web
  With arguments: {'query': 'recent litigations faced by NVIDIA'}
🔧 Tool Result (search_web):
  Arguments: {'query': 'recent litigations faced by NVIDIA'}
  Output: {'query': 'recent litigations faced by NVIDIA', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://jipel.law.nyu.edu/the-future-of-nvidia-in-light-of-recent-patent-infringement-lawsuit/', 'title': 'The Future of Nvidia in Light of Recent Patent Infringement Lawsuit', 'content': 'Despite its formidable growth, Nvidia may face some troublesome legal consequences due to a recent patent lawsuit filed by a tech startup, Xockets Inc.', 'score': 0.8319231, 'raw_content': None}, {'url': 'https://www.ktmc.com

  current_state = await ctx.get("state")
  await ctx.set("state", current_state)


📤 Output: Thought: I have gathered information on three recent litigations involving NVIDIA, analyzed their potential impacts, and recorded my notes. Now I will hand off to the WriteAgent to write the final report.
Action: handoff
Action Input: {'to_agent': 'WriteAgent', 'reason': 'I have gathered and analyzed the information on recent litigations faced by NVIDIA. The next step is to write a comprehensive report based on these findings. The WriteAgent is best suited for this task.'}
🛠️  Planning to use tools: ['handoff']
🔨 Calling Tool: handoff
  With arguments: {'to_agent': 'WriteAgent', 'reason': 'I have gathered and analyzed the information on recent litigations faced by NVIDIA. The next step is to write a comprehensive report based on these findings. The WriteAgent is best suited for this task.'}
🔧 Tool Result (handoff):
  Arguments: {'to_agent': 'WriteAgent', 'reason': 'I have gathered and analyzed the information on recent litigations faced by NVIDIA. The next step is to write a 

  current_state = await ctx.get("state")
  await ctx.set("state", current_state)


📤 Output: Thought: I have successfully generated the report on recent litigations faced by NVIDIA. The process is complete, and I can now present the final answer.
Answer: Here is the report on recent litigations faced by NVIDIA:

# Report on Recent Litigations Faced by NVIDIA

## Introduction

NVIDIA, a dominant force in the graphics processing unit (GPU) and artificial intelligence (AI) markets, has seen unprecedented growth and influence. However, its success has been accompanied by a series of significant legal and regulatory challenges. This report details several recent litigations faced by the company, examining their nature, current status, and potential impacts on NVIDIA's business operations and reputation.

## 1. Xockets Inc. Patent Infringement Lawsuit

*   **Litigation Overview:** The tech startup Xockets Inc. has filed a lawsuit against NVIDIA, alleging both patent infringement and antitrust violations. Xockets claims that its proprietary Data Processing Unit (DPU) techno

Now, we can retrieve the final report in the system for ourselves.

In [21]:
state = await handler.ctx.store.get("state")
report_content = state["report_content"]
print("\nFinal Report Content:")
print("=" * 50)
print(report_content)


Final Report Content:
# Report on Recent Litigations Faced by NVIDIA

## Introduction

NVIDIA, a dominant force in the graphics processing unit (GPU) and artificial intelligence (AI) markets, has seen unprecedented growth and influence. However, its success has been accompanied by a series of significant legal and regulatory challenges. This report details several recent litigations faced by the company, examining their nature, current status, and potential impacts on NVIDIA's business operations and reputation.

## 1. Xockets Inc. Patent Infringement Lawsuit

*   **Litigation Overview:** The tech startup Xockets Inc. has filed a lawsuit against NVIDIA, alleging both patent infringement and antitrust violations. Xockets claims that its proprietary Data Processing Unit (DPU) technology has been a critical element in NVIDIA's recent success. The lawsuit extends to Microsoft and RPX, accusing them of forming an illegal cartel to suppress the fair market value of Xockets' technology.
*   

Finally we run the GEval-based completeness evaluation on the overall system flow. 

In [22]:
# Define a test case on the overall agentic flow, actual_output is the final report content
system_test_case = LLMTestCase(input=workflow_message, actual_output=report_content)

# Use G-Eval metric
evaluate([system_test_case], [metric_completeness])
print(metric_completeness.score, metric_completeness.reason)



Metrics Summary

  - ✅ Completeness [GEval] (score: 0.9, threshold: 0.5, strict: False, evaluation model: gemini-2.5-pro, reason: The response successfully addresses all components of the input. It identifies three distinct and relevant recent litigations, provides a detailed report on each, and includes a separate commentary on the overall impact on NVIDIA's business and reputation. While the prompt asked for "outcomes," the response accurately states that the cases are ongoing and instead provides their current status and potential impacts, which is a reasonable and thorough way to address that requirement for current events., error: None)

For test case:

  - input: Identify some recent litigations faced by NVIDIA and write a report on it. You should identify different litigations, their outcomes, and any significant impacts on the company. Then you should write a commentary on the impact of these litigations on NVIDIA's business and reputation. 
  - actual output: # Report on Rec

None None
