# A simple two-agent A2A RAG Application

This notebook presents a simple scenario where an agent uses the A2A protocol to query another agent for information as it answers a RAG query. We show how to initialize an agent in Llama Stack and grant it access to communicating with another, external agent.

This demo is largely based on the single-agent RAG demo. It can be found in [Level4_RAG_agent.ipynb](../../rag_agentic/notebooks/Level4_RAG_agent.ipynb).

## Overview

This notebook covers the following steps:

1. Setting up a Llama Stack agent capable of retrieving content from vector DB via the builtin RAG tool.
2. Serving the agent over an A2A server.
3. Initializing another Llama Stack capable of communicating with the RAG agent.
4. Launching the second agent and using it to answer user queries about the documents.

## Prerequisites

Before starting, ensure you have the following:
- `python_requires >= 3.11`

- Followed the instructions in the [Setup Guide](../../rag_agentic/notebooks/Level0_getting_started_with_Llama_Stack.ipynb) notebook. 

- Llama Stack server should be using milvus as its vector DB provider.

## Additional environment variables
This demo requires the following environment variables in addition to those defined in the [Setup Guide](../../rag_agentic/notebooks//Level0_getting_started_with_Llama_Stack.ipynb):
- `RAG_AGENT_LOCAL_PORT`: the port over which we will serve the exported A2A agent with RAG capabilities.


## 1. Setting Up this Notebook
To provide A2A communication capabilities, we will use the [sample implementation by Google](https://github.com/google/A2A/tree/main/samples/python). Please make sure that the content of the referenced directory is available on your Python path. This can be done, for example, by running the following command:

In [1]:
! git clone https://github.com/google-a2a/a2a-samples.git
! pip install -r "../requirements.txt"

fatal: destination path 'a2a-samples' already exists and is not an empty directory.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Now, we will add the paths to the A2A library and our own tools to `sys.path`.

In [2]:
import sys
# the path of the A2A library
sys.path.append('./a2a-samples/samples/python')
# the path to our own utils
sys.path.append('../..')

We will now proceed with the necessary imports.

In [3]:
from common.server import A2AServer
from common.types import AgentCard, AgentSkill, AgentCapabilities
from a2a_llama_stack.A2ATool import A2ATool
from a2a_llama_stack.task_manager import AgentTaskManager

# for asynchronously serving the A2A agent
import threading

Next, we will initialize our environment as described in detail in our ["Getting Started" notebook](demos/rag_agentic/notebooks/Level0_getting_started_with_Llama_Stack.ipynb). Please refer to it for additional explanations.

In [4]:
# for accessing the environment variables
import os
from dotenv import load_dotenv
load_dotenv()

# for communication with Llama Stack
from llama_stack_client import LlamaStackClient

# agent- and RAG-related imports
import uuid
from llama_stack_client import Agent, RAGDocument
from llama_stack_client.lib.agents.event_logger import EventLogger

# pretty print of the results returned from the model/agent - import from the agentic_rag demo subdirectory
import sys
sys.path.append('../../rag_agentic')  
from src.utils import step_printer
from termcolor import cprint


base_url = os.getenv("REMOTE_BASE_URL")


# Tavily search API key is required for some of our demos and must be provided to the client upon initialization.
# We will cover it in the agentic demos that use the respective tool. Please ignore this parameter for all other demos.
tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")
if tavily_search_api_key is None:
    provider_data = None
else:
    provider_data = {"tavily_search_api_key": tavily_search_api_key}


client = LlamaStackClient(
    base_url=base_url,
    provider_data=provider_data
)
    
print(f"Connected to Llama Stack server")

# model_id for the model you wish to use that is configured with the Llama Stack server
model_id = os.getenv("INFERENCE_MODEL_ID")

temperature = float(os.getenv("TEMPERATURE", 0.0))
if temperature > 0.0:
    top_p = float(os.getenv("TOP_P", 0.95))
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}

max_tokens = int(os.getenv("MAX_TOKENS", 4096))

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

stream_env = os.getenv("STREAM", "False")
# the Boolean 'stream' parameter will later be passed to Llama Stack Agents/Inference APIs
# any value non equal to 'False' will be considered as 'True'
stream = (stream_env != "False")

print(f"Inference Parameters:\n\tModel: {model_id}\n\tSampling Parameters: {sampling_params}\n\tstream: {stream}")

Connected to Llama Stack server
Inference Parameters:
	Model: llama3.1:8b-instruct-fp16
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 512}
	stream: False


## 2. Setting Up and Serving a RAG A2A Agent
We will now initialize an agent connected to a vector DB and capable of serving requests related to the information contained in the indexed documents.

Our first steps will be identical to those demonstrated in [Level4_RAG_agent.ipynb](demos/rag_agentic/notebooks/Level4_RAG_agent.ipynb):
- Initialize a new document collection in the target vector DB. All parameters related to the vector DB, such as the embedding model and dimension, must be specified here.

In [5]:
vector_db_id = f"test_vector_db_{uuid.uuid4()}"

client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model=os.getenv("VDB_EMBEDDING"),
    embedding_dimension=int(os.getenv("VDB_EMBEDDING_DIMENSION", 384)),
    provider_id=os.getenv("VDB_PROVIDER"),
)

VectorDBRegisterResponse(embedding_dimension=384, embedding_model='all-MiniLM-L6-v2', identifier='test_vector_db_8f5fc8f5-2e61-4c37-baa2-ae497aca3990', provider_id='faiss', provider_resource_id='test_vector_db_8f5fc8f5-2e61-4c37-baa2-ae497aca3990', type='vector_db', access_attributes=None)

- Provide a list of document URLs to the RAG tool. Llama Stack will handle the fetching, conversion and chunking of the documents' content automatically.

In [6]:
urls = [
    ("https://www.openshift.guide/openshift-guide-screen.pdf", "application/pdf"),
]
documents = [
    RAGDocument(
        document_id=f"num-{i}",
        content=url,
        mime_type=url_type,
        metadata={},
    )
    for i, (url, url_type) in enumerate(urls)
]
client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=int(os.getenv("VECTOR_DB_CHUNK_SIZE", 512)),
)

- Initialize a Llama Stack agent with a list of tools including the built-in RAG tool. The RAG tool specification must include a list of document collection IDs to retrieve from.

In [7]:
rag_agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant. Use the RAG tool available to you to answer user queries. When a tool is used, only print its output without adding more content.",
    sampling_params=sampling_params,
    tools=[
        dict(
            name="builtin::rag/knowledge_search",
            args={
                "vector_db_ids": [vector_db_id],
            },
        )
    ],
)

Now, our Llama Stack agent is ready to be served as an A2A agent. This includes the following steps:
 - Create an `AgentCard` - an object containing all the details about the agent we are about to serve, including its URL and exposed capabilities.
 - Wrap the Llama Stack agent with an `AgentTaskManager` object - a wrapper/adapter making it possible for the A2A server to redirect incoming request to the Llama Stack agent.
 - Create and launch an `A2AServer` - a Rest API server capable of communicating via the A2A protocol.

In [8]:
rag_agent_local_port = int(os.getenv("RAG_AGENT_LOCAL_PORT", "10030"))
rag_agent_url = f"http://localhost:{rag_agent_local_port}"

agent_card = AgentCard(
    name="OpenShift Knowledge Source Agent",
    description="Provides information about all technical aspects related to Red Hat OpenShift",
    url=rag_agent_url,
    version="0.1.0",
    defaultInputModes=["text/plain"],
    defaultOutputModes=["text/plain"],
    capabilities=AgentCapabilities(streaming=True),
    skills=[
        AgentSkill(id="rag", name="RAG Query related to Red Hat OpenShift"),
    ],
)
task_manager = AgentTaskManager(agent=rag_agent)
server = A2AServer(
    agent_card=agent_card,
    task_manager=task_manager,
    host='localhost',
    port=rag_agent_local_port
)
thread = threading.Thread(target=server.start, daemon=True)
thread.start()

INFO:     Started server process [18664]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:10030 (Press CTRL+C to quit)


INFO:     ::1:53138 - "GET /.well-known/agent.json HTTP/1.1" 200 OK
INFO:     ::1:53150 - "POST / HTTP/1.1" 200 OK


## 3. Setting up an agent capable of A2A communication with the RAG agent
This includes the following steps:
 - Create a Llama Stack client tool that wraps A2A communication with the RAG agent.
 - Initialize a client agent with access to the above client tool.

In [9]:
rag_agent_tool = A2ATool(rag_agent_url)
a2a_client_agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant. When a tool is used, only print its output without adding more content.",
    sampling_params=sampling_params,
    tools=[rag_agent_tool],
)

Now, let's use our client agent for serving user requests.

In [10]:
queries = [
    "How to install OpenShift?",
]

for prompt in queries:
    cprint(f"\nUser> {prompt}", "blue")
    
    # create a new turn with a new session ID for each prompt
    response = a2a_client_agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=a2a_client_agent.create_session(f"rag-session_{uuid.uuid4()}"),
        stream=stream,
    )
    
    # print the response, including tool calls output
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps)

[34m
User> How to install OpenShift?[0m

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: OpenShift Knowledge Source Agent, Arguments: {'query': 'installing OpenShift'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35mTo install OpenShift, follow these steps:

1. Check the official Red Hat OpenShift Local documentation for an updated list of requirements at the official documentation website.
2. Ensure your system meets the hardware requirements: a recent Intel CPU (except for Macs, where Apple Silicon machines are supported) with at least four physical cores and 16 GB of RAM.
3. Download the latest release of OpenShift Local and the "pull secret" file from console.redhat.com/openshift/create/local.
4. Unzip the file containing the OpenShift Local executable and run the command `crc setup` to prepare your copy of OpenShift Local, verifying requirements and setting the required configuration values.
5. Launch crc start, which can take around 20 minutes on a recent PC.
6. Access the OpenShift Web Console with the crc console command, which will open your default browser. Log in as a low-privilege user using the developer username an

## Key Takeaways
This notebook demonstrated how to use the basic A2A functionality with Llama Stack. We did this by creating an agent, making it available it over an A2A server, and using another agent to collaborate with it for serving a user request.

Future demos will cover more advanced aspects of agent-to-agent communication.