# Agents using RAG via LlamaIndex

This notebook highlights how to use the LlamaIndex tools in **Docling MCP**.

We will use the Llama Stack backend to run a local Responses API system.

### Tools:

We will use tools from the Docling MCP server that allow executing tasks such as:
- converting a PDF file from a local or remote location into a unified document representation [DoclingDocument](https://docling-project.github.io/docling/concepts/docling_document/).
- chunk and ingest the document in the LlamaIndex vectordb.
- search in the document using rag techniques.

### Runtime:

- backend: use LLama Stack providing an OpenAI-compatible Responses API
- client: use the OpenAI Agents SDK

## Pre-Requisites

Before starting this notebook, ensure that you have:
1. Followed the instructions in the [Llama Stack README](../llama-stack/README.md) to set up the following resources:
  - Inference model with Ollama
  - Llama Stack server with the starter template [distribution-starter](https://hub.docker.com/r/llamastack/distribution-starter)

2. Started the Docling MCP server with the `conversion` and `llama-index-rag` groups. See the details in the [README](./README.md).

You may want to create a virtual environment to run this notebook, for instance, with [uv](https://docs.astral.sh/uv/):

```bash
uv venv
source .venv/bin/activate
uv pip install openai-agents rich
```

In [2]:
import uuid

from rich.console import Console

from agents import Agent, ModelSettings, Runner, SQLiteSession, set_trace_processors
from agents.mcp import MCPServerStreamableHttp
from agents.models.openai_provider import OpenAIProvider
from agents.run import RunConfig
from agents.tracing.processors import BatchTraceProcessor, ConsoleSpanExporter
from openai import AsyncOpenAI

console = Console()

In [3]:
BASE_URL = "http://localhost:8321/v1/openai/v1"
API_KEY = "none"
MODEL_ID = "llama3.2:3b-instruct-fp16"

In [4]:
client = AsyncOpenAI(base_url=BASE_URL, api_key=API_KEY)

# Configure the OpenAI provider that uses our AsyncOpenAI client for Llama Stack
provider = OpenAIProvider(openai_client=client)

# Tell OpenAI to dump traces to the console
set_trace_processors([BatchTraceProcessor(exporter=ConsoleSpanExporter())])


async def run_agent():
    async with MCPServerStreamableHttp(
        name="Docling MCP",
        params={
            "url": "http://localhost:8000/mcp",
            "timeout": 60.0,
        },
        client_session_timeout_seconds=60,
    ) as server:
        agent = Agent(
            name="Doc QA",
            model=MODEL_ID,
            instructions="You are a helpful assistant. Use the tools you have access to for providing relevant answers.",
            model_settings=ModelSettings(
                temperature=1.0, top_p=0.9, tool_choice="required"
            ),
            mcp_servers=[server],
        )
        session = SQLiteSession(str(uuid.uuid4()))
        print(f"Created session_id={session.session_id} for Agent({agent.name})")

        user_prompts = [
            "Convert the PDF document on https://arxiv.org/pdf/2206.01062 to DoclingDocument.",
            "Export the document to the vectordb.",
            "Search in the document: How many pages were manually annotated in the dataset?",
        ]
        for prompt in user_prompts:
            console.print(f"[cyan]User> {prompt}[/cyan]")
            response = await Runner.run(
                agent,
                prompt,
                session=session,
                run_config=RunConfig(model_provider=provider),
            )
            console.print(f"[green]Assistant> {response.final_output}[/green]")


await run_agent()

Created session_id=9028ac0e-bc71-47b2-8f97-816ff1194445 for Agent(Doc QA)


[Exporter] Export trace_id=trace_b77695865cde43b680b34dac44164fb4, name=Agent workflow
[Exporter] Export span: {'object': 'trace.span', 'id': 'span_5daf58a2405f4f35b2ddc696', 'trace_id': 'trace_b77695865cde43b680b34dac44164fb4', 'parent_id': None, 'started_at': '2025-08-18T11:00:20.050557+00:00', 'ended_at': '2025-08-18T11:00:20.061691+00:00', 'span_data': {'type': 'mcp_tools', 'server': 'Docling MCP', 'result': ['is_document_in_local_cache', 'convert_document_into_docling_document', 'export_docling_document_to_vector_db', 'search_documents']}, 'error': None}
[Exporter] Export span: {'object': 'trace.span', 'id': 'span_af58cd701f914cf3a04fa5a5', 'trace_id': 'trace_b77695865cde43b680b34dac44164fb4', 'parent_id': 'span_2c4f5ae3008c4f608da84b7e', 'started_at': '2025-08-18T11:00:20.062017+00:00', 'ended_at': '2025-08-18T11:00:24.823131+00:00', 'span_data': {'type': 'response', 'response_id': 'resp-e9e3d8f7-57b7-4c2a-9dca-8cb030c7238d'}, 'error': None}


[Exporter] Export span: {'object': 'trace.span', 'id': 'span_e6ed81424df14bf9949fb307', 'trace_id': 'trace_b77695865cde43b680b34dac44164fb4', 'parent_id': 'span_2c4f5ae3008c4f608da84b7e', 'started_at': '2025-08-18T11:00:24.823338+00:00', 'ended_at': '2025-08-18T11:00:46.321972+00:00', 'span_data': {'type': 'function', 'name': 'convert_document_into_docling_document', 'input': '{"source":"https://arxiv.org/pdf/2206.01062"}', 'output': '{"type":"text","text":"{\\n  \\"success\\": true,\\n  \\"document_key\\": \\"868f49ae1f0e66e82238a8aea43fd30b\\"\\n}","annotations":null,"meta":null}', 'mcp_data': {'server': 'Docling MCP'}}, 'error': None}
[Exporter] Export span: {'object': 'trace.span', 'id': 'span_11ec2106a80249d98f5a8720', 'trace_id': 'trace_b77695865cde43b680b34dac44164fb4', 'parent_id': 'span_2c4f5ae3008c4f608da84b7e', 'started_at': '2025-08-18T11:00:46.322273+00:00', 'ended_at': '2025-08-18T11:00:46.327432+00:00', 'span_data': {'type': 'mcp_tools', 'server': 'Docling MCP', 'result'

[Exporter] Export span: {'object': 'trace.span', 'id': 'span_590054d85eb8419c92f29eab', 'trace_id': 'trace_04e821a8ac0744a083a1549fdd6c4465', 'parent_id': 'span_0957dcf269204462a27f7a2d', 'started_at': '2025-08-18T11:00:49.056869+00:00', 'ended_at': '2025-08-18T11:00:51.477845+00:00', 'span_data': {'type': 'function', 'name': 'export_docling_document_to_vector_db', 'input': '{"document_key":"868f49ae1f0e66e82238a8aea43fd30b"}', 'output': '{"type":"text","text":"Successful initialisation for document with id 868f49ae1f0e66e82238a8aea43fd30b","annotations":null,"meta":null}', 'mcp_data': {'server': 'Docling MCP'}}, 'error': None}
[Exporter] Export span: {'object': 'trace.span', 'id': 'span_c545efc32dd845a38d365fc7', 'trace_id': 'trace_04e821a8ac0744a083a1549fdd6c4465', 'parent_id': 'span_0957dcf269204462a27f7a2d', 'started_at': '2025-08-18T11:00:51.478065+00:00', 'ended_at': '2025-08-18T11:00:51.481436+00:00', 'span_data': {'type': 'mcp_tools', 'server': 'Docling MCP', 'result': ['is_doc

[Exporter] Export span: {'object': 'trace.span', 'id': 'span_6b46f0309e474aa2ad41861c', 'trace_id': 'trace_8d9ec7fa83834e67a7a9c2a9dfbfc477', 'parent_id': 'span_348fe1f320fc4919b1f12493', 'started_at': '2025-08-18T11:00:55.600716+00:00', 'ended_at': '2025-08-18T11:00:57.821127+00:00', 'span_data': {'type': 'response', 'response_id': 'resp-eabde418-a256-4d72-b066-a9134377a69f'}, 'error': None}
[Exporter] Export span: {'object': 'trace.span', 'id': 'span_348fe1f320fc4919b1f12493', 'trace_id': 'trace_8d9ec7fa83834e67a7a9c2a9dfbfc477', 'parent_id': None, 'started_at': '2025-08-18T11:00:52.590122+00:00', 'ended_at': '2025-08-18T11:00:57.822614+00:00', 'span_data': {'type': 'agent', 'name': 'Doc QA', 'handoffs': [], 'tools': ['is_document_in_local_cache', 'convert_document_into_docling_document', 'export_docling_document_to_vector_db', 'search_documents'], 'output_type': 'str'}, 'error': None}
