# Migrating from Agent objects to Responses in Llama Stack

TO DO: Explain the point of this notebook


The development of this notebook was assisted by Google Gemini and Cursor using Claude 4 Sonnet.

## Getting Started

Before getting started, follow the following steps.

First install Llama Stack and all of the other dependencies for this notebook.
One way to do that is:

- First install Python 3.12 or later (do not try this with older versions of Python: it will not work).
- Then make a Python virtual environment.
- Then within that virtual environment run:

```
pip install -r requirements.txt
```

Once everything is installed, run the Llama Stack server:

```
llama stack run run.yaml --image-type venv
```

Also run the National Parks Service Model Context Protocol (MCP) server as described in [README_NPS.md](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/README_NPS.md).  Download it from that location and then run it using:

```
python nps_mcp_server.py --transport sse --port 3005
```

Alternatively, you can use this notebook with some other MCP server, but then you will need to change the example query and the server details to match 

## Configuration

Here we point to the locations of the servers we just started up above.  Also, we provide the model ID for the model we want to use.  The model ID should be one that that is specified in [run.yaml](./run.yaml).  In the [run.yaml](./run.yaml) included here, we have the following models defined:

- `openai/gpt-3.5-turbo` and `openai/gpt4o` are models from OpenAI.  They will only work if you have OPENAI_API_KEY set in your environment to a [valid OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).
- `llama-openai-compat/Llama-3.3-70B-Instruct` is a a model from Meta Llama.  This will only work if you have LLAMA_API_KEY set in your environment to a valid API key for the hosted [Llama API](https://www.llama.com/products/llama-api/).
- `watsonx/Llama-3.3-70B-Instruct` is the same model running on watsonx.ai (which has a somewhat different style for model IDs).  This will only work if you have a WATSONX_API_KEY and WATSONX_PROJECT_ID set (which requires an IBM Cloud account).  You may also need to set WATSONX_BASE_URL set if your watsonx.ai instance is running anywhere other than US South (which is the default).  Note that the watsonx provider in Llama Stack was [not working](https://github.com/llamastack/llama-stack/issues/3165) when this notebook was created, but hopefully it will work by the time you read this.

If you can't or don't want to get any of those API keys, you can update [run.yaml](./run.yaml) to use [another inference provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html#overview).  Llama Stack includes numerous providers for calling hosted models like the ones above.  It also includes providers to call models that you deploy and run yourself using a model serving capability, e.g., the [vLLM provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) or the [ollama provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_ollama.html).

In [1]:
LLAMA_STACK_URL = "http://localhost:8321/"
LLAMA_STACK_MODEL_IDS = [
    "openai/gpt-3.5-turbo",
    "openai/gpt-4o",
    "llama-openai-compat/Llama-3.3-70B-Instruct",
    "watsonx-Llama-3.3-70B-Instruct"
]

# Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml.
LLAMA_STACK_MODEL_ID = LLAMA_STACK_MODEL_IDS[1]

Replace these with other values if you are using a different MCP server:

In [13]:
NPS_MCP_URL = "http://localhost:3005/sse/"
NPS_EXAMPLE_PROMPT = "Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them."
NPS_EXAMPLE_FOLLOWUP_PROMPT = "Which of these is happening the soonest?"

NPS_INSTRUCTIONS = "You are a helpful assistant that can answer questions about the National Parks Service."

# The NPS MCP server does not require an access token, but some MCP servers do.
# We are sending it a dummy token here to show how to send the access token to the MCP server.
NPS_ACCESS_TOKEN = "frog"

## Set up the Llama Stack client



In [3]:
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

Test to see if it is working:

In [4]:
chat_completion_response = client.chat.completions.create(
    model=LLAMA_STACK_MODEL_ID,
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"


In [5]:
import json
from datetime import date

def pretty_print(obj) -> None:
    """
    Recursively prints an object's __dict__ in a nicely formatted JSON.
    Handles nested objects and lists of objects.
    """
    def recursive_serializer(o):
        if hasattr(o, '__dict__'):
            return o.__dict__
        # Handle types that are not directly JSON serializable
        if isinstance(o, date):
            return o.isoformat()
        # For other types, raise a TypeError to let the default handler fail.
        raise TypeError(f"Object of type {o.__class__.__name__} is not JSON serializable")

    # Determine what to serialize
    data_to_serialize = obj.__dict__ if hasattr(obj, "__dict__") else obj

    print(json.dumps(
        data_to_serialize,
        indent=2,
        default=recursive_serializer
    ))


pretty_print(chat_completion_response)

{
  "id": "chatcmpl-CLDwf46o2y1EIeMhGKlb9YCXMrPJY",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris.",
        "name": null,
        "tool_calls": null
      },
      "logprobs": null
    }
  ],
  "created": 1759174529,
  "model": "gpt-4o-2024-08-06",
  "object": "chat.completion"
}


## The Legacy Agent API

Here is a simple example for how to use the Llama Stack Agent API.  We don't recommend this approach because it is [expected to be deprecated soon](https://github.com/llamastack/llama-stack/issues/3313), but we're showing it here as a stepping stone toward some alternative approaches.

In [15]:
from llama_stack_client import Agent
from llama_stack_client.types.toolgroup_register_params import McpEndpoint
import uuid

client.toolgroups.register(
    toolgroup_id="mcp::nps",
    provider_id="model-context-protocol",
    mcp_endpoint=McpEndpoint(uri=NPS_MCP_URL),
)

agent = Agent(
    model=LLAMA_STACK_MODEL_ID,
    instructions=NPS_INSTRUCTIONS,
    client=client,
    tools=["mcp::nps"],
    extra_headers={
        "X-LlamaStack-Provider-Data": json.dumps(
            {
                "mcp_headers": {
                    NPS_MCP_URL: {
                        "Authorization": f"Bearer {NPS_ACCESS_TOKEN}",
                    },
                },
            }
        ),
    },
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/toolgroups "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools?toolgroup_id=mcp%3A%3Anps "HTTP/1.1 200 OK"


In [16]:
# Generate a unique session ID for the example.
# This is not required, but it is useful to have if you want to create multiple sessions without restarting Llama Stack.
session_id = agent.create_session(f"nps_session-{uuid.uuid4().hex}")

agent_response1 = agent.create_turn(
    messages=[{"role": "user", "content": NPS_EXAMPLE_PROMPT}],
    session_id=session_id,
    stream=False,
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/4bf9333f-3e8c-42f7-93e2-55c3b46dc498/session "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/4bf9333f-3e8c-42f7-93e2-55c3b46dc498/session/4d35349f-dd18-44fc-9ad3-a7c699038749/turn "HTTP/1.1 200 OK"


In [None]:
pretty_print(agent_response1)

{
  "input_messages": [
    {
      "content": "Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
      "role": "user",
      "context": null
    }
  ],
  "output_message": {
    "content": "I am currently experiencing difficulties retrieving information about parks in Rhode Island. Please try again later or check the National Park Service website for more details.",
    "role": "assistant",
    "stop_reason": "end_of_turn",
    "tool_calls": []
  },
  "session_id": "d79ddd26-a564-4295-be09-735f5fc50488",
  "started_at": "2025-09-29T19:35:29.749149+00:00",
  "steps": [
    {
      "api_model_response": {
        "content": "",
        "role": "assistant",
        "stop_reason": "end_of_turn",
        "tool_calls": [
          {
            "arguments": {
              "state_code": "RI",
              "park_code": "",
              "query": "",
              "limit": 5.0
            },
            "call_id": "call_8Xo4h3EU9MqRQFPayItZ

In [30]:
agent_response1

Turn(input_messages=[UserMessage(content='Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.', role='user', context=None)], output_message=CompletionMessage(content="It seems there is an issue with retrieving information about parks in Rhode Island. However, I can provide some general information about national parks in Rhode Island.\n\nRhode Island is home to the Roger Williams National Memorial, which commemorates the life of the founder of Rhode Island and a champion of the ideal of religious freedom. The memorial is located in Providence, RI.\n\nIf you are interested in specific events or more detailed information, I recommend checking the National Park Service's official website or contacting the park directly for the latest updates and events.", role='assistant', stop_reason='end_of_turn', tool_calls=[]), session_id='4d35349f-dd18-44fc-9ad3-a7c699038749', started_at=datetime.datetime(2025, 9, 29, 19, 53, 53, 121289, tzinfo=TzInfo(U

In [19]:
agent_response2 = agent.create_turn(
    messages=[{"role": "user", "content": NPS_EXAMPLE_FOLLOWUP_PROMPT}],
    session_id=session_id,
    stream=False,
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/4bf9333f-3e8c-42f7-93e2-55c3b46dc498/session/4d35349f-dd18-44fc-9ad3-a7c699038749/turn "HTTP/1.1 200 OK"


As you can see in the output message, it is not working right now.  Since the API is going away soon, it is probably not worthwhile to resolve whatever is broken here and we'll just move on to looking at how to migrate this to the Responses API.

## The Responses API

Here is some code using the Responses API that is roughly equivalent to the Agent example above.  As you can see, it no longer has separate calls for:

- Registering tools
- Creating an agent
- Creating a session
- Issuing a query for that session

The Responses API takes the list of tools as an argument so there is no need to pre-register the tools.  It does not make a persistent agent object but it does make a session object implicitly within the call -- you can see how that is used in the second call, in which `previous_response_id=responses_api_response1.id` is set to indicate that the second call is a continuation of the first.

Minor side comment: This example uses the `instructions` field in the Responses API, but that appears to be not implemented in Llama Stack right now (see [#3566](https://github.com/llamastack/llama-stack/issues/3566) for details).  That is not particularly problematic for this example because the model we are using is able to respond appropriately without that instruction, but if you have a system instruction that is very important for your use case, you may want to wait for that issue to be resolved or find a work-around.

In [None]:
responses_api_response1 = client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input=NPS_EXAMPLE_PROMPT,
    instructions=NPS_INSTRUCTIONS,
    tools=[
        {
            "type": "mcp",
            "server_url": NPS_MCP_URL,
            "server_label": "National Parks Service tools",
        }
    ]
)


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


Here we print the entire object:

In [66]:
pretty_print(responses_api_response1)

{
  "id": "resp-4e0679df-0908-4253-98a2-f9509166883d",
  "created_at": 1759175770,
  "model": "openai/gpt-4o",
  "object": "response",
  "output": [
    {
      "id": "mcp_list_aed0132b-1f36-4320-ae23-17718f2c4f97",
      "server_label": "National Parks Service tools",
      "tools": [
        {
          "input_schema": {
            "type": "object",
            "properties": {
              "state_code": {
                "type": "string",
                "description": ""
              },
              "park_code": {
                "type": "string",
                "description": ""
              },
              "query": {
                "type": "string",
                "description": ""
              },
              "limit": {
                "type": "integer",
                "description": ""
              }
            },
            "required": [
              "state_code",
              "park_code",
              "query",
              "limit"
            ]
          },


However, for some applications, we really only care about the actual text.  Here is how to print that:

In [72]:
for output_block in responses_api_response1.output:
    if output_block.type == "message":
        for content_block in output_block.content:
            print(content_block.text)

Here are some national parks in Rhode Island and their upcoming events:

### 1. Blackstone River Valley National Historical Park
Website: [Visit Here](https://www.nps.gov/blrv/index.htm)  
**Upcoming Events:**
- **Old Slater Mill Tour:** Guided tour of the mill that started the American Industrial Revolution. Tours start at 10:30 AM, 12:30 PM, and 2:30 PM.
- **Revolutionary War Pension Files Transcription Event:** Learn about and transcribe pension records from the War of Independence. Occurring on select dates at Heritage Hall.
- **Various walking and archaeology tours:** Including Blackstone River State Park Archaeology Tour and Voting in Our Mill Village event.

### 2. Roger Williams National Memorial
Website: [Visit Here](https://www.nps.gov/rowi/index.htm)  
**Upcoming Events:**
- **Roger Williams: The Separation of Church and State:** Explore the principles of freedom put into practice by Roger Williams, a foundational figure in establishing church and state separation in America

In [None]:
responses_api_response2 = client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input=NPS_EXAMPLE_FOLLOWUP_PROMPT,
    instructions=NPS_INSTRUCTIONS,
    previous_response_id=responses_api_response1.id,
    tools=[
        {
            "type": "mcp",
            "server_url": NPS_MCP_URL,
            "server_label": "National Parks Service tools",
        }
    ]
)


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


In [74]:
for output_block in responses_api_response2.output:
    if output_block.type == "message":
        for content_block in output_block.content:
            print(content_block.text)

The soonest event is the **Revolutionary War Pension Files Transcription Event** at the Blackstone River Valley National Historical Park. The first session is scheduled for October 1, 2025, from 6:00 PM to 7:30 PM at Heritage Hall in North Smithfield, RI.


## Emulating the Legacy Agent API

In very simple cases, the approach above is probably the best way to migrate from the Agent API to the Responses API.  However, if you have a large amount of code that uses the old API, you might want to have some objects that emulate the old Agent API instead.  Here is a very simple example of how to do that.

The simple example only covers the functionality needed in the original simple example above.  We'll get into some advanced functionality in later sections, but there are also many parameters in the legacy API that we do not provide sample code for.  We don't recommend trying to build a complete implementation of every parameter value in the entire API.  Instead, you can use this example as a starting point and then fill in those parameters and/or values that are important to your application.

In [None]:
TOOLS = {}
SESSIONS = {}

class LegacyMcpEndpoint:
    def __init__(self, uri):
        self.uri = uri

def toolgroups_register(toolgroup_id, provider_id, mcp_endpoint=None):
    """
    This replaces the client.toolgroups.register() call in the legacy agent API.  Since this version manages the
    tool group registration internally, it does not need to use the client object.
    """
    if provider_id == "model-context-protocol":
        TOOLS[toolgroup_id] = {
            "type": "mcp",
            "server_url": mcp_endpoint.uri,
            "server_label": toolgroup_id,
        }
    else:
        # TODO: Add support for other providers, not needed for this example.
        raise ValueError(f"Unsupported provider: {provider_id}")

def convert_response_to_legacy_agent_response(response):
    """
    This is a placeholder for now.  The code to convert objects to the legacy agent response format would go here.
    Note that just returning the response object is good enough for this example because the response object has
    all of the same fields that we are using in the example print statements.  However, in a more complex application
    you may need to do more work here.
    """
    return response

class LegacyAgent:
    def __init__(self, model, instructions, client, tools, extra_headers):
        self.model = model
        self.instructions = instructions
        self.client = client
        self.tool_ids = tools

        header_json = extra_headers["X-LlamaStack-Provider-Data"]
        headers = json.loads(header_json)
        self.headers = headers["mcp_headers"]
    
    def create_session(self, session_id):
        SESSIONS[session_id] = []
        return session_id
    
    def create_turn(self, messages, session_id, stream=False):
        # Note that the stream parameter is not used.  That works for this example because we are not using the stream=True,
        # but if you are using that option, you will need to update this code.
        if session_id not in SESSIONS:
            raise ValueError(f"Session {session_id} not found")
            
        previous_response_id = None
        if len(SESSIONS[session_id]) > 0:
            previous_response_id = SESSIONS[session_id][-1].id

        tools = [TOOLS[tool_id] for tool_id in self.tool_ids]
        for tool in tools:
            if tool["server_url"] in self.headers:
                tool["headers"] = self.headers[tool["server_url"]]

        response = client.responses.create(
            model=self.model,
            # TODO: This is using the last message in the list, which is fine for this example, but
            # if you call create_turn() with multiple messages, you will need to update this code.
            input=messages[-1]["content"],
            previous_response_id=previous_response_id,
            instructions=self.instructions,
            tools=tools
        )
        SESSIONS[session_id].append(response)
        return convert_response_to_legacy_agent_response(response)

With this simple wrapper in place, we can maintain much of the structure of the original Agent code we saw earlier:

In [60]:
toolgroups_register(
    toolgroup_id="mcp::nps",
    provider_id="model-context-protocol",
    mcp_endpoint=LegacyMcpEndpoint(uri=NPS_MCP_URL),
)

agent = LegacyAgent(
    model=LLAMA_STACK_MODEL_ID,
    instructions=NPS_INSTRUCTIONS,
    client=client,
    tools=["mcp::nps"],
    extra_headers={
        "X-LlamaStack-Provider-Data": json.dumps(
            {
                "mcp_headers": {
                    NPS_MCP_URL: {
                        "Authorization": f"Bearer {NPS_ACCESS_TOKEN}",
                    },
                },
            }
        ),
    },
)

In [61]:
session_id = agent.create_session(f"nps_session-{uuid.uuid4().hex}")

agent_response1 = agent.create_turn(
    messages=[{"role": "user", "content": NPS_EXAMPLE_PROMPT}],
    session_id=session_id,
    stream=False,
)

[{'type': 'mcp', 'server_url': 'http://localhost:3005/sse/', 'server_label': 'mcp::nps'}]


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


In [None]:
for output_block in responses_api_response1.output:
    if output_block.type == "message":
        for content_block in output_block.content:
            print(content_block.text)

The soonest event is the **Revolutionary War Pension Files Transcription Event** at the Blackstone River Valley National Historical Park. The first session is scheduled for October 1, 2025, from 6:00 PM to 7:30 PM at Heritage Hall in North Smithfield, RI.


In [63]:
agent_response2 = agent.create_turn(
    messages=[{"role": "user", "content": NPS_EXAMPLE_FOLLOWUP_PROMPT}],
    session_id=session_id,
    stream=False,
)

[{'type': 'mcp', 'server_url': 'http://localhost:3005/sse/', 'server_label': 'mcp::nps', 'headers': {'Authorization': 'Bearer frog'}}]


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


In [76]:
for output_block in responses_api_response2.output:
    if output_block.type == "message":
        for content_block in output_block.content:
            print(content_block.text)

The soonest event is the **Revolutionary War Pension Files Transcription Event** at the Blackstone River Valley National Historical Park. The first session is scheduled for October 1, 2025, from 6:00 PM to 7:30 PM at Heritage Hall in North Smithfield, RI.


## Multi-process architectures

The examples above assume that the agent creation, session creation, and turn creation are all taking place in the same process.  However, some users of the legacy Agents API might be having one process that creates agents (e.g., an agent management console) and another process that uses the agents (e.g., a chatbot app that invokes that process.  Assuming these processes have different life cycles and run on different kinds of machines, here are some examples of ideas for how to handle these cases:

- If the agent creation logic is simple and static, then you can just move that agent creation logic into the application that uses the agent as in the examples above.  One downside of doing that is that you need to release a new version the application that uses the agent every time the definition of the agent changes.  If that's not feasible for you, consider one of the other options.
- You can have some sort of key-value storage mechanism contain a description of the agent (i.e., the instructions, list of tools).  The process that creates agents can write that configuration to the storage and the process that invokes the agent can read from it.
- You can have the process that creates the agent deploy a container with that agent into a cluster and the process that invokes the agent call that container (which would then call Llama Stack via the Responses API).  This is a much heavier and more disruptive change than just putting agent configuration into a key-value store, but potentially much more powerful too since you can build complex agents with multiple phases that use different models under different circumstances, etc.

There are many other possible architectures to explore too.

## Model safety

TO DO: Expand on the previous example by showing how to apply a safety model on the output from Responses.

## Other Topics???

TO DO: Are there other Agent features that are a priority to cover here?  More thought is needed.

## React Agents

The Llama Stack Python Client has a client-side [ReAct Agent](https://github.com/llamastack/llama-stack-client-python/blob/859d318313d9262755abeef2c4e6057f68c832ad/src/llama_stack_client/lib/agents/react/agent.py#L108) construct.

TO DO: Show how to get the same behavior from the Responses API by copying the key components from the implementation in the Llama Stack Python Client.