Refactor responses-backed Agent sessions #281

ashwinb · 2025-10-12T00:54:11Z

Summary

require explicit sessions when using the responses-backed Agent/AsyncAgent and remove legacy AgentConfig plumbing
add shared session-scoped tracking for streamed responses and update ReActAgent wrapper
extend unit/integration coverage for multi-session flows and combined server/client tool streaming

Testing

uv run ruff check
uv run pytest tests/lib/agents/test_agent_responses.py
TEST_API_BASE_URL=http://127.0.0.1:8321 LLAMA_STACK_TEST_MODEL=ollama/llama3.2:3b-instruct-fp16 uv run pytest tests/integration/test_agent_responses_e2e.py

raghotham · 2025-10-12T01:04:42Z

Will be good to create an e2e test that is equivalent to https://llamastack.github.io/docs/getting_started/quickstart#step-3-run-the-demo

raghotham

Will there need to be another change when conversation support is added?

src/llama_stack_client/__init__.py

ashwinb · 2025-10-12T04:54:20Z

Will there need to be another change when conversation support is added?

@raghotham confused by your statement:

- replace legacy `client.alpha.agents.*` paths in both sync and async agent implementations with the `/v1/responses` + `/v1/conversations` flow - treat each `Agent.create_session()` as a lazily created conversation, caching the returned `conv_…` ID for later turns - stream turns via `client.responses.create(..., stream=True)` and translate OpenAI `ResponseObjectStream` events into the agent event surface introduced in `lib/agents/stream_events.py` - run client and builtin tool calls by emitting follow-up responses with `previous_response_id`, mirroring the old turn-resume semantics - remove the legacy `AgentTurnResponseStreamChunk` dependency, introduce a lightweight `AgentStreamChunk`, and keep tool outputs inside `lib/` only - clean up aux imports, drop the unused `__future__` pragmas, and ensure the entire module passes `ruff check` This refactor keeps the public `Agent` API (create_session/create_turn) intact while aligning the implementation with stable responses/conversations APIs, so users can interoperate with standard OpenAI-compatible clients going forward.

This commit implements a high-level turn and step event model that wraps the low-level responses API stream events. The new model provides semantic meaning to agent interactions and distinguishes between server-side and client-side tool execution. Key changes: - Add turn_events.py with new event dataclasses (TurnStarted, StepProgress, etc.) - Add event_synthesizer.py for stateful event translation - Update Agent and AsyncAgent to use new event system - Update event_logger.py to work with new event structures - Separate server-side tools (file_search, web_search) from client-side function calls The turn model represents a complete interaction loop that may span multiple responses, with distinct inference and tool_execution steps. Server-side tools execute within responses and are logged as progress events, while client-side function tools trigger separate tool execution steps.

Major architectural change based on user feedback: - inference steps = model thinking/deciding what to do - tool_execution steps = ANY tool executing (server OR client-side) Previous incorrect design had server-side tools as progress within inference. New correct design: ALL tools (server and client) appear as tool_execution steps. The difference between server and client tools is operational: - Server-side (file_search, web_search, mcp_call): Execute within response stream, synthesizer emits tool_execution boundaries - Client-side (function): Break response stream, agent.py emits tool_execution when executing Both are annotated with metadata.server_side for clarity. Changes: - Rewrote event_synthesizer to emit tool_execution steps for server-side tools - Updated event_logger to differentiate server vs client in logs - Added metadata to StepStarted for server_side flag - Server-side tools now: complete inference -> tool_execution step -> new inference

Three focused tests validate core architecture: 1. test_basic_turn_without_tools - Validates simple text-only turn - Verifies turn_started -> inference step -> turn_completed flow - No tool execution steps 2. test_server_side_file_search_tool ⭐ KEY TEST - Validates server-side tools appear as tool_execution steps - Verifies metadata.server_side=True - Tests inference -> tool_execution (server) -> inference flow 3. test_client_side_function_tool - Validates client-side tools appear as tool_execution steps - Verifies metadata.server_side=False - Tests inference -> tool_execution (client) -> inference flow All tests verify the key principle: tool_execution steps for ALL tools, regardless of where they execute (server or client).

Python dataclasses require fields with default values to come after fields without defaults. Reordered all event dataclass fields to fix TypeError: non-default argument follows default argument.

ashwinb · 2025-10-15T16:35:57Z

Landing this!

…3810) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from llamastack/llama-stack-client-python#281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```

meta-cla bot added the cla signed label Oct 12, 2025

raghotham reviewed Oct 12, 2025

View reviewed changes

src/llama_stack_client/__init__.py Show resolved Hide resolved

ashwinb added 9 commits October 12, 2025 19:30

test(agent): cover responses agent flow

30e6c07

chore(pytest): drop allow_network marker

8e65586

refactor(agent): require explicit sessions

379222e

Fix dataclass field ordering

f2831b4

Python dataclasses require fields with default values to come after fields without defaults. Reordered all event dataclass fields to fix TypeError: non-default argument follows default argument.

more work on the tests

4fa1653

ashwinb force-pushed the agent_refactor branch from 5b00477 to 4fa1653 Compare October 13, 2025 17:51

add CLI

ccdc739

ashwinb mentioned this pull request Oct 14, 2025

feat(responses)!: improve responses + conversations implementations llamastack/llama-stack#3810

Merged

ashwinb added 5 commits October 14, 2025 15:28

fixes

349dce7

much more reasonable now, tests pass

6850508

cleanup for toolgroups and stuff

5c38b45

update test, undo pyproject changes

3b98b49

small update to CLI

1a4b2fe

ashwinb merged commit 37777d0 into main Oct 15, 2025
2 checks passed

ashwinb deleted the agent_refactor branch October 15, 2025 16:36

ashwinb mentioned this pull request Oct 16, 2025

Deprecate Agents API llamastack/llama-stack#3313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor responses-backed Agent sessions #281

Refactor responses-backed Agent sessions #281

Uh oh!

ashwinb commented Oct 12, 2025

Uh oh!

raghotham commented Oct 12, 2025

Uh oh!

raghotham left a comment

Uh oh!

Uh oh!

ashwinb commented Oct 12, 2025

Uh oh!

ashwinb commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor responses-backed Agent sessions #281

Refactor responses-backed Agent sessions #281

Uh oh!

Conversation

ashwinb commented Oct 12, 2025

Summary

Testing

Uh oh!

raghotham commented Oct 12, 2025

Uh oh!

raghotham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ashwinb commented Oct 12, 2025

Uh oh!

ashwinb commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants