feat(agents): add agent debugger with SSE step-through (#131)#485
Conversation
…fc#131) - GET /agent/{agent_id}/debug — debugger UI (Pico CSS + htmx) - GET /agent/{agent_id}/debug/events — SSE stream of trace events - POST /agent/{agent_id}/debug/trace — record execution events - POST /agent/{agent_id}/debug/step — advance one step - POST /agent/{agent_id}/debug/continue — run to completion - POST /agent/{agent_id}/debug/clear — clear trace - GET /agent/{agent_id}/debug/status — trace state In-memory per-agent trace store with SSE broadcasting. UI has left panel (event timeline) and right panel (event details) with step/continue/stop buttons and ARIA labels on all interactive elements.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an in-memory per-agent debugger: bounded trace buffers with trimming/rebase, SSE event streaming with replay and heartbeats, step/continue/clear control endpoints, a dark HTML debugger UI, app router registration, and async integration tests. ChangesAgent Debugger
Sequence DiagramsequenceDiagram
participant Browser as Browser (UI)
participant SSE as /agent/{agent_id}/debug/events
participant API as /agent/{agent_id}/debug/*
participant Storage as Per-Agent Trace Storage
Browser->>SSE: GET /agent/{agent_id}/debug/events
SSE->>Storage: Replay stored events
SSE->>Browser: Stream replayed events and position
Browser->>API: POST /agent/{agent_id}/debug/trace (event)
API->>Storage: Validate & append event
API->>SSE: Broadcast event
SSE->>Browser: SSE event
Browser->>API: POST /agent/{agent_id}/debug/step or /continue
API->>Storage: Update position
API->>SSE: Broadcast position
SSE->>Browser: Position update
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_agent_debugger.py`:
- Around line 159-172: The test test_multiple_events_ordered currently only
checks total_events; update it to fetch the recorded events from the status
response (e.g. status.json()["events"] or whichever field returns the event
list) and assert the sequence of event data matches numbers 0 through 9 in order
(inspect each event's data["num"] or equivalent); keep the existing posts to
/agent/test-agent/debug/trace and replace or augment the assert
status.json()["total_events"] == 10 with an ordered-sequence assertion such as
comparing the list of event num values to list(range(10)).
- Around line 17-27: The test fails due to shared module-level debugger state
(_traces in tinyagentos/routes/agent_debugger.py) causing flaky counts; modify
test_trace_records_event to clear that buffer before posting by importing
tinyagentos.routes.agent_debugger._traces and calling its clear() (or otherwise
reset the module-level trace buffer) at the start of the test so the assertion
data["total"] == 1 is deterministic.
- Around line 136-146: The test test_events_endpoint_returns_sse currently uses
await client.get(..., timeout=2) which buffers the entire StreamingResponse and
can hang; change it to use httpx AsyncClient streaming via async with
client.stream("GET", "/agent/test-agent/debug/events", timeout=2) to open the
connection, assert resp.status_code == 200, then read only the first chunk/line
from the response iterator (to validate an SSE frame exists) and then exit the
stream context to close the connection; update any assertions to operate on that
single chunk instead of the buffered body.
In `@tinyagentos/routes/agent_debugger.py`:
- Around line 50-56: _trim_trace currently slices off the oldest events in
_traces but does not adjust _positions correctly; compute the number of dropped
entries (dropped = len(events) - _MAX_TRACE_LENGTH), update _traces[agent_id] =
events[-_MAX_TRACE_LENGTH:], then set _positions[agent_id] = max(0,
min(_positions.get(agent_id, 0) - dropped, max(0, len(_traces[agent_id]) - 1)))
so the position is shifted left by the number dropped and clamped to the new
trace bounds; reference functions/variables: _trim_trace, _traces, _positions,
_MAX_TRACE_LENGTH.
- Around line 79-83: The backlog replay fails when a stored trace has >256
events because queue is created with asyncio.Queue(maxsize=256) and the code
calls queue.put_nowait() for every event from _traces.get(agent_id, []) which
raises QueueFull; fix by avoiding pre-filling a fixed-size queue: either create
the queue with a maxsize equal to the trace length (use
len(_traces.get(agent_id, [])) or None for an unbounded queue) before the loop
that calls queue.put_nowait(), or skip queuing the backlog entirely and stream
the stored events directly to the SSE generator instead of enqueuing them;
update the code paths referencing queue, asyncio.Queue(maxsize=256), _traces,
agent_id, and put_nowait() accordingly.
- Around line 174-191: The persisted trace is currently built only from
{"type","ts","data"} inside the request handler that constructs the `event`
dict; extend this to include stable node identifiers and lineage (e.g. "node_id"
and "parent_id"), timing and duration fields (e.g. "start_ts", "end_ts",
"duration_ms"), token/usage and model metadata (e.g. "token_count", "model",
"latency_ms"), and replay/checkpoint flags (e.g. "replay_checkpoint" or
"checkpoint_id") plus any relevant "metadata" map so the debugger can build
tree, replay, rerun and export flows; populate these fields from the incoming
body when present or synthesize deterministic IDs/timestamps in the same handler
where `event = {...}` is created (ensure names match the debugger contract).
- Around line 34-35: The module lacks a real paused vs running state: populate
and use _step_events and a running/paused flag so debugger_trace() can block
properly and /continue can resume all or single waiters. Change debugger_trace()
to create and store a per-agent asyncio.Event in _step_events when a trace
enters wait, loop waiting on that event (or a per-agent resume_event) while
consulting _positions to decide when to pause vs advance, and clear or replace
the event after a single step; update the /continue handler to set() the
appropriate event(s) from _step_events (or flip a shared _running boolean and
set all resume events) so waiting traces actually resume, and ensure step
semantics only resume one waiter if stepping and resume-all for continue; keep
references to symbols _step_events, debugger_trace(), _positions, and the
/continue handler when making the changes.
In `@tinyagentos/templates/agent_debugger.html`:
- Around line 264-270: The timeline list items (the created li element with
class 'event-item' that currently sets tabindex and onclick to call
selectEvent(idx)) are not keyboard-operable; add keyboard activation by either
moving the interactive control into a real <button> inside the li or by adding
button ARIA semantics and a keydown handler on the li: set role="button" and
handle Enter/Space in a keydown listener to call selectEvent(idx) (and ensure
preventDefault on Space), and keep the existing onclick for mouse support; also
ensure aria-pressed/aria-expanded is set as appropriate for the details pane
state.
- Around line 245-251: After handling the 'clear' event you must refresh the
header counters so the UI reflects the reset state; call the existing updateUI()
(or the specific header update routine) after resetting events, currentPos,
activeIdx and clearing eventList/eventDetail so the header count/position
updates immediately instead of waiting for the next SSE message. Ensure you
invoke updateUI() (or equivalent) at the end of the if (data.type === 'clear')
branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a5c841e3-946b-47e1-9076-2beec761530d
📒 Files selected for processing (4)
tests/test_agent_debugger.pytinyagentos/app.pytinyagentos/routes/agent_debugger.pytinyagentos/templates/agent_debugger.html
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICALNone Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (1 files)
Reviewed by nemotron-3-super-120b-a12b-20230311:free · 168,628 tokens |
_step_events was never populated — step/continue endpoints did .get() on an empty dict, so the step-mode wait loop never activated. Changed all three call sites (step, continue, trace) to use setdefault() so the asyncio.Event is created on first access. Fixes WARNING from kilo-code-bot review on PR jaylfc#485.
- trace endpoint no longer blocks unless a debugger UI is attached (step mode); fixes the POST /trace hang and makes step/continue work - rebase debugger position when the trace is trimmed past the cap - size the SSE replay queue to the backlog so reconnects with >256 buffered events no longer raise QueueFull - release blocked agents on clear and on last-listener disconnect - refresh header counters after a clear event in the UI - make timeline rows keyboard-operable (Enter/Space) - pin CDN assets and add SRI integrity hashes - isolate the total==1 assertion; drive SSE tests via the ASGI app instead of a buffered GET; assert event ordering on replay
- Fix _trim_trace position shift: compute dropped count and clamp position - Use unbounded asyncio.Queue (was 256) to handle up to 10K trace events - Enrich trace events with node_id, parent_id, timing, token/model/metadata fields - Fix test_trace_records_event: clear shared module-level state before assertions - Fix test_multiple_events_ordered: assert data.num ordering, not just count - Fix test_events_endpoint_returns_sse: stream instead of buffered client.get - Add keyboard accessibility to timeline items (Enter/Space, role=button, aria-pressed) - Call updateUI() after clear event to refresh header counters
…gent-debugger # Conflicts: # tests/test_agent_debugger.py # tinyagentos/routes/agent_debugger.py # tinyagentos/templates/agent_debugger.html
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_agent_debugger.py (1)
115-130:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake step/continue assertions deterministic and exact.
Both tests rely on shared in-memory state and only assert lower bounds (
>=), so they can pass even if endpoint behavior regresses. Clear per-agent state first and assert exact positions (1for step,5for continue in this setup).Suggested minimal fix
`@pytest.mark.asyncio` async def test_step_advances_position(client): """POST /agent/{agent_id}/debug/step advances the position.""" + await client.post("/agent/test-agent/debug/clear") await client.post( "/agent/test-agent/debug/trace", json={"type": "tool_call", "data": {"tool": "read"}}, ) await client.post( "/agent/test-agent/debug/trace", json={"type": "tool_result", "data": {"result": "ok"}}, ) resp = await client.post("/agent/test-agent/debug/step") assert resp.status_code == 200 data = resp.json() - assert data["pos"] >= 1 + assert data["pos"] == 1 `@pytest.mark.asyncio` async def test_continue_jumps_to_end(client): """POST /agent/{agent_id}/debug/continue advances to end of trace.""" + await client.post("/agent/test-agent/debug/clear") for i in range(5): await client.post( "/agent/test-agent/debug/trace", json={"type": "log", "data": {"msg": f"step {i}"}}, ) resp = await client.post("/agent/test-agent/debug/continue") assert resp.status_code == 200 data = resp.json() assert data["status"] == "continued" - assert data["pos"] >= 5 + assert data["pos"] == 5Also applies to: 133-146
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_agent_debugger.py` around lines 115 - 130, Reset the per-agent in-memory debug state before sending traces and replace the non-deterministic lower-bound assertions with exact checks: in test_step_advances_position, clear the agent test-agent debug state, post the two trace events, call POST /agent/test-agent/debug/step and assert data["pos"] == 1; similarly for the companion test that calls POST /agent/test-agent/debug/continue, clear state first, send the same traces, call the continue endpoint and assert data["pos"] == 5. Ensure the reset uses your existing test helper or the agent debug reset endpoint so tests don’t share state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/test_agent_debugger.py`:
- Around line 115-130: Reset the per-agent in-memory debug state before sending
traces and replace the non-deterministic lower-bound assertions with exact
checks: in test_step_advances_position, clear the agent test-agent debug state,
post the two trace events, call POST /agent/test-agent/debug/step and assert
data["pos"] == 1; similarly for the companion test that calls POST
/agent/test-agent/debug/continue, clear state first, send the same traces, call
the continue endpoint and assert data["pos"] == 5. Ensure the reset uses your
existing test helper or the agent debug reset endpoint so tests don’t share
state.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ae0182ad-2c2b-4c6a-b5b9-5cb0bdc50e81
📒 Files selected for processing (3)
tests/test_agent_debugger.pytinyagentos/routes/agent_debugger.pytinyagentos/templates/agent_debugger.html
🚧 Files skipped from review as they are similar to previous changes (2)
- tinyagentos/templates/agent_debugger.html
- tinyagentos/routes/agent_debugger.py
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/test_agent_debugger.py (2)
20-24: ⚡ Quick winPrefer the public
/debug/clearendpoint over direct module state manipulation.The test directly imports and clears internal module-level dictionaries (
_traces,_positions,_step_events), which couples the test to implementation details. If the storage mechanism changes, the test breaks. Use the existingPOST /agent/test-agent/debug/clearendpoint instead.♻️ Refactor to use the public API
- # Reset shared module-level state for deterministic test - from tinyagentos.routes.agent_debugger import _traces, _positions, _step_events - _traces.clear() - _positions.clear() - _step_events.clear() - + # Clear trace via public endpoint for test isolation + await client.post("/agent/test-agent/debug/clear") + resp = await client.post(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_agent_debugger.py` around lines 20 - 24, Replace the direct module-level state manipulation in tests/test_agent_debugger.py (the imports and calls to _traces.clear(), _positions.clear(), _step_events.clear()) with a call to the public debug clear endpoint: send a POST to /agent/test-agent/debug/clear before running assertions so the test uses the public API rather than touching internal symbols (_traces, _positions, _step_events), keeping it resilient to internal storage changes.
154-156: ⚡ Quick winSimplify SSE streaming by using the fixture
clientdirectly.The test creates a new
httpx.AsyncClientand accesses the private_transportattribute to reuse the app. The existingclientfixture is already anhttpx.AsyncClientwith the app wired up, so you can callclient.stream(...)directly without creating a new client or accessing internals.♻️ Simplified streaming approach
- # Use streaming to avoid buffering the entire infinite SSE response - async with httpx.AsyncClient( - transport=httpx.ASGITransport(app=client._transport.app), # type: ignore[arg-type] - base_url="http://test", - ) as ac: - async with ac.stream("GET", "/agent/test-agent/debug/events", timeout=2) as resp: + # Use streaming to avoid buffering the entire infinite SSE response + async with client.stream("GET", "/agent/test-agent/debug/events", timeout=2) as resp: - assert resp.status_code == 200 - # Read one chunk to verify SSE frame delivery - chunk = await resp.aiter_raw().__anext__() - assert chunk, "SSE stream returned no data" + assert resp.status_code == 200 + # Read one chunk to verify SSE frame delivery + chunk = await resp.aiter_raw().__anext__() + assert chunk, "SSE stream returned no data"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_agent_debugger.py` around lines 154 - 156, The test currently constructs a new httpx.AsyncClient and reaches into client._transport to reuse the app; instead use the existing test fixture `client` (already an httpx.AsyncClient) and call its stream method directly (e.g., use async with client.stream(...) for the same request and headers) so you don't create a new client or access private attributes; update the code that builds the request to use `client.stream` and remove the custom AsyncClient(...) block and the type: ignore on `_transport`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_agent_debugger.py`:
- Around line 190-194: The test_multiple_events_ordered test incorrectly reads
events from the /agent/{agent_id}/debug/status response (which has no events) so
the ordering assertion is skipped; change the test to connect to the SSE
endpoint /agent/{agent_id}/debug/events, collect streamed frames, filter for
frames with type == "log" (skip the initial "position" frame), extract
event["data"]["num"] from each log frame into nums and assert nums ==
list(range(10)) to verify insertion order.
In `@tinyagentos/routes/agent_debugger.py`:
- Around line 229-231: When clearing agent state you remove the asyncio.Event
from _step_events without setting it, which can leave a coroutine blocked on
step_event.wait(); before popping the event for agent_id call set() on the
stored Event so any waiter is unblocked, then remove it and also remove
corresponding entries from _traces and _positions (refer to symbols
_step_events, _traces, _positions and the waiter that uses step_event.wait()).
- Around line 218-221: There is a race where step_event.clear() is called after
the broadcast so a concurrent debugger_step/debugger_continue set() can be lost;
change the logic so the per-agent event is cleared before broadcasting (i.e.,
call _step_events.setdefault(agent_id, asyncio.Event()).clear() prior to sending
the step signal) or, preferably, replace the binary Event with a per-agent
counter semaphore (e.g., use an asyncio.Semaphore initialized to 0 keyed by
_step_events[agent_id], await semaphore.acquire() here and have
debugger_step/debugger_continue call semaphore.release()) to ensure rapid
set/release operations are not lost; update references to _step_events,
agent_id, step_event.clear(), wait(), and debugger_step/debugger_continue
accordingly.
---
Nitpick comments:
In `@tests/test_agent_debugger.py`:
- Around line 20-24: Replace the direct module-level state manipulation in
tests/test_agent_debugger.py (the imports and calls to _traces.clear(),
_positions.clear(), _step_events.clear()) with a call to the public debug clear
endpoint: send a POST to /agent/test-agent/debug/clear before running assertions
so the test uses the public API rather than touching internal symbols (_traces,
_positions, _step_events), keeping it resilient to internal storage changes.
- Around line 154-156: The test currently constructs a new httpx.AsyncClient and
reaches into client._transport to reuse the app; instead use the existing test
fixture `client` (already an httpx.AsyncClient) and call its stream method
directly (e.g., use async with client.stream(...) for the same request and
headers) so you don't create a new client or access private attributes; update
the code that builds the request to use `client.stream` and remove the custom
AsyncClient(...) block and the type: ignore on `_transport`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 42db2a77-fff4-4384-9c47-410151ed38a2
📒 Files selected for processing (3)
tests/test_agent_debugger.pytinyagentos/routes/agent_debugger.pytinyagentos/templates/agent_debugger.html
🚧 Files skipped from review as they are similar to previous changes (1)
- tinyagentos/templates/agent_debugger.html
POST /debug/trace unconditionally awaited step_event.wait() with no timeout, causing the test suite to hang indefinitely (both 3.12 and 3.13 jobs hit the 45-minute CI timeout on test_trace_records_event). Now only blocks when _queues has an active SSE listener for the agent. Without a connected debugger UI there is no reason to pause execution.
Builds on the listener-gated wait: also bound the wait with asyncio.wait_for so a UI that disconnects mid-step cannot freeze trace recording, add an autouse fixture that resets the module globals around every test (preventing a leaked SSE queue or step event from hanging a later /trace POST in a full-suite run), and make the SSE test drive the handler with a disconnected request so the infinite generator terminates instead of hanging the runner.
Implements agent debugger with SSE-based step-through execution.
Closes #131
Summary by CodeRabbit
New Features
Tests