Skip to content

feat(agents): add agent debugger with SSE step-through (#131)#485

Merged
jaylfc merged 11 commits into
jaylfc:masterfrom
hognek:feat/agent-debugger
May 31, 2026
Merged

feat(agents): add agent debugger with SSE step-through (#131)#485
jaylfc merged 11 commits into
jaylfc:masterfrom
hognek:feat/agent-debugger

Conversation

@hognek
Copy link
Copy Markdown
Contributor

@hognek hognek commented May 31, 2026

Implements agent debugger with SSE-based step-through execution.

  • GET /agent/{id}/debug — debugger UI
  • GET /agent/{id}/debug/events — SSE stream
  • POST /agent/{id}/debug/step — step to next event
  • POST /agent/{id}/debug/continue — run until breakpoint
  • Pico CSS + htmx UI with ARIA labels

Closes #131

Summary by CodeRabbit

  • New Features

    • Interactive dark-themed agent debugger UI with timeline, details panel, and step/continue/stop controls
    • Real-time event streaming with backlog replay, heartbeat messages, per-agent isolation, and clear/reset of traces
    • Position and total-event counters with selectable event payload viewing; debugger controls advance or jump position
  • Tests

    • Added async integration tests covering UI responses, trace recording/validation, stepping/continuing/clearing, isolation, ordering, and SSE streaming

…fc#131)

- GET /agent/{agent_id}/debug — debugger UI (Pico CSS + htmx)
- GET /agent/{agent_id}/debug/events — SSE stream of trace events
- POST /agent/{agent_id}/debug/trace — record execution events
- POST /agent/{agent_id}/debug/step — advance one step
- POST /agent/{agent_id}/debug/continue — run to completion
- POST /agent/{agent_id}/debug/clear — clear trace
- GET /agent/{agent_id}/debug/status — trace state

In-memory per-agent trace store with SSE broadcasting.
UI has left panel (event timeline) and right panel (event details)
with step/continue/stop buttons and ARIA labels on all interactive elements.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an in-memory per-agent debugger: bounded trace buffers with trimming/rebase, SSE event streaming with replay and heartbeats, step/continue/clear control endpoints, a dark HTML debugger UI, app router registration, and async integration tests.

Changes

Agent Debugger

Layer / File(s) Summary
Trace storage and SSE foundation
tinyagentos/routes/agent_debugger.py
In-memory per-agent trace buffers with bounded storage, per-agent SSE listener queues, cursor tracking, trimming/rebase helpers, and the SSE /agent/{agent_id}/debug/events endpoint that replays backlog, registers listener queues, streams events and position updates, emits periodic heartbeats, and cleans up on disconnect.
Trace recording and blocking behavior
tinyagentos/routes/agent_debugger.py
/debug/trace validates incoming event JSON type, timestamps and appends events to a bounded per-agent buffer (trimming with rebasing), broadcasts recorded events to listeners, and—when listeners exist—may block until stepped with a 30s timeout.
Step / Continue / Status endpoints
tinyagentos/routes/agent_debugger.py
/debug/step and /debug/continue update per-agent cursor position, broadcast position changes to SSE listeners, and signal awaiting recorders. /debug/status returns agent id, total_events, position, and whether listeners are registered.
Clear and state maintenance
tinyagentos/routes/agent_debugger.py
/debug/clear resets stored trace, cursor, and step event state, broadcasts a clear SSE event, and releases waiters.
Debugger UI and client integration
tinyagentos/templates/agent_debugger.html
Dark-themed UI with left event timeline and right details panel. Client rewrites REPLACE_AGENT_ID endpoints to inject agentId, opens an SSE EventSource, handles heartbeat, position, and clear messages, appends events in order, updates counters/selection, and renders escaped JSON details.
Router registration
tinyagentos/app.py
Mounts the debugger router into the FastAPI app via app.include_router(...) to expose the debugger routes.
Integration tests
tests/test_agent_debugger.py
Async tests validating UI HTML, trace recording/validation (missing/invalid type), status behavior, step/continue/clear semantics, per-agent isolation, SSE frame presence, no-events state, and ordered multi-event streaming.

Sequence Diagram

sequenceDiagram
  participant Browser as Browser (UI)
  participant SSE as /agent/{agent_id}/debug/events
  participant API as /agent/{agent_id}/debug/*
  participant Storage as Per-Agent Trace Storage
  Browser->>SSE: GET /agent/{agent_id}/debug/events
  SSE->>Storage: Replay stored events
  SSE->>Browser: Stream replayed events and position
  Browser->>API: POST /agent/{agent_id}/debug/trace (event)
  API->>Storage: Validate & append event
  API->>SSE: Broadcast event
  SSE->>Browser: SSE event
  Browser->>API: POST /agent/{agent_id}/debug/step or /continue
  API->>Storage: Update position
  API->>SSE: Broadcast position
  SSE->>Browser: Position update
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #131: Agent debugger: step through an agent's tool calls and intermediate results — This PR implements per-step trace recording, step/continue controls, and a UI aligning with that objective.
  • #492: Extend trace/event schema (node IDs, lineage, timing) — The new in-memory trace and endpoints touch the same storage and serialization paths and are a likely extension point for richer event schema.

Poem

🐰 I hop through streams of traces, soft and spry,

tiny events that shimmer, timestamped by and by,
Step pauses the trail, continue bounds the rest,
Clear sweeps the meadow, tidy and blessed,
I nibble at logs and watch the timeline fly.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive PR implements partial compliance: UI, SSE streaming, step/continue endpoints, and event recording are complete; 'Replay from here', 'Re-run with different model', and replay export features are not yet implemented. Verify whether partial implementation is acceptable for this PR, or add tasks for replay execution, model swapping, and file export features in follow-up issues.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an agent debugger with SSE-based step-through functionality, matching the core feature introduced in this PR.
Out of Scope Changes check ✅ Passed All changes are directly scoped to agent debugger functionality: new routes, templates, tests, and app integration align with issue #131 requirements.
Docstring Coverage ✅ Passed Docstring coverage is 85.19% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_agent_debugger.py`:
- Around line 159-172: The test test_multiple_events_ordered currently only
checks total_events; update it to fetch the recorded events from the status
response (e.g. status.json()["events"] or whichever field returns the event
list) and assert the sequence of event data matches numbers 0 through 9 in order
(inspect each event's data["num"] or equivalent); keep the existing posts to
/agent/test-agent/debug/trace and replace or augment the assert
status.json()["total_events"] == 10 with an ordered-sequence assertion such as
comparing the list of event num values to list(range(10)).
- Around line 17-27: The test fails due to shared module-level debugger state
(_traces in tinyagentos/routes/agent_debugger.py) causing flaky counts; modify
test_trace_records_event to clear that buffer before posting by importing
tinyagentos.routes.agent_debugger._traces and calling its clear() (or otherwise
reset the module-level trace buffer) at the start of the test so the assertion
data["total"] == 1 is deterministic.
- Around line 136-146: The test test_events_endpoint_returns_sse currently uses
await client.get(..., timeout=2) which buffers the entire StreamingResponse and
can hang; change it to use httpx AsyncClient streaming via async with
client.stream("GET", "/agent/test-agent/debug/events", timeout=2) to open the
connection, assert resp.status_code == 200, then read only the first chunk/line
from the response iterator (to validate an SSE frame exists) and then exit the
stream context to close the connection; update any assertions to operate on that
single chunk instead of the buffered body.

In `@tinyagentos/routes/agent_debugger.py`:
- Around line 50-56: _trim_trace currently slices off the oldest events in
_traces but does not adjust _positions correctly; compute the number of dropped
entries (dropped = len(events) - _MAX_TRACE_LENGTH), update _traces[agent_id] =
events[-_MAX_TRACE_LENGTH:], then set _positions[agent_id] = max(0,
min(_positions.get(agent_id, 0) - dropped, max(0, len(_traces[agent_id]) - 1)))
so the position is shifted left by the number dropped and clamped to the new
trace bounds; reference functions/variables: _trim_trace, _traces, _positions,
_MAX_TRACE_LENGTH.
- Around line 79-83: The backlog replay fails when a stored trace has >256
events because queue is created with asyncio.Queue(maxsize=256) and the code
calls queue.put_nowait() for every event from _traces.get(agent_id, []) which
raises QueueFull; fix by avoiding pre-filling a fixed-size queue: either create
the queue with a maxsize equal to the trace length (use
len(_traces.get(agent_id, [])) or None for an unbounded queue) before the loop
that calls queue.put_nowait(), or skip queuing the backlog entirely and stream
the stored events directly to the SSE generator instead of enqueuing them;
update the code paths referencing queue, asyncio.Queue(maxsize=256), _traces,
agent_id, and put_nowait() accordingly.
- Around line 174-191: The persisted trace is currently built only from
{"type","ts","data"} inside the request handler that constructs the `event`
dict; extend this to include stable node identifiers and lineage (e.g. "node_id"
and "parent_id"), timing and duration fields (e.g. "start_ts", "end_ts",
"duration_ms"), token/usage and model metadata (e.g. "token_count", "model",
"latency_ms"), and replay/checkpoint flags (e.g. "replay_checkpoint" or
"checkpoint_id") plus any relevant "metadata" map so the debugger can build
tree, replay, rerun and export flows; populate these fields from the incoming
body when present or synthesize deterministic IDs/timestamps in the same handler
where `event = {...}` is created (ensure names match the debugger contract).
- Around line 34-35: The module lacks a real paused vs running state: populate
and use _step_events and a running/paused flag so debugger_trace() can block
properly and /continue can resume all or single waiters. Change debugger_trace()
to create and store a per-agent asyncio.Event in _step_events when a trace
enters wait, loop waiting on that event (or a per-agent resume_event) while
consulting _positions to decide when to pause vs advance, and clear or replace
the event after a single step; update the /continue handler to set() the
appropriate event(s) from _step_events (or flip a shared _running boolean and
set all resume events) so waiting traces actually resume, and ensure step
semantics only resume one waiter if stepping and resume-all for continue; keep
references to symbols _step_events, debugger_trace(), _positions, and the
/continue handler when making the changes.

In `@tinyagentos/templates/agent_debugger.html`:
- Around line 264-270: The timeline list items (the created li element with
class 'event-item' that currently sets tabindex and onclick to call
selectEvent(idx)) are not keyboard-operable; add keyboard activation by either
moving the interactive control into a real <button> inside the li or by adding
button ARIA semantics and a keydown handler on the li: set role="button" and
handle Enter/Space in a keydown listener to call selectEvent(idx) (and ensure
preventDefault on Space), and keep the existing onclick for mouse support; also
ensure aria-pressed/aria-expanded is set as appropriate for the details pane
state.
- Around line 245-251: After handling the 'clear' event you must refresh the
header counters so the UI reflects the reset state; call the existing updateUI()
(or the specific header update routine) after resetting events, currentPos,
activeIdx and clearing eventList/eventDetail so the header count/position
updates immediately instead of waiting for the next SSE message. Ensure you
invoke updateUI() (or equivalent) at the end of the if (data.type === 'clear')
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a5c841e3-946b-47e1-9076-2beec761530d

📥 Commits

Reviewing files that changed from the base of the PR and between c08935a and 8134fc3.

📒 Files selected for processing (4)
  • tests/test_agent_debugger.py
  • tinyagentos/app.py
  • tinyagentos/routes/agent_debugger.py
  • tinyagentos/templates/agent_debugger.html

Comment thread tests/test_agent_debugger.py
Comment thread tests/test_agent_debugger.py Outdated
Comment thread tests/test_agent_debugger.py
Comment thread tinyagentos/routes/agent_debugger.py
Comment thread tinyagentos/routes/agent_debugger.py Outdated
Comment thread tinyagentos/routes/agent_debugger.py Outdated
Comment thread tinyagentos/routes/agent_debugger.py
Comment thread tinyagentos/templates/agent_debugger.html
Comment thread tinyagentos/templates/agent_debugger.html
Comment thread tinyagentos/routes/agent_debugger.py Outdated
Comment thread tinyagentos/templates/agent_debugger.html
Comment thread tinyagentos/templates/agent_debugger.html
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented May 31, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 2
Issue Details (click to expand)

CRITICAL

None

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
tinyagentos/routes/agent_debugger.py 79 The SSE queue is created with a fixed maxsize of 256, but the trace may contain more events (up to _MAX_TRACE_LENGTH=10_000). This will cause QueueFull exceptions when replaying the trace for agents with many events.
tinyagentos/templates/agent_debugger.html 7 Consider adding an integrity attribute to the stylesheet link to prevent supply chain attacks.
tinyagentos/templates/agent_debugger.html 8 Consider adding an integrity attribute to the script tag to prevent supply chain attacks.
Files Reviewed (1 files)
  • tinyagentos/routes/agent_debugger.py - 0 issues (all previous issues in this file have been addressed)

Reviewed by nemotron-3-super-120b-a12b-20230311:free · 168,628 tokens

Comment thread tinyagentos/routes/agent_debugger.py
Hogne and others added 5 commits May 31, 2026 12:44
_step_events was never populated — step/continue endpoints did .get()
on an empty dict, so the step-mode wait loop never activated. Changed
all three call sites (step, continue, trace) to use setdefault() so the
asyncio.Event is created on first access.

Fixes WARNING from kilo-code-bot review on PR jaylfc#485.
- trace endpoint no longer blocks unless a debugger UI is attached
  (step mode); fixes the POST /trace hang and makes step/continue work
- rebase debugger position when the trace is trimmed past the cap
- size the SSE replay queue to the backlog so reconnects with >256
  buffered events no longer raise QueueFull
- release blocked agents on clear and on last-listener disconnect
- refresh header counters after a clear event in the UI
- make timeline rows keyboard-operable (Enter/Space)
- pin CDN assets and add SRI integrity hashes
- isolate the total==1 assertion; drive SSE tests via the ASGI app
  instead of a buffered GET; assert event ordering on replay
- Fix _trim_trace position shift: compute dropped count and clamp position
- Use unbounded asyncio.Queue (was 256) to handle up to 10K trace events
- Enrich trace events with node_id, parent_id, timing, token/model/metadata fields
- Fix test_trace_records_event: clear shared module-level state before assertions
- Fix test_multiple_events_ordered: assert data.num ordering, not just count
- Fix test_events_endpoint_returns_sse: stream instead of buffered client.get
- Add keyboard accessibility to timeline items (Enter/Space, role=button, aria-pressed)
- Call updateUI() after clear event to refresh header counters
…gent-debugger

# Conflicts:
#	tests/test_agent_debugger.py
#	tinyagentos/routes/agent_debugger.py
#	tinyagentos/templates/agent_debugger.html
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_agent_debugger.py (1)

115-130: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make step/continue assertions deterministic and exact.

Both tests rely on shared in-memory state and only assert lower bounds (>=), so they can pass even if endpoint behavior regresses. Clear per-agent state first and assert exact positions (1 for step, 5 for continue in this setup).

Suggested minimal fix
 `@pytest.mark.asyncio`
 async def test_step_advances_position(client):
     """POST /agent/{agent_id}/debug/step advances the position."""
+    await client.post("/agent/test-agent/debug/clear")
     await client.post(
         "/agent/test-agent/debug/trace",
         json={"type": "tool_call", "data": {"tool": "read"}},
     )
     await client.post(
         "/agent/test-agent/debug/trace",
         json={"type": "tool_result", "data": {"result": "ok"}},
     )

     resp = await client.post("/agent/test-agent/debug/step")
     assert resp.status_code == 200
     data = resp.json()
-    assert data["pos"] >= 1
+    assert data["pos"] == 1


 `@pytest.mark.asyncio`
 async def test_continue_jumps_to_end(client):
     """POST /agent/{agent_id}/debug/continue advances to end of trace."""
+    await client.post("/agent/test-agent/debug/clear")
     for i in range(5):
         await client.post(
             "/agent/test-agent/debug/trace",
             json={"type": "log", "data": {"msg": f"step {i}"}},
         )

     resp = await client.post("/agent/test-agent/debug/continue")
     assert resp.status_code == 200
     data = resp.json()
     assert data["status"] == "continued"
-    assert data["pos"] >= 5
+    assert data["pos"] == 5

Also applies to: 133-146

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_agent_debugger.py` around lines 115 - 130, Reset the per-agent
in-memory debug state before sending traces and replace the non-deterministic
lower-bound assertions with exact checks: in test_step_advances_position, clear
the agent test-agent debug state, post the two trace events, call POST
/agent/test-agent/debug/step and assert data["pos"] == 1; similarly for the
companion test that calls POST /agent/test-agent/debug/continue, clear state
first, send the same traces, call the continue endpoint and assert data["pos"]
== 5. Ensure the reset uses your existing test helper or the agent debug reset
endpoint so tests don’t share state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/test_agent_debugger.py`:
- Around line 115-130: Reset the per-agent in-memory debug state before sending
traces and replace the non-deterministic lower-bound assertions with exact
checks: in test_step_advances_position, clear the agent test-agent debug state,
post the two trace events, call POST /agent/test-agent/debug/step and assert
data["pos"] == 1; similarly for the companion test that calls POST
/agent/test-agent/debug/continue, clear state first, send the same traces, call
the continue endpoint and assert data["pos"] == 5. Ensure the reset uses your
existing test helper or the agent debug reset endpoint so tests don’t share
state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ae0182ad-2c2b-4c6a-b5b9-5cb0bdc50e81

📥 Commits

Reviewing files that changed from the base of the PR and between 8134fc3 and 45ebc54.

📒 Files selected for processing (3)
  • tests/test_agent_debugger.py
  • tinyagentos/routes/agent_debugger.py
  • tinyagentos/templates/agent_debugger.html
🚧 Files skipped from review as they are similar to previous changes (2)
  • tinyagentos/templates/agent_debugger.html
  • tinyagentos/routes/agent_debugger.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/test_agent_debugger.py (2)

20-24: ⚡ Quick win

Prefer the public /debug/clear endpoint over direct module state manipulation.

The test directly imports and clears internal module-level dictionaries (_traces, _positions, _step_events), which couples the test to implementation details. If the storage mechanism changes, the test breaks. Use the existing POST /agent/test-agent/debug/clear endpoint instead.

♻️ Refactor to use the public API
-    # Reset shared module-level state for deterministic test
-    from tinyagentos.routes.agent_debugger import _traces, _positions, _step_events
-    _traces.clear()
-    _positions.clear()
-    _step_events.clear()
-
+    # Clear trace via public endpoint for test isolation
+    await client.post("/agent/test-agent/debug/clear")
+    
     resp = await client.post(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_agent_debugger.py` around lines 20 - 24, Replace the direct
module-level state manipulation in tests/test_agent_debugger.py (the imports and
calls to _traces.clear(), _positions.clear(), _step_events.clear()) with a call
to the public debug clear endpoint: send a POST to /agent/test-agent/debug/clear
before running assertions so the test uses the public API rather than touching
internal symbols (_traces, _positions, _step_events), keeping it resilient to
internal storage changes.

154-156: ⚡ Quick win

Simplify SSE streaming by using the fixture client directly.

The test creates a new httpx.AsyncClient and accesses the private _transport attribute to reuse the app. The existing client fixture is already an httpx.AsyncClient with the app wired up, so you can call client.stream(...) directly without creating a new client or accessing internals.

♻️ Simplified streaming approach
-    # Use streaming to avoid buffering the entire infinite SSE response
-    async with httpx.AsyncClient(
-        transport=httpx.ASGITransport(app=client._transport.app),  # type: ignore[arg-type]
-        base_url="http://test",
-    ) as ac:
-        async with ac.stream("GET", "/agent/test-agent/debug/events", timeout=2) as resp:
+    # Use streaming to avoid buffering the entire infinite SSE response
+    async with client.stream("GET", "/agent/test-agent/debug/events", timeout=2) as resp:
-            assert resp.status_code == 200
-            # Read one chunk to verify SSE frame delivery
-            chunk = await resp.aiter_raw().__anext__()
-            assert chunk, "SSE stream returned no data"
+        assert resp.status_code == 200
+        # Read one chunk to verify SSE frame delivery
+        chunk = await resp.aiter_raw().__anext__()
+        assert chunk, "SSE stream returned no data"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_agent_debugger.py` around lines 154 - 156, The test currently
constructs a new httpx.AsyncClient and reaches into client._transport to reuse
the app; instead use the existing test fixture `client` (already an
httpx.AsyncClient) and call its stream method directly (e.g., use async with
client.stream(...) for the same request and headers) so you don't create a new
client or access private attributes; update the code that builds the request to
use `client.stream` and remove the custom AsyncClient(...) block and the type:
ignore on `_transport`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_agent_debugger.py`:
- Around line 190-194: The test_multiple_events_ordered test incorrectly reads
events from the /agent/{agent_id}/debug/status response (which has no events) so
the ordering assertion is skipped; change the test to connect to the SSE
endpoint /agent/{agent_id}/debug/events, collect streamed frames, filter for
frames with type == "log" (skip the initial "position" frame), extract
event["data"]["num"] from each log frame into nums and assert nums ==
list(range(10)) to verify insertion order.

In `@tinyagentos/routes/agent_debugger.py`:
- Around line 229-231: When clearing agent state you remove the asyncio.Event
from _step_events without setting it, which can leave a coroutine blocked on
step_event.wait(); before popping the event for agent_id call set() on the
stored Event so any waiter is unblocked, then remove it and also remove
corresponding entries from _traces and _positions (refer to symbols
_step_events, _traces, _positions and the waiter that uses step_event.wait()).
- Around line 218-221: There is a race where step_event.clear() is called after
the broadcast so a concurrent debugger_step/debugger_continue set() can be lost;
change the logic so the per-agent event is cleared before broadcasting (i.e.,
call _step_events.setdefault(agent_id, asyncio.Event()).clear() prior to sending
the step signal) or, preferably, replace the binary Event with a per-agent
counter semaphore (e.g., use an asyncio.Semaphore initialized to 0 keyed by
_step_events[agent_id], await semaphore.acquire() here and have
debugger_step/debugger_continue call semaphore.release()) to ensure rapid
set/release operations are not lost; update references to _step_events,
agent_id, step_event.clear(), wait(), and debugger_step/debugger_continue
accordingly.

---

Nitpick comments:
In `@tests/test_agent_debugger.py`:
- Around line 20-24: Replace the direct module-level state manipulation in
tests/test_agent_debugger.py (the imports and calls to _traces.clear(),
_positions.clear(), _step_events.clear()) with a call to the public debug clear
endpoint: send a POST to /agent/test-agent/debug/clear before running assertions
so the test uses the public API rather than touching internal symbols (_traces,
_positions, _step_events), keeping it resilient to internal storage changes.
- Around line 154-156: The test currently constructs a new httpx.AsyncClient and
reaches into client._transport to reuse the app; instead use the existing test
fixture `client` (already an httpx.AsyncClient) and call its stream method
directly (e.g., use async with client.stream(...) for the same request and
headers) so you don't create a new client or access private attributes; update
the code that builds the request to use `client.stream` and remove the custom
AsyncClient(...) block and the type: ignore on `_transport`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 42db2a77-fff4-4384-9c47-410151ed38a2

📥 Commits

Reviewing files that changed from the base of the PR and between 45ebc54 and efa33b7.

📒 Files selected for processing (3)
  • tests/test_agent_debugger.py
  • tinyagentos/routes/agent_debugger.py
  • tinyagentos/templates/agent_debugger.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • tinyagentos/templates/agent_debugger.html

Comment thread tests/test_agent_debugger.py Outdated
Comment thread tinyagentos/routes/agent_debugger.py Outdated
Comment thread tinyagentos/routes/agent_debugger.py
jaylfc and others added 4 commits May 31, 2026 13:37
POST /debug/trace unconditionally awaited step_event.wait() with no
timeout, causing the test suite to hang indefinitely (both 3.12 and
3.13 jobs hit the 45-minute CI timeout on test_trace_records_event).

Now only blocks when _queues has an active SSE listener for the agent.
Without a connected debugger UI there is no reason to pause execution.
Builds on the listener-gated wait: also bound the wait with
asyncio.wait_for so a UI that disconnects mid-step cannot freeze
trace recording, add an autouse fixture that resets the module
globals around every test (preventing a leaked SSE queue or step
event from hanging a later /trace POST in a full-suite run), and
make the SSE test drive the handler with a disconnected request so
the infinite generator terminates instead of hanging the runner.
@jaylfc jaylfc merged commit 7ae98d4 into jaylfc:master May 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent debugger: step through an agent's tool calls and intermediate results

2 participants