Conversation
Two playtest bugs found 2026-04-24 (sq-playtest-pingpong lines 598, 622). 1. `/api/debug/state` now sorts newest-mtime first and accepts a `?session_key=<slug>` query filter. Also adds `last_activity_ts` to each view so the UI can pick explicitly. The dashboard State tab was stuck on `debugState[0]` — which, under alphabetical sort, was the oldest save, not the active session. 2. `LocalDM._extract_json_object` strips markdown code fences / preamble / trailing commentary before `json.loads`. Haiku was returning fenced responses every turn despite the system prompt saying "no fences", burning 45s/turn on parse failures and emitting `dispatch_count=0`. Every cleanup step is recorded on the `local_dm.decompose` OTEL span (`json_cleanup_steps` attr) so the GM panel sees the contract violation even when the recovery succeeds — no silent fallback. Failure path now records `parse_preview` so unrecoverable responses can be diagnosed from the dashboard instead of from server logs. Wiring tests: - test_debug_state_sorts_newest_first_and_filters_by_session_key — creates two saves with staggered mtimes, asserts the newer comes first and the filter narrows to one entry. - test_local_dm_parses_response_wrapped_in_code_fence — feeds the exact failing shape (```json ...```) and asserts clean parse + `strip_fence` recorded on the span. - test_local_dm_parse_failure_records_preview_on_span — failure path surfaces a preview for dashboard diagnosis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two OTEL dashboard bugs found during the 2026-04-24 playtest (see
sq-playtest-pingpong.mdlines 598, 622)./api/debug/statenow sorts newest-mtime first, addslast_activity_tsto each view, and accepts?session_key=<slug>to filter. The dashboard's defaultdebugState[0]pick now lands on the active session instead of the oldest save.LocalDM._extract_json_objectstrips markdown code fences, preamble, and trailing commentary beforejson.loads. Every cleanup step is recorded on thelocal_dm.decomposespan (json_cleanup_steps), and failure paths now record aparse_previewso the GM panel can diagnose unrecoverable responses from the dashboard. No silent fallback — contract violations remain visible.Bug 1 (turn aggregator stuck at 0) is fixed in the paired UI PR.
Test plan
uv run pytest tests/agents/test_local_dm.py tests/server/test_rest.py— 40 tests pass locallylocal_dm.decomposespan should no longer emitdegraded=truewithparse_failure: JSONDecodeErroreach turnWiring tests added:
test_debug_state_sorts_newest_first_and_filters_by_session_keytest_local_dm_parses_response_wrapped_in_code_fencetest_local_dm_parses_response_with_preambletest_local_dm_parse_failure_records_preview_on_spantest_extract_json_object_*(unit coverage for the extractor helper)🤖 Generated with Claude Code