fix(transcript): filter stale sessions by mtime to prevent cross-task contamination#305
Merged
olearycrew merged 2 commits intopinchbench:mainfrom Apr 14, 2026
Conversation
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Solid fix — the root cause is clearly identified and the guard is applied precisely where the stale session leak occurred. The Files Reviewed (2 files)
Reviewed by claude-4.6-sonnet-20260217 · 62,374 tokens |
OpenClaw and Claude Code serialize tool call parameters under 'arguments' rather than 'params'. The read_config check was always returning 0 for these agents even when the file was correctly read. Fix: fall back to 'arguments' when 'params' is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lver When a task's agent continued writing async responses after its transcript was archived, the new session file landed in sessions.json and was picked up by the NEXT task's transcript resolver (strategy 1b). This caused the next task to score 0% — it inherited the previous task's tail output and its actual workspace was never evaluated. Root cause: _find_transcript_path_from_sessions_store had no started_at guard. The glob fallback (strategy 2) already filters by mtime, but strategy 1b ran first and returned the stale path unconditionally. Fix: pass started_at into _find_transcript_path_from_sessions_store and skip any candidate whose mtime predates the task start (with 5s tolerance). Observed in the wild: task_24_polymarket_briefing wrote polymarket_briefing.md to a new async session (2276707f) after its transcript was archived. task_25_access_log_anomaly's resolver found 2276707f via sessions.json and scored 0/5 automated checks because anomaly_report.json was never written. Re-running task_25 in isolation scored 100%. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2a02b91 to
8750d29
Compare
Member
|
@mgoulart thanks for this fix! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When running tasks sequentially, a previous task's agent sometimes continued writing async responses after its transcript was already archived and its sessions were cleared. This created a new session file that appeared in
sessions.json— and the next task's transcript resolver (strategy 1b:_find_transcript_path_from_sessions_store) picked it up unconditionally, returning a stale session instead of the current task's session.Result: The next task scored 0% because it inherited the previous task's output and its actual workspace was never evaluated.
Observed reproduction
Running the full suite against
kimi-k2p5-fireworkson OpenClaw:Re-running
task_25in isolation (clean sessions) scored 100% — kimi correctly identified all three anomalies.Root cause
_find_transcript_path_from_sessions_store(strategy 1b) had nostarted_atguard. The glob fallback (strategy 2) already filtered by mtime, but strategy 1b ran first and short-circuited.Fix
Pass
started_atinto_find_transcript_path_from_sessions_storeand skip candidates whosemtime < started_at - 5s:The function signature gains an optional
started_at: float = 0.0parameter (default 0.0 = no filter) for backwards compatibility.🤖 Generated with Claude Code