Skip to content

fix(transcript): filter stale sessions by mtime to prevent cross-task contamination#305

Merged
olearycrew merged 2 commits intopinchbench:mainfrom
mgoulart:fix/stale-session-transcript-race
Apr 14, 2026
Merged

fix(transcript): filter stale sessions by mtime to prevent cross-task contamination#305
olearycrew merged 2 commits intopinchbench:mainfrom
mgoulart:fix/stale-session-transcript-race

Conversation

@mgoulart
Copy link
Copy Markdown
Contributor

Problem

When running tasks sequentially, a previous task's agent sometimes continued writing async responses after its transcript was already archived and its sessions were cleared. This created a new session file that appeared in sessions.json — and the next task's transcript resolver (strategy 1b: _find_transcript_path_from_sessions_store) picked it up unconditionally, returning a stale session instead of the current task's session.

Result: The next task scored 0% because it inherited the previous task's output and its actual workspace was never evaluated.

Observed reproduction

Running the full suite against kimi-k2p5-fireworks on OpenClaw:

22:05:53 — task_24_polymarket_briefing transcript archived (64 events)
22:05:58 — 2 old sessions cleared
22:05:58 — task_25_access_log_anomaly starts
22:06:00 — workspace prepared, agent prompted
# task_24's agent writes 3 final Polymarket events to NEW session 2276707f
22:07:30 — transcript resolver finds 2276707f via sessions.json (strategy 1b)
           → archives as task_25_access_log_anomaly.jsonl
           → grader finds no anomaly_report.json → 0/5 checks → 0%

Re-running task_25 in isolation (clean sessions) scored 100% — kimi correctly identified all three anomalies.

Root cause

_find_transcript_path_from_sessions_store (strategy 1b) had no started_at guard. The glob fallback (strategy 2) already filtered by mtime, but strategy 1b ran first and short-circuited.

Fix

Pass started_at into _find_transcript_path_from_sessions_store and skip candidates whose mtime < started_at - 5s:

# Before
candidate_from_store = _find_transcript_path_from_sessions_store(agent_id)

# After
candidate_from_store = _find_transcript_path_from_sessions_store(agent_id, started_at)

The function signature gains an optional started_at: float = 0.0 parameter (default 0.0 = no filter) for backwards compatibility.

🤖 Generated with Claude Code

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 14, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Solid fix — the root cause is clearly identified and the guard is applied precisely where the stale session leak occurred. The tolerance_seconds=5.0 buffer and backwards-compatible default of started_at=0.0 are sensible choices.

Files Reviewed (2 files)
  • scripts/lib_agent.py
  • tasks/task_10_workflow.md

Reviewed by claude-4.6-sonnet-20260217 · 62,374 tokens

mgoulart and others added 2 commits April 13, 2026 23:40
OpenClaw and Claude Code serialize tool call parameters under 'arguments'
rather than 'params'. The read_config check was always returning 0 for
these agents even when the file was correctly read.

Fix: fall back to 'arguments' when 'params' is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lver

When a task's agent continued writing async responses after its transcript
was archived, the new session file landed in sessions.json and was picked
up by the NEXT task's transcript resolver (strategy 1b). This caused the
next task to score 0% — it inherited the previous task's tail output and
its actual workspace was never evaluated.

Root cause: _find_transcript_path_from_sessions_store had no started_at
guard. The glob fallback (strategy 2) already filters by mtime, but
strategy 1b ran first and returned the stale path unconditionally.

Fix: pass started_at into _find_transcript_path_from_sessions_store and
skip any candidate whose mtime predates the task start (with 5s tolerance).

Observed in the wild: task_24_polymarket_briefing wrote polymarket_briefing.md
to a new async session (2276707f) after its transcript was archived.
task_25_access_log_anomaly's resolver found 2276707f via sessions.json and
scored 0/5 automated checks because anomaly_report.json was never written.
Re-running task_25 in isolation scored 100%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mgoulart mgoulart force-pushed the fix/stale-session-transcript-race branch from 2a02b91 to 8750d29 Compare April 14, 2026 03:40
@olearycrew
Copy link
Copy Markdown
Member

@mgoulart thanks for this fix!

@olearycrew olearycrew merged commit 07b014c into pinchbench:main Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants