Skip to content

fix(cli): fix flush 413 infinite loop, add timeout/progress, fix status pending check#122

Merged
softberries merged 2 commits intomainfrom
fix/flush-413-and-status-pending
Apr 23, 2026
Merged

fix(cli): fix flush 413 infinite loop, add timeout/progress, fix status pending check#122
softberries merged 2 commits intomainfrom
fix/flush-413-and-status-pending

Conversation

@softberries
Copy link
Copy Markdown
Member

Summary

  • 413 infinite retry loop: flush was re-enqueuing events that failed with 413 Payload Too Large, which would fail again forever. Now 413s are dropped with a clear warning.
  • Root cause fix: StreamEventRequest::truncate_large_fields() added to core — drops transcript_linestool_responsetool_input in order until payload is under 512 KB. Called in stream before writing to pending.jsonl, and in flush before each send (covers already-queued oversized events).
  • Flush hangs: reqwest::Client::new() had no timeout. Added 60s timeout so a silent server fails fast instead of hanging indefinitely.
  • Flush progress: Added \r Session <id> — event N/M ... output per event so the user can see it's working.
  • Status pending check: session_checks was looking for .pushed marker files, which only the old batch push command ever created — so all sessions always showed as unpushed. Now counts non-empty pending.jsonl files instead, which is the actual streaming queue. Message updated to reference tracevault flush.

Test plan

  • Run tracevault stream with a large file write — confirm no 413 on next flush
  • Manually add an oversized entry to pending.jsonl, run tracevault flush — confirm it truncates and sends (or drops with clear message), does not loop
  • Kill server, run tracevault stream, restore server, run tracevault flush — confirm progress output and successful send
  • Disconnect network during flush — confirm it times out after ~60s instead of hanging
  • Run tracevault status — confirm pending count reflects pending.jsonl contents, not .pushed files

Set update=none on both submodules so --recurse-submodules skips them.
Switch website to HTTPS so it can be inited without an SSH key.
…g check

- Add StreamEventRequest::truncate_large_fields() to core; drop transcript
  lines, then tool_response, then tool_input until payload is under 512 KB
- Call truncation in stream before saving to pending.jsonl (root cause fix)
- Call truncation in flush before each send (fixes already-queued events)
- Don't re-enqueue 413 failures in flush — retrying oversized events loops forever
- Add per-event progress output to flush (\r overwrite) so it doesn't appear hung
- Add 60s timeout to reqwest client so flush can't hang indefinitely
- Fix status session check: count non-empty pending.jsonl instead of .pushed
  marker files (which only the old batch push command ever created); update
  message to reference `tracevault flush`
@softberries softberries merged commit 154aa4c into main Apr 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant