Skip to content

Stream-parse Codex session files to fix oversize-cap drops on heavy users#207

Merged
iamtoruk merged 1 commit intogetagentseal:mainfrom
ozymandiashh:fix/codex-stream-large-sessions
May 3, 2026
Merged

Stream-parse Codex session files to fix oversize-cap drops on heavy users#207
iamtoruk merged 1 commit intogetagentseal:mainfrom
ozymandiashh:fix/codex-stream-large-sessions

Conversation

@ozymandiashh
Copy link
Copy Markdown
Contributor

Context

Follow-up to #204. After installing 0.9.6 from Homebrew, both fixes from that issue (16 KB read cap + info: null estimation) work as advertised on my account. However, while testing with --verbose I noticed that one of my recent rollout files was being silently dropped:

codeburn: skipped oversize file /Users/.../sessions/2026/05/03/rollout-...jsonl
  (259470209 bytes > cap 134217728)

The session is 247 MB in a single .jsonl file. MAX_SESSION_FILE_BYTES = 128 MB in src/fs-utils.ts rejects it before any parsing happens, so the entire session disappears from the dashboard with no on-screen indication unless the user explicitly passes --verbose.

Heavy users on long-running Codex sessions (large file contents in context, multi-hour coding pushes) hit this in practice.

Why bumping the cap isn't enough

The Codex provider reads the file in full via readSessionFile and then does content.split('\n'). split roughly doubles the high-water memory while the new array of line strings is being built, so even with readViaStream keeping the read itself bounded, we'd push V8 toward its ~512 MB string limit on a single 250 MB session.

readSessionLines already exists in fs-utils.ts as the streaming counterpart and is used by readFirstLine. Memory there is bounded to one line at a time, which is what we want for full-file parsing too.

Changes

src/fs-utils.ts — introduce MAX_STREAM_SESSION_FILE_BYTES = 2 GB and apply it in readSessionLines instead of the full-read cap. The smaller MAX_SESSION_FILE_BYTES (128 MB) stays in place for the two consumers that materialize the whole file (readSessionFile, readSessionFileSync), where the V8 string limit is still a real constraint.

src/providers/codex.ts — replace

const content = await readSessionFile(source.path)
if (content === null) return
const lines = content.split('\n').filter(l => l.trim())
for (const line of lines) { ... }

with

let sawAnyLine = false
for await (const rawLine of readSessionLines(source.path)) {
  sawAnyLine = true
  const line = rawLine.trim()
  if (!line) continue
  ...
}
if (!sawAnyLine) return  // preserves early-return on read failure

The sawAnyLine guard means a failed/oversized/empty stream still skips the cache write, so a transient read failure can't pin an empty result set against a fingerprint that would otherwise be re-parsed on the next run.

Empirical impact

On my account with one 247 MB rollout file in the 7-day window:

Before (0.9.6 stock) After this PR Δ
Cost €358.69 €550.67 +€191.98
Calls 4,536 6,111 +1,575
Sessions 61 62 +1
Input tokens 20.1 M 37.3 M +17.2 M
Output tokens 1.91 M 2.57 M +0.66 M
Cache read 477 M 702 M +225 M

Sessions under 128 MB show identical numbers before and after, as expected.

Test plan

  • npm run build clean
  • npm test — 31 files / 419 tests pass
  • node dist/cli.js report --period week --provider codex --format json runs to completion on a directory containing the 247 MB rollout
  • CODEBURN_VERBOSE=1 no longer prints skipped oversize file for that path
  • CODEBURN_VERBOSE=1 does still print the warning if a file ever exceeds 2 GB (synthetic test by temporarily lowering MAX_STREAM_SESSION_FILE_BYTES locally)
  • Smaller sessions still parse with identical results to the previous code path

Notes

  • I deliberately kept MAX_SESSION_FILE_BYTES and the readSessionFile / readSessionFileSync exports unchanged. Other providers may rely on them and the V8 string-limit reasoning still applies there. Only the streaming path is allowed to grow.
  • Happy to fold in a similar conversion for any other provider that's seeing the same warning, but didn't want to scope-creep this PR.

Heavy Codex users hit MAX_SESSION_FILE_BYTES (128 MB) on long-running
sessions. The file is read in full via readSessionFile and then split on
'\n', so even bumping the cap eventually runs into V8's 512 MB string
limit (split doubles the high-water mark).

readSessionLines is a streaming generator that already exists in
fs-utils for exactly this case but only readFirstLine was using it.
Switch the Codex provider to consume it and let the cap apply only when
streaming would still be unreasonable.

Changes:
- src/fs-utils.ts: introduce MAX_STREAM_SESSION_FILE_BYTES (2 GB) and
  apply it in readSessionLines instead of the full-read cap. Keep
  MAX_SESSION_FILE_BYTES for readSessionFile / readSessionFileSync
  consumers that materialize the whole file.
- src/providers/codex.ts: replace `readSessionFile -> split('\n')` with
  `for await (... of readSessionLines)`. Add sawAnyLine guard so a
  failed/empty stream skips cache write, preserving the previous
  early-return behavior.

Empirical impact on a real account with one 247 MB rollout: 7-day totals
went from 4,536 calls / €358.69 / 20.1M input tokens to 6,111 calls /
€550.67 / 37.3M input tokens. The previously-skipped session is now
included; no other behavior changes.

Refs getagentseal#204
Copy link
Copy Markdown
Member

@iamtoruk iamtoruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped fix. Streaming path is behaviorally equivalent for all practical cases — the two minor deltas (empty-file and partial-error caching) both favor the new code. 2 GB cap is numerically safe, fingerprint race is pre-existing and unchanged, memory improvement is substantial for the target use case. LGTM.

@iamtoruk iamtoruk merged commit ac8081b into getagentseal:main May 3, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants