Stream-parse Codex session files to fix oversize-cap drops on heavy users by ozymandiashh · Pull Request #207 · getagentseal/codeburn

ozymandiashh · 2026-05-03T23:15:43Z

Context

Follow-up to #204. After installing 0.9.6 from Homebrew, both fixes from that issue (16 KB read cap + info: null estimation) work as advertised on my account. However, while testing with --verbose I noticed that one of my recent rollout files was being silently dropped:

codeburn: skipped oversize file /Users/.../sessions/2026/05/03/rollout-...jsonl
  (259470209 bytes > cap 134217728)

The session is 247 MB in a single .jsonl file. MAX_SESSION_FILE_BYTES = 128 MB in src/fs-utils.ts rejects it before any parsing happens, so the entire session disappears from the dashboard with no on-screen indication unless the user explicitly passes --verbose.

Heavy users on long-running Codex sessions (large file contents in context, multi-hour coding pushes) hit this in practice.

Why bumping the cap isn't enough

The Codex provider reads the file in full via readSessionFile and then does content.split('\n'). split roughly doubles the high-water memory while the new array of line strings is being built, so even with readViaStream keeping the read itself bounded, we'd push V8 toward its ~512 MB string limit on a single 250 MB session.

readSessionLines already exists in fs-utils.ts as the streaming counterpart and is used by readFirstLine. Memory there is bounded to one line at a time, which is what we want for full-file parsing too.

Changes

src/fs-utils.ts — introduce MAX_STREAM_SESSION_FILE_BYTES = 2 GB and apply it in readSessionLines instead of the full-read cap. The smaller MAX_SESSION_FILE_BYTES (128 MB) stays in place for the two consumers that materialize the whole file (readSessionFile, readSessionFileSync), where the V8 string limit is still a real constraint.

src/providers/codex.ts — replace

const content = await readSessionFile(source.path)
if (content === null) return
const lines = content.split('\n').filter(l => l.trim())
for (const line of lines) { ... }

with

let sawAnyLine = false
for await (const rawLine of readSessionLines(source.path)) {
  sawAnyLine = true
  const line = rawLine.trim()
  if (!line) continue
  ...
}
if (!sawAnyLine) return  // preserves early-return on read failure

The sawAnyLine guard means a failed/oversized/empty stream still skips the cache write, so a transient read failure can't pin an empty result set against a fingerprint that would otherwise be re-parsed on the next run.

Empirical impact

On my account with one 247 MB rollout file in the 7-day window:

	Before (0.9.6 stock)	After this PR	Δ
Cost	€358.69	€550.67	+€191.98
Calls	4,536	6,111	+1,575
Sessions	61	62	+1
Input tokens	20.1 M	37.3 M	+17.2 M
Output tokens	1.91 M	2.57 M	+0.66 M
Cache read	477 M	702 M	+225 M

Sessions under 128 MB show identical numbers before and after, as expected.

Test plan

npm run build clean
npm test — 31 files / 419 tests pass
node dist/cli.js report --period week --provider codex --format json runs to completion on a directory containing the 247 MB rollout
CODEBURN_VERBOSE=1 no longer prints skipped oversize file for that path
CODEBURN_VERBOSE=1 does still print the warning if a file ever exceeds 2 GB (synthetic test by temporarily lowering MAX_STREAM_SESSION_FILE_BYTES locally)
Smaller sessions still parse with identical results to the previous code path

Notes

I deliberately kept MAX_SESSION_FILE_BYTES and the readSessionFile / readSessionFileSync exports unchanged. Other providers may rely on them and the V8 string-limit reasoning still applies there. Only the streaming path is allowed to grow.
Happy to fold in a similar conversion for any other provider that's seeing the same warning, but didn't want to scope-creep this PR.

Heavy Codex users hit MAX_SESSION_FILE_BYTES (128 MB) on long-running sessions. The file is read in full via readSessionFile and then split on '\n', so even bumping the cap eventually runs into V8's 512 MB string limit (split doubles the high-water mark). readSessionLines is a streaming generator that already exists in fs-utils for exactly this case but only readFirstLine was using it. Switch the Codex provider to consume it and let the cap apply only when streaming would still be unreasonable. Changes: - src/fs-utils.ts: introduce MAX_STREAM_SESSION_FILE_BYTES (2 GB) and apply it in readSessionLines instead of the full-read cap. Keep MAX_SESSION_FILE_BYTES for readSessionFile / readSessionFileSync consumers that materialize the whole file. - src/providers/codex.ts: replace `readSessionFile -> split('\n')` with `for await (... of readSessionLines)`. Add sawAnyLine guard so a failed/empty stream skips cache write, preserving the previous early-return behavior. Empirical impact on a real account with one 247 MB rollout: 7-day totals went from 4,536 calls / €358.69 / 20.1M input tokens to 6,111 calls / €550.67 / 37.3M input tokens. The previously-skipped session is now included; no other behavior changes. Refs getagentseal#204

iamtoruk

Clean, well-scoped fix. Streaming path is behaviorally equivalent for all practical cases — the two minor deltas (empty-file and partial-error caching) both favor the new code. 2 GB cap is numerically safe, fingerprint race is pre-existing and unchanged, memory improvement is substantial for the target use case. LGTM.

ozymandiashh mentioned this pull request May 3, 2026

Codex sessions on ChatGPT Plus/Pro auth show 0 calls — two bugs (16KB read cap + null token info) #204

Closed

iamtoruk approved these changes May 3, 2026

View reviewed changes

iamtoruk merged commit ac8081b into getagentseal:main May 3, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream-parse Codex session files to fix oversize-cap drops on heavy users#207

Stream-parse Codex session files to fix oversize-cap drops on heavy users#207
iamtoruk merged 1 commit intogetagentseal:mainfrom
ozymandiashh:fix/codex-stream-large-sessions

ozymandiashh commented May 3, 2026

Uh oh!

iamtoruk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ozymandiashh commented May 3, 2026

Context

Why bumping the cap isn't enough

Changes

Empirical impact

Test plan

Notes

Uh oh!

iamtoruk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants