test: add Codex JSONL shape benchmark by hhh2210 · Pull Request #1017 · steipete/CodexBar

hhh2210 · 2026-05-17T18:27:39Z

This adds an RFC benchmark/survey harness for the Codex JSONL scanner workload:

CostUsageJsonlShapeBenchmarkTests builds a synthetic JSONL file shaped after a local 30-day Codex sessions sample, then compares the current streaming scanner with a front-buffer baseline.
Scripts/cost_jsonl_shape_survey.swift lets maintainers collect the same aggregate shape numbers locally without printing prompts, tool payloads, model values, file paths, or raw JSONL rows.
A small end-to-end parser assertion covers the relevant attribution shape: a turn_context row followed by token_count rows that do not carry an explicit model.

The benchmark intentionally does not assert a speed threshold. It is meant to make scanner tradeoffs reproducible without making CI sensitive to runner noise. The long turn_context correctness fix is separate in #1014; this PR only provides measurement and workload-shape coverage for #1016.

Local aggregate survey from my default Codex sessions over 30 days:

files: 532
total bytes: 1202851897
lines: 146748
relevant Codex scanner lines: 57418
line length p50/p90/p95/p99/max: 651 / 5065 / 12080 / 52701 / 6904210 bytes
lines > 32 KiB: 2602
lines > 256 KiB: 697
turn_context lines: 1945
turn_context lines > 32 KiB: 208
turn_context lines > 256 KiB: 0
turn_context model offset p50/p95/max: 443 / 2255 / 2382 bytes
turn_context model offset < 32 KiB: 1945 / 1945
turn_context model offset < 256 KiB: 1945 / 1945
token_count rows missing an explicit model: 22381 / 22381

Local benchmark sample:

Codex JSONL shape benchmark: divisor=20 bytes=19621792 lines=7290 truncated=129 current=113.9MB/s baseline=46.9MB/s speedup=2.4x

Validation:

swift test --filter CostUsageJsonlShapeBenchmarkTests
Scripts/cost_jsonl_shape_survey.swift --days 30
./Scripts/lint.sh lint
git diff --check

Copilot

Pull request overview

Adds a reproducible benchmark + local survey harness to characterize Codex JSONL (NDJSON) scanner workload shape and compare the current streaming scanner against a front-buffer baseline, plus a small end-to-end attribution assertion for model-less token_count rows.

Changes:

Add CostUsageJsonlShapeBenchmarkTests to generate a synthetic JSONL fixture matching a 30-day Codex session “shape” and benchmark scanning throughput vs a baseline implementation.
Add Scripts/cost_jsonl_shape_survey.swift to scan local Codex session logs and print aggregate shape stats while redacting sensitive content.
Add a parser assertion ensuring turn_context model attribution is applied to subsequent token_count rows that omit an explicit model.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
Tests/CodexBarTests/CostUsageJsonlShapeBenchmarkTests.swift	New synthetic shape fixture + benchmark harness + attribution regression assertion.
Scripts/cost_jsonl_shape_survey.swift	New standalone Swift script to survey local JSONL shape metrics with redacted output.

Comments suppressed due to low confidence (1)

Scripts/cost_jsonl_shape_survey.swift:288

The error paths also call Foundation.exit(...) and use stderr; both typically come from Darwin/Glibc rather than Foundation. Consider importing the appropriate C module (or using FileHandle.standardError) and calling exit(...) directly so the script compiles cross-platform.

} catch let error as UsageError {
    fputs("error: \(error.description)\n\n", stderr)
    printUsage()
    Foundation.exit(2)
} catch {
    fputs("error: \(error.localizedDescription)\n", stderr)
    Foundation.exit(1)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        case "--help", "-h":
+            printUsage()
+            Foundation.exit(0)
+        default:


+
+func printSummary(_ survey: Survey, options: SurveyOptions) {
+    print("root: \(redactedRootDescription(options.root))")
+    print("window days: \(Int(options.days))")


+private struct CodexJsonlShapePlan {
+    static let localThirtyDaySample = CodexJsonlShapePlan(
+        totalLines: 145_797,
+        relevantLines: 57063,
+        tokenCountWithoutModelLines: 22235,
+        turnContextLines: 1935,
+        longTurnContextLines: 207,
+        linesOver32KiB: 2584,
+        linesOver256KiB: 697)


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10366add30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T18:32:40Z

+        let isRelevant = line.contains(Marker.eventMessage)
+            || line.contains(Marker.turnContext)
+            || line.contains(Marker.sessionMetadata)


Restrict relevant-line counting to scanned event subtypes

The survey currently increments relevantLines for every "type":"event_msg" row, but the actual Codex scanner only keeps event_msg rows when they contain token_count or task_started (Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift lines 805–809). As a result, the reported relevant Codex scanner lines metric is inflated whenever session logs contain other event messages, which skews the shape numbers this benchmark is meant to reproduce.

Useful? React with 👍 / 👎.

Add Codex JSONL shape benchmark

10366ad

hhh2210 mentioned this pull request May 17, 2026

Explore faster local JSONL scanning for cost history #1016

Open

hhh2210 marked this pull request as ready for review May 17, 2026 18:28

Copilot AI review requested due to automatic review settings May 17, 2026 18:28

Copilot started reviewing on behalf of hhh2210 May 17, 2026 18:29 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

steipete merged commit dd2b3cc into steipete:main May 18, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add Codex JSONL shape benchmark#1017

test: add Codex JSONL shape benchmark#1017
steipete merged 1 commit into
steipete:mainfrom
hhh2210:codex/jsonl-shape-benchmark

hhh2210 commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hhh2210 commented May 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants