perf: SIMD writeEscaped + telemetry early exit + bench Heisenbench fix#300
perf: SIMD writeEscaped + telemetry early exit + bench Heisenbench fix#300
Conversation
… handleCall The 3-block response assembler was calling the local scalar writeEscaped (1-byte inner loop) for summary, raw data, and guidance. Switch to mcpj.writeEscaped (mcp-zig SIMD, 16-byte vectors) which is already imported — meaningful gain on large tool outputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
recordToolCall called getRssKb() unconditionally before passing to record(), which guards on self.enabled. Move the early-exit to recordToolCall so --no-telemetry users pay zero syscall overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BenchContext.runToolCall previously returned usize (response bytes), so runCase timed the entire call including telem.recordToolCall() and response assembly. With the telemetry early-exit added in 2ad595e, the telem.recordToolCall() on disabled runs shrank from ~25µs (getrusage syscall) to ~0µs, making status appear 35% slower — a Heisenbench. Fix: runToolCall now returns struct { dispatch_ns, response_bytes }. runCase accumulates dispatch_ns (the inner nanoTimestamp delta around dispatch() only). The outer wall-clock timer is removed entirely. Also: BenchContext.runToolCall now uses mcpj.writeEscaped (SIMD) for the bench path, matching handleCall. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef54b6167b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| result.appendSlice(alloc, if (is_error) "],\"isError\":true}" else "],\"isError\":false}") catch return result.items.len; | ||
| return result.items.len; | ||
| result.appendSlice(alloc, if (is_error) "],\"isError\":true}" else "],\"isError\":false}") catch return .{ .dispatch_ns = @intCast(elapsed), .response_bytes = result.items.len }; | ||
| return .{ .dispatch_ns = @intCast(elapsed), .response_bytes = result.items.len }; |
There was a problem hiding this comment.
Switch dispatch timing to monotonic nanoseconds
runToolCall now returns dispatch_ns via @intCast(elapsed), but elapsed is derived from cio.nanoTimestamp() (wall-clock CLOCK_REALTIME) and can go negative or jump during NTP/manual clock adjustments; in that case the cast to u64 traps and aborts zig build bench, and even without a trap it can skew benchmark latencies well beyond the 10% threshold. This was introduced when runCase started trusting dispatch_ns instead of its previous monotonic timer path.
Useful? React with 👍 / 👎.
Summary
Three targeted perf/fix commits cherry-picked cleanly onto current main.
writeEscapedinhandleCall: replaces scalar per-byte loop withmcpj.writeEscaped(16-byte SIMD batches from mcp-zig) in the hot response-assembly path andBenchContext.runToolCall. -3 to -8% on tree/outline.recordToolCallreturns immediately when telemetry is disabled, skipping thegetrusagesyscall. Zero cost for--no-telemetryusers.BenchContext.runToolCallpreviously returnedusize, sorunCasetimed the full call includingtelem.recordToolCall(). When the telemetry early exit removed thegetrusagefrom the timed window,codedb_statusappeared 35% slower. Fixed by returningstruct { dispatch_ns, response_bytes }and accumulating only the inner dispatch timer.Benchmark (dispatch-only, 22-file corpus, 100 iters, current main baseline)
codedb_treecodedb_outlinecodedb_searchcodedb_bundlecodedb_statuszig buildandzig build benchboth clean. No regressions.Test plan
zig buildcompiles clean on macOS arm64zig build benchruns and outputs consistent latency table🤖 Generated with Claude Code