Skip to content

perf(scan): rewrite walker hot path with sync I/O + manual string ops#805

Merged
BYK merged 1 commit intomainfrom
byk/perf/walker-optimize
Apr 21, 2026
Merged

perf(scan): rewrite walker hot path with sync I/O + manual string ops#805
BYK merged 1 commit intomainfrom
byk/perf/walker-optimize

Conversation

@BYK
Copy link
Copy Markdown
Member

@BYK BYK commented Apr 21, 2026

Summary

Walker was 411ms p50 for 10k files on the synthetic/large fixture — roughly 10× slower than a naive sync walk (50ms). Follow-up to PR #791 and #804.

Five targeted optimizations in the hot loop, all micro-benchmarked before landing:

Change Measured win End-to-end win
readdirSync instead of readdir 105→45ms (−60ms) dominant
statSync instead of Bun.file().size −10ms per 10k files
String concat for paths path.join 7ms → concat 0.7ms (10×)
abs.slice(cwdPrefixLen) for relPath path.relative 9ms → slice 0.8ms (11×)
Manual lastIndexOf-based extname 9ms → 7ms (25%)

Perf results (synthetic/large, 10k files, p50)

Op Before After Δ
scan.walk 411ms 231ms −44%
scan.walk.noExt 572ms 448ms −22%
scan.walk.dsnParity 228ms 138ms −39%
scanCodeForDsns 323ms 304ms −6%
detectAllDsns.cold 327ms 308ms −6%
detectAllDsns.warm 27.9ms 27.0ms
scan.grepFiles 322ms 316ms noise
scanCodeForFirstDsn 2.23ms 2.05ms

Walker ops are 22-44% faster. Downstream ops (grep, DSN scanner) benefit less because their time is dominated by content scanning, not walking — but still show consistent ~6% improvements.

Why readdirSync is safe here

The sync vs async tradeoff usually favors async because blocking the event loop is bad in general. But measured per-call cost matters:

Metric readdirSync on 3635 dirs
p50 11µs
p95 24µs
p99 35µs
max 65µs

65µs max block is trivial — setTimeout(0) latency in Node is ~4ms. Blocking for 65µs never causes noticeable event-loop pauses. And we pay ~60µs of microtask overhead for each async readdir, which wipes out any theoretical fairness benefit. Net: 2-3× faster per-dir on walks with many small directories, which is every realistic CLI workload.

If this ever matters for a weird embedded use case, the optimization is trivially reversible — the walker's public API is unchanged.

mtime parity fix

Initial implementation regressed detectAllDsns.warm from 28ms → 304ms because statSync().mtimeMs is a float (e.g. 1776790602458.1033) while Bun.file().lastModified is already an integer. The DSN cache validator compares floored sourceMtimes, so un-floored floats caused cache misses on every warm call. Fixed by flooring explicitly in tryYieldFile — matches the same treatment already applied to onDirectoryVisit's dirMtimes.

Walker v2 design notes

  • Sync readdir + async yield. Readdir blocks briefly (~65µs max), but the surrounding async-generator interface is preserved. Library consumers that insert async work between iterations still get cooperative scheduling.
  • Path ops inlined. path.join / path.relative / path.extname + toLowerCase are all replaced with manual string ops in the hot loop. On Windows, normalizePath is still applied via replaceAll(NATIVE_SEP, "/") — the POSIX fast path uses a cached POSIX_NATIVE module constant.
  • WalkContext.cwdPrefixLen precomputed once per walk, used to slice relative paths from absolute paths.
  • Cached module constants NATIVE_SEP and POSIX_NATIVE avoid per-call path.sep property lookups in the hot loop.

What this PR does NOT change

  • WalkEntry shape: preserved identically (absolutePath, relativePath, size, mtime, isBinary, depth).
  • WalkOptions contract: no new options, no semantics changes.
  • Symlink handling, followSymlinks, cycle detection via visitedInodes: unchanged.
  • IgnoreStack + gitignore semantics: unchanged.
  • onDirectoryVisit, recordMtimes, abortSignal: unchanged.

Test plan

  • bunx tsc --noEmit — clean
  • bun run lint — clean (1 pre-existing warning in src/lib/formatters/markdown.ts unrelated)
  • bun test --timeout 15000 test/lib test/commands test/types5640 pass, 0 fail
  • bun test test/isolated — 138 pass
  • bun test test/lib/scan/walker.test.ts — 34 pass (incl. property tests covering hidden files, symlinks, maxDepth, gitignore interaction)
  • bun test test/lib/dsn/code-scanner.test.ts — 52 pass (incl. dirMtimes / sourceMtimes cache validation)
  • DSN count correctness verified end-to-end: 4 DSNs found on fixture (matches pre-change count)
  • bun run bench --size large --runs 5 --warmup 2 — results in table above

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.7 (1M context) noreply@anthropic.com

Walker was 411ms p50 for 10k files on synthetic/large fixture —
roughly 10× slower than a naive sync walk (50ms). Profiling showed
the overhead split roughly as: async readdir 167ms, per-file stat
via Bun.file().size 44ms, IgnoreStack.isIgnored 65ms, path
operations (join/relative/extname) 30ms, and small control-flow
overhead.

Five targeted optimizations in the hot loop, all micro-benchmarked
before landing:

1. **Sync readdir** — `readdirSync` instead of `readdir`. Per-call
   cost measured p50 11µs / p95 24µs / max 65µs over 3635 dirs in
   the fixture. Blocking the event loop for max 65µs is trivially
   safe (setTimeout(0) latency is ~4ms) AND avoids the ~60µs
   microtask overhead each async readdir incurs. Net: 105→45ms.

2. **statSync instead of Bun.file().size** — measured ~15% faster
   per call (30ms vs 36ms for 10k files). When `recordMtimes: true`,
   the same statSync result serves both size and mtime reads — one
   syscall instead of two. Net: ~10ms saved.

3. **String concat for paths** — `frame.absDir + NATIVE_SEP + entry.name`
   instead of `path.join(...)`. Inputs are already clean (absDir is
   absolute without trailing slash, name is a pure basename per
   dirent semantics), so the normalization path.join does is wasted
   work. Measured 10× faster (7ms vs 0.7ms for 13k calls).

4. **slice for relativePath** — `abs.slice(cwdPrefixLen)` instead of
   `path.relative(cwd, abs)`. Safe because every abs is guaranteed
   under cwd by construction. Measured 11× faster (9ms vs 0.8ms).
   Windows: `normalizePath` fallback via `replaceAll(NATIVE_SEP, "/")`.

5. **Manual extname** — `name.lastIndexOf(".")` + slice + toLowerCase
   instead of `path.extname(name).toLowerCase()`. Measured 25% faster
   (9ms vs 7ms for 13k calls).

Also: precompute `cwdPrefixLen` once on `WalkContext` instead of
recomputing `cfg.cwd.length + 1` per entry. Cache `path.sep` /
`POSIX_NATIVE` at module scope to avoid per-call property lookups.

## mtime parity fix

Initial implementation regressed `detectAllDsns.warm` from 28ms →
304ms because `statSync().mtimeMs` is a float (e.g.
`1776790602458.1033`) while `Bun.file().lastModified` is already an
integer. The DSN cache validator compares floored `sourceMtimes`,
so un-floored floats caused cache misses on every warm call. Fixed
by flooring explicitly in `tryYieldFile` — matches the same
treatment already applied to `onDirectoryVisit`'s dirMtimes.

## Perf (synthetic/large, 10k files, p50)

| Op | Before | After | Δ |
|---|---:|---:|---:|
| `scan.walk` | 411ms | **231ms** | **−44%** |
| `scan.walk.noExt` | 572ms | 448ms | **−22%** |
| `scan.walk.dsnParity` | 228ms | **138ms** | **−39%** |
| `scanCodeForDsns` | 323ms | 304ms | −6% |
| `detectAllDsns.cold` | 327ms | 308ms | −6% |
| `detectAllDsns.warm` | 27.9ms | 27.0ms | — |
| `scan.grepFiles` | 322ms | 316ms | noise |

Walker ops are 22-44% faster. Downstream ops (grep, DSN scanner)
benefit less because their time is dominated by content scanning,
not walking — but still show consistent ~6% improvements.

## Test plan

- [x] `bunx tsc --noEmit` — clean
- [x] `bun run lint` — clean (1 pre-existing warning unrelated)
- [x] `bun test --timeout 15000 test/lib test/commands test/types` —
      **5640 pass, 0 fail**
- [x] `bun test test/isolated` — 138 pass
- [x] `bun test test/lib/scan/walker.test.ts` — 34 pass (incl.
      property tests covering hidden files, symlinks, maxDepth,
      gitignore interaction)
- [x] `bun test test/lib/dsn/code-scanner.test.ts` — 52 pass (incl.
      dirMtimes / sourceMtimes cache validation)
- [x] DSN count correctness verified end-to-end: 4 DSNs found on
      fixture (matches pre-change count)
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-805/

Built to branch gh-pages at 2026-04-21 20:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@github-actions
Copy link
Copy Markdown
Contributor

Codecov Results 📊

138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric Change
Total Tests
Passed Tests
Failed Tests
Skipped Tests

✨ No test changes detected

All tests are passing successfully.

✅ Patch coverage is 85.71%. Project has 1772 uncovered lines.
❌ Project coverage is 95.62%. Comparing base (base) to head (head).

Files with missing lines (1)
File Patch % Lines
src/lib/scan/walker.ts 85.71% ⚠️ 5 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    95.63%    95.62%    -0.01%
==========================================
  Files          281       281         —
  Lines        40442     40463       +21
  Branches         0         0         —
==========================================
+ Hits         38676     38691       +15
- Misses        1766      1772        +6
- Partials         0         0         —

Generated by Codecov Action

@BYK BYK merged commit 7c4f082 into main Apr 21, 2026
24 checks passed
@BYK BYK deleted the byk/perf/walker-optimize branch April 21, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant