perf(scan): rewrite walker hot path with sync I/O + manual string ops#805
Merged
perf(scan): rewrite walker hot path with sync I/O + manual string ops#805
Conversation
Walker was 411ms p50 for 10k files on synthetic/large fixture —
roughly 10× slower than a naive sync walk (50ms). Profiling showed
the overhead split roughly as: async readdir 167ms, per-file stat
via Bun.file().size 44ms, IgnoreStack.isIgnored 65ms, path
operations (join/relative/extname) 30ms, and small control-flow
overhead.
Five targeted optimizations in the hot loop, all micro-benchmarked
before landing:
1. **Sync readdir** — `readdirSync` instead of `readdir`. Per-call
cost measured p50 11µs / p95 24µs / max 65µs over 3635 dirs in
the fixture. Blocking the event loop for max 65µs is trivially
safe (setTimeout(0) latency is ~4ms) AND avoids the ~60µs
microtask overhead each async readdir incurs. Net: 105→45ms.
2. **statSync instead of Bun.file().size** — measured ~15% faster
per call (30ms vs 36ms for 10k files). When `recordMtimes: true`,
the same statSync result serves both size and mtime reads — one
syscall instead of two. Net: ~10ms saved.
3. **String concat for paths** — `frame.absDir + NATIVE_SEP + entry.name`
instead of `path.join(...)`. Inputs are already clean (absDir is
absolute without trailing slash, name is a pure basename per
dirent semantics), so the normalization path.join does is wasted
work. Measured 10× faster (7ms vs 0.7ms for 13k calls).
4. **slice for relativePath** — `abs.slice(cwdPrefixLen)` instead of
`path.relative(cwd, abs)`. Safe because every abs is guaranteed
under cwd by construction. Measured 11× faster (9ms vs 0.8ms).
Windows: `normalizePath` fallback via `replaceAll(NATIVE_SEP, "/")`.
5. **Manual extname** — `name.lastIndexOf(".")` + slice + toLowerCase
instead of `path.extname(name).toLowerCase()`. Measured 25% faster
(9ms vs 7ms for 13k calls).
Also: precompute `cwdPrefixLen` once on `WalkContext` instead of
recomputing `cfg.cwd.length + 1` per entry. Cache `path.sep` /
`POSIX_NATIVE` at module scope to avoid per-call property lookups.
## mtime parity fix
Initial implementation regressed `detectAllDsns.warm` from 28ms →
304ms because `statSync().mtimeMs` is a float (e.g.
`1776790602458.1033`) while `Bun.file().lastModified` is already an
integer. The DSN cache validator compares floored `sourceMtimes`,
so un-floored floats caused cache misses on every warm call. Fixed
by flooring explicitly in `tryYieldFile` — matches the same
treatment already applied to `onDirectoryVisit`'s dirMtimes.
## Perf (synthetic/large, 10k files, p50)
| Op | Before | After | Δ |
|---|---:|---:|---:|
| `scan.walk` | 411ms | **231ms** | **−44%** |
| `scan.walk.noExt` | 572ms | 448ms | **−22%** |
| `scan.walk.dsnParity` | 228ms | **138ms** | **−39%** |
| `scanCodeForDsns` | 323ms | 304ms | −6% |
| `detectAllDsns.cold` | 327ms | 308ms | −6% |
| `detectAllDsns.warm` | 27.9ms | 27.0ms | — |
| `scan.grepFiles` | 322ms | 316ms | noise |
Walker ops are 22-44% faster. Downstream ops (grep, DSN scanner)
benefit less because their time is dominated by content scanning,
not walking — but still show consistent ~6% improvements.
## Test plan
- [x] `bunx tsc --noEmit` — clean
- [x] `bun run lint` — clean (1 pre-existing warning unrelated)
- [x] `bun test --timeout 15000 test/lib test/commands test/types` —
**5640 pass, 0 fail**
- [x] `bun test test/isolated` — 138 pass
- [x] `bun test test/lib/scan/walker.test.ts` — 34 pass (incl.
property tests covering hidden files, symlinks, maxDepth,
gitignore interaction)
- [x] `bun test test/lib/dsn/code-scanner.test.ts` — 52 pass (incl.
dirMtimes / sourceMtimes cache validation)
- [x] DSN count correctness verified end-to-end: 4 DSNs found on
fixture (matches pre-change count)
Contributor
|
Contributor
Codecov Results 📊✅ 138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ✅ Patch coverage is 85.71%. Project has 1772 uncovered lines. Files with missing lines (1)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
- Coverage 95.63% 95.62% -0.01%
==========================================
Files 281 281 —
Lines 40442 40463 +21
Branches 0 0 —
==========================================
+ Hits 38676 38691 +15
- Misses 1766 1772 +6
- Partials 0 0 —Generated by Codecov Action |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Walker was 411ms p50 for 10k files on the synthetic/large fixture — roughly 10× slower than a naive sync walk (50ms). Follow-up to PR #791 and #804.
Five targeted optimizations in the hot loop, all micro-benchmarked before landing:
readdirSyncinstead ofreaddirstatSyncinstead ofBun.file().sizeabs.slice(cwdPrefixLen)for relPathlastIndexOf-based extnamePerf results (synthetic/large, 10k files, p50)
scan.walkscan.walk.noExtscan.walk.dsnParityscanCodeForDsnsdetectAllDsns.colddetectAllDsns.warmscan.grepFilesscanCodeForFirstDsnWalker ops are 22-44% faster. Downstream ops (grep, DSN scanner) benefit less because their time is dominated by content scanning, not walking — but still show consistent ~6% improvements.
Why
readdirSyncis safe hereThe sync vs async tradeoff usually favors async because blocking the event loop is bad in general. But measured per-call cost matters:
readdirSyncon 3635 dirs65µs max block is trivial —
setTimeout(0)latency in Node is ~4ms. Blocking for 65µs never causes noticeable event-loop pauses. And we pay ~60µs of microtask overhead for each async readdir, which wipes out any theoretical fairness benefit. Net: 2-3× faster per-dir on walks with many small directories, which is every realistic CLI workload.If this ever matters for a weird embedded use case, the optimization is trivially reversible — the walker's public API is unchanged.
mtime parity fix
Initial implementation regressed
detectAllDsns.warmfrom 28ms → 304ms becausestatSync().mtimeMsis a float (e.g.1776790602458.1033) whileBun.file().lastModifiedis already an integer. The DSN cache validator compares flooredsourceMtimes, so un-floored floats caused cache misses on every warm call. Fixed by flooring explicitly intryYieldFile— matches the same treatment already applied toonDirectoryVisit's dirMtimes.Walker v2 design notes
path.join/path.relative/path.extname+toLowerCaseare all replaced with manual string ops in the hot loop. On Windows,normalizePathis still applied viareplaceAll(NATIVE_SEP, "/")— the POSIX fast path uses a cachedPOSIX_NATIVEmodule constant.WalkContext.cwdPrefixLenprecomputed once per walk, used to slice relative paths from absolute paths.NATIVE_SEPandPOSIX_NATIVEavoid per-callpath.sepproperty lookups in the hot loop.What this PR does NOT change
WalkEntryshape: preserved identically (absolutePath, relativePath, size, mtime, isBinary, depth).WalkOptionscontract: no new options, no semantics changes.followSymlinks, cycle detection viavisitedInodes: unchanged.onDirectoryVisit,recordMtimes,abortSignal: unchanged.Test plan
bunx tsc --noEmit— cleanbun run lint— clean (1 pre-existing warning insrc/lib/formatters/markdown.tsunrelated)bun test --timeout 15000 test/lib test/commands test/types— 5640 pass, 0 failbun test test/isolated— 138 passbun test test/lib/scan/walker.test.ts— 34 pass (incl. property tests covering hidden files, symlinks, maxDepth, gitignore interaction)bun test test/lib/dsn/code-scanner.test.ts— 52 pass (incl. dirMtimes / sourceMtimes cache validation)bun run bench --size large --runs 5 --warmup 2— results in table above🤖 Generated with Claude Code
Co-authored-by: Claude Opus 4.7 (1M context) noreply@anthropic.com