Skip to content

perf: unify cache/result path into hashed ResolverPath#231

Closed
stormslowly wants to merge 6 commits into
mainfrom
perf/hashed-resolver-path
Closed

perf: unify cache/result path into hashed ResolverPath#231
stormslowly wants to merge 6 commits into
mainfrom
perf/hashed-resolver-path

Conversation

@stormslowly
Copy link
Copy Markdown
Collaborator

Why

Resolver consumers (rspack) re-hash the returned PathBuf for their own module-id maps. Today every resolver result pays a path hash twice — once inside the cache, once downstream. The internal cache also threads paths through OsStr-backed Path::hash, which on Windows walks per component.

This PR unifies the internal CachedPath and the public Resolution.path: PathBuf into a single ResolverPath:

  • carries a precomputed FxHasher u64 over the path bytes
  • holds the path as Box<str> (UTF-8)
  • cheap Arc clone, same single-allocation footprint as the old Arc<CachedPathImpl>
  • exposed off Resolution via resolver_path() / into_resolver_path()

Downstream can plug the prehash straight into an FxHashMap without re-walking the bytes.

What

  • ResolverPath replaces CachedPath everywhere in the resolver and is the type carried by Resolution.
  • New public accessors Resolution::resolver_path / into_resolver_path. Existing path() / into_path_buf() / full_path() / query() / fragment() / package_json() keep their signatures — no public input API change.
  • Cache::value now returns Result<ResolverPath, ResolveError>. New error variant ResolveError::PathNotUtf8(PathBuf) is raised at the boundary instead of the previous path_to_str panic.
  • Internal hot-path joins use join_str / parent_str on &str, skipping OsStr ↔ str conversions and the per-component walk in Path::hash (the win is most visible on Windows; Unix is at parity with perf(cache): hash CachedPath by raw bytes on unix #226).

Before / after

// before
let res = resolver.resolve(dir, spec).await?;
let mut h = FxHasher::default();
h.write(res.path().as_os_str().as_bytes()); // second walk over the bytes
let hash = h.finish();

// after
let res = resolver.resolve(dir, spec).await?;
let hash = res.resolver_path().hash(); // already computed inside the cache

Tests

  • 144 lib + 9 cache contract + 7 integration tests all pass.
  • New tests: ResolverPath::hash matches FxHasher::write(bytes), equal lookups share Arc, parent walk matches direct lookup, non-UTF-8 input returns PathNotUtf8.
  • napi build (cargo check -p rspack_napi_resolver) clean.

Tiny string-slicing parent walker and Path::join-compatible joiner.
Used by the upcoming ResolverPath cache that operates on `&str` instead
of `&Path` to skip OsStr↔str conversions on the hot path.
Surfaced from Cache::value when the input path is not valid UTF-8. The
new ResolverPath stores paths as Box<str>, and the cache validates at
the boundary instead of carrying a slow fallback for non-UTF8 paths.
ResolverPath is the unified path value used by both the internal cache
and the public Resolution result. It carries:

- a precomputed FxHasher u64 over the path bytes
- the path as Box<str> (UTF-8)
- the parent ResolverPath (cheap Arc clone)
- the existing OnceCell-cached fs metadata / package.json fields

Parent walks now slice &str via parent_str; module joins use join_str.
Lookups skip OsStr↔str conversions on the hot path, and downstream
consumers (rspack) can read the prehash directly off the returned
ResolverPath instead of re-hashing the PathBuf.

Cache::value gains a Result return: non-UTF8 input now surfaces
ResolveError::PathNotUtf8 instead of panicking inside path_to_str.
Resolution.path is now a ResolverPath instead of PathBuf. The existing
accessors (path / into_path_buf / full_path / query / fragment /
package_json) keep their signatures.

New accessors give the caller the underlying ResolverPath:

  pub fn resolver_path(&self) -> &ResolverPath
  pub fn into_resolver_path(self) -> ResolverPath

Downstream consumers can read .hash() directly off the result and reuse
it as the FxHashMap key hash, avoiding the second walk over the path
bytes that today every resolver result pays.
Copilot AI review requested due to automatic review settings May 21, 2026 15:09
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de4faa7e16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/path.rs
Comment on lines +56 to +57
let trimmed = trim_one_trailing_sep(s);
let idx = last_sep_index(trimmed)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve relative parent hop in parent_str

parent_str now short-circuits to None for single-component relative paths because there is no separator, but the previous Path::parent() behavior returned Some(""). That removes one ancestor step from the cache parent chain, so load_node_modules's successors(..., p.parent()) walk stops at "a" instead of continuing to ""; with a relative context (e.g. resolve("a", "pkg")), ./node_modules/pkg is no longer considered.

Useful? React with 👍 / 👎.

Comment thread src/lib.rs
if let Some(specifier) = specifier.strip_prefix(SLASH_START) {
for root in &self.options.roots {
let cached_path = self.cache.value(root);
let cached_path = self.cache.value(root).ok()?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue scanning roots after UTF-8 conversion failures

load_roots currently uses self.cache.value(root).ok()?, which converts PathNotUtf8 into None and exits the whole function on the first invalid root. That means one non-UTF-8 options.roots entry prevents all later valid roots from being checked, producing an unexpected miss (and hiding the actual path encoding error) instead of continuing the loop.

Useful? React with 👍 / 👎.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 21, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing perf/hashed-resolver-path (bb5e476) with main (c8af902)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR unifies the resolver’s internal cached path representation and the public Resolution path into a single ResolverPath that carries a precomputed FxHasher hash plus a UTF-8 string path, aiming to eliminate downstream re-hashing and reduce hot-path overhead (notably on Windows).

Changes:

  • Introduces ResolverPath (hashed, UTF-8, Arc-backed) and uses it across cache and resolution results.
  • Makes Cache::value fallible (Result<ResolverPath, ResolveError>) and adds ResolveError::PathNotUtf8 instead of panicking on non-UTF-8 paths.
  • Adds Resolution::{resolver_path, into_resolver_path} while keeping existing Resolution path APIs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/cache.rs Replaces cached path type with ResolverPath, adds UTF-8 validation + precomputed hashing, updates cache keying and adds related tests.
src/lib.rs Threads ResolverPath through resolver flow and updates cache lookups to handle fallible Cache::value.
src/resolution.rs Stores ResolverPath inside Resolution, adds accessors for downstream hash reuse, and updates tests.
src/path.rs Adds parent_str/join_str helpers to avoid repeated OsStr/Path conversions on hot paths and adds unit tests.
src/error.rs Adds ResolveError::PathNotUtf8 and a display test for the new error variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/path.rs
Comment on lines +14 to +18
/// Strip a single trailing path separator (platform-aware). Used so that
/// `parent_str("/a/b/")` walks to `"/a"` instead of `"/a/b"`.
#[inline]
fn trim_one_trailing_sep(s: &str) -> &str {
#[cfg(windows)]
Comment thread src/lib.rs
if let Some(specifier) = specifier.strip_prefix(SLASH_START) {
for root in &self.options.roots {
let cached_path = self.cache.value(root);
let cached_path = self.cache.value(root).ok()?;
Comment thread src/resolution.rs
Comment on lines +47 to 50
/// Returns the path without query and fragment.
pub fn into_path_buf(self) -> PathBuf {
self.path
self.path.path().to_path_buf()
}
resolve_impl previously did:

  let path_buf = self.load_realpath(&cached_path).await?;  // PathBuf
  let path     = self.cache.value(&path_buf)?;              // extra cache trip

That second cache.value runs for every resolve, even when realpath()
returned the input bytes unchanged (the common case: no symlink in
chain). It costs one FxHasher walk + DashSet lookup + a fresh
Arc<ResolverPathInner> insert per result file. With `clear_cache()`
called once per bench iter and 1000 deps resolved per iter, that adds
up to the 4.5% / 3.5% Simulation regressions and the 22% memory bump
CodSpeed flagged on this PR.

Make load_realpath return ResolverPath and short-circuit:

- options.symlinks = false  → reuse cached_path (pure Arc bump)
- symlinks on, no symlink   → compare path bytes, reuse cached_path
- symlinks on, real symlink → cache.value(realpath) (unchanged path)

Public API unchanged.
The first fix (6fabea5) only helped when realpath() returned the input
bytes unchanged. With pnpm projects (and the resolver bench), every
resolved file goes through a .pnpm/<pkg>@<ver>/... symlink, so realpath
always changes the path and we always fell through to
cache.value(&path_buf), which:

- inserts the realpath result into the cache
- builds and inserts its entire parent chain into the cache
- per-iter, that's ~1000 fresh entries with full .pnpm/ parent chains
  → matches CodSpeed's +22% memory bump

ResolverPath::shallow() builds a single Arc<ResolverPathInner> without
inserting into the cache and without constructing the parent chain.
Caching the realpath result was never load-bearing for correctness —
each ResolverPath already memoizes its own realpath via the
canonicalized OnceCell — so dropping the cache insert is purely a
memory and instruction-count win.

Public API is unchanged. Resolution::resolver_path().parent() returns
None when symlinks resolved to a different path (same as if the caller
had called .parent() on the old PathBuf result, which had no parent
caching either).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants