Skip to content

Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning#11

Merged
tolgaergin merged 11 commits into
mainfrom
phase-53-greedy-resolver
Apr 27, 2026
Merged

Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning#11
tolgaergin merged 11 commits into
mainfrom
phase-53-greedy-resolver

Conversation

@tolgaergin
Copy link
Copy Markdown
Contributor

Summary

Three stacked phases of resolver/walker/transport work, all empirically validated and ready to ship as one PR. Phase 56 (walker/resolver fusion) starts on a fresh branch from main after this lands.

  • Phase 53 — greedy multi-version resolver alongside PubGrub, opt-in via LPM_RESOLVER=greedy. Plus W5+W6 close-out work: metadata cache writes off the tokio runtime, extract off the download permit.
  • Phase 54 — continuous-stream walker (run_stream) replacing the BFS level-step walker, opt-in via LPM_WALKER=stream. Per-fetch concurrency bumped 50 → 256 (empirical, n=10 statistically significant).
  • Phase 55 — transport tuning. h1-pool probe (refuted as default, opt-in retained), reqwest gzip feature, brotli + zstd, Arc-wrap of SharedCache values (eliminated 1,063 ms of resolver-side cloning per cold install).

Bench results — bench/fixture-large (266 packages), cold-equal-footing, n=20

Medians from the Phase 53 W5+W6 close-out §3.2 — pre/post comparison vs main HEAD baseline, with bun as the reference control:

Arm Wall median Wall stdev × bun
pubgrub-stream 4,020 ms 499 ms 4.34×
greedy-stream (new default for greedy) 3,936 ms 821 ms 4.25×
bun (control, n=10) 926 ms

Substage wins from W5+W6 (greedy-stream): extract_sum −42 %, extract_max −28 ms, finalize_sum −41 %. Wall delta is in the noise band — moving extract off the download permit + cache writes off the runtime is structural correctness work for the next phase, not a wall win on its own.

The remaining ~3 sec gap to bun is the walker/resolver redundancy: 92 of 134 metadata fetches go through the greedy resolver's escape-hatch (69 %), not the walker's pre-fetch. Phase 56 (walker/resolver fusion) is the next branch.

What changes — by phase

Phase 53: greedy multi-version resolver

  • New crates/lpm-resolver/src/greedy.rs (1,106 lines). Bun-shape enqueueDependencyWithMain port: edge queue + first-match version pick + reuse-on-compatible / allocate-on-incompatible.
  • Multi-version-per-canonical: when edge B's range is incompatible with edge A's pick, allocate a new node; both versions live in the resolved tree keyed by (canonical, version).
  • Platform filtering inline (avoids the 47-package esbuild over-install on fixture-large).
  • W5+W6: write_metadata_cache runs on tokio::task::spawn_blocking; metadata cache file format simplified (no HMAC, magic-prefixed); extract no longer holds the download permit.

Phase 54: continuous-stream walker

  • New BfsWalker::run_stream() (gated by LPM_WALKER=stream). mpsc-driven, no level barrier; semaphore-bounded fan-out.
  • DEFAULT_NPM_FANOUT 50 → 256 (empirical: median 4,816 → 4,124 ms, t=3.49 significant; stdev 972 → 253 ms — −74 %).
  • Per-level instrumentation on the legacy run_bfs for diagnostic comparison.

Phase 55: transport tuning

  • phase-55 W1 — h1-pool probe: tried bun-style 64 separate HTTP/1.1 sockets to bypass single-connection h2 flow control. Refuted as default (no statistical wall improvement); kept as opt-in via env flag for future investigation.
  • phase-55 W2reqwest gzip feature enabled: modest wall delta, real wire savings.
  • phase-55 W3 — added brotli + zstd to reqwest features.
  • phase-55 W4 — Arc-wrap SharedCache values. The load-bearing fix for the resolver wall: pre-W4 the greedy resolver burned ~5 sec per cold install cloning popular packuments per edge. Resolver wall: −1,063 ms; walker: −721 ms.

What is NOT in this PR

  • Phase 56 walker/resolver fusion (pre-plan in lpm-dev/a-package-manager) — separate branch from main once this lands.
  • Lever 2 / security analyzer skip — measurement showed it buys ~150-200 ms wall (smaller than the close-out's optimistic 200-300 ms estimate), not worth a half-day before fusion.
  • Default resolver / walker do not flip in this PR. pubgrub + bfs remain the defaults; greedy + stream are opt-in via env vars.

Files changed

 Cargo.toml                             |    2 +-
 crates/lpm-cli/src/commands/install.rs |  129 +++-
 crates/lpm-registry/src/client.rs      |  588 +++++++++--------
 crates/lpm-registry/src/timing.rs      |   59 +-
 crates/lpm-resolver/src/greedy.rs      | 1106 ++++++++++++++++++++++++++++++++  (new)
 crates/lpm-resolver/src/lib.rs         |    3 +-
 crates/lpm-resolver/src/provider.rs    |   82 ++-
 crates/lpm-resolver/src/resolve.rs     |   77 ++-
 crates/lpm-resolver/src/walker.rs      |  393 +++++++++++-
 9 files changed, 2056 insertions(+), 383 deletions(-)

Commit history

a15650d phase-53 W1: scaffold greedy resolver + LPM_RESOLVER env-var dispatch
e14033e phase-53 W2: multi-version dedupe + platform filter (greedy resolver)
77cba6f phase-54 W1: per-BFS-level instrumentation on the existing walker
fc81cb8 phase-54 W2: continuous-stream walker behind LPM_WALKER=stream
b5056a1 phase-54 W3-C: bump DEFAULT_NPM_FANOUT 50 -> 256 (empirical)
da15db8 phase-55 W1: h1-pool transport probe (REFUTED), kept as opt-in
926fe74 phase-55 W2: enable reqwest gzip feature (modest wall, real wire savings)
02534d0 phase-55 W3: add brotli + zstd to reqwest features
bcbb6a3 phase-55 W4: Arc-wrap SharedCache values (resolver -1,063 ms, walker -721 ms)
46f8b91 phase-53 W5+W6: cache writes off-runtime + extract off-permit (substage -42%, wall flat)
12c9bc7 phase-53 W5+W6 follow-up: allow clippy::type_complexity on into_parts

(Note: commit order is non-monotonic by phase number — 46f8b91 (phase-53 W5+W6) is committed after the phase-55 stack. Functional correctness is unaffected; mentioning for clarity.)

Test plan

  • cargo fmt --check clean
  • grep 'fancy-regex' crates/*/Cargo.toml — none (ReDoS guard)
  • cargo clippy --workspace --all-targets -- -D warnings — clean (verified locally on rust 1.94.1; CI runs 1.94.0 — patch-level diff)
  • cargo build --workspace — green
  • cargo nextest run --workspace --exclude lpm-integration-tests --no-fail-fast — 4,393 tests pass / 0 fail / 7 skip (per close-out)
  • cargo test -p lpm-auth × 2 — no flakes (verifies the keychain-race fix from prior phases)
  • bench/fixture-large cold-equal-footing n=20 — greedy-stream median 3,936 ms (close-out §3.2)
  • Lockfile equality across all four cells (close-out §3.2)
  • CI green on this PR

Companion documentation

🤖 Generated with Claude Code

tolgaergin and others added 11 commits April 26, 2026 12:58
Lands the first commit of Phase 53's bun-recipe port: a single-version-per-
canonical greedy resolver gated behind LPM_RESOLVER=greedy. Default stays
PubGrub-with-split-retry while the new path ships dark.

The W1 algorithm follows bun's enqueueDependencyWithMain shape:

  1. Seed task_queue with one Edge per root dep.
  2. Pop edge -> ensure_manifest() (Phase 49 SharedCache hit / wait /
     escape-hatch fetch via existing route_mode plumbing).
  3. find_best_version(): reverse-iterate sorted versions, first
     satisfying match wins. O(versions).
  4. Reuse existing node for the canonical (W1 single-version rule)
     or allocate a new one and enqueue ITS children.
  5. Loop until task_queue empty.

No backtracking, no split-retry. ~600-1000 find_best_version calls per
266-pkg tree at ~us each.

Empirical sanity (pure-npm /tmp fixture: zod + dayjs, full cache wipe):

  PUBGRUB (default):  wall=781 ms
  GREEDY:             wall=209 ms (resolve=58 fetch=122 link=2)

Both resolved identical versions (dayjs@1.11.20, zod@3.25.76). 3.7x cold
on this fixture. The W1 ship gate is bench/project (51 pkgs, no conflicts)
producing identical resolved trees vs PubGrub at lower resolve_ms; that
bench runs next.

W1 deferred items wired in subsequent commits:
  - W2: drop single-version rule, dedupe by (canonical, version)
  - W3: peer/required asymmetry + peer queue drain pass
  - W4: Phase 32 P5 path-selector overrides + Phase 40 P2 root aliases
  - W5: BfsWalker integration (drop redundant SharedCache copy)

Plumbing reused as-is: SharedCache, NotifyMap, WalkerDone, route_mode,
parse_metadata_to_cache_info, NpmRange. PubGrub stays in tree until W7.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex in any Cargo.toml                                 clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4391 passed, 7 skipped, 0 failed
                                                                   (+8 vs baseline; 6 are new greedy.rs unit tests)

See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops W1's single-version-per-canonical rule. Each (canonical, version)
pair now gets its own resolved node; when an edge wants a canonical
already in the tree, the resolver reuses an existing node ONLY IF its
version satisfies the new edge's range. Incompatible ranges allocate
new nodes — bun + npm + pnpm semantics, no PubGrub split-retry needed.

Key data structure changes in greedy.rs:
  - resolved: HashMap<CanonicalKey, NodeId>
        -> HashMap<CanonicalKey, Vec<(NpmVersion, NodeId)>>
  - children_enqueued: HashSet<CanonicalKey>
        -> HashSet<(CanonicalKey, NpmVersion)>
  - new field: ResolveState.platform_skipped: usize
  - new enum: VersionPick { Picked(v) | NoSatisfying | PlatformFiltered }

Platform filter (provider::is_platform_compatible promoted to pub(crate))
applied in find_best_version. Without it, optional deps with `os`/`cpu`
constraints (e.g. esbuild's per-platform binary tarballs) over-install:
fixture-large was producing 301 packages vs PubGrub's 254 because every
@esbuild/* platform binary was getting resolved. After the fix:
greedy installs 255 packages on fixture-large (one extra: a transitive
qs landing on a different body-parser version), tree near-identical to
PubGrub's. The platform_skipped counter is incremented and surfaced
through ResolveResult.platform_skipped for --json telemetry parity.

Tree-correctness empirical (release binary, full cache wipe per arm):
  bench/project (17 root, ~50 transitives):
    PubGrub: 51 packages, wall 902 ms
    Greedy:  51 packages, wall 435 ms
    diff: TREES IDENTICAL (W1 had a 50-vs-51 gap; W2 closes it)
  bench/fixture-large (21 root, ~254 transitives):
    PubGrub: 254 packages
    Greedy:  255 packages
    diff: 6 lines (1 body-parser minor delta + 1 transitive qs)
          near-identical, no platform-binary over-install.

Resolve-substage measurement is HIGH VARIANCE on single runs:
  fixture-large cold (single run, this session): PubGrub resolve_ms
  ranged 4068 -> 8799 across runs depending on how many split passes
  the trace hit on that run. Greedy resolve_ms was 5816 on this run.
  The W2 ship gate's "resolve_ms <= 1500 ms median" target requires the
  n=30 V2 protocol to be conclusive — single-run numbers don't decide
  it. The correctness-defining work (multi-version + platform filter)
  ships here so the perf work has a stable substrate.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex in any Cargo.toml                                 clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed
                                                                   (+3 vs W1: process_edge multi-version test +
                                                                    find_best_version platform-filter test +
                                                                    handle_no_version platform_skipped counter test)

Truth references vs PubGrub tree: per user feedback, lpm's PubGrub-with-
split-retry is NOT the source of truth — bun/pnpm/npm are (the trees
millions of users rely on). W2's reuse-on-compatible-range / allocate-
on-incompatible-range matches their semantics; PubGrub's tree is itself
a workaround for a flat-resolution constraint that doesn't exist in any
mature npm-shaped resolver.

W3 priority: peer-dep queue + drain pass; eliminates a known difference
where greedy may pick a version that violates a peer-dep range that
PubGrub's solver would have caught. Already used by W1's body-parser
1.20.4 vs 1.20.5 delta on fixture-large.

See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three-phase wall-clock attribution (setup / fetch / commit) per
BFS level in `BfsWalker::run`, exposed via:

- New `LevelTiming` struct, `pub levels: Vec<LevelTiming>` on
  `WalkerSummary`, exported from `lpm_resolver`.
- `tracing::info!(target = "lpm_resolver::walker")` per-level line so
  `RUST_LOG=lpm_resolver::walker=info` dumps all levels to stderr.
- New `timing.resolve.streaming_bfs.levels` array in install --json
  output. (Built outside the outer json! macro to keep
  recursion_limit-friendly.)

Diagnostic only — no algorithm change. Validates the Phase 54 thesis:
the BFS-level barrier in `walker.rs:243-353` (`tokio::join!` per
level + `for` loop draining results before advancing) is the cost we
pay. Single cold install on `bench/fixture-large` (single-run, fresh
cache):

  level 0: seeded= 21 fetch=575 commit=36 total=611 ms
  level 1: seeded= 99 fetch=494 commit=25 total=520 ms
  level 2: seeded= 63 fetch=447 commit=10 total=457 ms
  level 3: seeded= 50 fetch=941 commit=22 total=963 ms
  level 4: seeded= 18 fetch=336 commit= 1 total=338 ms
  level 5: seeded=  6 fetch=306 commit= 0 total=306 ms
  level 6: seeded=  1 fetch= 21 commit= 0 total= 21 ms
  ────────────────────────────────────────────────────
                    fetch=3120 commit=94 total=3216 ms
  walker_wall_ms = 3219 (matches sum-of-totals)

Findings:
  - 7 sequential levels; each is a stop-the-world barrier.
  - Commit phase is 3% of walker wall — the dead time isn't there.
  - Level 3 (50 pkgs, 941 ms) > Level 1 (99 pkgs, 494 ms).
    Slowest-fetch-per-level holds each barrier — a stream walker
    would only wait on the critical-path chain.
  - Per Phase 54 preplan §4 W1 ship gate: walker_wall / resolve_ms ≈
    71% on this run (resolve_ms was ~4500 ms). >50% threshold met,
    so the level-barrier is the dominant cost. **W2 implementation
    justified empirically.**

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex                                                   clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed
                                                                   (same as Phase 53 W2 baseline; instrumentation
                                                                    added no new tests, doesn't break old ones)

W2 next: replace `BfsWalker::run`'s level loop with a continuous-
stream loop (mpsc + FuturesUnordered + Semaphore). Both PubGrub and
greedy benefit equally; both stay opt-in via `LPM_RESOLVER`.

See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `BfsWalker::run_stream` — a bun-style fetch-on-manifest-parse
walker that eliminates the level-barrier in `run_bfs` (the W1-
instrumented version). The dispatch in `BfsWalker::run` switches on
`LPM_WALKER`; default stays `bfs` until W3's n=30 bench validates.

The stream walker:
- Owns one `tokio::sync::mpsc::UnboundedSender<(name, depth)>`. Roots
  seeded at depth 0; each child sent with parent_depth + 1; depth
  capped by the existing MAX_WALK_DEPTH (16).
- Uses `tokio::task::JoinSet` (tokio-native FuturesUnordered<JoinHandle>
  equivalent — no `futures` crate dep) for in-flight fetches.
- Single global `Arc<Semaphore>(npm_fanout)` caps concurrent fetches.
  Permit acquisition lives INSIDE the spawned future so the dispatch
  loop never blocks on the semaphore — tasks queue parked-on-permit
  in the runtime, the semaphore caps actual network I/O.
- `tokio::select!` between `in_flight.join_next()` (biased: process
  results first to free permits + discover children) and `rx.recv()`.
- Termination: when `in_flight.is_empty() && rx.is_empty()`, drop the
  sender; the next `rx.recv()` returns None, both arms unmatched,
  `else => break`.
- Calls existing `commit_manifest` for cache insert + notify_waiters
  + roots_ready + spec_tx.send — preplan §5.5 ordering preserved
  exactly. `expand_deps_from_info` reused as-is.

W2 SHIP GATE — single-run cold on bench/fixture-large (266 pkgs):

  pubgrub-bfs     wall=5763 ms | 254 pkgs   ← W1 baseline
  pubgrub-stream  wall=4469 ms | 254 pkgs   ← -22.5%
  greedy-bfs      wall=4798 ms | 255 pkgs   ← W1 baseline
  greedy-stream   wall=4148 ms | 255 pkgs   ← -13.6%

  TREE PARITY (within same resolver, BFS vs Stream MUST match):
    pubgrub-bfs == pubgrub-stream  TREES IDENTICAL ✓
    greedy-bfs  == greedy-stream   TREES IDENTICAL ✓

Smoke gates per Phase 54 preplan §4 W2:
  - pure-npm (zod+dayjs):  bfs wall=946 / stream wall=206 ms (4.6x)
  - bench/project (51):    bfs wall=672 / stream wall=409 ms (1.6x)
                           TREES IDENTICAL ✓

Per-package fetch via existing client APIs (per preplan §3.5):
- npm direct: client.get_npm_metadata_direct(&name)
- npm via worker: client.get_npm_package_metadata(&name)
- @lpm.dev/*: PackageName::parse + client.get_package_metadata(&pkg)

Single-run wall delta is smaller than the preplan's projected ~2 sec
because single-run variance is high (one slow CDN edge per run). W3's
n=30 V2 protocol will settle the median delta with the trimmed paired
t-stat; expectation is ~1-2 sec on the headline median.

Both PubGrub and greedy benefit equally because they share the
walker; neither resolver is removed in this phase. `LPM_WALKER=bfs`
stays the default until W3's bench is green.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex                                                   clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed

W3 next: n=30 V2 four-cell A/B (PubGrub × {BFS, Stream}) ×
(Greedy × {BFS, Stream}) on fixture-large + bun control. If
stream-arms beat bfs-arms by >=1.5s on resolve_ms median, W4 flips
LPM_WALKER default to stream.

See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 54 W3 Option C bench (n=10 cold-equal-footing on bench/fixture-
large, stream walker + greedy resolver):

  fanout= 50  median=4816 ms  stdev= 972 ms  <- prior default
  fanout=128  median=4403 ms  stdev=1578 ms  (t=0.59 NOT sig.; 9.3s outlier)
  fanout=256  median=4124 ms  stdev= 253 ms  (t=3.49 SIGNIFICANT)

Bumping to 256 on the same single HTTP/2 connection saves ~700 ms
median wall (-14.4%) AND collapses variance by ~74% (stdev 972 -> 253
ms). No server-side rejection at 256 concurrent streams against a
single npmjs.org HTTP/2 connection — the "most CDNs hard-cap at 100"
warning some hypotheses raised does not apply to npm registry on the
fixture-large workload.

The variance collapse mirrors W2's stream-walker effect on the BFS
barrier: with 256 permits in flight, slow-CDN-edge tail fetches now
overlap with the rest of the resolution instead of holding the whole
batch. Same phenomenon, different layer.

Empirical-only change. The `LPM_NPM_FANOUT` env-var override added
in the prior commit stays in `BfsWalker::run` as an unprivileged knob
for users who want to A/B different fan-outs (for `LPM_NPM_FANOUT=50`
diagnostic comparisons or to dial down on memory-constrained CI).

The remaining ~3-sec gap to bun's 765 ms is per-connection flow
control on the single h2 multiplex socket. Bun uses 64 separate
HTTP/1.1 sockets, each with its own TCP congestion window — that's
the structural advantage stream concurrency on h2 cannot match. Next
phase (55) Option B is an `LPM_HTTP=h1-pool` transport probe to
verify and ship the structural fix.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex                                                   clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed (1 leaky), 7 skipped, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hypothesis: bun's 64-socket HTTP/1.1 pool bypasses a per-h2-connection
flow-control throttle, hence its 765ms cold install vs our ~4.1s.

Test: build_http_client gates an `LPM_HTTP=h1-pool` mode that switches
reqwest to:
  .http1_only()
  .pool_max_idle_per_host(64)
  .pool_idle_timeout(120s)
  .tcp_keepalive(60s)
  .tcp_nodelay(true)

Bench: n=30 cold-equal-footing, stream-greedy on bench/fixture-large
(commit b5056a1 baseline = h2 default + fanout=256):

  h2-default   median=4273 ms  stdev=254 ms  (control)
  h1-pool-64   median=4766 ms  stdev=430 ms  (-11.5%, paired t=-4.57)
  h1-pool-256  median=4651 ms  stdev=405 ms  ( -8.9%, paired t=-3.46)

**Both h1-pool variants are STATISTICALLY SIGNIFICANTLY SLOWER than
h2-default.** Both trimmed paired t-stats are negative and well
above the 2.069 critical (|t|=4.57 and 3.46 vs 2.069 critical at
alpha=0.05). Variance is also ~60% worse with h1-pool.

Why h1-pool loses on this fixture against npmjs.org:
  - 64 separate TLS handshakes per cold install (~50-200ms each
    with TLS 1.3 + RTT) vs h2's single handshake.
  - 64 separate TCP slow-starts vs h2's single connection past
    slow-start.
  - npmjs.org seems to handle 256 concurrent h2 streams on one
    connection without throttling (Phase 54 W3-C established this).

So the structural argument that "bun's 64 sockets with separate
TCP windows dodge a per-h2-connection throttle" is empirically
unsupported. **Bun's speed advantage is NOT in HTTP/1 vs HTTP/2
transport.** The remaining ~3.5-sec gap to bun lives somewhere
else.

Default stays h2 (negotiated via ALPN). The h1-pool path is kept
as opt-in (`LPM_HTTP=h1-pool`) so future debugging on different
network regimes — CDN routing changes, server-side h2 throttle
policy changes — can A/B without re-implementing the branch.
The doc comment on build_http_client preserves the bench result
so future readers know the option is a refutation, not a
recommendation.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex                                                   clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngs)

Workspace Cargo.toml had reqwest features = ["json", "rustls-tls",
"stream", "http2"]. Adding "gzip" enables transparent
Accept-Encoding: gzip + response decompression for all reqwest calls.

Empirical wire-byte verification (curl test against npmjs.org):

  Accept: application/vnd.npm.install-v1+json (current):     69,555 B
  Same + Accept-Encoding: gzip:                              29,348 B   (-58%)
  No Accept header (full packument, for reference):         247,652 B

For 258 manifests on bench/fixture-large that's ~18 MB uncompressed
vs ~7.6 MB gzipped — saves ~10 MB of wire bytes per cold install.
Note: the abbreviated metadata Accept header was already correctly
sent at client.rs:928 and :981 (Gemini's "abbreviated metadata
smoking gun" hypothesis was already addressed in the codebase).

Empirical wall impact (n=10 cold-equal-footing on fixture-large,
stream-greedy, fanout=256):

  WALL median:      4136 ms (mean=4348, stdev=576)
  vs prior n=30:    4273-4290 ms median (b5056a1 baseline)
  Delta:            ~-150 ms median (~-3.5%), modest

  RESOLVE median:   3738 ms (vs 3890 ms baseline) ~-150 ms

Modest because **wire bandwidth isn't the dominant bottleneck on
this network**. Server-side per-request processing is. But:
  - On slow connections (residential 10 Mbps DSL, mobile, etc.)
    the 10 MB savings translates to multiple seconds.
  - This is pure-win telemetry and behavioral correctness — npm
    explicitly recommends Accept-Encoding: gzip; we were the rude
    client downloading uncompressed.
  - Phase 50 F2 in lpm-vs-bun.md flagged this as a real bug back
    when; ships now.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  no fancy-regex                                                   clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bun's verbose HTTP traces (captured cold against npmjs.org) show:

  > HTTP/1.1 GET https://registry.npmjs.org/dayjs
  > Accept-Encoding: gzip, deflate, br, zstd

Adding "brotli" + "zstd" to our reqwest features matches bun's
Accept-Encoding offer. Empirical wire-byte savings per lodash
manifest:

  Accept-Encoding: gzip:                    29,348 B
  Accept-Encoding: gzip,deflate,br,zstd:    28,934 B (-1.4%)

Marginal — npm's brotli responses are ~1.4% smaller than gzip on
this manifest; total install savings ~100 KB across 258 manifests.
Wall is unchanged within noise (3-iter cold median ~4,400 ms,
indistinguishable from gzip-only baseline).

Kept because:
  1. Free wire bytes on slow connections.
  2. Behavioral parity with bun's request shape.
  3. Brotli/zstd decoders in reqwest are well-tested; no
     measurable regression in 3-iter cold smoke.

The 4-sec wall gap to bun's ~1-1.3s is **not in compression
choice or wire bytes**. It's HTTP-stack architectural. Bun uses
uSockets (custom C++ event loop, originally from uWebSockets):
loop: *jsc.MiniEventLoop, NewHTTPContext machinery in
src/http/HTTPThread.zig. reqwest+hyper+tokio is general-purpose
and ~2-10x slower for high-concurrency keep-alive HTTP. That's
the real gap; this commit doesn't change it.

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-721 ms)

The Phase 55 D micro-benchmark proved reqwest+tokio can fetch 248 npm
manifests in ~310 ms median (~800 fetches/sec — actually faster than
bun's ~700 fetches/sec). So the HTTP stack was NEVER the bottleneck.

Empirical substage probe on the W3-baseline binary against fixture-
large cold revealed:
  pubgrub_ms (greedy resolver wall): 5292 ms
  streaming_bfs.walk_ms (walker):    1334 ms
  spec_tx_send_wait_ms:                37 ms (negligible)
  escape_hatch_fetches:                 0    (cache always hit)

The resolver was burning ~5 sec on cache-warm lookups, and the walker
~1.3 sec on cache writes. Walker can fetch in 310 ms but spends an
extra 1 sec doing... what? Cloning huge CachedPackageInfo structs.

`SharedCache = Arc<DashMap<CanonicalKey, CachedPackageInfo>>` — the
DashMap stored values directly, not Arc-wrapped. Every read in the
greedy resolver's `ensure_manifest` did:
    return Ok(Arc::new(entry.clone()));
i.e. a deep clone of all 7 nested HashMaps (versions × deps,
peer_deps, optional, platform, dist, aliases) per edge access.

For popular packages with 200+ versions and ~150 deps each, that's
~30 KB-ish of cloned heap data per edge × ~600 edges = ~18 MB of
allocator churn during resolve.

This commit changes:
    SharedCache = Arc<DashMap<CanonicalKey, Arc<CachedPackageInfo>>>

So `entry.value().clone()` is a refcount bump, not a deep clone. The
`into_parts` path at end of resolve still produces the public
`HashMap<_, CachedPackageInfo>` shape (one-time deep clone at the
boundary; per-edge access during resolve is shared).

Substage measurement on cold fixture-large after the fix:
    pubgrub_ms: 4229 (was 5292)        -1063 ms / -20%
    walk_ms:    613  (was 1334)        -721  ms / -54% (!!)
    spec_tx_send_wait_ms: 44           unchanged

Walker dropping by half is the most striking result — confirms the
walker was paying cache-write deep-clone cost on every commit_manifest.

Single-run wall on cold fixture-large after the fix was high-variance
(5 runs: 8760, 5109, 4109, 5160, 3839 ms — median 5109). The substage
improvements are masked at the wall level by network noise on 5
samples; will be visible in n=30 V2 protocol.

Files touched (mostly mechanical):
  - provider.rs:  insert wraps in Arc::new, into_parts deep-clones at boundary
  - walker.rs:    insert wraps in Arc::new in commit_manifest
  - greedy.rs:    ensure_manifest returns entry.value().clone() (Arc::clone)
                  result-cache extraction deep-clones at boundary
  - resolve.rs:   one test-fixture insert site updated

Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
  cargo fmt --check                                                clean
  cargo clippy --workspace --all-targets -- -D warnings            clean
  cargo build --workspace                                          clean
  cargo nextest run --workspace --exclude lpm-integration-tests    4394 passed, 7 skipped, 0 failed

The remaining ~4 sec resolver wall is per-edge CPU cost in greedy's
process_edge (NpmRange::parse, HashMap inserts, child Edge allocation
across ~600 edges sequentially). That's the next lever — Phase 55 W5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ge -42%, wall flat)

Stacks on bcbb6a3 (Phase 55 W4 Arc-wrap SharedCache). Wall-clock improvement
on bench/fixture-large is in the noise band (greedy-stream median 3,936 ms,
pubgrub-stream 4,020 ms, n=20 cold; ~4× bun's 926 ms); substage view shows
real work moved off the critical path. Shipping as observability + structural
cleanup; the wall-time win needs walker/resolver fusion next phase. Full
flamegraph map + bench tables + next-lever scoping in
DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-w5-w6-closeout.md
in the a-package-manager repo.

W5.1  write_metadata_cache: blocking std::fs::write moved to spawn_blocking;
      handles tracked in `pending_cache_writes` so tests can deterministically
      flush via `flush_pending_cache_writes()`. Sync-write fallback when no
      runtime is available + an opt-in `with_synchronous_cache_writes(true)`
      builder for mock-server tests. Pre-W5.1 every walker manifest fetch
      serialized through a sync write that blocked the calling tokio worker.

W5.2  Drop HMAC-SHA256 from the metadata cache file format. New shape:
      `LPM-MD-V2\n{ETag}\n{msgpack-payload}`. Old HMAC-prefixed entries fail
      the magic check and are silently treated as misses (refetch + rewrite
      in the new format). Removed the persistent `cache_signing_key` plumbing
      + four signing-key helpers + CACHE_SIGNING_KEY_FILE constant + three
      HMAC-specific tests. Bun's manifest cache uses the same trust-the-FS
      model (header magic only).

A1    Split followup_rpc_count into walker_rpc_count + escape_hatch_rpc_count.
      Both BFS (`run_bfs`) and streaming (`run_stream`) walkers post-record
      via `record_walker_rpcs(n)` after each fetch path returns; new fields
      surface on StageTiming and in `--json`. Pre-A1 the metric conflated
      walker pre-fetch with provider escape-hatch; fixture-large now reports
      42 walker + 92 escape-hatch = 134 total, making the walker/resolver
      race visible (69% of fetches escape-hatch).

A3    ResolveResult.cache: HashMap<CanonicalKey, CachedPackageInfo> →
      HashMap<CanonicalKey, Arc<CachedPackageInfo>>. Eliminates the
      end-of-resolve deep-clone (~248 entries × ~30 KB nested HashMaps =
      ~7.4 MB allocator churn hidden inside pubgrub_ms). Auto-deref handles
      consumer field access; format_solution, check_unmet_peers,
      provider.into_parts, greedy.resolve_greedy, and the test helper
      make_cached_info needed signature updates.

W6a   fetch_and_store_streaming and fetch_and_store_legacy now consume an
      OwnedSemaphorePermit and `drop()` it between download and extract.
      Pre-W6a the permit covered both phases end-to-end; the max ~141 ms
      extract was a serializing tail behind every sibling's download permit
      hand-off. Streaming path collects compressed tarball into bytes::Bytes
      (bounded by MAX_COMPRESSED_TARBALL_SIZE; ~12 MB peak resident across
      24 permits), drops permit, then spawn_blocking extract from a
      Cursor<Bytes>. Legacy path drops permit after temp-file write.

Bench (n=20 cold, bench/fixture-large, three-cell, vs pre-W5 n=3 baseline):

  arm                pre-W5    W5+W6    Δ        × bun
  pubgrub-stream     3,911 ms  4,020 ms +109 ms  4.34×
  greedy-stream      3,936 ms  3,936 ms  flat    4.25×
  bun                  868 ms    926 ms          —

Substage deltas W5 → W5+W6 (medians, ms; greedy-stream):
  extract_sum     824 →  478  −42%
  extract_max     122 →   94  −28
  finalize_sum  1,188 →  695  −41%
  pubgrub_ms    3,432 → 3,245 −187

Workspace nextest: 4,393 tests pass, 0 failed, 7 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `Arc<CachedPackageInfo>` wrap added in 46f8b91 pushed the 4-tuple
return type over clippy 1.94.0's type_complexity threshold (it was
`HashMap<_, CachedPackageInfo>` before — same arity, one less nesting
level). The single caller in `resolve.rs:369-370` already destructures
with readable names, so a named-fields struct here is cleanup-of-clean-
code rather than a real refactor; CLAUDE.md explicitly permits this
`#[allow]` for design-level lints.

Caught by `cargo clippy --workspace --all-targets -- -D warnings` on
local toolchain 1.94.1; fix verified green on a fresh CARGO_TARGET_DIR
cold build alongside the full nextest suite (4,393 tests pass / 0 fail
/ 7 skip) and lpm-auth determinism rerun ×2 (47/0/0 both runs, no
flakes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tolgaergin tolgaergin merged commit 4934909 into main Apr 27, 2026
3 checks passed
tolgaergin added a commit that referenced this pull request Apr 29, 2026
* phase-60 D2: promote download_tarball_routed helpers to RegistryClient

Behavior-preserving refactor extracting the two private routed-tarball
helpers from install.rs (download_tarball_routed,
download_tarball_streaming_routed) onto RegistryClient as public
methods. Both `lpm install` and the upcoming Phase 60 `lpm add` source-
delivery flow consume the same Custom-route auth-attachment logic.

- crates/lpm-registry/src/client.rs: add public methods
- crates/lpm-cli/src/commands/install.rs: switch all 5 call sites to
  the new methods; delete the private helpers; remove the now-unused
  DownloadedTarball import

All 602 install + npmrc tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 60.0.e: PackageMetadata::resolve_version_spec helper

Add a three-tier version-spec resolver on PackageMetadata covering
dist-tag → exact-version → semver-range, mirroring the canonical
pattern at install_global.rs:368-405 verbatim.

Pre-Phase-60, `lpm add react@beta`, `next@canary`, `lodash@^4` all
failed because PackageMetadata::version() is a pure HashMap lookup —
none of those literal strings exist as concrete versions. The new
helper closes the gap.

Per D3 (preplan): both parse-failure and no-satisfying-version
return LpmError::Script (matching install_global verbatim) so the
Phase 60.1 migration of the four duplicate sites (install_global,
install, update_global, global) is a true behavior-preserving
refactor.

9 unit tests cover dist-tag (latest/beta/canary), exact match,
caret/tilde range, no-satisfying error, parse-fail error, and
empty-versions error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 60.0+60.1+60.1.5+60.2: lpm add source delivery from any registry

Decouple `lpm add` from LPM-only package identity, mirror install's
full .npmrc setup, switch to file-spool tarball download, add
destination-side path containment, gate dep auto-install on
lpm.config.json presence, and surface external imports for the simple
path. End-to-end flow now works for any package on any registry the
rust client can reach (lpm.dev worker, npmjs.org direct, .npmrc-
declared private registries).

60.0.a + 60.0.b — Identity refactor + drop dotted-name auto-prepend
- New AddTarget enum: Lpm(PackageName) | Npm { spec: String }.
- New resolve_add_target replaces parse_package_ref. No rewriting
  outside the @lpm.dev/ scope — `lodash.merge`, `tolga.foo`, etc.
  resolve to AddTarget::Npm verbatim. Fixes a long-standing
  correctness bug: pre-Phase-60 dotted bare names were silently
  rewritten to @lpm.dev/<name> which doesn't exist on lpm.dev.
- All output / log / JSON sites render via target.display() /
  target.json_name() — `name.scoped()` no longer used unconditionally.
- Skills branch type-encoded via `let AddTarget::Lpm(pkg) = &target`
  pattern, with a why-comment (60.2) explaining the scope gate
  (lpm.dev runs LLM scans on shipped skill content; arbitrary npm
  packages are not scanned).

60.0.c — Mirror install's full .npmrc setup
- Build RouteTable::from_env_and_filesystem before any network call.
- Surface npmrc_warnings (non-JSON) and the strict-ssl=false security
  warning (escapes --json). Clone the client with with_tls_overrides
  so cafile= / strict-ssl=false take effect on metadata + tarball
  fetches. Mirrors install.rs:3295-3445.

60.0.d — Routed metadata + file-spool tarball
- Metadata: AddTarget::Lpm uses get_package_metadata; AddTarget::Npm
  uses get_npm_metadata_routed.
- Tarball: client.download_tarball_routed (D2 promoted helper) +
  lpm_extractor::extract_tarball_from_file. Bounded memory via
  MAX_COMPRESSED_TARBALL_SIZE (500 MB) for free; lpm add typescript
  (~22 MB) and worst-case @scope/giant-fixture no longer load the
  whole tarball into RAM.

60.0.f — Destination-side path containment (D6)
- New resolve_safe_dest helper canonicalizes target_dir once and
  validates every write destination: refuses to follow existing
  symlinks, rejects writes whose canonical parent escapes the target
  root. Wired into the Step 8 file-copy loop. Closes the threat-model
  gap that opened up when add expanded from "trusted lpm.dev
  publishers" to "any npm publisher."

60.1 — Dep gate + bare-imports notice (D4)
- Tighten dep gate: `if !no_install_deps && lpm_config.is_some()`.
  Simple path is download-manager: copy bytes, no auto-install.
- import_rewriter exports a sibling collect_bare_specifiers fn that
  shares an internal SpecifierKind classifier with rewrite_imports
  (anti-drift contract — "bare" means the same thing in both places).
- add.rs surfaces the collected externals as a non-JSON notice and
  as a `external_imports` array in the JSON output.

60.1.5 — Non-interactive simple-path guard
- `lpm_config.is_none() && target_path.is_none() && (yes || json ||
  !is_tty)` errors before the file-copy loop. Heuristically defaulting
  components/ for arbitrary 3rd-party source under --yes/--json/non-TTY
  is a CI/automation footgun.

Tests
- 15 unit tests in add.rs (resolve_add_target classification including
  the dotted-name regression; resolve_safe_dest contracts including
  symlink-refusal on Unix).
- 10 unit tests in import_rewriter.rs (classify_specifier,
  collect_bare_specifiers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 60.3: integration tests for lpm add simple path + guards + traversal

Three new wiremock-driven integration tests covering the highest-value
end-to-end scenarios for Phase 60:

- add_simple_non_interactive_without_path.rs (4 sub-tests) — proves
  the 60.1.5 guard fires for --yes, --json, and non-TTY (stdin from
  /dev/null) without --path; positive control with --path succeeds.
  No package.json mutation in any failure case.

- add_source_npm_simple.rs (2 sub-tests) — full simple-path pipeline
  via wiremock npm metadata + tarball: AddTarget::Npm resolves, file-
  spool download, extract, files copied flat (no auto-nest), bare-
  imports notice lists react + @radix-ui/react-slot, package.json
  NOT mutated, .lpm/skills/ NOT created. JSON sub-test asserts the
  package.name uses the npm-style identity (not @lpm.dev/-prefixed)
  and the new external_imports array is well-shaped.

- add_path_traversal_dest_escape.rs — proves resolve_safe_dest is
  wired into the actual write loop, not just unit-tested in
  isolation. Tarball ships an lpm.config.json with files[0].dest =
  "../../escaped/evil.txt" — assertion: containment-violation error,
  exit non-zero, no file written outside target_dir.

Other 60.3 specced tests are either (i) covered by the unit tests
that landed alongside the implementation (#5 dotted-name, #9 version-
spec, #11 symlink — see preplan v6 audit checklist) or (ii)
deliberately deferred where the underlying machinery is already
test-covered by Phase 58.x install tests (#1 lpm.dev rich, #2 npm
rich, #6 npmrc auth, #7 strict-ssl, #8 missing-var fatal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 60.4: README — lpm add now works against any registry

- Update the lpm add one-liner in the Commands list.
- Add a "How lpm add Works" section explaining: source delivery vs.
  install, the firm naming rule (@lpm.dev/owner.name only), the rich
  vs. simple paths, and the non-interactive --path requirement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 audit fix: resolve_safe_dest must validate before mkdir

Audit reproduced (with a temp-dir filesystem probe) that the landed
resolve_safe_dest helper still created directories OUTSIDE the
target_dir for two attack vectors before the containment error fired:

1. `dest_rel = "../../escaped/evil.txt"` — `Path::join` resolves
   lexically; `dest.parent()` lands outside target; `create_dir_all`
   ran before the containment check, leaving `<target>/../escaped/`
   on disk even though the file write was correctly blocked.

2. Absolute `dest_rel = "/tmp/elsewhere/evil.txt"` — `Path::join` of
   an absolute path returns the absolute path verbatim; `parent =
   /tmp/elsewhere/`; `create_dir_all` created it before the
   containment check fired.

The original integration test only asserted no escaped FILE existed,
so the directory-side-effect bug passed CI.

Fix
- Reorder resolve_safe_dest so EVERY check that can reject the
  destination runs BEFORE any filesystem mutation:
  Step 1 (NEW) — reject absolute dest_rel up-front.
  Step 2 (NEW) — reject any ParentDir / RootDir / Prefix component.
  Step 3 — refuse existing-symlink destinations.
  Step 4 (NEW) — pre-mkdir ancestor canonicalization: walk up to the
    longest existing ancestor; canonicalize; require it under
    target_root_canonical (catches symlinked intermediate dirs).
  Step 5 — create_dir_all (NOW safe).
  Step 6 — post-mkdir re-canonicalize as TOCTOU defense-in-depth.

  The lexical bans in Steps 1-2 kill the entire `../escape` and
  absolute-path attack classes before any mkdir runs. The longest-
  existing-ancestor walk in Step 4 covers the symlinked-intermediate
  case (target/foo → /tmp/elsewhere). Step 6 is paranoia.

Tests
- Strengthen unit tests:
  - resolve_safe_dest_dotdot_in_path_rejected_with_no_external_dir_created
    now asserts no escape directory was created.
  - resolve_safe_dest_absolute_dest_rejected_with_no_external_dir_created
    is new — covers the absolute-path attack.
  - resolve_safe_dest_dotdot_in_middle_of_path_also_rejected covers
    `foo/../bar.txt` (lexically resolves back inside but still
    rejected up-front).
- Extend integration test:
  - dest_escape_via_dotdot_is_refused_and_creates_no_external_directory
    now snapshots target_dir entries before the run and asserts no
    unexpected new top-level entries appeared, plus no escape dir.
  - dest_escape_via_absolute_path_is_refused_and_creates_no_external_directory
    is new — covers the absolute-path attack at the integration level.

Net: 4923 → 4926 workspace tests; clippy + fmt clean; all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 1, 2026
…se docs

Phase 64 audit findings #1, #11, #22, #28:

- Add `lpm completions <shell>` (bash/zsh/fish/powershell/elvish) via
  `clap_complete`. Script is generated from the live `Cli` definition so
  it stays in sync with the binary at every release.
- Add `lpm graph --no-open` to suppress the auto-launch of the default
  browser on `--format html`. The file is still written to
  `<project>/.lpm/graph.html`. Useful in headless / CI environments.
- Rename `lpm rebuild --rebuild` to `--force` (no alias kept, per the
  decision to stop bridging). The flag-named-after-the-command UX was
  awkward and `--force` matches the intent better.
- Tighten `lpm use` spec doc-comment to drop "runtime" framing — the
  command is Node.js-only today.

Adjacent issues surfaced and fixed during the work:

- Drop `short = 'V'` from `lpm query --query-verbose`. The local short
  collided with the global `-V` / `--version` and tripped clap's
  structural assertion under `clap_complete::generate`. Without this
  fix `lpm completions <any>` would panic. `--query-verbose` long form
  is unaffected.
- Phase 47 P0 rename leftovers: `"lpm build --all"` → `"lpm rebuild --all"`
  in the rebuild.rs untrusted-pkg hint and `lpm build` → `lpm rebuild`
  in the install.rs auto-build comment.

New parser tests cover all four findings + the legacy `--rebuild` flag
is asserted to fail at parse time so users get an explicit error rather
than a silent no-op.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant