Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning by tolgaergin · Pull Request #11 · lpm-dev/rust-client

tolgaergin · 2026-04-27T01:13:23Z

Summary

Three stacked phases of resolver/walker/transport work, all empirically validated and ready to ship as one PR. Phase 56 (walker/resolver fusion) starts on a fresh branch from main after this lands.

Phase 53 — greedy multi-version resolver alongside PubGrub, opt-in via LPM_RESOLVER=greedy. Plus W5+W6 close-out work: metadata cache writes off the tokio runtime, extract off the download permit.
Phase 54 — continuous-stream walker (run_stream) replacing the BFS level-step walker, opt-in via LPM_WALKER=stream. Per-fetch concurrency bumped 50 → 256 (empirical, n=10 statistically significant).
Phase 55 — transport tuning. h1-pool probe (refuted as default, opt-in retained), reqwest gzip feature, brotli + zstd, Arc-wrap of SharedCache values (eliminated 1,063 ms of resolver-side cloning per cold install).

Bench results — `bench/fixture-large` (266 packages), cold-equal-footing, n=20

Medians from the Phase 53 W5+W6 close-out §3.2 — pre/post comparison vs main HEAD baseline, with bun as the reference control:

Arm	Wall median	Wall stdev	× bun
`pubgrub-stream`	4,020 ms	499 ms	4.34×
`greedy-stream` (new default for greedy)	3,936 ms	821 ms	4.25×
bun (control, n=10)	926 ms	—	—

Substage wins from W5+W6 (greedy-stream): extract_sum −42 %, extract_max −28 ms, finalize_sum −41 %. Wall delta is in the noise band — moving extract off the download permit + cache writes off the runtime is structural correctness work for the next phase, not a wall win on its own.

The remaining ~3 sec gap to bun is the walker/resolver redundancy: 92 of 134 metadata fetches go through the greedy resolver's escape-hatch (69 %), not the walker's pre-fetch. Phase 56 (walker/resolver fusion) is the next branch.

What changes — by phase

Phase 53: greedy multi-version resolver

New crates/lpm-resolver/src/greedy.rs (1,106 lines). Bun-shape enqueueDependencyWithMain port: edge queue + first-match version pick + reuse-on-compatible / allocate-on-incompatible.
Multi-version-per-canonical: when edge B's range is incompatible with edge A's pick, allocate a new node; both versions live in the resolved tree keyed by (canonical, version).
Platform filtering inline (avoids the 47-package esbuild over-install on fixture-large).
W5+W6: write_metadata_cache runs on tokio::task::spawn_blocking; metadata cache file format simplified (no HMAC, magic-prefixed); extract no longer holds the download permit.

Phase 54: continuous-stream walker

New BfsWalker::run_stream() (gated by LPM_WALKER=stream). mpsc-driven, no level barrier; semaphore-bounded fan-out.
DEFAULT_NPM_FANOUT 50 → 256 (empirical: median 4,816 → 4,124 ms, t=3.49 significant; stdev 972 → 253 ms — −74 %).
Per-level instrumentation on the legacy run_bfs for diagnostic comparison.

Phase 55: transport tuning

phase-55 W1 — h1-pool probe: tried bun-style 64 separate HTTP/1.1 sockets to bypass single-connection h2 flow control. Refuted as default (no statistical wall improvement); kept as opt-in via env flag for future investigation.
phase-55 W2 — reqwest gzip feature enabled: modest wall delta, real wire savings.
phase-55 W3 — added brotli + zstd to reqwest features.
phase-55 W4 — Arc-wrap SharedCache values. The load-bearing fix for the resolver wall: pre-W4 the greedy resolver burned ~5 sec per cold install cloning popular packuments per edge. Resolver wall: −1,063 ms; walker: −721 ms.

What is NOT in this PR

Phase 56 walker/resolver fusion (pre-plan in lpm-dev/a-package-manager) — separate branch from main once this lands.
Lever 2 / security analyzer skip — measurement showed it buys ~150-200 ms wall (smaller than the close-out's optimistic 200-300 ms estimate), not worth a half-day before fusion.
Default resolver / walker do not flip in this PR. pubgrub + bfs remain the defaults; greedy + stream are opt-in via env vars.

Files changed

 Cargo.toml                             |    2 +-
 crates/lpm-cli/src/commands/install.rs |  129 +++-
 crates/lpm-registry/src/client.rs      |  588 +++++++++--------
 crates/lpm-registry/src/timing.rs      |   59 +-
 crates/lpm-resolver/src/greedy.rs      | 1106 ++++++++++++++++++++++++++++++++  (new)
 crates/lpm-resolver/src/lib.rs         |    3 +-
 crates/lpm-resolver/src/provider.rs    |   82 ++-
 crates/lpm-resolver/src/resolve.rs     |   77 ++-
 crates/lpm-resolver/src/walker.rs      |  393 +++++++++++-
 9 files changed, 2056 insertions(+), 383 deletions(-)

Commit history

a15650d phase-53 W1: scaffold greedy resolver + LPM_RESOLVER env-var dispatch
e14033e phase-53 W2: multi-version dedupe + platform filter (greedy resolver)
77cba6f phase-54 W1: per-BFS-level instrumentation on the existing walker
fc81cb8 phase-54 W2: continuous-stream walker behind LPM_WALKER=stream
b5056a1 phase-54 W3-C: bump DEFAULT_NPM_FANOUT 50 -> 256 (empirical)
da15db8 phase-55 W1: h1-pool transport probe (REFUTED), kept as opt-in
926fe74 phase-55 W2: enable reqwest gzip feature (modest wall, real wire savings)
02534d0 phase-55 W3: add brotli + zstd to reqwest features
bcbb6a3 phase-55 W4: Arc-wrap SharedCache values (resolver -1,063 ms, walker -721 ms)
46f8b91 phase-53 W5+W6: cache writes off-runtime + extract off-permit (substage -42%, wall flat)
12c9bc7 phase-53 W5+W6 follow-up: allow clippy::type_complexity on into_parts

(Note: commit order is non-monotonic by phase number — 46f8b91 (phase-53 W5+W6) is committed after the phase-55 stack. Functional correctness is unaffected; mentioning for clarity.)

Test plan

cargo fmt --check clean
grep 'fancy-regex' crates/*/Cargo.toml — none (ReDoS guard)
cargo clippy --workspace --all-targets -- -D warnings — clean (verified locally on rust 1.94.1; CI runs 1.94.0 — patch-level diff)
cargo build --workspace — green
cargo nextest run --workspace --exclude lpm-integration-tests --no-fail-fast — 4,393 tests pass / 0 fail / 7 skip (per close-out)
cargo test -p lpm-auth × 2 — no flakes (verifies the keychain-race fix from prior phases)
bench/fixture-large cold-equal-footing n=20 — greedy-stream median 3,936 ms (close-out §3.2)
Lockfile equality across all four cells (close-out §3.2)
CI green on this PR

Companion documentation

Phase 53 W5+W6 close-out (in lpm-dev/a-package-manager)
Phase 54 continuous-stream walker pre-plan
Phase 56 walker/resolver fusion pre-plan — next branch after this merges

🤖 Generated with Claude Code

Lands the first commit of Phase 53's bun-recipe port: a single-version-per- canonical greedy resolver gated behind LPM_RESOLVER=greedy. Default stays PubGrub-with-split-retry while the new path ships dark. The W1 algorithm follows bun's enqueueDependencyWithMain shape: 1. Seed task_queue with one Edge per root dep. 2. Pop edge -> ensure_manifest() (Phase 49 SharedCache hit / wait / escape-hatch fetch via existing route_mode plumbing). 3. find_best_version(): reverse-iterate sorted versions, first satisfying match wins. O(versions). 4. Reuse existing node for the canonical (W1 single-version rule) or allocate a new one and enqueue ITS children. 5. Loop until task_queue empty. No backtracking, no split-retry. ~600-1000 find_best_version calls per 266-pkg tree at ~us each. Empirical sanity (pure-npm /tmp fixture: zod + dayjs, full cache wipe): PUBGRUB (default): wall=781 ms GREEDY: wall=209 ms (resolve=58 fetch=122 link=2) Both resolved identical versions (dayjs@1.11.20, zod@3.25.76). 3.7x cold on this fixture. The W1 ship gate is bench/project (51 pkgs, no conflicts) producing identical resolved trees vs PubGrub at lower resolve_ms; that bench runs next. W1 deferred items wired in subsequent commits: - W2: drop single-version rule, dedupe by (canonical, version) - W3: peer/required asymmetry + peer queue drain pass - W4: Phase 32 P5 path-selector overrides + Phase 40 P2 root aliases - W5: BfsWalker integration (drop redundant SharedCache copy) Plumbing reused as-is: SharedCache, NotifyMap, WalkerDone, route_mode, parse_metadata_to_cache_info, NpmRange. PubGrub stays in tree until W7. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex in any Cargo.toml clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4391 passed, 7 skipped, 0 failed (+8 vs baseline; 6 are new greedy.rs unit tests) See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops W1's single-version-per-canonical rule. Each (canonical, version) pair now gets its own resolved node; when an edge wants a canonical already in the tree, the resolver reuses an existing node ONLY IF its version satisfies the new edge's range. Incompatible ranges allocate new nodes — bun + npm + pnpm semantics, no PubGrub split-retry needed. Key data structure changes in greedy.rs: - resolved: HashMap<CanonicalKey, NodeId> -> HashMap<CanonicalKey, Vec<(NpmVersion, NodeId)>> - children_enqueued: HashSet<CanonicalKey> -> HashSet<(CanonicalKey, NpmVersion)> - new field: ResolveState.platform_skipped: usize - new enum: VersionPick { Picked(v) | NoSatisfying | PlatformFiltered } Platform filter (provider::is_platform_compatible promoted to pub(crate)) applied in find_best_version. Without it, optional deps with `os`/`cpu` constraints (e.g. esbuild's per-platform binary tarballs) over-install: fixture-large was producing 301 packages vs PubGrub's 254 because every @esbuild/* platform binary was getting resolved. After the fix: greedy installs 255 packages on fixture-large (one extra: a transitive qs landing on a different body-parser version), tree near-identical to PubGrub's. The platform_skipped counter is incremented and surfaced through ResolveResult.platform_skipped for --json telemetry parity. Tree-correctness empirical (release binary, full cache wipe per arm): bench/project (17 root, ~50 transitives): PubGrub: 51 packages, wall 902 ms Greedy: 51 packages, wall 435 ms diff: TREES IDENTICAL (W1 had a 50-vs-51 gap; W2 closes it) bench/fixture-large (21 root, ~254 transitives): PubGrub: 254 packages Greedy: 255 packages diff: 6 lines (1 body-parser minor delta + 1 transitive qs) near-identical, no platform-binary over-install. Resolve-substage measurement is HIGH VARIANCE on single runs: fixture-large cold (single run, this session): PubGrub resolve_ms ranged 4068 -> 8799 across runs depending on how many split passes the trace hit on that run. Greedy resolve_ms was 5816 on this run. The W2 ship gate's "resolve_ms <= 1500 ms median" target requires the n=30 V2 protocol to be conclusive — single-run numbers don't decide it. The correctness-defining work (multi-version + platform filter) ships here so the perf work has a stable substrate. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex in any Cargo.toml clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed (+3 vs W1: process_edge multi-version test + find_best_version platform-filter test + handle_no_version platform_skipped counter test) Truth references vs PubGrub tree: per user feedback, lpm's PubGrub-with- split-retry is NOT the source of truth — bun/pnpm/npm are (the trees millions of users rely on). W2's reuse-on-compatible-range / allocate- on-incompatible-range matches their semantics; PubGrub's tree is itself a workaround for a flat-resolution constraint that doesn't exist in any mature npm-shaped resolver. W3 priority: peer-dep queue + drain pass; eliminates a known difference where greedy may pick a version that violates a peer-dep range that PubGrub's solver would have caught. Already used by W1's body-parser 1.20.4 vs 1.20.5 delta on fixture-large. See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds three-phase wall-clock attribution (setup / fetch / commit) per BFS level in `BfsWalker::run`, exposed via: - New `LevelTiming` struct, `pub levels: Vec<LevelTiming>` on `WalkerSummary`, exported from `lpm_resolver`. - `tracing::info!(target = "lpm_resolver::walker")` per-level line so `RUST_LOG=lpm_resolver::walker=info` dumps all levels to stderr. - New `timing.resolve.streaming_bfs.levels` array in install --json output. (Built outside the outer json! macro to keep recursion_limit-friendly.) Diagnostic only — no algorithm change. Validates the Phase 54 thesis: the BFS-level barrier in `walker.rs:243-353` (`tokio::join!` per level + `for` loop draining results before advancing) is the cost we pay. Single cold install on `bench/fixture-large` (single-run, fresh cache): level 0: seeded= 21 fetch=575 commit=36 total=611 ms level 1: seeded= 99 fetch=494 commit=25 total=520 ms level 2: seeded= 63 fetch=447 commit=10 total=457 ms level 3: seeded= 50 fetch=941 commit=22 total=963 ms level 4: seeded= 18 fetch=336 commit= 1 total=338 ms level 5: seeded= 6 fetch=306 commit= 0 total=306 ms level 6: seeded= 1 fetch= 21 commit= 0 total= 21 ms ──────────────────────────────────────────────────── fetch=3120 commit=94 total=3216 ms walker_wall_ms = 3219 (matches sum-of-totals) Findings: - 7 sequential levels; each is a stop-the-world barrier. - Commit phase is 3% of walker wall — the dead time isn't there. - Level 3 (50 pkgs, 941 ms) > Level 1 (99 pkgs, 494 ms). Slowest-fetch-per-level holds each barrier — a stream walker would only wait on the critical-path chain. - Per Phase 54 preplan §4 W1 ship gate: walker_wall / resolve_ms ≈ 71% on this run (resolve_ms was ~4500 ms). >50% threshold met, so the level-barrier is the dominant cost. **W2 implementation justified empirically.** Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed (same as Phase 53 W2 baseline; instrumentation added no new tests, doesn't break old ones) W2 next: replace `BfsWalker::run`'s level loop with a continuous- stream loop (mpsc + FuturesUnordered + Semaphore). Both PubGrub and greedy benefit equally; both stay opt-in via `LPM_RESOLVER`. See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `BfsWalker::run_stream` — a bun-style fetch-on-manifest-parse walker that eliminates the level-barrier in `run_bfs` (the W1- instrumented version). The dispatch in `BfsWalker::run` switches on `LPM_WALKER`; default stays `bfs` until W3's n=30 bench validates. The stream walker: - Owns one `tokio::sync::mpsc::UnboundedSender<(name, depth)>`. Roots seeded at depth 0; each child sent with parent_depth + 1; depth capped by the existing MAX_WALK_DEPTH (16). - Uses `tokio::task::JoinSet` (tokio-native FuturesUnordered<JoinHandle> equivalent — no `futures` crate dep) for in-flight fetches. - Single global `Arc<Semaphore>(npm_fanout)` caps concurrent fetches. Permit acquisition lives INSIDE the spawned future so the dispatch loop never blocks on the semaphore — tasks queue parked-on-permit in the runtime, the semaphore caps actual network I/O. - `tokio::select!` between `in_flight.join_next()` (biased: process results first to free permits + discover children) and `rx.recv()`. - Termination: when `in_flight.is_empty() && rx.is_empty()`, drop the sender; the next `rx.recv()` returns None, both arms unmatched, `else => break`. - Calls existing `commit_manifest` for cache insert + notify_waiters + roots_ready + spec_tx.send — preplan §5.5 ordering preserved exactly. `expand_deps_from_info` reused as-is. W2 SHIP GATE — single-run cold on bench/fixture-large (266 pkgs): pubgrub-bfs wall=5763 ms | 254 pkgs ← W1 baseline pubgrub-stream wall=4469 ms | 254 pkgs ← -22.5% greedy-bfs wall=4798 ms | 255 pkgs ← W1 baseline greedy-stream wall=4148 ms | 255 pkgs ← -13.6% TREE PARITY (within same resolver, BFS vs Stream MUST match): pubgrub-bfs == pubgrub-stream TREES IDENTICAL ✓ greedy-bfs == greedy-stream TREES IDENTICAL ✓ Smoke gates per Phase 54 preplan §4 W2: - pure-npm (zod+dayjs): bfs wall=946 / stream wall=206 ms (4.6x) - bench/project (51): bfs wall=672 / stream wall=409 ms (1.6x) TREES IDENTICAL ✓ Per-package fetch via existing client APIs (per preplan §3.5): - npm direct: client.get_npm_metadata_direct(&name) - npm via worker: client.get_npm_package_metadata(&name) - @lpm.dev/*: PackageName::parse + client.get_package_metadata(&pkg) Single-run wall delta is smaller than the preplan's projected ~2 sec because single-run variance is high (one slow CDN edge per run). W3's n=30 V2 protocol will settle the median delta with the trimmed paired t-stat; expectation is ~1-2 sec on the headline median. Both PubGrub and greedy benefit equally because they share the walker; neither resolver is removed in this phase. `LPM_WALKER=bfs` stays the default until W3's bench is green. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed W3 next: n=30 V2 four-cell A/B (PubGrub × {BFS, Stream}) × (Greedy × {BFS, Stream}) on fixture-large + bun control. If stream-arms beat bfs-arms by >=1.5s on resolve_ms median, W4 flips LPM_WALKER default to stream. See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 54 W3 Option C bench (n=10 cold-equal-footing on bench/fixture- large, stream walker + greedy resolver): fanout= 50 median=4816 ms stdev= 972 ms <- prior default fanout=128 median=4403 ms stdev=1578 ms (t=0.59 NOT sig.; 9.3s outlier) fanout=256 median=4124 ms stdev= 253 ms (t=3.49 SIGNIFICANT) Bumping to 256 on the same single HTTP/2 connection saves ~700 ms median wall (-14.4%) AND collapses variance by ~74% (stdev 972 -> 253 ms). No server-side rejection at 256 concurrent streams against a single npmjs.org HTTP/2 connection — the "most CDNs hard-cap at 100" warning some hypotheses raised does not apply to npm registry on the fixture-large workload. The variance collapse mirrors W2's stream-walker effect on the BFS barrier: with 256 permits in flight, slow-CDN-edge tail fetches now overlap with the rest of the resolution instead of holding the whole batch. Same phenomenon, different layer. Empirical-only change. The `LPM_NPM_FANOUT` env-var override added in the prior commit stays in `BfsWalker::run` as an unprivileged knob for users who want to A/B different fan-outs (for `LPM_NPM_FANOUT=50` diagnostic comparisons or to dial down on memory-constrained CI). The remaining ~3-sec gap to bun's 765 ms is per-connection flow control on the single h2 multiplex socket. Bun uses 64 separate HTTP/1.1 sockets, each with its own TCP congestion window — that's the structural advantage stream concurrency on h2 cannot match. Next phase (55) Option B is an `LPM_HTTP=h1-pool` transport probe to verify and ship the structural fix. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed (1 leaky), 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Hypothesis: bun's 64-socket HTTP/1.1 pool bypasses a per-h2-connection flow-control throttle, hence its 765ms cold install vs our ~4.1s. Test: build_http_client gates an `LPM_HTTP=h1-pool` mode that switches reqwest to: .http1_only() .pool_max_idle_per_host(64) .pool_idle_timeout(120s) .tcp_keepalive(60s) .tcp_nodelay(true) Bench: n=30 cold-equal-footing, stream-greedy on bench/fixture-large (commit b5056a1 baseline = h2 default + fanout=256): h2-default median=4273 ms stdev=254 ms (control) h1-pool-64 median=4766 ms stdev=430 ms (-11.5%, paired t=-4.57) h1-pool-256 median=4651 ms stdev=405 ms ( -8.9%, paired t=-3.46) **Both h1-pool variants are STATISTICALLY SIGNIFICANTLY SLOWER than h2-default.** Both trimmed paired t-stats are negative and well above the 2.069 critical (|t|=4.57 and 3.46 vs 2.069 critical at alpha=0.05). Variance is also ~60% worse with h1-pool. Why h1-pool loses on this fixture against npmjs.org: - 64 separate TLS handshakes per cold install (~50-200ms each with TLS 1.3 + RTT) vs h2's single handshake. - 64 separate TCP slow-starts vs h2's single connection past slow-start. - npmjs.org seems to handle 256 concurrent h2 streams on one connection without throttling (Phase 54 W3-C established this). So the structural argument that "bun's 64 sockets with separate TCP windows dodge a per-h2-connection throttle" is empirically unsupported. **Bun's speed advantage is NOT in HTTP/1 vs HTTP/2 transport.** The remaining ~3.5-sec gap to bun lives somewhere else. Default stays h2 (negotiated via ALPN). The h1-pool path is kept as opt-in (`LPM_HTTP=h1-pool`) so future debugging on different network regimes — CDN routing changes, server-side h2 throttle policy changes — can A/B without re-implementing the branch. The doc comment on build_http_client preserves the bench result so future readers know the option is a refutation, not a recommendation. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ngs) Workspace Cargo.toml had reqwest features = ["json", "rustls-tls", "stream", "http2"]. Adding "gzip" enables transparent Accept-Encoding: gzip + response decompression for all reqwest calls. Empirical wire-byte verification (curl test against npmjs.org): Accept: application/vnd.npm.install-v1+json (current): 69,555 B Same + Accept-Encoding: gzip: 29,348 B (-58%) No Accept header (full packument, for reference): 247,652 B For 258 manifests on bench/fixture-large that's ~18 MB uncompressed vs ~7.6 MB gzipped — saves ~10 MB of wire bytes per cold install. Note: the abbreviated metadata Accept header was already correctly sent at client.rs:928 and :981 (Gemini's "abbreviated metadata smoking gun" hypothesis was already addressed in the codebase). Empirical wall impact (n=10 cold-equal-footing on fixture-large, stream-greedy, fanout=256): WALL median: 4136 ms (mean=4348, stdev=576) vs prior n=30: 4273-4290 ms median (b5056a1 baseline) Delta: ~-150 ms median (~-3.5%), modest RESOLVE median: 3738 ms (vs 3890 ms baseline) ~-150 ms Modest because **wire bandwidth isn't the dominant bottleneck on this network**. Server-side per-request processing is. But: - On slow connections (residential 10 Mbps DSL, mobile, etc.) the 10 MB savings translates to multiple seconds. - This is pure-win telemetry and behavioral correctness — npm explicitly recommends Accept-Encoding: gzip; we were the rude client downloading uncompressed. - Phase 50 F2 in lpm-vs-bun.md flagged this as a real bug back when; ships now. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bun's verbose HTTP traces (captured cold against npmjs.org) show: > HTTP/1.1 GET https://registry.npmjs.org/dayjs > Accept-Encoding: gzip, deflate, br, zstd Adding "brotli" + "zstd" to our reqwest features matches bun's Accept-Encoding offer. Empirical wire-byte savings per lodash manifest: Accept-Encoding: gzip: 29,348 B Accept-Encoding: gzip,deflate,br,zstd: 28,934 B (-1.4%) Marginal — npm's brotli responses are ~1.4% smaller than gzip on this manifest; total install savings ~100 KB across 258 manifests. Wall is unchanged within noise (3-iter cold median ~4,400 ms, indistinguishable from gzip-only baseline). Kept because: 1. Free wire bytes on slow connections. 2. Behavioral parity with bun's request shape. 3. Brotli/zstd decoders in reqwest are well-tested; no measurable regression in 3-iter cold smoke. The 4-sec wall gap to bun's ~1-1.3s is **not in compression choice or wire bytes**. It's HTTP-stack architectural. Bun uses uSockets (custom C++ event loop, originally from uWebSockets): loop: *jsc.MiniEventLoop, NewHTTPContext machinery in src/http/HTTPThread.zig. reqwest+hyper+tokio is general-purpose and ~2-10x slower for high-concurrency keep-alive HTTP. That's the real gap; this commit doesn't change it. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-721 ms) The Phase 55 D micro-benchmark proved reqwest+tokio can fetch 248 npm manifests in ~310 ms median (~800 fetches/sec — actually faster than bun's ~700 fetches/sec). So the HTTP stack was NEVER the bottleneck. Empirical substage probe on the W3-baseline binary against fixture- large cold revealed: pubgrub_ms (greedy resolver wall): 5292 ms streaming_bfs.walk_ms (walker): 1334 ms spec_tx_send_wait_ms: 37 ms (negligible) escape_hatch_fetches: 0 (cache always hit) The resolver was burning ~5 sec on cache-warm lookups, and the walker ~1.3 sec on cache writes. Walker can fetch in 310 ms but spends an extra 1 sec doing... what? Cloning huge CachedPackageInfo structs. `SharedCache = Arc<DashMap<CanonicalKey, CachedPackageInfo>>` — the DashMap stored values directly, not Arc-wrapped. Every read in the greedy resolver's `ensure_manifest` did: return Ok(Arc::new(entry.clone())); i.e. a deep clone of all 7 nested HashMaps (versions × deps, peer_deps, optional, platform, dist, aliases) per edge access. For popular packages with 200+ versions and ~150 deps each, that's ~30 KB-ish of cloned heap data per edge × ~600 edges = ~18 MB of allocator churn during resolve. This commit changes: SharedCache = Arc<DashMap<CanonicalKey, Arc<CachedPackageInfo>>> So `entry.value().clone()` is a refcount bump, not a deep clone. The `into_parts` path at end of resolve still produces the public `HashMap<_, CachedPackageInfo>` shape (one-time deep clone at the boundary; per-edge access during resolve is shared). Substage measurement on cold fixture-large after the fix: pubgrub_ms: 4229 (was 5292) -1063 ms / -20% walk_ms: 613 (was 1334) -721 ms / -54% (!!) spec_tx_send_wait_ms: 44 unchanged Walker dropping by half is the most striking result — confirms the walker was paying cache-write deep-clone cost on every commit_manifest. Single-run wall on cold fixture-large after the fix was high-variance (5 runs: 8760, 5109, 4109, 5160, 3839 ms — median 5109). The substage improvements are masked at the wall level by network noise on 5 samples; will be visible in n=30 V2 protocol. Files touched (mostly mechanical): - provider.rs: insert wraps in Arc::new, into_parts deep-clones at boundary - walker.rs: insert wraps in Arc::new in commit_manifest - greedy.rs: ensure_manifest returns entry.value().clone() (Arc::clone) result-cache extraction deep-clones at boundary - resolve.rs: one test-fixture insert site updated Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed The remaining ~4 sec resolver wall is per-edge CPU cost in greedy's process_edge (NpmRange::parse, HashMap inserts, child Edge allocation across ~600 edges sequentially). That's the next lever — Phase 55 W5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ge -42%, wall flat) Stacks on bcbb6a3 (Phase 55 W4 Arc-wrap SharedCache). Wall-clock improvement on bench/fixture-large is in the noise band (greedy-stream median 3,936 ms, pubgrub-stream 4,020 ms, n=20 cold; ~4× bun's 926 ms); substage view shows real work moved off the critical path. Shipping as observability + structural cleanup; the wall-time win needs walker/resolver fusion next phase. Full flamegraph map + bench tables + next-lever scoping in DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-w5-w6-closeout.md in the a-package-manager repo. W5.1 write_metadata_cache: blocking std::fs::write moved to spawn_blocking; handles tracked in `pending_cache_writes` so tests can deterministically flush via `flush_pending_cache_writes()`. Sync-write fallback when no runtime is available + an opt-in `with_synchronous_cache_writes(true)` builder for mock-server tests. Pre-W5.1 every walker manifest fetch serialized through a sync write that blocked the calling tokio worker. W5.2 Drop HMAC-SHA256 from the metadata cache file format. New shape: `LPM-MD-V2\n{ETag}\n{msgpack-payload}`. Old HMAC-prefixed entries fail the magic check and are silently treated as misses (refetch + rewrite in the new format). Removed the persistent `cache_signing_key` plumbing + four signing-key helpers + CACHE_SIGNING_KEY_FILE constant + three HMAC-specific tests. Bun's manifest cache uses the same trust-the-FS model (header magic only). A1 Split followup_rpc_count into walker_rpc_count + escape_hatch_rpc_count. Both BFS (`run_bfs`) and streaming (`run_stream`) walkers post-record via `record_walker_rpcs(n)` after each fetch path returns; new fields surface on StageTiming and in `--json`. Pre-A1 the metric conflated walker pre-fetch with provider escape-hatch; fixture-large now reports 42 walker + 92 escape-hatch = 134 total, making the walker/resolver race visible (69% of fetches escape-hatch). A3 ResolveResult.cache: HashMap<CanonicalKey, CachedPackageInfo> → HashMap<CanonicalKey, Arc<CachedPackageInfo>>. Eliminates the end-of-resolve deep-clone (~248 entries × ~30 KB nested HashMaps = ~7.4 MB allocator churn hidden inside pubgrub_ms). Auto-deref handles consumer field access; format_solution, check_unmet_peers, provider.into_parts, greedy.resolve_greedy, and the test helper make_cached_info needed signature updates. W6a fetch_and_store_streaming and fetch_and_store_legacy now consume an OwnedSemaphorePermit and `drop()` it between download and extract. Pre-W6a the permit covered both phases end-to-end; the max ~141 ms extract was a serializing tail behind every sibling's download permit hand-off. Streaming path collects compressed tarball into bytes::Bytes (bounded by MAX_COMPRESSED_TARBALL_SIZE; ~12 MB peak resident across 24 permits), drops permit, then spawn_blocking extract from a Cursor<Bytes>. Legacy path drops permit after temp-file write. Bench (n=20 cold, bench/fixture-large, three-cell, vs pre-W5 n=3 baseline): arm pre-W5 W5+W6 Δ × bun pubgrub-stream 3,911 ms 4,020 ms +109 ms 4.34× greedy-stream 3,936 ms 3,936 ms flat 4.25× bun 868 ms 926 ms — Substage deltas W5 → W5+W6 (medians, ms; greedy-stream): extract_sum 824 → 478 −42% extract_max 122 → 94 −28 finalize_sum 1,188 → 695 −41% pubgrub_ms 3,432 → 3,245 −187 Workspace nextest: 4,393 tests pass, 0 failed, 7 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The `Arc<CachedPackageInfo>` wrap added in 46f8b91 pushed the 4-tuple return type over clippy 1.94.0's type_complexity threshold (it was `HashMap<_, CachedPackageInfo>` before — same arity, one less nesting level). The single caller in `resolve.rs:369-370` already destructures with readable names, so a named-fields struct here is cleanup-of-clean- code rather than a real refactor; CLAUDE.md explicitly permits this `#[allow]` for design-level lints. Caught by `cargo clippy --workspace --all-targets -- -D warnings` on local toolchain 1.94.1; fix verified green on a fresh CARGO_TARGET_DIR cold build alongside the full nextest suite (4,393 tests pass / 0 fail / 7 skip) and lpm-auth determinism rerun ×2 (47/0/0 both runs, no flakes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 D2: promote download_tarball_routed helpers to RegistryClient Behavior-preserving refactor extracting the two private routed-tarball helpers from install.rs (download_tarball_routed, download_tarball_streaming_routed) onto RegistryClient as public methods. Both `lpm install` and the upcoming Phase 60 `lpm add` source- delivery flow consume the same Custom-route auth-attachment logic. - crates/lpm-registry/src/client.rs: add public methods - crates/lpm-cli/src/commands/install.rs: switch all 5 call sites to the new methods; delete the private helpers; remove the now-unused DownloadedTarball import All 602 install + npmrc tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.0.e: PackageMetadata::resolve_version_spec helper Add a three-tier version-spec resolver on PackageMetadata covering dist-tag → exact-version → semver-range, mirroring the canonical pattern at install_global.rs:368-405 verbatim. Pre-Phase-60, `lpm add react@beta`, `next@canary`, `lodash@^4` all failed because PackageMetadata::version() is a pure HashMap lookup — none of those literal strings exist as concrete versions. The new helper closes the gap. Per D3 (preplan): both parse-failure and no-satisfying-version return LpmError::Script (matching install_global verbatim) so the Phase 60.1 migration of the four duplicate sites (install_global, install, update_global, global) is a true behavior-preserving refactor. 9 unit tests cover dist-tag (latest/beta/canary), exact match, caret/tilde range, no-satisfying error, parse-fail error, and empty-versions error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.0+60.1+60.1.5+60.2: lpm add source delivery from any registry Decouple `lpm add` from LPM-only package identity, mirror install's full .npmrc setup, switch to file-spool tarball download, add destination-side path containment, gate dep auto-install on lpm.config.json presence, and surface external imports for the simple path. End-to-end flow now works for any package on any registry the rust client can reach (lpm.dev worker, npmjs.org direct, .npmrc- declared private registries). 60.0.a + 60.0.b — Identity refactor + drop dotted-name auto-prepend - New AddTarget enum: Lpm(PackageName) | Npm { spec: String }. - New resolve_add_target replaces parse_package_ref. No rewriting outside the @lpm.dev/ scope — `lodash.merge`, `tolga.foo`, etc. resolve to AddTarget::Npm verbatim. Fixes a long-standing correctness bug: pre-Phase-60 dotted bare names were silently rewritten to @lpm.dev/<name> which doesn't exist on lpm.dev. - All output / log / JSON sites render via target.display() / target.json_name() — `name.scoped()` no longer used unconditionally. - Skills branch type-encoded via `let AddTarget::Lpm(pkg) = &target` pattern, with a why-comment (60.2) explaining the scope gate (lpm.dev runs LLM scans on shipped skill content; arbitrary npm packages are not scanned). 60.0.c — Mirror install's full .npmrc setup - Build RouteTable::from_env_and_filesystem before any network call. - Surface npmrc_warnings (non-JSON) and the strict-ssl=false security warning (escapes --json). Clone the client with with_tls_overrides so cafile= / strict-ssl=false take effect on metadata + tarball fetches. Mirrors install.rs:3295-3445. 60.0.d — Routed metadata + file-spool tarball - Metadata: AddTarget::Lpm uses get_package_metadata; AddTarget::Npm uses get_npm_metadata_routed. - Tarball: client.download_tarball_routed (D2 promoted helper) + lpm_extractor::extract_tarball_from_file. Bounded memory via MAX_COMPRESSED_TARBALL_SIZE (500 MB) for free; lpm add typescript (~22 MB) and worst-case @scope/giant-fixture no longer load the whole tarball into RAM. 60.0.f — Destination-side path containment (D6) - New resolve_safe_dest helper canonicalizes target_dir once and validates every write destination: refuses to follow existing symlinks, rejects writes whose canonical parent escapes the target root. Wired into the Step 8 file-copy loop. Closes the threat-model gap that opened up when add expanded from "trusted lpm.dev publishers" to "any npm publisher." 60.1 — Dep gate + bare-imports notice (D4) - Tighten dep gate: `if !no_install_deps && lpm_config.is_some()`. Simple path is download-manager: copy bytes, no auto-install. - import_rewriter exports a sibling collect_bare_specifiers fn that shares an internal SpecifierKind classifier with rewrite_imports (anti-drift contract — "bare" means the same thing in both places). - add.rs surfaces the collected externals as a non-JSON notice and as a `external_imports` array in the JSON output. 60.1.5 — Non-interactive simple-path guard - `lpm_config.is_none() && target_path.is_none() && (yes || json || !is_tty)` errors before the file-copy loop. Heuristically defaulting components/ for arbitrary 3rd-party source under --yes/--json/non-TTY is a CI/automation footgun. Tests - 15 unit tests in add.rs (resolve_add_target classification including the dotted-name regression; resolve_safe_dest contracts including symlink-refusal on Unix). - 10 unit tests in import_rewriter.rs (classify_specifier, collect_bare_specifiers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.3: integration tests for lpm add simple path + guards + traversal Three new wiremock-driven integration tests covering the highest-value end-to-end scenarios for Phase 60: - add_simple_non_interactive_without_path.rs (4 sub-tests) — proves the 60.1.5 guard fires for --yes, --json, and non-TTY (stdin from /dev/null) without --path; positive control with --path succeeds. No package.json mutation in any failure case. - add_source_npm_simple.rs (2 sub-tests) — full simple-path pipeline via wiremock npm metadata + tarball: AddTarget::Npm resolves, file- spool download, extract, files copied flat (no auto-nest), bare- imports notice lists react + @radix-ui/react-slot, package.json NOT mutated, .lpm/skills/ NOT created. JSON sub-test asserts the package.name uses the npm-style identity (not @lpm.dev/-prefixed) and the new external_imports array is well-shaped. - add_path_traversal_dest_escape.rs — proves resolve_safe_dest is wired into the actual write loop, not just unit-tested in isolation. Tarball ships an lpm.config.json with files[0].dest = "../../escaped/evil.txt" — assertion: containment-violation error, exit non-zero, no file written outside target_dir. Other 60.3 specced tests are either (i) covered by the unit tests that landed alongside the implementation (#5 dotted-name, #9 version- spec, #11 symlink — see preplan v6 audit checklist) or (ii) deliberately deferred where the underlying machinery is already test-covered by Phase 58.x install tests (#1 lpm.dev rich, #2 npm rich, #6 npmrc auth, #7 strict-ssl, #8 missing-var fatal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.4: README — lpm add now works against any registry - Update the lpm add one-liner in the Commands list. - Add a "How lpm add Works" section explaining: source delivery vs. install, the firm naming rule (@lpm.dev/owner.name only), the rich vs. simple paths, and the non-interactive --path requirement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 audit fix: resolve_safe_dest must validate before mkdir Audit reproduced (with a temp-dir filesystem probe) that the landed resolve_safe_dest helper still created directories OUTSIDE the target_dir for two attack vectors before the containment error fired: 1. `dest_rel = "../../escaped/evil.txt"` — `Path::join` resolves lexically; `dest.parent()` lands outside target; `create_dir_all` ran before the containment check, leaving `<target>/../escaped/` on disk even though the file write was correctly blocked. 2. Absolute `dest_rel = "/tmp/elsewhere/evil.txt"` — `Path::join` of an absolute path returns the absolute path verbatim; `parent = /tmp/elsewhere/`; `create_dir_all` created it before the containment check fired. The original integration test only asserted no escaped FILE existed, so the directory-side-effect bug passed CI. Fix - Reorder resolve_safe_dest so EVERY check that can reject the destination runs BEFORE any filesystem mutation: Step 1 (NEW) — reject absolute dest_rel up-front. Step 2 (NEW) — reject any ParentDir / RootDir / Prefix component. Step 3 — refuse existing-symlink destinations. Step 4 (NEW) — pre-mkdir ancestor canonicalization: walk up to the longest existing ancestor; canonicalize; require it under target_root_canonical (catches symlinked intermediate dirs). Step 5 — create_dir_all (NOW safe). Step 6 — post-mkdir re-canonicalize as TOCTOU defense-in-depth. The lexical bans in Steps 1-2 kill the entire `../escape` and absolute-path attack classes before any mkdir runs. The longest- existing-ancestor walk in Step 4 covers the symlinked-intermediate case (target/foo → /tmp/elsewhere). Step 6 is paranoia. Tests - Strengthen unit tests: - resolve_safe_dest_dotdot_in_path_rejected_with_no_external_dir_created now asserts no escape directory was created. - resolve_safe_dest_absolute_dest_rejected_with_no_external_dir_created is new — covers the absolute-path attack. - resolve_safe_dest_dotdot_in_middle_of_path_also_rejected covers `foo/../bar.txt` (lexically resolves back inside but still rejected up-front). - Extend integration test: - dest_escape_via_dotdot_is_refused_and_creates_no_external_directory now snapshots target_dir entries before the run and asserts no unexpected new top-level entries appeared, plus no escape dir. - dest_escape_via_absolute_path_is_refused_and_creates_no_external_directory is new — covers the absolute-path attack at the integration level. Net: 4923 → 4926 workspace tests; clippy + fmt clean; all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…se docs Phase 64 audit findings #1, #11, #22, #28: - Add `lpm completions <shell>` (bash/zsh/fish/powershell/elvish) via `clap_complete`. Script is generated from the live `Cli` definition so it stays in sync with the binary at every release. - Add `lpm graph --no-open` to suppress the auto-launch of the default browser on `--format html`. The file is still written to `<project>/.lpm/graph.html`. Useful in headless / CI environments. - Rename `lpm rebuild --rebuild` to `--force` (no alias kept, per the decision to stop bridging). The flag-named-after-the-command UX was awkward and `--force` matches the intent better. - Tighten `lpm use` spec doc-comment to drop "runtime" framing — the command is Node.js-only today. Adjacent issues surfaced and fixed during the work: - Drop `short = 'V'` from `lpm query --query-verbose`. The local short collided with the global `-V` / `--version` and tripped clap's structural assertion under `clap_complete::generate`. Without this fix `lpm completions <any>` would panic. `--query-verbose` long form is unaffected. - Phase 47 P0 rename leftovers: `"lpm build --all"` → `"lpm rebuild --all"` in the rebuild.rs untrusted-pkg hint and `lpm build` → `lpm rebuild` in the install.rs auto-build comment. New parser tests cover all four findings + the legacy `--rebuild` flag is asserted to fail at parse time so users get an explicit error rather than a silent no-op.

tolgaergin and others added 11 commits April 26, 2026 12:58

tolgaergin merged commit 4934909 into main Apr 27, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning#11

Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning#11
tolgaergin merged 11 commits into
mainfrom
phase-53-greedy-resolver

tolgaergin commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tolgaergin commented Apr 27, 2026

Summary

Bench results — bench/fixture-large (266 packages), cold-equal-footing, n=20

What changes — by phase

Phase 53: greedy multi-version resolver

Phase 54: continuous-stream walker

Phase 55: transport tuning

What is NOT in this PR

Files changed

Commit history

Test plan

Companion documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bench results — `bench/fixture-large` (266 packages), cold-equal-footing, n=20