Phase 53/54/55: greedy resolver + continuous-stream walker + transport tuning#11
Merged
Conversation
Lands the first commit of Phase 53's bun-recipe port: a single-version-per-
canonical greedy resolver gated behind LPM_RESOLVER=greedy. Default stays
PubGrub-with-split-retry while the new path ships dark.
The W1 algorithm follows bun's enqueueDependencyWithMain shape:
1. Seed task_queue with one Edge per root dep.
2. Pop edge -> ensure_manifest() (Phase 49 SharedCache hit / wait /
escape-hatch fetch via existing route_mode plumbing).
3. find_best_version(): reverse-iterate sorted versions, first
satisfying match wins. O(versions).
4. Reuse existing node for the canonical (W1 single-version rule)
or allocate a new one and enqueue ITS children.
5. Loop until task_queue empty.
No backtracking, no split-retry. ~600-1000 find_best_version calls per
266-pkg tree at ~us each.
Empirical sanity (pure-npm /tmp fixture: zod + dayjs, full cache wipe):
PUBGRUB (default): wall=781 ms
GREEDY: wall=209 ms (resolve=58 fetch=122 link=2)
Both resolved identical versions (dayjs@1.11.20, zod@3.25.76). 3.7x cold
on this fixture. The W1 ship gate is bench/project (51 pkgs, no conflicts)
producing identical resolved trees vs PubGrub at lower resolve_ms; that
bench runs next.
W1 deferred items wired in subsequent commits:
- W2: drop single-version rule, dedupe by (canonical, version)
- W3: peer/required asymmetry + peer queue drain pass
- W4: Phase 32 P5 path-selector overrides + Phase 40 P2 root aliases
- W5: BfsWalker integration (drop redundant SharedCache copy)
Plumbing reused as-is: SharedCache, NotifyMap, WalkerDone, route_mode,
parse_metadata_to_cache_info, NpmRange. PubGrub stays in tree until W7.
Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
cargo fmt --check clean
no fancy-regex in any Cargo.toml clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo build --workspace clean
cargo nextest run --workspace --exclude lpm-integration-tests 4391 passed, 7 skipped, 0 failed
(+8 vs baseline; 6 are new greedy.rs unit tests)
See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops W1's single-version-per-canonical rule. Each (canonical, version)
pair now gets its own resolved node; when an edge wants a canonical
already in the tree, the resolver reuses an existing node ONLY IF its
version satisfies the new edge's range. Incompatible ranges allocate
new nodes — bun + npm + pnpm semantics, no PubGrub split-retry needed.
Key data structure changes in greedy.rs:
- resolved: HashMap<CanonicalKey, NodeId>
-> HashMap<CanonicalKey, Vec<(NpmVersion, NodeId)>>
- children_enqueued: HashSet<CanonicalKey>
-> HashSet<(CanonicalKey, NpmVersion)>
- new field: ResolveState.platform_skipped: usize
- new enum: VersionPick { Picked(v) | NoSatisfying | PlatformFiltered }
Platform filter (provider::is_platform_compatible promoted to pub(crate))
applied in find_best_version. Without it, optional deps with `os`/`cpu`
constraints (e.g. esbuild's per-platform binary tarballs) over-install:
fixture-large was producing 301 packages vs PubGrub's 254 because every
@esbuild/* platform binary was getting resolved. After the fix:
greedy installs 255 packages on fixture-large (one extra: a transitive
qs landing on a different body-parser version), tree near-identical to
PubGrub's. The platform_skipped counter is incremented and surfaced
through ResolveResult.platform_skipped for --json telemetry parity.
Tree-correctness empirical (release binary, full cache wipe per arm):
bench/project (17 root, ~50 transitives):
PubGrub: 51 packages, wall 902 ms
Greedy: 51 packages, wall 435 ms
diff: TREES IDENTICAL (W1 had a 50-vs-51 gap; W2 closes it)
bench/fixture-large (21 root, ~254 transitives):
PubGrub: 254 packages
Greedy: 255 packages
diff: 6 lines (1 body-parser minor delta + 1 transitive qs)
near-identical, no platform-binary over-install.
Resolve-substage measurement is HIGH VARIANCE on single runs:
fixture-large cold (single run, this session): PubGrub resolve_ms
ranged 4068 -> 8799 across runs depending on how many split passes
the trace hit on that run. Greedy resolve_ms was 5816 on this run.
The W2 ship gate's "resolve_ms <= 1500 ms median" target requires the
n=30 V2 protocol to be conclusive — single-run numbers don't decide
it. The correctness-defining work (multi-version + platform filter)
ships here so the perf work has a stable substrate.
Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
cargo fmt --check clean
no fancy-regex in any Cargo.toml clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo build --workspace clean
cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed
(+3 vs W1: process_edge multi-version test +
find_best_version platform-filter test +
handle_no_version platform_skipped counter test)
Truth references vs PubGrub tree: per user feedback, lpm's PubGrub-with-
split-retry is NOT the source of truth — bun/pnpm/npm are (the trees
millions of users rely on). W2's reuse-on-compatible-range / allocate-
on-incompatible-range matches their semantics; PubGrub's tree is itself
a workaround for a flat-resolution constraint that doesn't exist in any
mature npm-shaped resolver.
W3 priority: peer-dep queue + drain pass; eliminates a known difference
where greedy may pick a version that violates a peer-dep range that
PubGrub's solver would have caught. Already used by W1's body-parser
1.20.4 vs 1.20.5 delta on fixture-large.
See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-greedy-resolver-preplan.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three-phase wall-clock attribution (setup / fetch / commit) per
BFS level in `BfsWalker::run`, exposed via:
- New `LevelTiming` struct, `pub levels: Vec<LevelTiming>` on
`WalkerSummary`, exported from `lpm_resolver`.
- `tracing::info!(target = "lpm_resolver::walker")` per-level line so
`RUST_LOG=lpm_resolver::walker=info` dumps all levels to stderr.
- New `timing.resolve.streaming_bfs.levels` array in install --json
output. (Built outside the outer json! macro to keep
recursion_limit-friendly.)
Diagnostic only — no algorithm change. Validates the Phase 54 thesis:
the BFS-level barrier in `walker.rs:243-353` (`tokio::join!` per
level + `for` loop draining results before advancing) is the cost we
pay. Single cold install on `bench/fixture-large` (single-run, fresh
cache):
level 0: seeded= 21 fetch=575 commit=36 total=611 ms
level 1: seeded= 99 fetch=494 commit=25 total=520 ms
level 2: seeded= 63 fetch=447 commit=10 total=457 ms
level 3: seeded= 50 fetch=941 commit=22 total=963 ms
level 4: seeded= 18 fetch=336 commit= 1 total=338 ms
level 5: seeded= 6 fetch=306 commit= 0 total=306 ms
level 6: seeded= 1 fetch= 21 commit= 0 total= 21 ms
────────────────────────────────────────────────────
fetch=3120 commit=94 total=3216 ms
walker_wall_ms = 3219 (matches sum-of-totals)
Findings:
- 7 sequential levels; each is a stop-the-world barrier.
- Commit phase is 3% of walker wall — the dead time isn't there.
- Level 3 (50 pkgs, 941 ms) > Level 1 (99 pkgs, 494 ms).
Slowest-fetch-per-level holds each barrier — a stream walker
would only wait on the critical-path chain.
- Per Phase 54 preplan §4 W1 ship gate: walker_wall / resolve_ms ≈
71% on this run (resolve_ms was ~4500 ms). >50% threshold met,
so the level-barrier is the dominant cost. **W2 implementation
justified empirically.**
Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
cargo fmt --check clean
no fancy-regex clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo build --workspace clean
cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed
(same as Phase 53 W2 baseline; instrumentation
added no new tests, doesn't break old ones)
W2 next: replace `BfsWalker::run`'s level loop with a continuous-
stream loop (mpsc + FuturesUnordered + Semaphore). Both PubGrub and
greedy benefit equally; both stay opt-in via `LPM_RESOLVER`.
See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `BfsWalker::run_stream` — a bun-style fetch-on-manifest-parse
walker that eliminates the level-barrier in `run_bfs` (the W1-
instrumented version). The dispatch in `BfsWalker::run` switches on
`LPM_WALKER`; default stays `bfs` until W3's n=30 bench validates.
The stream walker:
- Owns one `tokio::sync::mpsc::UnboundedSender<(name, depth)>`. Roots
seeded at depth 0; each child sent with parent_depth + 1; depth
capped by the existing MAX_WALK_DEPTH (16).
- Uses `tokio::task::JoinSet` (tokio-native FuturesUnordered<JoinHandle>
equivalent — no `futures` crate dep) for in-flight fetches.
- Single global `Arc<Semaphore>(npm_fanout)` caps concurrent fetches.
Permit acquisition lives INSIDE the spawned future so the dispatch
loop never blocks on the semaphore — tasks queue parked-on-permit
in the runtime, the semaphore caps actual network I/O.
- `tokio::select!` between `in_flight.join_next()` (biased: process
results first to free permits + discover children) and `rx.recv()`.
- Termination: when `in_flight.is_empty() && rx.is_empty()`, drop the
sender; the next `rx.recv()` returns None, both arms unmatched,
`else => break`.
- Calls existing `commit_manifest` for cache insert + notify_waiters
+ roots_ready + spec_tx.send — preplan §5.5 ordering preserved
exactly. `expand_deps_from_info` reused as-is.
W2 SHIP GATE — single-run cold on bench/fixture-large (266 pkgs):
pubgrub-bfs wall=5763 ms | 254 pkgs ← W1 baseline
pubgrub-stream wall=4469 ms | 254 pkgs ← -22.5%
greedy-bfs wall=4798 ms | 255 pkgs ← W1 baseline
greedy-stream wall=4148 ms | 255 pkgs ← -13.6%
TREE PARITY (within same resolver, BFS vs Stream MUST match):
pubgrub-bfs == pubgrub-stream TREES IDENTICAL ✓
greedy-bfs == greedy-stream TREES IDENTICAL ✓
Smoke gates per Phase 54 preplan §4 W2:
- pure-npm (zod+dayjs): bfs wall=946 / stream wall=206 ms (4.6x)
- bench/project (51): bfs wall=672 / stream wall=409 ms (1.6x)
TREES IDENTICAL ✓
Per-package fetch via existing client APIs (per preplan §3.5):
- npm direct: client.get_npm_metadata_direct(&name)
- npm via worker: client.get_npm_package_metadata(&name)
- @lpm.dev/*: PackageName::parse + client.get_package_metadata(&pkg)
Single-run wall delta is smaller than the preplan's projected ~2 sec
because single-run variance is high (one slow CDN edge per run). W3's
n=30 V2 protocol will settle the median delta with the trimmed paired
t-stat; expectation is ~1-2 sec on the headline median.
Both PubGrub and greedy benefit equally because they share the
walker; neither resolver is removed in this phase. `LPM_WALKER=bfs`
stays the default until W3's bench is green.
Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
cargo fmt --check clean
no fancy-regex clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo build --workspace clean
cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed
W3 next: n=30 V2 four-cell A/B (PubGrub × {BFS, Stream}) ×
(Greedy × {BFS, Stream}) on fixture-large + bun control. If
stream-arms beat bfs-arms by >=1.5s on resolve_ms median, W4 flips
LPM_WALKER default to stream.
See: DOCS/new-features/37-rust-client-RUNNER-VISION-phase54-continuous-stream-walker-preplan.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 54 W3 Option C bench (n=10 cold-equal-footing on bench/fixture- large, stream walker + greedy resolver): fanout= 50 median=4816 ms stdev= 972 ms <- prior default fanout=128 median=4403 ms stdev=1578 ms (t=0.59 NOT sig.; 9.3s outlier) fanout=256 median=4124 ms stdev= 253 ms (t=3.49 SIGNIFICANT) Bumping to 256 on the same single HTTP/2 connection saves ~700 ms median wall (-14.4%) AND collapses variance by ~74% (stdev 972 -> 253 ms). No server-side rejection at 256 concurrent streams against a single npmjs.org HTTP/2 connection — the "most CDNs hard-cap at 100" warning some hypotheses raised does not apply to npm registry on the fixture-large workload. The variance collapse mirrors W2's stream-walker effect on the BFS barrier: with 256 permits in flight, slow-CDN-edge tail fetches now overlap with the rest of the resolution instead of holding the whole batch. Same phenomenon, different layer. Empirical-only change. The `LPM_NPM_FANOUT` env-var override added in the prior commit stays in `BfsWalker::run` as an unprivileged knob for users who want to A/B different fan-outs (for `LPM_NPM_FANOUT=50` diagnostic comparisons or to dial down on memory-constrained CI). The remaining ~3-sec gap to bun's 765 ms is per-connection flow control on the single h2 multiplex socket. Bun uses 64 separate HTTP/1.1 sockets, each with its own TCP congestion window — that's the structural advantage stream concurrency on h2 cannot match. Next phase (55) Option B is an `LPM_HTTP=h1-pool` transport probe to verify and ship the structural fix. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed (1 leaky), 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hypothesis: bun's 64-socket HTTP/1.1 pool bypasses a per-h2-connection flow-control throttle, hence its 765ms cold install vs our ~4.1s. Test: build_http_client gates an `LPM_HTTP=h1-pool` mode that switches reqwest to: .http1_only() .pool_max_idle_per_host(64) .pool_idle_timeout(120s) .tcp_keepalive(60s) .tcp_nodelay(true) Bench: n=30 cold-equal-footing, stream-greedy on bench/fixture-large (commit b5056a1 baseline = h2 default + fanout=256): h2-default median=4273 ms stdev=254 ms (control) h1-pool-64 median=4766 ms stdev=430 ms (-11.5%, paired t=-4.57) h1-pool-256 median=4651 ms stdev=405 ms ( -8.9%, paired t=-3.46) **Both h1-pool variants are STATISTICALLY SIGNIFICANTLY SLOWER than h2-default.** Both trimmed paired t-stats are negative and well above the 2.069 critical (|t|=4.57 and 3.46 vs 2.069 critical at alpha=0.05). Variance is also ~60% worse with h1-pool. Why h1-pool loses on this fixture against npmjs.org: - 64 separate TLS handshakes per cold install (~50-200ms each with TLS 1.3 + RTT) vs h2's single handshake. - 64 separate TCP slow-starts vs h2's single connection past slow-start. - npmjs.org seems to handle 256 concurrent h2 streams on one connection without throttling (Phase 54 W3-C established this). So the structural argument that "bun's 64 sockets with separate TCP windows dodge a per-h2-connection throttle" is empirically unsupported. **Bun's speed advantage is NOT in HTTP/1 vs HTTP/2 transport.** The remaining ~3.5-sec gap to bun lives somewhere else. Default stays h2 (negotiated via ALPN). The h1-pool path is kept as opt-in (`LPM_HTTP=h1-pool`) so future debugging on different network regimes — CDN routing changes, server-side h2 throttle policy changes — can A/B without re-implementing the branch. The doc comment on build_http_client preserves the bench result so future readers know the option is a refutation, not a recommendation. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngs) Workspace Cargo.toml had reqwest features = ["json", "rustls-tls", "stream", "http2"]. Adding "gzip" enables transparent Accept-Encoding: gzip + response decompression for all reqwest calls. Empirical wire-byte verification (curl test against npmjs.org): Accept: application/vnd.npm.install-v1+json (current): 69,555 B Same + Accept-Encoding: gzip: 29,348 B (-58%) No Accept header (full packument, for reference): 247,652 B For 258 manifests on bench/fixture-large that's ~18 MB uncompressed vs ~7.6 MB gzipped — saves ~10 MB of wire bytes per cold install. Note: the abbreviated metadata Accept header was already correctly sent at client.rs:928 and :981 (Gemini's "abbreviated metadata smoking gun" hypothesis was already addressed in the codebase). Empirical wall impact (n=10 cold-equal-footing on fixture-large, stream-greedy, fanout=256): WALL median: 4136 ms (mean=4348, stdev=576) vs prior n=30: 4273-4290 ms median (b5056a1 baseline) Delta: ~-150 ms median (~-3.5%), modest RESOLVE median: 3738 ms (vs 3890 ms baseline) ~-150 ms Modest because **wire bandwidth isn't the dominant bottleneck on this network**. Server-side per-request processing is. But: - On slow connections (residential 10 Mbps DSL, mobile, etc.) the 10 MB savings translates to multiple seconds. - This is pure-win telemetry and behavioral correctness — npm explicitly recommends Accept-Encoding: gzip; we were the rude client downloading uncompressed. - Phase 50 F2 in lpm-vs-bun.md flagged this as a real bug back when; ships now. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean no fancy-regex clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bun's verbose HTTP traces (captured cold against npmjs.org) show: > HTTP/1.1 GET https://registry.npmjs.org/dayjs > Accept-Encoding: gzip, deflate, br, zstd Adding "brotli" + "zstd" to our reqwest features matches bun's Accept-Encoding offer. Empirical wire-byte savings per lodash manifest: Accept-Encoding: gzip: 29,348 B Accept-Encoding: gzip,deflate,br,zstd: 28,934 B (-1.4%) Marginal — npm's brotli responses are ~1.4% smaller than gzip on this manifest; total install savings ~100 KB across 258 manifests. Wall is unchanged within noise (3-iter cold median ~4,400 ms, indistinguishable from gzip-only baseline). Kept because: 1. Free wire bytes on slow connections. 2. Behavioral parity with bun's request shape. 3. Brotli/zstd decoders in reqwest are well-tested; no measurable regression in 3-iter cold smoke. The 4-sec wall gap to bun's ~1-1.3s is **not in compression choice or wire bytes**. It's HTTP-stack architectural. Bun uses uSockets (custom C++ event loop, originally from uWebSockets): loop: *jsc.MiniEventLoop, NewHTTPContext machinery in src/http/HTTPThread.zig. reqwest+hyper+tokio is general-purpose and ~2-10x slower for high-concurrency keep-alive HTTP. That's the real gap; this commit doesn't change it. Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target): cargo fmt --check clean cargo clippy --workspace --all-targets -- -D warnings clean cargo build --workspace clean cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-721 ms)
The Phase 55 D micro-benchmark proved reqwest+tokio can fetch 248 npm
manifests in ~310 ms median (~800 fetches/sec — actually faster than
bun's ~700 fetches/sec). So the HTTP stack was NEVER the bottleneck.
Empirical substage probe on the W3-baseline binary against fixture-
large cold revealed:
pubgrub_ms (greedy resolver wall): 5292 ms
streaming_bfs.walk_ms (walker): 1334 ms
spec_tx_send_wait_ms: 37 ms (negligible)
escape_hatch_fetches: 0 (cache always hit)
The resolver was burning ~5 sec on cache-warm lookups, and the walker
~1.3 sec on cache writes. Walker can fetch in 310 ms but spends an
extra 1 sec doing... what? Cloning huge CachedPackageInfo structs.
`SharedCache = Arc<DashMap<CanonicalKey, CachedPackageInfo>>` — the
DashMap stored values directly, not Arc-wrapped. Every read in the
greedy resolver's `ensure_manifest` did:
return Ok(Arc::new(entry.clone()));
i.e. a deep clone of all 7 nested HashMaps (versions × deps,
peer_deps, optional, platform, dist, aliases) per edge access.
For popular packages with 200+ versions and ~150 deps each, that's
~30 KB-ish of cloned heap data per edge × ~600 edges = ~18 MB of
allocator churn during resolve.
This commit changes:
SharedCache = Arc<DashMap<CanonicalKey, Arc<CachedPackageInfo>>>
So `entry.value().clone()` is a refcount bump, not a deep clone. The
`into_parts` path at end of resolve still produces the public
`HashMap<_, CachedPackageInfo>` shape (one-time deep clone at the
boundary; per-edge access during resolve is shared).
Substage measurement on cold fixture-large after the fix:
pubgrub_ms: 4229 (was 5292) -1063 ms / -20%
walk_ms: 613 (was 1334) -721 ms / -54% (!!)
spec_tx_send_wait_ms: 44 unchanged
Walker dropping by half is the most striking result — confirms the
walker was paying cache-write deep-clone cost on every commit_manifest.
Single-run wall on cold fixture-large after the fix was high-variance
(5 runs: 8760, 5109, 4109, 5160, 3839 ms — median 5109). The substage
improvements are masked at the wall level by network noise on 5
samples; will be visible in n=30 V2 protocol.
Files touched (mostly mechanical):
- provider.rs: insert wraps in Arc::new, into_parts deep-clones at boundary
- walker.rs: insert wraps in Arc::new in commit_manifest
- greedy.rs: ensure_manifest returns entry.value().clone() (Arc::clone)
result-cache extraction deep-clones at boundary
- resolve.rs: one test-fixture insert site updated
Pre-merge gate (CARGO_TARGET_DIR=/tmp/lpm-rs-phase53-target):
cargo fmt --check clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo build --workspace clean
cargo nextest run --workspace --exclude lpm-integration-tests 4394 passed, 7 skipped, 0 failed
The remaining ~4 sec resolver wall is per-edge CPU cost in greedy's
process_edge (NpmRange::parse, HashMap inserts, child Edge allocation
across ~600 edges sequentially). That's the next lever — Phase 55 W5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ge -42%, wall flat) Stacks on bcbb6a3 (Phase 55 W4 Arc-wrap SharedCache). Wall-clock improvement on bench/fixture-large is in the noise band (greedy-stream median 3,936 ms, pubgrub-stream 4,020 ms, n=20 cold; ~4× bun's 926 ms); substage view shows real work moved off the critical path. Shipping as observability + structural cleanup; the wall-time win needs walker/resolver fusion next phase. Full flamegraph map + bench tables + next-lever scoping in DOCS/new-features/37-rust-client-RUNNER-VISION-phase53-w5-w6-closeout.md in the a-package-manager repo. W5.1 write_metadata_cache: blocking std::fs::write moved to spawn_blocking; handles tracked in `pending_cache_writes` so tests can deterministically flush via `flush_pending_cache_writes()`. Sync-write fallback when no runtime is available + an opt-in `with_synchronous_cache_writes(true)` builder for mock-server tests. Pre-W5.1 every walker manifest fetch serialized through a sync write that blocked the calling tokio worker. W5.2 Drop HMAC-SHA256 from the metadata cache file format. New shape: `LPM-MD-V2\n{ETag}\n{msgpack-payload}`. Old HMAC-prefixed entries fail the magic check and are silently treated as misses (refetch + rewrite in the new format). Removed the persistent `cache_signing_key` plumbing + four signing-key helpers + CACHE_SIGNING_KEY_FILE constant + three HMAC-specific tests. Bun's manifest cache uses the same trust-the-FS model (header magic only). A1 Split followup_rpc_count into walker_rpc_count + escape_hatch_rpc_count. Both BFS (`run_bfs`) and streaming (`run_stream`) walkers post-record via `record_walker_rpcs(n)` after each fetch path returns; new fields surface on StageTiming and in `--json`. Pre-A1 the metric conflated walker pre-fetch with provider escape-hatch; fixture-large now reports 42 walker + 92 escape-hatch = 134 total, making the walker/resolver race visible (69% of fetches escape-hatch). A3 ResolveResult.cache: HashMap<CanonicalKey, CachedPackageInfo> → HashMap<CanonicalKey, Arc<CachedPackageInfo>>. Eliminates the end-of-resolve deep-clone (~248 entries × ~30 KB nested HashMaps = ~7.4 MB allocator churn hidden inside pubgrub_ms). Auto-deref handles consumer field access; format_solution, check_unmet_peers, provider.into_parts, greedy.resolve_greedy, and the test helper make_cached_info needed signature updates. W6a fetch_and_store_streaming and fetch_and_store_legacy now consume an OwnedSemaphorePermit and `drop()` it between download and extract. Pre-W6a the permit covered both phases end-to-end; the max ~141 ms extract was a serializing tail behind every sibling's download permit hand-off. Streaming path collects compressed tarball into bytes::Bytes (bounded by MAX_COMPRESSED_TARBALL_SIZE; ~12 MB peak resident across 24 permits), drops permit, then spawn_blocking extract from a Cursor<Bytes>. Legacy path drops permit after temp-file write. Bench (n=20 cold, bench/fixture-large, three-cell, vs pre-W5 n=3 baseline): arm pre-W5 W5+W6 Δ × bun pubgrub-stream 3,911 ms 4,020 ms +109 ms 4.34× greedy-stream 3,936 ms 3,936 ms flat 4.25× bun 868 ms 926 ms — Substage deltas W5 → W5+W6 (medians, ms; greedy-stream): extract_sum 824 → 478 −42% extract_max 122 → 94 −28 finalize_sum 1,188 → 695 −41% pubgrub_ms 3,432 → 3,245 −187 Workspace nextest: 4,393 tests pass, 0 failed, 7 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `Arc<CachedPackageInfo>` wrap added in 46f8b91 pushed the 4-tuple return type over clippy 1.94.0's type_complexity threshold (it was `HashMap<_, CachedPackageInfo>` before — same arity, one less nesting level). The single caller in `resolve.rs:369-370` already destructures with readable names, so a named-fields struct here is cleanup-of-clean- code rather than a real refactor; CLAUDE.md explicitly permits this `#[allow]` for design-level lints. Caught by `cargo clippy --workspace --all-targets -- -D warnings` on local toolchain 1.94.1; fix verified green on a fresh CARGO_TARGET_DIR cold build alongside the full nextest suite (4,393 tests pass / 0 fail / 7 skip) and lpm-auth determinism rerun ×2 (47/0/0 both runs, no flakes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin
added a commit
that referenced
this pull request
Apr 29, 2026
* phase-60 D2: promote download_tarball_routed helpers to RegistryClient
Behavior-preserving refactor extracting the two private routed-tarball
helpers from install.rs (download_tarball_routed,
download_tarball_streaming_routed) onto RegistryClient as public
methods. Both `lpm install` and the upcoming Phase 60 `lpm add` source-
delivery flow consume the same Custom-route auth-attachment logic.
- crates/lpm-registry/src/client.rs: add public methods
- crates/lpm-cli/src/commands/install.rs: switch all 5 call sites to
the new methods; delete the private helpers; remove the now-unused
DownloadedTarball import
All 602 install + npmrc tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase-60 60.0.e: PackageMetadata::resolve_version_spec helper
Add a three-tier version-spec resolver on PackageMetadata covering
dist-tag → exact-version → semver-range, mirroring the canonical
pattern at install_global.rs:368-405 verbatim.
Pre-Phase-60, `lpm add react@beta`, `next@canary`, `lodash@^4` all
failed because PackageMetadata::version() is a pure HashMap lookup —
none of those literal strings exist as concrete versions. The new
helper closes the gap.
Per D3 (preplan): both parse-failure and no-satisfying-version
return LpmError::Script (matching install_global verbatim) so the
Phase 60.1 migration of the four duplicate sites (install_global,
install, update_global, global) is a true behavior-preserving
refactor.
9 unit tests cover dist-tag (latest/beta/canary), exact match,
caret/tilde range, no-satisfying error, parse-fail error, and
empty-versions error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase-60 60.0+60.1+60.1.5+60.2: lpm add source delivery from any registry
Decouple `lpm add` from LPM-only package identity, mirror install's
full .npmrc setup, switch to file-spool tarball download, add
destination-side path containment, gate dep auto-install on
lpm.config.json presence, and surface external imports for the simple
path. End-to-end flow now works for any package on any registry the
rust client can reach (lpm.dev worker, npmjs.org direct, .npmrc-
declared private registries).
60.0.a + 60.0.b — Identity refactor + drop dotted-name auto-prepend
- New AddTarget enum: Lpm(PackageName) | Npm { spec: String }.
- New resolve_add_target replaces parse_package_ref. No rewriting
outside the @lpm.dev/ scope — `lodash.merge`, `tolga.foo`, etc.
resolve to AddTarget::Npm verbatim. Fixes a long-standing
correctness bug: pre-Phase-60 dotted bare names were silently
rewritten to @lpm.dev/<name> which doesn't exist on lpm.dev.
- All output / log / JSON sites render via target.display() /
target.json_name() — `name.scoped()` no longer used unconditionally.
- Skills branch type-encoded via `let AddTarget::Lpm(pkg) = &target`
pattern, with a why-comment (60.2) explaining the scope gate
(lpm.dev runs LLM scans on shipped skill content; arbitrary npm
packages are not scanned).
60.0.c — Mirror install's full .npmrc setup
- Build RouteTable::from_env_and_filesystem before any network call.
- Surface npmrc_warnings (non-JSON) and the strict-ssl=false security
warning (escapes --json). Clone the client with with_tls_overrides
so cafile= / strict-ssl=false take effect on metadata + tarball
fetches. Mirrors install.rs:3295-3445.
60.0.d — Routed metadata + file-spool tarball
- Metadata: AddTarget::Lpm uses get_package_metadata; AddTarget::Npm
uses get_npm_metadata_routed.
- Tarball: client.download_tarball_routed (D2 promoted helper) +
lpm_extractor::extract_tarball_from_file. Bounded memory via
MAX_COMPRESSED_TARBALL_SIZE (500 MB) for free; lpm add typescript
(~22 MB) and worst-case @scope/giant-fixture no longer load the
whole tarball into RAM.
60.0.f — Destination-side path containment (D6)
- New resolve_safe_dest helper canonicalizes target_dir once and
validates every write destination: refuses to follow existing
symlinks, rejects writes whose canonical parent escapes the target
root. Wired into the Step 8 file-copy loop. Closes the threat-model
gap that opened up when add expanded from "trusted lpm.dev
publishers" to "any npm publisher."
60.1 — Dep gate + bare-imports notice (D4)
- Tighten dep gate: `if !no_install_deps && lpm_config.is_some()`.
Simple path is download-manager: copy bytes, no auto-install.
- import_rewriter exports a sibling collect_bare_specifiers fn that
shares an internal SpecifierKind classifier with rewrite_imports
(anti-drift contract — "bare" means the same thing in both places).
- add.rs surfaces the collected externals as a non-JSON notice and
as a `external_imports` array in the JSON output.
60.1.5 — Non-interactive simple-path guard
- `lpm_config.is_none() && target_path.is_none() && (yes || json ||
!is_tty)` errors before the file-copy loop. Heuristically defaulting
components/ for arbitrary 3rd-party source under --yes/--json/non-TTY
is a CI/automation footgun.
Tests
- 15 unit tests in add.rs (resolve_add_target classification including
the dotted-name regression; resolve_safe_dest contracts including
symlink-refusal on Unix).
- 10 unit tests in import_rewriter.rs (classify_specifier,
collect_bare_specifiers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase-60 60.3: integration tests for lpm add simple path + guards + traversal
Three new wiremock-driven integration tests covering the highest-value
end-to-end scenarios for Phase 60:
- add_simple_non_interactive_without_path.rs (4 sub-tests) — proves
the 60.1.5 guard fires for --yes, --json, and non-TTY (stdin from
/dev/null) without --path; positive control with --path succeeds.
No package.json mutation in any failure case.
- add_source_npm_simple.rs (2 sub-tests) — full simple-path pipeline
via wiremock npm metadata + tarball: AddTarget::Npm resolves, file-
spool download, extract, files copied flat (no auto-nest), bare-
imports notice lists react + @radix-ui/react-slot, package.json
NOT mutated, .lpm/skills/ NOT created. JSON sub-test asserts the
package.name uses the npm-style identity (not @lpm.dev/-prefixed)
and the new external_imports array is well-shaped.
- add_path_traversal_dest_escape.rs — proves resolve_safe_dest is
wired into the actual write loop, not just unit-tested in
isolation. Tarball ships an lpm.config.json with files[0].dest =
"../../escaped/evil.txt" — assertion: containment-violation error,
exit non-zero, no file written outside target_dir.
Other 60.3 specced tests are either (i) covered by the unit tests
that landed alongside the implementation (#5 dotted-name, #9 version-
spec, #11 symlink — see preplan v6 audit checklist) or (ii)
deliberately deferred where the underlying machinery is already
test-covered by Phase 58.x install tests (#1 lpm.dev rich, #2 npm
rich, #6 npmrc auth, #7 strict-ssl, #8 missing-var fatal).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase-60 60.4: README — lpm add now works against any registry
- Update the lpm add one-liner in the Commands list.
- Add a "How lpm add Works" section explaining: source delivery vs.
install, the firm naming rule (@lpm.dev/owner.name only), the rich
vs. simple paths, and the non-interactive --path requirement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase-60 audit fix: resolve_safe_dest must validate before mkdir
Audit reproduced (with a temp-dir filesystem probe) that the landed
resolve_safe_dest helper still created directories OUTSIDE the
target_dir for two attack vectors before the containment error fired:
1. `dest_rel = "../../escaped/evil.txt"` — `Path::join` resolves
lexically; `dest.parent()` lands outside target; `create_dir_all`
ran before the containment check, leaving `<target>/../escaped/`
on disk even though the file write was correctly blocked.
2. Absolute `dest_rel = "/tmp/elsewhere/evil.txt"` — `Path::join` of
an absolute path returns the absolute path verbatim; `parent =
/tmp/elsewhere/`; `create_dir_all` created it before the
containment check fired.
The original integration test only asserted no escaped FILE existed,
so the directory-side-effect bug passed CI.
Fix
- Reorder resolve_safe_dest so EVERY check that can reject the
destination runs BEFORE any filesystem mutation:
Step 1 (NEW) — reject absolute dest_rel up-front.
Step 2 (NEW) — reject any ParentDir / RootDir / Prefix component.
Step 3 — refuse existing-symlink destinations.
Step 4 (NEW) — pre-mkdir ancestor canonicalization: walk up to the
longest existing ancestor; canonicalize; require it under
target_root_canonical (catches symlinked intermediate dirs).
Step 5 — create_dir_all (NOW safe).
Step 6 — post-mkdir re-canonicalize as TOCTOU defense-in-depth.
The lexical bans in Steps 1-2 kill the entire `../escape` and
absolute-path attack classes before any mkdir runs. The longest-
existing-ancestor walk in Step 4 covers the symlinked-intermediate
case (target/foo → /tmp/elsewhere). Step 6 is paranoia.
Tests
- Strengthen unit tests:
- resolve_safe_dest_dotdot_in_path_rejected_with_no_external_dir_created
now asserts no escape directory was created.
- resolve_safe_dest_absolute_dest_rejected_with_no_external_dir_created
is new — covers the absolute-path attack.
- resolve_safe_dest_dotdot_in_middle_of_path_also_rejected covers
`foo/../bar.txt` (lexically resolves back inside but still
rejected up-front).
- Extend integration test:
- dest_escape_via_dotdot_is_refused_and_creates_no_external_directory
now snapshots target_dir entries before the run and asserts no
unexpected new top-level entries appeared, plus no escape dir.
- dest_escape_via_absolute_path_is_refused_and_creates_no_external_directory
is new — covers the absolute-path attack at the integration level.
Net: 4923 → 4926 workspace tests; clippy + fmt clean; all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin
added a commit
that referenced
this pull request
May 1, 2026
…se docs Phase 64 audit findings #1, #11, #22, #28: - Add `lpm completions <shell>` (bash/zsh/fish/powershell/elvish) via `clap_complete`. Script is generated from the live `Cli` definition so it stays in sync with the binary at every release. - Add `lpm graph --no-open` to suppress the auto-launch of the default browser on `--format html`. The file is still written to `<project>/.lpm/graph.html`. Useful in headless / CI environments. - Rename `lpm rebuild --rebuild` to `--force` (no alias kept, per the decision to stop bridging). The flag-named-after-the-command UX was awkward and `--force` matches the intent better. - Tighten `lpm use` spec doc-comment to drop "runtime" framing — the command is Node.js-only today. Adjacent issues surfaced and fixed during the work: - Drop `short = 'V'` from `lpm query --query-verbose`. The local short collided with the global `-V` / `--version` and tripped clap's structural assertion under `clap_complete::generate`. Without this fix `lpm completions <any>` would panic. `--query-verbose` long form is unaffected. - Phase 47 P0 rename leftovers: `"lpm build --all"` → `"lpm rebuild --all"` in the rebuild.rs untrusted-pkg hint and `lpm build` → `lpm rebuild` in the install.rs auto-build comment. New parser tests cover all four findings + the legacy `--rebuild` flag is asserted to fail at parse time so users get an explicit error rather than a silent no-op.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three stacked phases of resolver/walker/transport work, all empirically validated and ready to ship as one PR. Phase 56 (walker/resolver fusion) starts on a fresh branch from main after this lands.
LPM_RESOLVER=greedy. Plus W5+W6 close-out work: metadata cache writes off the tokio runtime, extract off the download permit.run_stream) replacing the BFS level-step walker, opt-in viaLPM_WALKER=stream. Per-fetch concurrency bumped 50 → 256 (empirical, n=10 statistically significant).SharedCachevalues (eliminated 1,063 ms of resolver-side cloning per cold install).Bench results —
bench/fixture-large(266 packages), cold-equal-footing, n=20Medians from the Phase 53 W5+W6 close-out §3.2 — pre/post comparison vs main HEAD baseline, with bun as the reference control:
pubgrub-streamgreedy-stream(new default for greedy)Substage wins from W5+W6 (greedy-stream):
extract_sum −42 %,extract_max −28 ms,finalize_sum −41 %. Wall delta is in the noise band — moving extract off the download permit + cache writes off the runtime is structural correctness work for the next phase, not a wall win on its own.The remaining ~3 sec gap to bun is the walker/resolver redundancy: 92 of 134 metadata fetches go through the greedy resolver's escape-hatch (69 %), not the walker's pre-fetch. Phase 56 (walker/resolver fusion) is the next branch.
What changes — by phase
Phase 53: greedy multi-version resolver
crates/lpm-resolver/src/greedy.rs(1,106 lines). Bun-shapeenqueueDependencyWithMainport: edge queue + first-match version pick + reuse-on-compatible / allocate-on-incompatible.(canonical, version).write_metadata_cacheruns ontokio::task::spawn_blocking; metadata cache file format simplified (no HMAC, magic-prefixed); extract no longer holds the download permit.Phase 54: continuous-stream walker
BfsWalker::run_stream()(gated byLPM_WALKER=stream). mpsc-driven, no level barrier; semaphore-bounded fan-out.DEFAULT_NPM_FANOUT50 → 256 (empirical: median 4,816 → 4,124 ms, t=3.49 significant; stdev 972 → 253 ms — −74 %).run_bfsfor diagnostic comparison.Phase 55: transport tuning
phase-55 W1— h1-pool probe: tried bun-style 64 separate HTTP/1.1 sockets to bypass single-connection h2 flow control. Refuted as default (no statistical wall improvement); kept as opt-in via env flag for future investigation.phase-55 W2—reqwestgzip feature enabled: modest wall delta, real wire savings.phase-55 W3— added brotli + zstd toreqwestfeatures.phase-55 W4— Arc-wrapSharedCachevalues. The load-bearing fix for the resolver wall: pre-W4 the greedy resolver burned ~5 sec per cold install cloning popular packuments per edge. Resolver wall: −1,063 ms; walker: −721 ms.What is NOT in this PR
lpm-dev/a-package-manager) — separate branch from main once this lands.pubgrub+bfsremain the defaults; greedy + stream are opt-in via env vars.Files changed
Commit history
(Note: commit order is non-monotonic by phase number —
46f8b91(phase-53 W5+W6) is committed after the phase-55 stack. Functional correctness is unaffected; mentioning for clarity.)Test plan
cargo fmt --checkcleangrep 'fancy-regex' crates/*/Cargo.toml— none (ReDoS guard)cargo clippy --workspace --all-targets -- -D warnings— clean (verified locally on rust 1.94.1; CI runs 1.94.0 — patch-level diff)cargo build --workspace— greencargo nextest run --workspace --exclude lpm-integration-tests --no-fail-fast— 4,393 tests pass / 0 fail / 7 skip (per close-out)cargo test -p lpm-auth× 2 — no flakes (verifies the keychain-race fix from prior phases)bench/fixture-largecold-equal-footing n=20 — greedy-stream median 3,936 ms (close-out §3.2)Companion documentation
lpm-dev/a-package-manager)🤖 Generated with Claude Code