Phase 51: Sigstore Bundle v0.3 fix + provenance tracing + scriptable_package_rows parallelization by tolgaergin · Pull Request #9 · lpm-dev/rust-client

tolgaergin · 2026-04-25T22:52:42Z

Summary

Resolves the Phase 50 close-out's P0 finding: warm installs of bench/project (51 pkgs) were paying ~700–1300 ms of prov_sum_ms on every run because ~18 of the attested packages silently failed to parse and never landed in the disk cache.

Root cause: npm migrated to Sigstore Bundle spec v0.3, which collapsed the leaf cert from verificationMaterial.x509CertificateChain.certificates[] into a single verificationMaterial.certificate field. The original find_leaf_cert_rawbytes only knew the v0.2 chain shape, so v0.3 attestations parsed past the JSON stage and bailed at cert lookup, returning Err(()) → Ok(None) (degraded/unknown — never cached).

What's in this PR

Three commits, two files touched:

Commit	Scope
`af6d9f2`	W1c: per-failure-point `tracing::debug!` lines at every `Err(())` site in `fetch_and_parse` and `parse_sigstore_bundle`. Caller contract unchanged; default log filter still silent.
`af6d9f2`	W2: parallelize `scriptable_package_rows` via rayon and hoist trustedScopes parse out of a 266-deep N+1 disk-read pattern. Adds `perf.scriptable_package_rows pkgs=N ms=W` log line.
`6b4b249`	W1d: extend `find_leaf_cert_rawbytes` with a Shape 2 branch for `verificationMaterial.certificate.rawBytes` (Sigstore Bundle v0.3). Ordered after the v0.2 chain branch so cache-key stability is preserved (cert SHAs are part of the drift-check identity). 4 bug-first regression tests added.

Empirical impact

Bench/project (51 pkgs) — isolated-cache warm A/B (n=5):

BEFORE warm-on-broken-cache: median 233 ms (cache files: 1)
AFTER  warm-on-fixed-cache:  median  72 ms (cache files: 6)
WARM delta: −161 ms (−69 %)   ← matches the close-out's 150-200 ms estimate

Bench/fixture-large (266 pkgs) — isolated-cache warm A/B (n=10):

BEFORE warm-on-broken-cache: median 630 ms stdev 123 (range 475-771)
AFTER  warm-on-fixed-cache:  median 316 ms stdev  10 (range 292-321)
WARM delta: −315 ms (−50 %)

Cold install regression check (V2, n=10 cold-equal-footing on 266 pkgs):

BEFORE wall median: 5418 ms
AFTER  wall median: 5397 ms
delta: +21 ms (within ±1099 ms variance — no regression)

12× tighter variance on the AFTER side at 266 pkgs is a secondary stability win — broken-cache installs vary based on which network calls fail+retry per run.

Bug-first regression tests

Empirically verified by temporarily reverting find_leaf_cert_rawbytes to its pre-fix shape and watching the v0.3 test fail with Result::unwrap() on an Err value: ():

parse_bundle_v3_single_cert_shape_extracts_identity_phase_51_regression
parse_bundle_npm_real_world_skips_publickey_falls_through_to_v3_cert
parse_bundle_npm_publickey_only_with_no_cert_yields_err
find_leaf_cert_rawbytes_prefers_v2_chain_when_both_shapes_coexist

CI gate (run locally pre-merge)

cargo clippy --workspace -- -D warnings ✓
cargo fmt --check ✓
grep fancy-regex → none ✓
cargo build --workspace ✓
cargo nextest run --workspace --exclude lpm-integration-tests --no-fail-fast → 4383 passed (4 new) ✓
cargo test -p lpm-auth × 3 reruns → all pass deterministically ✓

Test plan

Bug-first regression tests added and verified to fail without the fix
Existing 30+ provenance_fetch tests still pass
Full workspace nextest gate clean
Empirical V1 (zero parse failures), V2 (no cold regression), V4 51-pkg + 266-pkg warm wins captured

Related docs

Phase 51 close-out (pending push)
Phase 50 close-out (the prior session's hand-off)

🤖 Generated with Claude Code

…e_rows parallelization Phase 50 close-out flagged two diagnosable post-install hot-path issues that the existing instrumentation couldn't resolve. This commit lands the smallest correct fix for each so the next 266-pkg bench produces actionable per-failure-mode signal. W1c — Granular tracing in `provenance_fetch.rs`. Every `Err(())` site in `fetch_and_parse` and `parse_sigstore_bundle` now emits a `tracing::debug!` line with a `stage` field and the URL or body context, replacing the prior `.map_err(|_| ())` discard pattern. Stages covered: send, status, content_length_cap, chunk, stream_cap, parse, json_parse, cert_lookup (with top-level keys for shape-drift diagnosis), base64_decode. Caller still maps to `Ok(None)` so the drift-check contract is unchanged; the only behavioural delta is that `RUST_LOG=lpm_cli::provenance_fetch=debug` now reveals which of the 8 failure points is firing on the ~18 of 51 attested packages whose warm-install cache never gets populated. Production users with default log filters see no change. W2 — `scriptable_package_rows` in `rebuild.rs`. Three problems found on the 266-pkg fixture and addressed together: 1. `is_scope_trusted` was called inside the per-package loop, which re-read AND re-parsed `project_dir/package.json` once per package — 266 redundant disk reads of the same file. Hoisted into `parse_trusted_scopes` (reads once) + the pure `name_matches_trusted_scope` matcher (called per pkg). `is_scope_trusted` retained as a thin wrapper for the build runner's one-off call site. 2. The walk was sequential despite each iteration being independent. Migrated to `rayon::par_iter().filter_map().collect()` matching the pattern already in `build_state::compute_blocked_packages_with_metadata`. 3. The walk was not instrumented. Added `perf.scriptable_package_rows pkgs=N ms=W` debug log so the close-out's "458 ms unaccounted on 266 pkgs" can be split. Behavior preserved: same trust gate, same row content, same input ordering on output (rayon's stable collect). All 4379 workspace tests pass; clippy + fmt clean; lpm-auth deterministic over 3 reruns. Refs: 37-rust-client-RUNNER-VISION-phase50-bun-parity-closeout.md §3, §6.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e drift fix) The Phase 50 close-out flagged ~18 of 51 attested packages on bench/project where every warm install re-fetches the attestation bundle because nothing ever lands in the disk cache. W1c tracing in the previous commit narrowed it to 100 % of failures hitting the `cert_lookup` stage with `top_level_keys=["attestations"]`. Curling six failing URLs against registry.npmjs.org confirms the cause: npm now serves a 2-element attestations list per package: attestations[0] Sigstore Bundle v0.2 with `verificationMaterial.publicKey` (npm's own publish-time keypair attestation; no Fulcio cert) attestations[1] Sigstore Bundle v0.3 with `verificationMaterial.certificate` (Fulcio-issued GitHub Actions provenance — the leaf we want) The Sigstore Bundle v0.3 protobuf-spec change collapsed the `x509CertificateChain.certificates[]` array into a single `certificate` field. The original `find_leaf_cert_rawbytes` only knew the v0.1/v0.2 chain shape, so v0.3 attestations parsed past the JSON stage and then bailed at cert lookup, returning Err(()) which the caller maps to Ok(None) (degraded/unknown — never cached, retried on every install). Fix: extend `find_leaf_cert_rawbytes` with a v0.3 single-cert branch (`verificationMaterial.certificate.rawBytes`), placed after the v0.2 chain branch so the legacy lookup order is preserved (important — cert SHAs are part of the drift-check identity, so flipping order would invalidate every cached entry). Recursion into npm's attestations-list wrapper now skips publicKey-only entries automatically and lands on the v0.3 cert-bearing entry. Empirical impact (bench/fixture-large, 266 pkgs): V1 diagnostic — zero parse failures (was 30+ per warm install) Cache file count after cold install: 19 → 37 (+18 newly cached) V4 isolated-cache warm install A/B (n=10): BEFORE warm-on-broken-cache: median 630 ms (stdev 123) AFTER warm-on-fixed-cache: median 316 ms (stdev 10) Median delta: −315 ms (−49.96 % of BEFORE) Bug-first regression tests (4 new): parse_bundle_v3_single_cert_shape_extracts_identity_phase_51_regression — pins the v0.3 shape; fails without the fix (verified by temporarily reverting find_leaf_cert_rawbytes and rerunning) parse_bundle_npm_real_world_skips_publickey_falls_through_to_v3_cert — encodes the actual 2026-04-25 production wrapper parse_bundle_npm_publickey_only_with_no_cert_yields_err — wrapper with no Fulcio cert anywhere stays Err (degraded) find_leaf_cert_rawbytes_prefers_v2_chain_when_both_shapes_coexist — defensive: preserves cache-key stability if a future bundle grows both shapes in one verificationMaterial CI gate: clippy --workspace clean, fmt clean, no fancy-regex, 4383 nextest run / 4383 passed (4 new), lpm-auth deterministic over 3 reruns. Refs: 37-rust-client-RUNNER-VISION-phase50-bun-parity-closeout.md §3, §6.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* phase-60 D2: promote download_tarball_routed helpers to RegistryClient Behavior-preserving refactor extracting the two private routed-tarball helpers from install.rs (download_tarball_routed, download_tarball_streaming_routed) onto RegistryClient as public methods. Both `lpm install` and the upcoming Phase 60 `lpm add` source- delivery flow consume the same Custom-route auth-attachment logic. - crates/lpm-registry/src/client.rs: add public methods - crates/lpm-cli/src/commands/install.rs: switch all 5 call sites to the new methods; delete the private helpers; remove the now-unused DownloadedTarball import All 602 install + npmrc tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.0.e: PackageMetadata::resolve_version_spec helper Add a three-tier version-spec resolver on PackageMetadata covering dist-tag → exact-version → semver-range, mirroring the canonical pattern at install_global.rs:368-405 verbatim. Pre-Phase-60, `lpm add react@beta`, `next@canary`, `lodash@^4` all failed because PackageMetadata::version() is a pure HashMap lookup — none of those literal strings exist as concrete versions. The new helper closes the gap. Per D3 (preplan): both parse-failure and no-satisfying-version return LpmError::Script (matching install_global verbatim) so the Phase 60.1 migration of the four duplicate sites (install_global, install, update_global, global) is a true behavior-preserving refactor. 9 unit tests cover dist-tag (latest/beta/canary), exact match, caret/tilde range, no-satisfying error, parse-fail error, and empty-versions error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.0+60.1+60.1.5+60.2: lpm add source delivery from any registry Decouple `lpm add` from LPM-only package identity, mirror install's full .npmrc setup, switch to file-spool tarball download, add destination-side path containment, gate dep auto-install on lpm.config.json presence, and surface external imports for the simple path. End-to-end flow now works for any package on any registry the rust client can reach (lpm.dev worker, npmjs.org direct, .npmrc- declared private registries). 60.0.a + 60.0.b — Identity refactor + drop dotted-name auto-prepend - New AddTarget enum: Lpm(PackageName) | Npm { spec: String }. - New resolve_add_target replaces parse_package_ref. No rewriting outside the @lpm.dev/ scope — `lodash.merge`, `tolga.foo`, etc. resolve to AddTarget::Npm verbatim. Fixes a long-standing correctness bug: pre-Phase-60 dotted bare names were silently rewritten to @lpm.dev/<name> which doesn't exist on lpm.dev. - All output / log / JSON sites render via target.display() / target.json_name() — `name.scoped()` no longer used unconditionally. - Skills branch type-encoded via `let AddTarget::Lpm(pkg) = &target` pattern, with a why-comment (60.2) explaining the scope gate (lpm.dev runs LLM scans on shipped skill content; arbitrary npm packages are not scanned). 60.0.c — Mirror install's full .npmrc setup - Build RouteTable::from_env_and_filesystem before any network call. - Surface npmrc_warnings (non-JSON) and the strict-ssl=false security warning (escapes --json). Clone the client with with_tls_overrides so cafile= / strict-ssl=false take effect on metadata + tarball fetches. Mirrors install.rs:3295-3445. 60.0.d — Routed metadata + file-spool tarball - Metadata: AddTarget::Lpm uses get_package_metadata; AddTarget::Npm uses get_npm_metadata_routed. - Tarball: client.download_tarball_routed (D2 promoted helper) + lpm_extractor::extract_tarball_from_file. Bounded memory via MAX_COMPRESSED_TARBALL_SIZE (500 MB) for free; lpm add typescript (~22 MB) and worst-case @scope/giant-fixture no longer load the whole tarball into RAM. 60.0.f — Destination-side path containment (D6) - New resolve_safe_dest helper canonicalizes target_dir once and validates every write destination: refuses to follow existing symlinks, rejects writes whose canonical parent escapes the target root. Wired into the Step 8 file-copy loop. Closes the threat-model gap that opened up when add expanded from "trusted lpm.dev publishers" to "any npm publisher." 60.1 — Dep gate + bare-imports notice (D4) - Tighten dep gate: `if !no_install_deps && lpm_config.is_some()`. Simple path is download-manager: copy bytes, no auto-install. - import_rewriter exports a sibling collect_bare_specifiers fn that shares an internal SpecifierKind classifier with rewrite_imports (anti-drift contract — "bare" means the same thing in both places). - add.rs surfaces the collected externals as a non-JSON notice and as a `external_imports` array in the JSON output. 60.1.5 — Non-interactive simple-path guard - `lpm_config.is_none() && target_path.is_none() && (yes || json || !is_tty)` errors before the file-copy loop. Heuristically defaulting components/ for arbitrary 3rd-party source under --yes/--json/non-TTY is a CI/automation footgun. Tests - 15 unit tests in add.rs (resolve_add_target classification including the dotted-name regression; resolve_safe_dest contracts including symlink-refusal on Unix). - 10 unit tests in import_rewriter.rs (classify_specifier, collect_bare_specifiers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.3: integration tests for lpm add simple path + guards + traversal Three new wiremock-driven integration tests covering the highest-value end-to-end scenarios for Phase 60: - add_simple_non_interactive_without_path.rs (4 sub-tests) — proves the 60.1.5 guard fires for --yes, --json, and non-TTY (stdin from /dev/null) without --path; positive control with --path succeeds. No package.json mutation in any failure case. - add_source_npm_simple.rs (2 sub-tests) — full simple-path pipeline via wiremock npm metadata + tarball: AddTarget::Npm resolves, file- spool download, extract, files copied flat (no auto-nest), bare- imports notice lists react + @radix-ui/react-slot, package.json NOT mutated, .lpm/skills/ NOT created. JSON sub-test asserts the package.name uses the npm-style identity (not @lpm.dev/-prefixed) and the new external_imports array is well-shaped. - add_path_traversal_dest_escape.rs — proves resolve_safe_dest is wired into the actual write loop, not just unit-tested in isolation. Tarball ships an lpm.config.json with files[0].dest = "../../escaped/evil.txt" — assertion: containment-violation error, exit non-zero, no file written outside target_dir. Other 60.3 specced tests are either (i) covered by the unit tests that landed alongside the implementation (#5 dotted-name, #9 version- spec, #11 symlink — see preplan v6 audit checklist) or (ii) deliberately deferred where the underlying machinery is already test-covered by Phase 58.x install tests (#1 lpm.dev rich, #2 npm rich, #6 npmrc auth, #7 strict-ssl, #8 missing-var fatal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 60.4: README — lpm add now works against any registry - Update the lpm add one-liner in the Commands list. - Add a "How lpm add Works" section explaining: source delivery vs. install, the firm naming rule (@lpm.dev/owner.name only), the rich vs. simple paths, and the non-interactive --path requirement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase-60 audit fix: resolve_safe_dest must validate before mkdir Audit reproduced (with a temp-dir filesystem probe) that the landed resolve_safe_dest helper still created directories OUTSIDE the target_dir for two attack vectors before the containment error fired: 1. `dest_rel = "../../escaped/evil.txt"` — `Path::join` resolves lexically; `dest.parent()` lands outside target; `create_dir_all` ran before the containment check, leaving `<target>/../escaped/` on disk even though the file write was correctly blocked. 2. Absolute `dest_rel = "/tmp/elsewhere/evil.txt"` — `Path::join` of an absolute path returns the absolute path verbatim; `parent = /tmp/elsewhere/`; `create_dir_all` created it before the containment check fired. The original integration test only asserted no escaped FILE existed, so the directory-side-effect bug passed CI. Fix - Reorder resolve_safe_dest so EVERY check that can reject the destination runs BEFORE any filesystem mutation: Step 1 (NEW) — reject absolute dest_rel up-front. Step 2 (NEW) — reject any ParentDir / RootDir / Prefix component. Step 3 — refuse existing-symlink destinations. Step 4 (NEW) — pre-mkdir ancestor canonicalization: walk up to the longest existing ancestor; canonicalize; require it under target_root_canonical (catches symlinked intermediate dirs). Step 5 — create_dir_all (NOW safe). Step 6 — post-mkdir re-canonicalize as TOCTOU defense-in-depth. The lexical bans in Steps 1-2 kill the entire `../escape` and absolute-path attack classes before any mkdir runs. The longest- existing-ancestor walk in Step 4 covers the symlinked-intermediate case (target/foo → /tmp/elsewhere). Step 6 is paranoia. Tests - Strengthen unit tests: - resolve_safe_dest_dotdot_in_path_rejected_with_no_external_dir_created now asserts no escape directory was created. - resolve_safe_dest_absolute_dest_rejected_with_no_external_dir_created is new — covers the absolute-path attack. - resolve_safe_dest_dotdot_in_middle_of_path_also_rejected covers `foo/../bar.txt` (lexically resolves back inside but still rejected up-front). - Extend integration test: - dest_escape_via_dotdot_is_refused_and_creates_no_external_directory now snapshots target_dir entries before the run and asserts no unexpected new top-level entries appeared, plus no escape dir. - dest_escape_via_absolute_path_is_refused_and_creates_no_external_directory is new — covers the absolute-path attack at the integration level. Net: 4923 → 4926 workspace tests; clippy + fmt clean; all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t schema wording Two doc/contract drifts caught on second-pass audit of the #9 fix: 1. Fallback gate was `deps.is_empty()` — fired whenever the config- driven collection yielded nothing, including when `dependencies` is declared but no conditional branch matches the consumer's config. That contradicts the schema description ("when dependencies is omitted") and surprises authors: declaring conditional deps and landing in an unmatched branch silently pulled every entry from the package's own `package.json#dependencies`, including names the author didn't intend to ship. Tighten the gate to fire only when `lpm.config.json#dependencies` is absent entirely. Authors who declare the field — even with empty or non-matching branches — opt out of the legacy fallback. Mirrors how `files[]` works: declared = source of truth. 2. Schema description and author docs claimed deps are resolved by "the trailing `lpm install`," but `lpm add --pm <npm|pnpm|yarn|bun>` dispatches through the selected package manager. Reword schema description, public mirror, and lpm-config-json.mdx to spell out the `--pm` selection. Also document the new fallback contract on the same surface. Tests: two new cases in source_pkg_deps — - `legacy_fallback_does_not_fire_when_dependencies_field_present_but_unmatched`: encodes the new author contract. - `legacy_fallback_fires_only_when_dependencies_field_absent`: covers both shapes of "absent" (no lpm.config.json at all, and lpm.config.json present without a `dependencies` key). CI gate: clippy clean, fmt clean, nextest 5245/5245 pass, schema-drift test green against the synced public mirror. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Third-pass audit on #9: the schema description, public mirror, and author docs enumerated `lpm/npm/pnpm/yarn/bun` but omitted `auto`, which the CLI also accepts (main.rs:526). Add it with a one-phrase explanation of what it does ("project-state detection"). Three internal doc-comments in add.rs still said "trailing `lpm install`" — aligned to "the selected package manager (`--pm`) runs its install step" matching the public wording. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…op `*`) Tier 2 of #9 — preserve explicit version specs from `lpm.config.json#dependencies` and resolve bare entries against the registry before mutating the manifest. Pre-fix, `lpm add` wrote every collected dep as `"*"`, then ran the trailing install. The Phase 33 save policy explicitly preserves user wildcards verbatim ("explicit user input wins"), so the install never rewrote `*` to `^x.y.z`. Source-package consumers thus accumulated wildcard ranges for every conditional dep — defeats reproducibility, extra-risky for `@lpm.dev/*` and private-registry entries where `*` means "next publish gets installed automatically without review." Hybrid fix: - `lpm.config.json#dependencies` arrays now accept `name@range` specs alongside bare names. Authors who want pinning write `"react@^18"`; authors who don't write `"react"`. - The collector returns `Vec<(name, UserSaveIntent)>` instead of `Vec<String>`, parsed via `save_spec::parse_user_save_intent` so the scoped/unscoped @-splitting matches `lpm install <pkg>` exactly. - `handle_dependencies` resolves every Bare/DistTag entry up-front via `RegistryClient::batch_metadata` (one round-trip for N), then runs each entry through `save_spec::decide_saved_dependency_spec` — honoring `--save-prefix`/`save-exact` from `~/.lpm/config.toml` and the prerelease-exact safety rule, same as `lpm install`. - New `build_save_decisions` helper isolates the policy logic for unit testing; takes an injected `resolved_latest` map so tests don't need a real registry. Fail-fast on resolve failure: an unresolvable Bare/DistTag entry errors before any package.json mutation. Avoids the pre-fix failure mode where a stranded `*` survived a failed install indefinitely. The error message points the author at the explicit-version workaround and `lpm login` for `@lpm.dev/*` access issues. Legacy `package.json` fallback (used when `lpm.config.json#dependencies` is omitted entirely) reconstructs each entry as `name@range` so the declared version range from the package's own manifest carries through verbatim — `react: "^18"` lands as `^18`, not `*`. Schema description (source + public mirror) and author docs at `lpm-config-json.mdx` updated to document the `name@range` syntax, the four spec shapes (bare/range/exact/dist-tag/wildcard), and the resolve-then-write policy. Save-policy table mirrors `lpm install`. Tests: 11 new unit tests covering bare→caret, explicit ranges preserved, exact preserved, wildcard preserved, dist-tag stable→caret, dist-tag prerelease→exact, fail-fast on missing resolve, save-exact config honored, mixed-intent end-to-end, dedup-by-name first-wins, and the legacy fallback's range preservation. Workspace nextest: 5256 pass (was 5245, +11), schema-drift green, clippy + fmt clean, Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e table Second-pass audit on #9.1 caught that the new pre-resolution path used a single `RegistryClient::batch_metadata` call for every Bare/DistTag entry. That endpoint only handles `@lpm.dev/*` packages — npm-published and `.npmrc`-declared private-registry entries were broken. The walker already documents the three-arm dispatch model at crates/lpm-resolver/src/walker.rs:435-485 (Phase 58 day-4); the new `handle_dependencies` flow now follows the same pattern: - `@lpm.dev/*` names take the LPM-direct metadata route via `client.get_package_metadata`, the same call `add::run` uses for the source package itself when its target classifies as `AddTarget::Lpm`. - Everything else routes per-package via `route_table.route_for_package(name)` → `client.get_npm_metadata_routed(name, route)`. That dispatcher already handles all three upstream variants (LpmWorker proxy, npm direct, custom `.npmrc`-declared registry) including origin-scoped auth attachment. Without this, an author shipping `lpm.config.json#dependencies` with a bare `@corp/ui` (resolved through a `.npmrc` `@corp:registry=...` declaration) would have failed at resolve time unless they pinned an explicit range — defeating the "any registry" contract the schema and docs promise. Schema description (source + public mirror) also corrected: the previous wording said "explicit ranges/exacts/wildcards/dist-tags are preserved verbatim," but dist-tags resolve against the registry and apply the stable→caret / prerelease→exact safety policy. Author docs at lpm-config-json.mdx already had this right; the schema mirrors were drifted. Now consistent across all three. `route_table` threaded as a new parameter to `handle_dependencies`; already in scope at the `add::run` call site so the change is local. Serial routed fetches over the typical < 10 source-package deps — the walker's parallel-fan-out pattern is overkill at this scale and network setup dominates wall time anyway. CI gate: clippy clean, fmt clean, nextest 5256/5256 pass, schema-drift green, Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…llback tx Closes #9.2. `lpm add` previously mutated package.json with the resolved specs, then ran the trailing install (LPM or external) separately — warning and continuing on install failure. The result was a half-applied manifest: dep entries pointing at versions that didn't actually link, lockfile and node_modules either missing or stale. Re-running `lpm install` after manually fixing the underlying error didn't always recover, because the manifest entries the failed install partially populated could now mismatch the live state. This patch wraps the mutation + trailing install in a `ManifestTransaction`, the same Drop-based snapshot guard that `lpm install <pkg>` uses for its stage→install→finalize flow: - The snapshot covers package.json (required), the LPM lockfiles (`lpm.lock`, `lpm.lockb`, optional), the selected package manager's lockfile (`package-lock.json` for npm, `pnpm-lock.yaml` for pnpm, `yarn.lock` for yarn, both `bun.lock` + `bun.lockb` for bun, all optional), and `.lpm/install-hash` (invalidate-only). - All four PM dispatch arms now return `Err` instead of warn-and- continue when the install fails. The `?` propagates and the tx drops without commit, restoring every snapshotted file and deleting the install-hash cache so the next run re-derives it. - `effective_pm` is resolved (handling `--pm auto`) BEFORE the snapshot opens, so the per-PM lockfile is included in the rollback surface. Without this extension, an `npm install` partial write would leave a manifest/lockfile split-brain (caught by the second- pass audit on this fix). The boundary intentionally stops at the manifest + lockfile + cache surface. Source files that `lpm add` copied earlier in the run are NOT rolled back — that's a known limitation of the manifest-tx contract, documented at manifest_tx.rs:33-43. Worse-case-than-today? No: today the manifest is also broken on failure. Filed as a follow-up at phase64-findings #9.3 (source-file orphan cleanup, full atomicity). Helper `pm_lockfile_paths(pm, project_dir)` returns the per-PM lockfile name(s) for the snapshot. Six unit tests in `source_pkg_deps` cover npm/pnpm/yarn/bun (both forms)/lpm (empty, already covered)/ unknown (defensive empty). The rollback semantics themselves are covered by the existing `manifest_tx::tests` suite — the new tx call site composes the primitive without changing it. Author docs at lpm-config-json.mdx ("Resolve-then-write, with rollback") updated to document the new scope: which files snap back, which (source files) don't, and the "re-run lpm add to converge" guidance for trailing-install failures. CI gate: clippy clean, fmt clean, nextest 5262/5262 pass (5256 → 5262, +6 new), schema-drift unchanged (no schema modifications), Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes #9.3. The #9.2 ManifestTransaction covered package.json + lockfiles + install-hash, but Step 8's source-file copies happened BEFORE the tx opened — so a failure between Step 8 and Step 9.1 left copied source files orphaned in the project even though the manifest rolled back. The whole point of the rollback contract was to leave a clean project on failure; the gap broke that for any source-file- involving error. Lift tx ownership from `handle_dependencies` up to `add::run` and extend the snapshot to include every dest path Step 8 will write to. Two structural pieces had to land first: 1. **Split `resolve_safe_dest` into validate + prepare phases.** The original Step 5 called `create_dir_all(parent)` mid-validation — a side effect that would corrupt the rollback boundary if it ran before the tx snapshot opened (parent dirs would survive rollback, defeating containment). New shape: - `resolve_safe_dest_validate(target_root_canonical, target_dir, dest_rel)` — pure validation, no mkdir. Pre-snapshot phase calls this for every dest_rel. - `prepare_safe_dest_parent(parent, target_root_canonical)` — create_dir_all + post-mkdir re-canonicalize. Runs inside Step 8's loop, after the snapshot has captured every dest path. The original `resolve_safe_dest` survives as a `#[cfg(test)]` wrapper composing both phases, so the existing six containment tests still encode the user-visible contract. 2. **Commit the tx after Step 9.1, NOT after Step 11.** The Swift recursion at Step 10 calls `Box::pin(run(...)).await?` per Swift dep — each recursive `lpm add` opens its own tx. If the outer tx stayed open across that boundary, a recursive failure could roll back the root package's already-applied mutations while leaving recursive `lpm add` side effects intact (worse split-brain than no rollback). Step 11 output is read-only; Step 12 skills are non-fatal best-effort. All three sit outside the tx by design. Snapshot list under the new shape: - Optional: package.json, lpm.lock, lpm.lockb, the selected PM's lockfile (per `pm_lockfile_paths`), every validated dest path from Step 8. - Invalidate: .lpm/install-hash. `handle_dependencies` reverts to a "do work, return Ok/Err" shape with no internal tx (caller owns the rollback boundary now). Takes `effective_pm: &str` instead of computing it itself; the resolution of `--pm auto` happens at `run()`'s level so the snapshot can include the right per-PM lockfile. Tests: 4 new unit tests on `resolve_safe_dest_validate` proving it rejects `..`/absolute/existing-symlink dest paths without ever calling `create_dir_all` (the mkdir side effect that Phase 60.0.f originally fixed). All 6 existing `resolve_safe_dest` containment tests still pass — they exercise the wrapper which composes the two phases. The rollback primitive itself remains covered by `manifest_tx::tests`. Boundary explicitly out of scope (documented in lpm-config-json.mdx "Resolve-then-write, with rollback" section): empty parent directories created during the file copy stay on disk, recursive Swift `lpm add` is its own scope, agent skills (Step 12) run after commit and are non-fatal by contract. CI gate: workspace clippy clean, fmt clean, nextest 5262 → 5266 pass (+4 new), Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 64 #9.3 second-pass-audit regression. The prior #9.3 commit (`ee4f2a0`) split `resolve_safe_dest` into validate + prepare phases and called both inside the Step 8 loop — but the production loop discarded `prepare_safe_dest_parent`'s return value (the canonicalized parent) and wrote to the pre-canonicalize `dest_path` instead. That re-opened the post-mkdir TOCTOU window: a symlink swap between the canonicalize check and the write would route the actual write through the new symlink chain, escaping containment. The `#[cfg(test)]` `resolve_safe_dest` wrapper still composed canonical correctly, so the existing six containment tests passed while the real production path was broken. Fix: move both `resolve_safe_dest_validate` and `prepare_safe_dest_parent` into the pre-snapshot phase. Compose `final_dest_paths[i] = parent_canonical.join(file_name)` once per file before opening the transaction. The snapshot then registers canonical-pinned paths, and Step 8's read / conflict-check / write all flow through `final_dest_paths[i]` directly. Snapshot path == write path; rollback restores exactly what got written. Trade-off: parent directories are now created BEFORE the tx opens. A rolled-back failure leaves empty directories on disk that the transaction can't clean up. That trade is necessary for the snapshot to track canonical paths — without canonicalize-before-snapshot, the rollback path and the write path could diverge under intermediate- symlink resolution. Documented as part of the rollback boundary in `lpm-config-json.mdx`'s "Resolve-then-write, with rollback" section, which also corrects an earlier overstatement claiming `lpm add` restores "the same state as before you ran the command" — the install target directory and any canonical parents on the resolution chain stay on disk. New regression test `step_8_write_path_pins_canonical_parent_through_intermediate_symlink` exercises the production composition logic against a symlinked- inside-target intermediate directory and asserts the final write path is the canonical resolution, not the pre-canonicalize alias. CI gate: workspace clippy clean, fmt clean, nextest 5266 → 5267 pass (+1 new), Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nifest project Closes #9.4. Pre-fix, `lpm add` against a project with no `package.json` would copy source files first and only warn at the end of `handle_dependencies` that deps weren't installed. Users got source files importing packages they couldn't install, with a confusing late-stage warning instead of a clear early error. Add a preflight gate (Step 6.2) that runs after extraction + dry-run exit but BEFORE Step 7 prompts and Step 8 file copies. Hard-errors when ALL of: - the user did NOT pass `--no-install-deps` (an explicit "I'll handle deps myself" opt-out — respected), - the source ships an `lpm.config.json` (simple-path tarballs skip auto-install entirely; bare-imports notice surfaces what the user needs), - the project has no `package.json`, - `collect_source_pkg_deps` would return a non-empty list (config- driven OR legacy fallback via the source's own `package.json`). Error message points the user at `lpm init` / `npm init -y` to create a manifest, with a fallback note about `--no-install-deps` for "copy source only, I'll resolve imports myself" workflows. Going with option (a) "fail loudly with remediation" rather than option (b) "auto-create a minimal package.json" because the latter adds policy questions (private vs public, fields to populate, drift risk against `lpm init`) and creates a competing bootstrap path. Future "just works" UX can be added explicitly via `--init-manifest` or an interactive prompt routed through the same helper that owns `lpm init`. Tests: 6 new cases in `source_pkg_deps` covering the 4 gate conditions (config-json deps, legacy-fallback deps, manifest exists, no deps declared, --no-install-deps escape, simple-path no lpm.config.json) plus error-message assertions for `lpm init` / `--no-install-deps` remediation hints. Author docs at `add.mdx` gain a "Prerequisites" subsection that shows the error and explains the `--no-install-deps` escape hatch. CI gate: workspace clippy clean, fmt clean, nextest 5267 → 5273 pass (+6 new), Fumadocs build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…st gap GPT's audits across #9.1 / #9.2 / #9.3 / #9.4 consistently flagged the absence of an `lpm add`-specific integration test exercising the real CLI binary end-to-end. Helper-level unit tests in crates/lpm-cli/src/commands/add.rs covered each layer's contract, but a regression that broke the wiring between layers — registry routing, save-spec composition, transaction rollback — would have slipped through. New `tests/workflows/tests/add.rs` closes that gap with three composed tests against the mock-registry harness: 1. Happy path (#9 + #9.1) — source package declaring a bare name plus an explicit Exact spec. Asserts the bare name caret-resolves via the registry to `^0.400.0`, the Exact is preserved verbatim in package.json, source files land at the right canonical paths, and the trailing install populates node_modules. 2. Preflight (#9.4) — deps-declaring source against a project with no package.json. Asserts the command exits non-zero with the `lpm init` / `npm init -y` / `--no-install-deps` remediation hints in stderr, AND that no source-file copy happened (preflight ran before Step 8). 3. Rollback (#9.2 + #9.3) — source declares `unfetchable@1.0.0`; the mock mounts metadata but not the tarball. Trailing install 404s on download, the transaction drops uncommitted, package.json is byte-identical to its pre-`lpm add` state, and the source-file copy was deleted on rollback. Helper `make_source_pkg_tarball(name, version, lpm_config, files)` composes the source-package tarball shape (package.json stub + lpm.config.json + arbitrary source files at the tarball root) for the three tests; reusable for future workflow coverage. The `?withTests=true` URL spec syntax pre-answers the conditional config field deterministically — avoids relying on schema-default coercion under `--yes` to fire the dep map. Workspace nextest: 5273 → 5276 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round 1 — `lpm rebuild --no-sandbox` pairing hardening (Phase 64 #4 + #38): - Parser test pinning the clap `requires = "unsafe_full_env"` constraint - Defense-in-depth `debug_assert!` in `run_under_store_lock` - Command-level `--help` expanded to enumerate executed phases (preinstall/install/postinstall) and recognized-but-not-executed phases (prepare/prepublishOnly), the sandbox-on-by-default contract, and the `--unsafe-full-env --no-sandbox` partner pairing - `prepare` correction across rebuild / install / approve-scripts / glossary / npm-compatibility docs Round 2 — `lpm test --watch` silent-drop fix (Phase 64 #14): - Detect watch flags in forwarded args; rewrite the vitest base from `vitest run` to `vitest` so `--watch` is honored. Pre-fix, vitest silently dropped `--watch` under the `run` subcommand. Jest / mocha unchanged. `lpm bench` unaffected (vitest's `bench` subcommand respects `--watch` natively). Round 3 — `lpm add` source-package dep flow rewrite (Phase 64 #9 / #9.1 / #9.2 / #9.3 / #9.4): #9 — drop the @lpm.dev/* filter that silently lost dep entries declared in source packages. Registry-agnostic dep collection now; shared `collect_source_pkg_deps` helper drives both install and preview / skip-count surfaces. Tightened legacy-fallback gate so a declared-but-unmatched `dependencies` block opts out of the fallback. #9.1 — preserve author-pinned `name@range` specs verbatim; bare names caret-resolve via the registry per Phase 33 save policy. Per-package routing through `.npmrc` so `@corp/ui` from a private registry works the same as a bare npm name. Fail-fast posture: unresolvable bare/dist-tag entries error before `package.json` is mutated. #9.2 — wrap the manifest mutation + trailing install in a `ManifestTransaction`. Snapshot includes the selected PM's lockfile (package-lock.json / pnpm-lock.yaml / yarn.lock / bun.lock+lockb) so external-PM partial writes don't create manifest/lockfile split-brain. All four `--pm` dispatch arms now error-and-rollback instead of warn-and-continue. #9.3 — extend the snapshot to include source-file dest paths. Step 8 file copies roll back too. Required splitting `resolve_safe_dest` into pure validate + mkdir/canonicalize phases, then composing canonical- pinned final dest paths before the snapshot opens (so snapshot path == write path under intermediate-symlink resolution). #9.4 — preflight gate: hard-error before any side effects when a deps-declaring source package would land in a project with no `package.json`. Remediation message points at `lpm init` / `npm init -y`. `--no-install-deps` escape hatch preserved. Plus composed integration tests at `tests/workflows/tests/add.rs` exercising the real CLI binary end-to-end (happy path / preflight / rollback) — closes the test-depth gap audited across the #9.x chain. Schema: `lpm.config.json#dependencies` entries now accept `name@range` syntax alongside bare names. Author docs and JSON Schema description updated in lockstep with the public mirror; drift-guard test pins the parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ng + GraphKey disambiguation Closes the three load-bearing 4d (default-flip) blockers from Phase 66 4b's deferred-items list as a single coordinated change. Without these, every line of Phase 4c would build on top of an empty-peers GraphKey assumption and a `(name, version)`-only key map — exactly the cheap-now / refactor-later trap the user called out. #9 — peer-context threading (resolver → install → linker): - `lpm_resolver::ResolvedPackage` gains `peers: Vec<(String, String)>`, populated from `CachedPackageInfo.peer_deps[version]` intersected against the resolved-versions lookup the resolver already builds for `format_solution` / greedy `into_resolved_packages`. Sorted by peer_name for deterministic GraphKey hashing. Both resolver arms (pubgrub + greedy) populate symmetrically through `compute_resolved_peers` (pubgrub) / inline lookup (greedy, since the node table is the lookup). - `InstallPackage.peers` carries the resolver's output verbatim through `resolved_to_install_packages`. Source-kind paths (Tarball / Directory / Link / lockfile fast-path) populate empty for now; the v2 linker's `ensure_peer_context` re-derives from the just-extracted `package.json` when the field arrives empty, keeping cold-resolve and warm-fast-path producing the same GraphKeys. - `LinkTarget.peers` propagates from `InstallPackage.peers` at every install→link conversion site. v1 ignores the field; hoisted-mode v1 wanting cross-project sharing later can fold it in without further plumbing. #4 — fold peers into the GraphKey: - `lpm_store::v2::GraphKeyInputs::with_peers` now receives the resolved `PeerEntry` list from `LinkTarget.peers` instead of the empty `Vec<PeerEntry>::new()` placeholder. The hash field contract was already in place (`peers` slot in `derive`); we just stop passing nothing into it. - New `with_wrapper_id` setter folds the source-identity disambiguator into the hash so `Source::Registry { foo@1.0.0 }` and `Source::Tarball { foo@1.0.0 from URL X }` produce distinct keys. Empty `wrapper_id` (registry default) preserves the pre-Phase-66 hash so existing v2 store entries don't get invalidated by this addition. #8 — multi-source-same-coords disambiguation in v2 linker key map: - New `KeyMap` type with two indexes — `by_triple` keyed on `(name, version, wrapper_id)` for the consumer's own key lookup, `by_coords` keyed on `(name, version)` for dep / peer edge lookups (which carry only coords today). At construction time, a `(name, version)` collision across distinct `wrapper_id`s surfaces a hard `LpmError::Store` rather than silently aliasing the second target onto the first. Audit- fixtures don't exercise multi-source-same-coords today, so the error is reachable only via a malformed install set; lifting the constraint requires threading wrapper_id through dep edges, a Phase 4 follow-up. v2 linker behavior changes: - `augment_with_peer_edges` renamed to `ensure_peer_context` and rewritten to populate `LinkTarget.peers` (separate Vec) instead of mutating `LinkTarget.dependencies`. The fixed-point closure loop is gone — each consumer's resolved peers is a single per-package fact (the resolver / package.json intersection), not a transitive graph property. Transitive resolution flows through the per-target loop: when peer B is also a LinkTarget, ITS link entry gets ITS own peer siblings. - `populate_one` synthesizes peer-edge sibling symlinks ALONGSIDE dep-edge siblings (peers were previously folded into `dependencies`; now they're a separate pass with explicit dedupe against already-declared deps). - `peerDependenciesMeta.optional` controls trace verbosity for missing peers — required-but-missing emits a debug log pointing at the upstream `check_unmet_peers` gap; optional-missing is silent (npm-compat). Tests: - `link_packages_v2_distinct_keys_for_peer_divergent_projects`: same consumer + same edge graph + DIFFERENT resolved peer versions across two projects must produce distinct GraphKeys (no silent cross-project sharing under peer-pinning divergence). - `link_packages_v2_shares_keys_for_peer_identical_projects`: same consumer + same edge graph + SAME resolved peer version across two projects must produce the same GraphKey (cross- project sharing actually works under peer-pinning agreement — this is the win the v2 rewrite is supposed to unlock). - `link_packages_v2_errors_on_multi_source_same_coords`: malformed install set with two LinkTargets at the same `(name, version)` distinct `wrapper_id` produces a clear `multi-source LinkTarget collision` error rather than aliasing. Pre-merge gate green: - cargo clippy --workspace --all-targets -- -D warnings ✓ - cargo fmt --check ✓ - cargo nextest run --workspace --exclude lpm-integration-tests ✓ (5711/5711 pass; one transient lpm-inspect sqlite-races-under-load flake on first run — rerun clean) - cargo test -p lpm-auth (2× parallel-deterministic) ✓ - audit-fixtures: 17 PASS / 1 SKIP / 0 mixed under both default v1 and `LPM_STORE_VERSION=v2`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ix (#58) * test(workflows): pin concurrency + recovery contracts for lpm install Adds tests/workflows/tests/install_concurrency.rs with 13 falsifiable tests covering production failure modes that had zero coverage: Category A — process racing: * two concurrent installs on same project (pins finding-#77 floor) * install + concurrent store-clean serialize via shared/exclusive store_lock (probed via try_with_exclusive_lock on the actual lock file, not a directory-existence proxy) * two concurrent `lpm install -g` via global_tx_lock — proves final manifest + WAL coherence under serialized commits Category B — interruption recovery: * kill mid-tarball-fetch leaves no .lpm/install-hash * next `lpm install` converges to a coherent end state Category C — network faults: * tarball 503 → 200 succeeds after retry (counting Respond impl) * metadata 404 fails immediately without retry (<2s wall-clock) Category D — filesystem faults: * readonly project dir fails with actionable error (no panic); POSIX-only via #[cfg(unix)], RAII guard restores permissions * `<project>/.lpm` planted as a regular file fails clearly Category E — partial state recovery: * stale install-hash triggers re-resolve + refetch * partial node_modules re-links to full state * truncated lpm.lockb either recovers or fails cleanly (no panic) Category F — WAL recovery hook: * torn WAL tail (3 garbage bytes) gets truncated by the dispatcher's recovery hook before the command runs; idempotent on re-invocation Support helper refactor (same commit so the new helper has callers): * extracts env-isolation set into `LpmEnvSink` trait + `apply_lpm_env(cmd, project)` shared by `lpm()` (assert_cmd) and the new `lpm_spawnable()` / `lpm_spawnable_with_registry()` (std::process::Command, supports Child::kill()) * trait impl on both Command variants ensures the two helpers cannot drift on the ~30 env knobs that gate test isolation Surfaced findings during this work: * #77 — no project-level install lock: concurrent installs silently drop one side's work AND/OR fail with atomic-rename races (3 observed failure modes documented in findings.md). Fix shape: LpmRoot::project_install_lock + with_exclusive_lock_async wrap. * #78 — retry-backoff has no test-friendly knob; retry-exhaustion tests take 15s+. Fix shape: LPM_RETRY_BACKOFF_MS_OVERRIDE env in debug builds. CI gate locally green: clippy --workspace --all-targets -- -D warnings: clean cargo fmt --check: clean fancy-regex ban: empty cargo build --workspace: clean cargo nextest run --workspace --exclude lpm-integration-tests: 6439 passed, 7 skipped, 1 leaky (pre-existing) Deferred (filed under "next session" in the followup plan): B.3 (kill doesn't tear lockfile) — subsumed by B.1/B.2 B.4 (panic injection) — needs LPM_TEST_PANIC_AT env hook C.2 (retry exhaustion) — blocked by finding #78 C.3 (truncated body) — needs custom Respond with Content-Length mismatch D.3 (disk-full simulation) — no portable mechanism F.2, F.3 (orphan WAL, torn WAL with real records) — needs framed-WAL construction helpers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): pin lpm.lock well-formedness + recovery skip-on-contention Closes B.3 and F.2 of the concurrency tranche — 13 → 15 tests, meeting the "≥15 of 21" acceptance criterion for Item 2. B.3 — `install_killed_mid_pipeline_leaves_well_formed_or_absent_lockfile`: Exercises two SIGKILL windows on the install pipeline — fresh project and project with a committed lpm.lock from a prior install. After each kill, asserts the on-disk lpm.lock is either absent OR parses as TOML. Never half-written. Adds `toml = { workspace = true }` as a workflow- tests dev-dep for the parse assertion. Helper `assert_lockfile_well_formed_or_absent` shared between both windows. F.2 — `lpm_command_skips_recovery_when_another_lpm_holds_global_tx_lock`: Validates the dispatcher's `try_with_exclusive_lock` idempotent-skip path at `main.rs:2531`. A background thread acquires `global_tx_lock` via `lpm_common::with_exclusive_lock` and blocks on a channel. With the lock held, runs `lpm global list` against a project with a torn- WAL prefix — asserts the WAL bytes are UNCHANGED (skip arm fired, recovery did not run). Then releases the lock and re-runs; asserts the WAL is now truncated (recovery defers correctly to the next lock-free invocation). Exercises both branches of the `try_with_ exclusive_lock` Ok(None) / Ok(Some) arm. CI gate locally green: cargo clippy --workspace --all-targets -- -D warnings: clean cargo fmt --check: clean cargo nextest run --workspace --exclude lpm-integration-tests: 6441/6441 passed, 7 skipped 5x parallel re-run of install_concurrency: 15/15 stable each run Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): pin truncated-tarball + orphan-WAL recovery contracts Two new tests in tests/workflows/tests/install_concurrency.rs: - C.3 tarball_connection_dropped_mid_body_fails_or_retries: a custom wiremock Respond impl serves half a tarball with a Content-Length header naming the full length. Pins the install pipeline's retry-then-fail behavior on transport-class failures (~14s wall-clock for the full 4-attempt retry schedule). Hyper 1.9 server-side panics on the Content-Length lie, dropping the connection — a valid surrogate for a broken upstream / CDN dropping mid-body. Surfaced 8 tarball GETs per install (deterministic, 3-of-3 reproducer), explained by two distinct download_tarball_* call sites in install.rs each running the 4-attempt retry budget. - F.3 lpm_command_with_orphan_pending_tx_emits_recovery_banner: plants both halves of an orphan transaction (WAL Intent record without matching Commit/Abort + matching [pending.<pkg>] row in manifest.toml pointing at a non-existent install root) and asserts the dispatcher's recovery hook fires the RolledBack banner from main.rs:2543. Sets RUST_LOG=lpm=info to lift the default lpm=warn filter so the tracing::info! line surfaces. Adds lpm-global as a workflow dev-dep for WalWriter / IntentPayload / write_for. Pins post-state: orphan pending row gone, no spurious active row. Together these close the C.3 and F.3 gaps in Item 2 of the test coverage follow-up plan: 17/21 scenarios pinned (was 15/21). The four remaining items all need source-side hooks (LPM_TEST_PANIC_AT, LPM_RETRY_BACKOFF_MS_OVERRIDE, container infra) and are out of scope for this tranche. Full CI gate green: clippy clean, fmt clean, fancy-regex empty, 6443/6443 nextest pass (was 6441 pre-tranche). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): pin tarball-extraction security contracts at install tier New file tests/workflows/tests/tarball_security.rs ships phase 1 of Item 3 (tarball-extraction security): 5 of 10 planned tests covering the most distinct security contracts at the install-pipeline tier. Each test constructs its malicious tarball in-line via tar::Builder (no checked-in fixtures), serves it through MockRegistry, and runs lpm install end-to-end so any pipeline-level regression that bypasses the extractor's hardening is caught. Tests landed: - #1 tarball_with_dot_dot_path_entry_is_rejected_by_install — pokes package/../escape.txt into the raw tar header bytes; install fails with "path traversal detected"; outside sentinel never created. - #3 tarball_with_absolute_path_entry_is_normalized_to_relative_under_package_dir — renamed from "rejected" to reflect actual contract. The extractor's strip_first_component consumes the RootDir; an entry like /etc/lpm-pwned.txt extracts as node_modules/<pkg>/etc/lpm-pwned.txt. Install SUCCEEDS; literal /etc/lpm-pwned.txt is never written. Defensible: malformed-but-safe input normalized rather than refused. - #2 tarball_with_symlink_to_outside_path_is_silently_skipped — renamed. The is_file() gate at lib.rs:398 silently drops symlinks; install succeeds with byte-identical outside sentinel. - #5 tarball_with_hard_link_to_outside_file_is_silently_skipped — renamed. Same is_file() gate; hardlinks silently skipped; outside victim file unmodified. - #8 tarball_with_setuid_executable_extracts_with_setuid_bit_stripped (POSIX-only) — tarball entry mode 0o4755 extracts as 0o755. SUID, SGID, and sticky bits all cleared via set_preserve_permissions(false) + the explicit `0o644 | exec_bits` mode set after write. Exec bits preserved. Three tests carry a "plan-vs-actual" docstring section explaining why the rename is defensible — the actual extractor contract differs from the plan's prescribed phrasing in safe ways, not in regression-grade ways. No findings filed. Phase 2 (5 remaining tests: Unicode normalization, device file, FIFO, zero-byte sanity, OS-max path) is deferred to a follow-up tranche with rationale + lift estimate documented in the plan. None blocks phase 1 acceptance. Pre-merge gate green: clippy clean, fmt clean, fancy-regex empty, 6448/6448 nextest pass (was 6443; +5 for the new tests). 0.18s wall- clock for the full file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): per-project lock prevents concurrent-install data loss Closes finding #77. Two `lpm install <pkg>` invocations on the same project no longer race on the manifest snapshot+commit window. Pre-fix, both processes acquired only a SHARED store_lock and proceeded in parallel. Each opened its own per-process ManifestTransaction snapshot of the pre-edit package.json, staged its own dep on top, and ran the install pipeline. Whoever wrote package.json + lpm.lock last won; the other process's edits — including its node_modules link — silently vanished. Both processes still exited 0 with success-path output. CI scripts that ran two installs in parallel saw no signal of the data loss. The fix introduces: - crates/lpm-common/src/paths.rs::project_install_lock(project_dir): free helper returning <project_dir>/.lpm/.install.lock. Re-exported from crates/lpm-common/src/lib.rs. - run_add_packages and run_install_filtered_add in crates/lpm-cli/src/commands/install.rs now wrap the snapshot → stage → install → finalize → commit window in with_exclusive_lock_async against the project lock. The lock is per-project (no cross-project contention) and held across all ?-early-exits via the async block's return. For the workspace path, the lock sits at the discovered workspace root (not per-member) so two concurrent `lpm install --filter <member>` invocations on the same workspace serialize without per-member deadlock-ordering complexity. run_with_options (the inner install pipeline) does NOT acquire this lock — it's called from inside both run_add_packages's wrap and from many other commands; double-acquiring the same fd-lock would deadlock in-process. Deferred (phase 2, not exercised by A.1): lpm add (add.rs:723-904) has a similar 180-line transaction with recursive Swift handling. Wrapping it is invasive and the race surface is theoretical (users don't typically run `lpm add` and `lpm install` concurrently). Defer to a separate tranche if a concurrent `lpm add` × `lpm install` race is ever observed. Test contract tightening (bug-first per CLAUDE.md): two_concurrent_installs_on_same_project_leave_well_formed_manifest in tests/workflows/tests/install_concurrency.rs went from "at-least-one survives + manifest is well-formed JSON" (the floor) to "BOTH installs succeed, BOTH packages present in package.json deps, BOTH packages linked in node_modules/" (the contract). Pre-fix: 1/1 fail (pkg-b silently dropped). Post-fix: 5/5 pass with no flakes (~1.2s wall-clock each — install B observes pkg-a's commit and reports "Resolved 2 packages"). Pre-merge gate green: clippy --workspace --all-targets clean, fmt clean, fancy-regex empty, 6448/6448 nextest pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(registry): test-only retry-backoff override env knob Closes finding #78 + lands C.2 (`tarball_503_exhausts_retries_fails_with_http_status`). Pre-fix, retry-exhaustion tests were blocked: the registry client's backoff schedule (1+2+4+8s, capped at 10s) made every retry-exhaustion test take ~15s per fetch site (~28s with the install pipeline's 2 distinct download_tarball_* call sites). MAX_RETRIES, RETRY_BASE_DELAY, and RETRY_MAX_DELAY are private const with no env override. C.2 therefore had to be #[ignore]-gated behind LPM_RUN_SLOW_TESTS=1, and the retry-exhaustion contract went unproven on `cargo nextest run`. The fix introduces: - crates/lpm-registry/src/client.rs::backoff_override(): reads LPM_RETRY_BACKOFF_MS_OVERRIDE (a u64 ms value) gated by cfg!(debug_assertions) || LPM_TEST_MODE=1. Returns Some(Duration) when both conditions hold; None otherwise. Production retry policy is immune — release builds without LPM_TEST_MODE=1 silently ignore the env. - backoff_delay(attempt) consults the override before computing the exponential schedule. - The two 429 Retry-After sleep sites also consult the override so a future 429-flood retry-exhaustion test wouldn't hang on the server-supplied header. C.2 test landed alongside (bug-first per CLAUDE.md): - Mock returns 503 on every tarball request — no recovery path. - Test sets LPM_RETRY_BACKOFF_MS_OVERRIDE=10 on the lpm subprocess. - Asserts: install fails non-zero, no panic, ≥4 attempts (proves the retry loop fired), elapsed < 2s (load-bearing — without the knob this fails at ~14s), stderr contains an actionable HTTP-class noun (503 / status / http / network / etc). - Surfaces 8 tarball GETs per install (4 attempts × 2 distinct download_tarball_* call sites — matches C.3's observation). Pre-fix verification: same C.2 against the unfixed client.rs failed on the elapsed assertion at 14.04s (knob ignored). Post-fix: passes in 1.6s cold / 0.1s warm. 5/5 passes with no flakes. Pre-merge gate green: clippy --workspace --all-targets clean, fmt clean, fancy-regex empty, 6449/6449 nextest pass (was 6448 pre-fix; +1 for C.2). Item 2 of the test-coverage-followup-plan now at 18/21 (was 17/21). Both findings #77 and #78 fixed in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): tarball-security phase 2 — Unicode, device, FIFO, zero-byte, long-path Adds 5 more tests to tarball_security.rs, completing Item 3 of the test-coverage follow-up plan. Each test pins the actual extractor contract under malicious-or-edge-case tarball shapes that reach the install pipeline through MockRegistry. Tests landed: - #4 tarball_with_unicode_lookalike_parent_dir_extracts_safely_as_literal_bytes — renamed from "_normalization_traversal_rejected" to reflect the actual contract. Tarball entry path uses full-width dots U+FF0E `．．` (bytewise NOT ASCII `..`). Component::ParentDir is byte-exact, so `．．` becomes Component::Normal. Install SUCCEEDS; `．．` materializes as a literal directory under node_modules/<pkg>/; outside sentinel byte-identical. Defensible because Path::components() doesn't NFKC-normalize on POSIX. - #6 tarball_with_character_device_entry_is_silently_skipped (POSIX-only). EntryType::Char with /dev/null-shaped major/minor. Same is_file() gate as symlinks/hardlinks — silently skipped. Install SUCCEEDS; no device file at the expected path. - #7 tarball_with_fifo_entry_is_silently_skipped (POSIX-only). EntryType::Fifo. Same posture as #6. - #9 tarball_with_zero_byte_regular_file_extracts_as_empty_file. Sanity check that empty files still extract correctly (legitimate npm shape: .gitkeep, license placeholders). - #10 tarball_with_single_path_component_exceeding_name_max_fails_cleanly. 300-byte single-component name, well over POSIX NAME_MAX=255. Tar wire format succeeds via GNU long-name extension; the FILESYSTEM rejects on extraction (ENAMETOOLONG). Extractor wraps as LpmError::Io → install fails non-zero with the OS error visible and an actionable noun in stderr. Three of the five tests are renamed to reflect actual extractor contract vs the plan's prescribed phrasing — same "plan-vs-actual" docstring pattern as phase 1. No findings filed; all 10 contracts across phase 1 + 2 are defensible-as-implemented. Pre-merge gate green: clippy --workspace --all-targets clean, fmt clean, fancy-regex empty, 6454/6454 nextest pass (was 6449 pre-tranche; +5 for the new tests). Full file 0.2s wall-clock for all 10 tests. Item 3 now COMPLETE (10/10). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): cross-command flows Item 4 — migrate→rebuild + workspace filter isolation Closes Item 4 of the test-coverage-followup-plan at 6/6 (target was ≥5). Two additions to tests/workflows/tests/cross_command_flows.rs: - Plan #1 — extended flow_migrate_install_audit_lockfile_round_trips with a `lpm rebuild --dry-run --policy=deny` step. Pins the full migrate → install → audit → rebuild lifecycle. Asserts the rebuild step exits 0 + does not mutate the post-audit state (lpm.lock + lpm.lockb still present). Catches regressions where rebuild's lockfile or build-state parser breaks against a freshly-migrated manifest. - Plan #5 — added flow_workspace_install_filter_member_a_does_not_mutate_member_b (new test, 159 LOC). Pins the workspace-member isolation contract using the workspace-monorepo fixture (3 members: app, core, utils): 1. Initial filtered install on @test/core (re-pinning its existing semver dep) populates core's per-member quadruple: lpm.lock=319 B, lockb=230 B, install_hash=118 B. 2. Snapshot core's full quadruple. 3. Run `lpm install chalk@5.3.0 --filter @test/app` to add a new dep to app ONLY. 4. Assert app's package.json gained chalk; core's quadruple (package.json + lpm.lock + lpm.lockb + install-hash) is BYTE-IDENTICAL post-install; chalk does NOT appear in core's node_modules/. Catches a regression where a per-member filtered install accidentally also mutates a sibling member's package.json / lockfile / install-hash — a real bug class because run_install_filtered_add shares the workspace-root project lock (added in #77 fix) and could over-snapshot if the target-set computation drifts. Helper `mount_pkg_full(mock, name, version)` factors out the three-step metadata + batch-metadata + tarball mount so the test body stays readable. Other 4 plan flows already covered pre-tranche: - Plan #2: flow_add_install_graph_added_dep_visible - Plan #3: flow_install_patch_patch_commit_install_persists_patch - Plan #4: flow_token_rotate_publish_dry_run_picks_new_token - Plan #6: flow_install_upgrade_major_audit_picks_new_version Pre-merge gate green: clippy --workspace --all-targets clean, fmt clean, fancy-regex empty, 6455/6455 nextest pass (was 6454; +1 for the new flow). Plan #5 stable across 5/5 reruns at ~0.11s each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(install): LPM_TEST_PANIC_AT hook + B.4 panic-rollback contract Adds a deterministic panic-injection hook to the install pipeline + unblocks the long-deferred B.4 contract test for ManifestTransaction Drop-based rollback on panic. The hook (`maybe_test_panic(stage)` in crates/lpm-cli/src/commands/install.rs) reads LPM_TEST_PANIC_AT and panics when the env value matches the stage name. Gated to `cfg!(debug_assertions) || LPM_TEST_MODE=1` — same pattern as the #78 retry-backoff override. Production builds without LPM_TEST_MODE=1 silently treat the env as no-op. Wired 4 stages in `run_add_packages`: - "after-snapshot" — manifest unchanged; Drop is no-op - "after-stage" — placeholder `*` written to package.json (load-bearing) - "after-install" — pipeline complete; manifest still has `*` - "after-finalize" — concrete versions written; pre-commit only The hook unblocks B.4 (`install_panics_mid_pipeline_rollback_restores_manifest`), deferred since the original Item 2 tranche because there was no deterministic way to trigger a panic mid-install from a workflow test. Recoverable errors fire `?`-rollback (covered by E.1/E.2/E.3); SIGKILL bypasses Drop entirely (B.1/B.2/B.3 cover that). The panic path was the missing rollback proof. B.4 sets LPM_TEST_PANIC_AT=after-stage and asserts: - process exits non-zero (panic propagates to runtime) - stderr contains `"panicked at"` AND `"LPM_TEST_PANIC_AT=after-stage"` - package.json BYTE-IDENTICAL to pre-stage (Drop ran on unwind, snapshot bytes restored — load-bearing) - the new pkg is NOT in dependencies (placeholder rollback worked) - .lpm/install-hash absent (invalidate-on-rollback) - lpm.lock absent (matched optional snapshot's None pre-state) Catches a regression where: - panic = "abort" added to release profile (no Drop on panic) - ManifestTransaction Drop logic stops restoring snapshot bytes - The `lpm install` snapshot+commit window grows without re-wiring Drop Test runs in 0.07s warm. 5/5 stable across reruns. Pre-merge gate green: clippy --workspace --all-targets clean, fmt clean, fancy-regex empty, 6456/6456 nextest pass (was 6455; +1 for B.4). install_concurrency now at 19/19. Item 2 of test-coverage-followup-plan moves to 19/21 — only A.2 (no contract) and D.3 (needs container infra) remain deferred indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(workflows): align MockRegistry tarball URL shape with production /-/ gate Workflow tests mounted tarballs at `/tarballs/{name}-{version}.tgz` — missing the `/-/` path segment that the registry-client's `evaluate_cached_url` gate at [crates/lpm-registry/src/client.rs#L4117] requires (`.tgz` suffix AND `/-/` substring). The gate is a defense-in-depth check that blocks the H1 auth-token leak: a tampered lockfile URL like `/api/admin/foo.tgz` (no `/-/`) would otherwise attach the bearer to a non-registry endpoint. The mismatch produced two test-environment side effects that don't manifest in production: 1. **WARN noise**: every install test that read a tarball URL from the lockfile fast path logged `cached tarball URL for X@Y failed shape check; falling back to on-demand lookup`. Polluted stderr across the suite. 2. **`shape_mismatch_count` defeated**: the registry-client documents this counter as a "BUG signal — the writer should never emit a gate-rejectable URL". Test runs incremented it on every install, making the counter useless for catching real bugs. This commit migrates the mock to the production-shape `/tarballs/{name}/-/{name}-{version}.tgz` everywhere — both the helper methods (`MockRegistry::tarball_path` / `tarball_url`) and the ~60 hard-coded `format!` sites across 14 test files + 1 snapshot. The new `tarball_path` helper is `pub` with a prominent docstring warning future test authors not to re-introduce the legacy shape. Internal mounts in `with_package_and_deps` / `with_package_published_at` / `with_full_package_metadata` all route through it. Post-fix verification: WARN gone, gate `Accepted` path runs, all 691 lpm-workflows tests pass (0 leaky in the latest full-workspace run, down from 1-3 leaky pre-fix — fewer fallback paths firing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): test-coverage-followup tranche — Items 2/3/4/5 Closes the remaining open rows from `private/test-coverage-followup-plan.md` across four items. ~2,600 LOC of new test code + fixture + budget infra. **Item 3 — tarball-security additional candidate surfaces (7 tests in `tarball_security.rs`):** - `tarball_with_pax_path_traversal_rejected` — PAX extended `path` header smuggling `..` is rejected by the extractor's `Component::ParentDir` check after the tar crate resolves the override. - `tarball_with_gnu_longname_traversal_rejected` — symmetric GNU `L` entry; same rejection path. - `tarball_rejects_or_rolls_back_when_later_entry_is_malicious` — pins the `rollback_extraction` contract: valid first entry is cleaned up when a later `..`-traversal entry trips rejection mid-stream. - `tarball_with_duplicate_member_path_rejected_or_deterministic` — pins current last-write-wins contract (defensible; flagged scanner- disagreement risk in test comment). - `tarball_with_truncated_gzip_rolls_back_partial_extract` — half- truncated gzip stream → libdeflate fails cleanly → no partial extract. - `tarball_ignores_uid_gid_ownership_metadata` (POSIX) — bogus uid/gid in tar header is ignored; extracted files owned by process uid. - `tarball_with_sparse_huge_file_rejected_by_declared_size` — manually- constructed tarball with header declaring `MAX_FILE_SIZE + 1` and empty on-wire body; extractor rejects on the pre-check at lib.rs:306 before draining body. **Item 4 — cross-command flows additional candidate surfaces (2 tests in `cross_command_flows.rs`):** - `flow_install_uninstall_install_graph_round_trip` — pins manifest / link / graph hand-off through a full round-trip. - `flow_cache_clean_then_offline_install_uses_store_or_fails_helpfully` — pins the cache/store boundary: `cache clean` must not corrupt offline install; store-side bytes byte-identical after a clean. **Item 2 — concurrency/recovery additional candidate surfaces (3 tests in `install_concurrency.rs`):** - `cache_clean_during_slow_tarball_install_does_not_corrupt_install` (G.4) — install + cache clean run concurrently (different lock paths, no serialization); install succeeds despite metadata cache wipe mid-stream. Empirical timing observed: install elapsed 1.57s, cache clean fired at t=30-39ms cleanly inside the install window. - `install_panics_after_install_hash_write_rollback_invalidates_hash` (G.5) — reuses existing `LPM_TEST_PANIC_AT=after-install` stage (no new source-side hook needed — `write_post_install_v6_hash` runs inside `run_with_options` which returns BEFORE that stage fires). Pins that Drop-based rollback restores manifest AND deletes the freshly-written install-hash. - `malformed_registry_json_fails_without_manifest_or_lockfile_mutation` (G.6) — truncated JSON on all three metadata endpoints; install fails cleanly, no panic/backtrace, package.json byte-identical, no torn lockfile. **Verdaccio-npm parity for `which@4.0.0` (`install_real_registry.rs`):** - `verdaccio_npm_parity_for_bin_package_pins_metadata_and_shim_presence` — extends the existing lodash byte-diff with a bin-shipping target package. Asserts metadata equivalence + `.bin/<name>` shim present on both sides + bin target file materialized + exec bits non-zero (POSIX). **Item 5 — realworld fidelity (new fixture + new test file):** - `tests/fixtures/realworld-nextjs/` (package.json + README) — pinned Next.js 14.2.13 + React 18.3.1 + TypeScript 5.6.3 + 3 `@types/*` packages. Resolves to ~28 transitive deps empirically. README documents the calibration methodology including raw measurement data. - `tests/workflows/tests/install_realworld.rs` — `install_realworld_nextjs_fixture_succeeds_through_verdaccio` installs the fixture through Verdaccio→npmjs and asserts end-to-end success at production scale. Always logs cold + warm wall-clock + peak RSS to stderr for calibration data. - **`LPM_BUDGET_GATE=1`-gated budget assertions**: cold ≤ 25s, warm ≤ 25ms, cold peak RSS ≤ 1500 MiB. Calibrated from N=6 cold + N=3 warm + N=3 RSS runs on M-series macOS, 2026-05-14. Memory measurement via `/usr/bin/time -l` (macOS) / `-v` (Linux); Windows skips with a clear warning. This closes Item 5 entirely (all 4 acceptance criteria green) and brings Items 2/3/4 to the parked-by-design or infrastructure-blocked baseline. CI gate: clippy `--workspace --all-targets -- -D warnings` clean, fmt clean, fancy-regex empty, build clean, `cargo nextest run --workspace` 6471/6471 pass. Suite runtime ~2:40 (was ~2:24 pre-tranche; +15s for the realworld test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(workflows): collapse Linux-only let-chain in parse_peak_rss CI lint on Linux failed on `clippy::collapsible_if` in the Linux-cfg'd branch of `parse_peak_rss`. The macOS branch had an intermediate `let bytes_str = rest.trim();` between the two `if let`s, which is why the local clippy run on macOS didn't catch this — only the macOS-cfg branch compiled there. Collapse the Linux branch to use `&&` (stable let-chains) so it satisfies the lint while preserving the same semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tolgaergin and others added 2 commits April 25, 2026 22:42

tolgaergin merged commit e35890b into main Apr 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 51: Sigstore Bundle v0.3 fix + provenance tracing + scriptable_package_rows parallelization#9

Phase 51: Sigstore Bundle v0.3 fix + provenance tracing + scriptable_package_rows parallelization#9
tolgaergin merged 2 commits into
mainfrom
phase-51-investigation

tolgaergin commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tolgaergin commented Apr 25, 2026

Summary

What's in this PR

Empirical impact

Bug-first regression tests

CI gate (run locally pre-merge)

Test plan

Related docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant