Skip to content

fix(registry): NDJSON parse loop — O(n²) scan + wall-clock timeout#3

Merged
tolgaergin merged 2 commits into
mainfrom
phase42-ndjson-batch-prefetch
Apr 17, 2026
Merged

fix(registry): NDJSON parse loop — O(n²) scan + wall-clock timeout#3
tolgaergin merged 2 commits into
mainfrom
phase42-ndjson-batch-prefetch

Conversation

@tolgaergin
Copy link
Copy Markdown
Contributor

@tolgaergin tolgaergin commented Apr 17, 2026

TL;DR

Two root-cause fixes to the batch-metadata NDJSON parse loop. Decision-gate cold install: 46 s → 6.7 s (7× faster).

Commit 1 — wall-clock timeout (20e757f). reqwest::Client::builder().timeout(30s) fires on body reads too. 66 MB streams exceed 30 s. Replaced with .connect_timeout(10s) + .read_timeout(30s). Surfaced the real bottleneck (below) by letting the stream actually complete instead of crashing with operation timed out <- request or response body error <- error decoding response body.

Commit 2 — quadratic \n scan (621cf84). Every chunk appended to buffer re-scanned from offset 0 for the next newline. Average NDJSON line ~200 KB arriving over ~30 chunks → ~4 MB scanned per line × 365 lines ≈ 1.5 GB of re-scans at ~74 MB/s ≈ 21 s. Added a scan_from cursor. Every byte now scanned at most once → ~66 ms for 66 MB. 300× speedup on the scan alone.

Sub-timer narrowing

After the timeout fix the stream completed but the same initial_batch_ms ≈ 40 s persisted vs raw reqwest at ~7 s. Added 10 finer-grained phase timers to the parse loop and reran cold:

total=22146ms | chunk_wait=351ms extend=18ms scan=21088ms utf8=3ms
              | parse=253ms cache_write=306ms clone=116ms send=1ms
              | map_insert=0ms drain=0ms | chunks=11023 bytes=76492350 entries=365

scan at 21 s was 95 % of the loop. Every other CPU phase summed to ~700 ms, ruling out the usual suspects (channel backpressure, sync cache writes, JSON parse, memcpy).

Popular theories this falsified

From an independent second-opinion review (Gemini, no code access) that diagnosed the symptom as a "4.5× ingestion tax":

Theory Verdict
mpsc::channel(32) too small, backpressure traps the reader False — channel is 512, verified at install.rs:3203
Synchronous fs::write per line adds ~5 ms × 325 = 1.6 s False — total cache_write measured at 306 ms, including serialize + HMAC + syscall
Speculation tarball fetches competing for bandwidth False — identical initial_batch_ms with LPM_SPEC_FETCH=0
reqwest buffer allocation / TLS decode overhead False — raw reqwest bench (same endpoint, same client config, zero processing) consumes the body in ~7 s cold / ~2 s warm at 8-33 MB/s
Server is the bottleneck (streaming slowly) False — curl pulls 66 MB in 7 s over H2 = 9.4 MB/s
Needs Rayon / spawn_blocking / bigger read buffers False — the loop is CPU-trivial once scan is fixed

What was real:

Theory Verdict
"Line-by-line tax" (right symptom, wrong cause — sync I/O guess) Partially correct — the tax WAS per-line, but it was O(n²) .position(b'\n'), not blocking I/O
"Rust client is 5× slower than curl at ingesting the same bytes" Correct symptom diagnosis — 1.7 MB/s vs 9.4 MB/s measured

Post-fix measurements (cold, lpm.dev)

fixture pre-PR total_ms post-PR total_ms delta
51-pkg (Phase 39 guard) ~573 516–671 same (trees too small to exhibit the quadratic)
280-pkg (Phase 40 scaling) ~6 500 3 227–3 645 −50 %
decision-gate (Phase 40 P4) 44 054 6 704–7 136 warm, 15 006 cold −85 % warm, −66 % cold-cold

Decision-gate breakdown, median of warm runs:

metric pre-PR post-PR delta
initial_batch_ms 39 977 2 942 −93 %
followup_rpc_ms 3 732 1 639 −56 %
followup_rpc_count 45 45 same
fetch_ms 1 244 1 221 same
resolve_ms 44 462 5 383 −88 %

Bun's reported ~3.6 s on this fixture is now ~2× away, down from ~12×.

Tests

Two regression tests from the timeout commit still cover the read_timeout behavior (batch_metadata_deep_tolerates_slow_streaming_body_under_read_timeout + batch_metadata_deep_fails_under_old_wallclock_timeout). The scan fix doesn't need a dedicated test — its correctness falls out of the existing streaming tests which consume multi-chunk bodies, and its performance impact is an O(n²) → O(n) cleanup that shows up in benchmarks rather than correctness assertions. Both tests pass in 0.8 s.

Metadata-bloat sensitivity follow-up (the side question)

Investigated during diagnosis; still applies and still worth doing as a separate PR. lpm-resolver::ranges::to_pubgrub_ranges(&available_versions) is O(N) in version count, uncached, and PubGrub calls it O(queries) times per package during backtracking. The +962 ms pubgrub_core_ms regression Phase 41 measured when 9 extra packages were pre-fetched matches the K×N×queries arithmetic. Memoizing (pkg, range) → Ranges is a bounded 1–2 day PR. File separately.

CI gate locally green

  • cargo clippy --workspace -- -D warnings
  • cargo fmt --check
  • fancy-regex ban
  • cargo build --workspace
  • cargo nextest run --workspace --exclude lpm-integration-tests --no-fail-fast3641 passed, 7 skipped, 0 failed
  • cargo test -p lpm-auth ×2 — 43 passed both (parallel-deterministic)
  • 3 cold installs on each of 51-pkg, 280-pkg, decision-gate against lpm.dev — zero WARN

🤖 Generated with Claude Code

tolgaergin and others added 2 commits April 17, 2026 15:45
`RegistryClient::new` configured `reqwest::ClientBuilder::timeout(30s)` —
a wall-clock cap covering the entire request + response cycle, body
read included. On the decision-gate fixture (54 direct deps, ~66 MB
deep NDJSON response) the server legitimately takes 30+ seconds to
stream the body, so the timer fired mid-body at ~51 MB / 7500 chunks
and every cold install above ~40 roots logged `WARN batch prefetch
failed, falling back to sequential resolution (slower): registry
error: NDJSON read error ... error decoding response body <- request
or response body error <- operation timed out`.

Root-cause diagnosis: expanded `reqwest::Error` Display to walk the
full `source()` chain so the kind-only top-level string stops hiding
the hyper-level cause. The chain surfaced `operation timed out` as
the real reason. Cross-checking CPU vs wall-clock timing on the same
fixture showed parse (85 ms) + cache_write (88 ms) = 173 ms total
CPU, i.e. the remaining 29.8 s was genuinely network wait. Server
is legitimately slow for this response size; the old wall-clock cap
is the wrong tool.

Fix: swap `.timeout()` for `.connect_timeout(10s) + .read_timeout(30s)`.
`read_timeout` is a per-read idle timer that resets on each
successful chunk, so a slow-but-progressing stream completes intact;
a genuinely stalled server still gets interrupted at 30 s without a
read.

Behavioral regression tests (new, in `lpm-registry` client tests):
  - `batch_metadata_deep_tolerates_slow_streaming_body_under_read_timeout`
    — drives a custom TCP-level streaming mock server that sends
    4 NDJSON chunks at 200 ms apart (800 ms total) through a client
    configured with `read_timeout=500ms`. Pre-fix wall-clock would
    kill at 500 ms; post-fix the request completes with 4 entries.
  - `batch_metadata_deep_fails_under_old_wallclock_timeout` — same
    server, this test instantiates the OLD-style client via a raw
    `reqwest::Client::builder().timeout(500ms)` and asserts the
    request aborts with a timeout-sourced `Registry` error. Pinned
    as a forward-regression guard: anyone re-introducing `.timeout()`
    on the prod builder trips this test.

Post-fix prod measurements (lpm.dev, 3 cold runs each, median):

  Fixture         | pre-fix total_ms | post-fix total_ms | WARN? | followup_rpc_count
  ----------------+------------------+-------------------+-------+---------------------
  51-pkg (Phase 39)|      ~573        |       542         | none  |         1
  280-pkg (Phase 40)|     ~6 500      |      6 759        | none  |        20
  decision-gate   |     44 054       |     46 454         | none  |        45  (was 73)

On decision-gate specifically, total wall-clock grew ~2.4 s because
pre-fix was silently short-circuiting to sequential resolution at
the 30 s mark (fetch was then slower from individual lookups:
3 269 ms pre-fix vs 1 244 ms post-fix — a 2.0 s fetch win). Net is
roughly flat, but with the Phase 38 P3 speculation dispatcher now
actually engaged — 0 spec tasks dispatched pre-fix vs 345 / 345
post-fix (288 transitive, max_depth_reached 5). Previously, every
cold install of a fixture above ~40 roots quietly disabled the
speculation path the Phase 38 code was designed to run. The
`followup_rpc_count` drop (73 → 45, −38 %) reflects the metadata
cache now being populated by the full deep batch rather than
piece-by-piece.

Diagnostic instrumentation added (kept for future incidents):
reqwest error source-chain walk + per-chunk byte/chunk counters in
the warn string. Cheap (runs only on error), load-bearing for
future transport failures.

Metadata-bloat sensitivity finding (investigated during diagnosis,
noted here for future optimization — out of scope for this PR):

`lpm-resolver::ranges::to_pubgrub_ranges(&available_versions)` runs
on every PubGrub `get_dependencies` call and is O(N) in the number
of versions for that package. Not memoized. Adding metadata for K
new packages × avg 50 versions each × ~3 PubGrub backtrack queries
per package matches the +962 ms pubgrub_core_ms regression measured
during Phase 41's A/B (see phase41 postmortem). Memoizing
`(pkg, range) → Ranges` would cut that factor. Track as follow-up.

CI gate locally green:
  - cargo clippy --workspace -- -D warnings
  - cargo fmt --check
  - fancy-regex ban
  - cargo build --workspace
  - cargo nextest run --workspace --exclude lpm-integration-tests
    (3641 passed, 7 skipped, 0 failed)
  - cargo test -p lpm-auth x2 (43 passed both runs, parallel-deterministic)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the read_timeout fix in this same PR. With the timeout
bug out of the way, the decision-gate cold install STILL showed a
~5× gap between the raw reqwest body drain (~7 s for 66 MB at
9.5 MB/s) and our parse loop (~40 s at 1.7 MB/s). This commit
closes that gap.

## Narrowed with sub-timers

Added temporary per-phase timers around the NDJSON parse loop and
reran the decision-gate cold install. On a 22-second run that
consumed 76 MB / 11 000 chunks / 365 entries, the breakdown was:

  phase          ms
  -----------    ----
  chunk_wait     351  (reqwest)
  extend         18
  scan           21088  ← 95% of the loop
  utf8           3
  parse          253
  cache_write    306
  clone          116
  send           1
  map_insert     0
  drain          0
  total          22146

Every CPU path OTHER than scan summed to ~700 ms. The scan alone
burned 21 s. This empirically falsifies several popular theories about
where the tax lived (channel backpressure — channel is size 512, not
32; sync cache writes — only 306 ms; speculation bandwidth
competition — same 40 s with `LPM_SPEC_FETCH=0`).

## Root cause

The inner `while let Some(newline_pos) = buffer.iter().position(|&b|
b == b'\n')` restarted from offset 0 on every iteration. With an
average NDJSON line of ~200 KB arriving over ~30 chunks of ~9 KB, we
re-scanned the same bytes every time a new chunk landed:

  chunk 1: scan 9 KB → no newline
  chunk 2: scan 18 KB (9 KB re-scanned)
  chunk 3: scan 27 KB (18 KB re-scanned)
  ...
  chunk 30: scan 270 KB → newline found

Per-line cost: sum(i × 9 KB for i in 1..30) ≈ 4 MB scanned.
Across 365 lines: ~1.5 GB scanned.
At `[u8]::iter().position()`'s ~74 MB/s throughput (bounds-checked,
non-SIMD): ~21 s. Matches measurement exactly.

## Fix

Track a `scan_from: usize` cursor marking the first byte we haven't
yet inspected for a newline. After each chunk append, resume scanning
from `scan_from` instead of 0. When a line is drained, reset to 0
because the remaining bytes shifted. Net: every byte scanned at most
once → O(total_bytes) ≈ 66 ms for 66 MB.

## Post-fix measurements (lpm.dev, cold installs)

  fixture        | pre-scan-fix total_ms | post-scan-fix total_ms | delta
  ---------------|----------------------:|-----------------------:|--------
  51-pkg         |          542          |         516-671        | ~same
  280-pkg        |         6759          |         3227-3645      | −49%
  decision-gate  |        46454          |         6704-7136 warm |
                 |                       |        15006 cold run  | −85% warm
                 |                       |                        | −68% cold

Sub-breakdown on decision-gate (warm cache, 2 of 3 runs shown):

  metric              | pre-fix | post-fix (run 2) | delta
  --------------------|--------:|-----------------:|--------
  initial_batch_ms    |  39 977 |            2 942 | −93%
  followup_rpc_ms     |   3 732 |            1 639 | −56%
  followup_rpc_count  |      45 |               45 | same
  fetch_ms            |   1 244 |            1 221 | same
  resolve_ms          |  44 462 |            5 383 | −88%

initial_batch_ms now within ~1.5× of the raw reqwest drain (which
tops out at ~30 MB/s warm → ~2 s for this body). followup_rpc_ms
dropped too because those follow-up RPCs run the same parse loop
and were paying the same quadratic tax.

Bun's reported ~3.6 s on this fixture is now 2× away, down from 12×.

## Tests

Both streaming regression tests from the earlier timeout fix still
pass. The `iter().position` scan quadratic was present in pre-scan-fix
runs of the timeout-tolerance test too, but the test uses only 4
chunks × 200 ms = 800 ms, where the quadratic cost is still sub-ms
and doesn't change test outcomes. The post-fix test completes ~20 ms
faster but nothing load-bearing shifts.

## On Gemini's speculation (2026-04-17)

An independent second-opinion review attributed the 5× gap to
`mpsc::channel(32)` backpressure, synchronous line-by-line I/O, and
reqwest buffer allocation patterns. Checking each: channel was
actually 512, not 32; cache-write + parse totalled 560 ms not
1.6 s; raw reqwest benched at 30 MB/s on warm runs (no allocation
problem). The actual cause was a classic O(n²) algorithm bug in our
own code that looked like a transport bottleneck from the outside.
Noted in `DOCS/new-features/37-rust-client-RUNNER-VISION-phase41.md`
follow-ups; Phase 42's "ingestion throughput" work item is now
closed in a single commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tolgaergin tolgaergin changed the title fix(registry): replace wall-clock request timeout with read_timeout fix(registry): NDJSON parse loop — O(n²) scan + wall-clock timeout Apr 17, 2026
@tolgaergin tolgaergin merged commit e7a30f4 into main Apr 17, 2026
3 checks passed
@tolgaergin tolgaergin deleted the phase42-ndjson-batch-prefetch branch April 17, 2026 17:20
tolgaergin added a commit that referenced this pull request Apr 17, 2026
Phase 42 — resolver correctness + perf fix + algorithmic insurance.

Since v0.19.1:

- **fix(registry)** — Phase 42 P0/P1 NDJSON parse loop — O(n²) scan
  + wall-clock timeout (#3). Decision-gate install 46 s → 6.7 s
  (7× speedup) when the registry streams large batch-metadata
  responses. The O(n²) scan was the dominant cost in NDJSON parsing
  for large packuments; a rolling-offset rewrite brings it to O(n).
  Wall-clock timeout defends against slow-emit registries stalling
  the resolver.

- **fix(install)** — Phase 40 P4 split-context dedupe (#2). When
  two sibling parents produced the same grandchild under different
  split contexts, the grandchild was duplicated in the fetch/link
  plan. Dedupe on canonical `(name, version)` before fetch
  dispatch. No user-visible lockfile change.

- **perf(resolver)** — Phase 42 P2 `NpmRange → pubgrub::Ranges`
  memoization (#4). Null-result in benchmarks but shipped as
  algorithmic insurance: the conversion is O(m) per call and was
  re-computed on every PubGrub visit. Memoized table eliminates
  redundant work. Neutral on current fixtures, protective against
  pathological cases.

- **chore(ci)** — Node.js 24 opt-in for JavaScript actions ahead
  of GitHub's 2026-06-02 forced-default.

No breaking changes. Lockfile compatibility unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 18, 2026
Three tightenings after the 2026-04-18 GPT audit of P43-1:

## Finding #1 (Medium) — stale v1 `lpm.lockb` persists across reads

`read_fast` falls back to TOML when binary open returns
`UnsupportedVersion`, but previously left the v1 file on disk.
`read_fast` is called from `lpm install`, `lpm upgrade`, AND
`lpm outdated`. Read-only commands never trigger a write, so an
upgraded user would pay the open-reject + TOML-parse cost on every
`lpm outdated` invocation forever — a real perf regression when
shipping P43-1 standalone before P43-2's install writeback lands.

Fix: best-effort delete the stale binary when open returns
`UnsupportedVersion`. Deletion is scoped to the version mismatch
only; other errors (corrupt magic, structural issues) leave the
file on disk for forensic inspection. Delete failures (read-only
FS, permission denied) are swallowed — correctness still holds via
the TOML fallback.

Test `phase43_read_fast_falls_back_to_toml_when_binary_is_v1`
flipped to assert deletion. New test
`phase43_read_fast_preserves_binary_on_non_version_errors` guards
against aggressive-deletion regression.

## Finding #2 (Low) — corrupt tarball pair surfaces `Some("")`

`BinaryLockfileReader::open` validated the deps-table layout but
not the new tarball pair. Corrupt `(tarball_off, tarball_len)`
bytes flowed through `read_str` (which degrades out-of-bounds
reads to `""`), so `tarball()` returned `Some("")` — silent
corruption that P43-2 would later feed into the shape gate.

Fix: extend the per-entry validation loop in `open` to bounds-
check the tarball pair against the string table length. `(0, 0)`
is the null sentinel and bypasses the check; any other pair must
fit within `[string_table_off, EOF)`. Corruption now forces TOML
fallback via `read_fast` — matching how v1 handled source/integrity
range issues via the deps-validation mechanism.

New test `phase43_open_rejects_corrupt_tarball_pair` exercises the
`u32::MAX` offset case.

## Finding #3 (Low) — empty-string rejection asymmetric

The binary writer rejected empty `source` / `integrity` / `tarball`
at serialization time (via `insert_optional`), but `from_toml` was
pure serde — it accepted `tarball = ""` cleanly and only failed
later when `write_all` tried to emit the binary. P43-1's commit
message claimed "consistent rejection" which was wrong.

Fix: add a validation pass in `from_toml` that walks all packages
and rejects empty `source` / `integrity` / `tarball`. Matches the
binary writer's rejection at the parse boundary, preventing
asymmetric late failures.

New test `phase43_from_toml_rejects_empty_optional_strings`
covers all three fields parametrically.

## Results

74 lockfile tests pass (+3 from +2 existing tests). Workspace
nextest: 3659 pass (+3 since P43-1 initial commit).

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3659 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 18, 2026
…metry completeness

Addresses three GPT-audit findings on P43-2 commit 1, turning the
gate-accepted URL path into a safe standalone rollout. Commit 1's
claim of "clean standalone rollout point" was wrong — a stale
stored URL that passed the scheme/shape/origin gate would
immediately hard-fail the install (both legacy line 3711 and
streaming line 3814 routed `NotFound` straight to
`handle_tarball_not_found`). This commit is **safety**, not
cleanup — P43-2 commit 1 should not ship without it.

## Finding #1 (Medium) — stale URL = first-run hard failure

Pre-Phase-43, the lockfile didn't store URLs, so every fetch did
its own metadata round-trip first — stale upstream paths were
refreshed transparently. Post-commit-1, a stored URL that the
gate accepts is reused as-is; if upstream republished or migrated
the tarball path, the download 404s and lockfiles get nuked.

Fix: both fetch paths gain a **same-run retry** on stored-URL
404s. On `LpmError::NotFound` where `p.tarball_url.is_some()`:

1. Invalidate the metadata cache for this package.
2. Re-resolve via `resolve_tarball_url(..., cached_url=None)` to
   force a real metadata round-trip.
3. Guard against loop: if fresh URL == stale URL, metadata itself
   is stuck → fall through to `handle_tarball_not_found`.
4. Retry the download ONCE with the fresh URL.
5. On success: bump `stale_recovery` counter, carry on.
6. On second 404: bump `stale_hard_fail` counter, fall through
   to `handle_tarball_not_found`.

On-demand path 404s (no stored URL) skip retry — there's nothing
stale to refresh.

## Finding #2 (Medium) — `handle_tarball_not_found` is CWD-relative

Pre-fix `Path::new("lpm.lock")` / `Path::new("lpm.lockb")` —
deletes relative to process CWD, not the project root. A
programmatic install from a nested directory would leak stale
lockfile state (the `lpm.lock` at project root stays, the retry
repeats indefinitely). Now takes `project_dir: &Path` and uses
`project_dir.join(LOCKFILE_NAME)` / `.join(BINARY_LOCKFILE_NAME)`.

`project_dir` is threaded through `fetch_and_store_legacy` and
`fetch_and_store_streaming`; captured from the existing
`project_dir_buf` in the task dispatch closure (line 1663).

## Finding #3 (Low) — RejectedScheme had no counter

`try_lockfile_fast_path`'s `RejectedScheme` branch previously
only logged. Now bumps `gate_stats.scheme_mismatch` so corrupt-
lockfile signals are observable in telemetry (symmetric with
shape/origin — all three rejection types now have counters).

## GateStats expanded

Adds three AtomicU64 counters: `scheme_mismatch`,
`stale_recovery`, `stale_hard_fail`. All surfaced in the JSON
`timing.fetch_breakdown.tarball_url_gate` object.

## Legacy fetch path restructured

`fetch_and_store_legacy` previously delegated to
`fetch_tarball_to_file` (URL resolve + download composed). For
the retry path to distinguish a metadata 404 from a download
404, the two steps are now inline; `fetch_tarball_to_file` is
removed as orphaned. Behaviorally equivalent on the happy path.

## What's NOT in this commit (lands in P43-2 commit 3/3)

- Generalized writeback trigger (C1 + C3 + C4 from the audit
  passes) — with retry in place, convergence is a perf concern
  not a correctness one: every install pays one extra metadata
  round-trip for each stale package until the lockfile is
  refreshed by some other trigger (add/remove dep).
- Regression tests (9 cases from the design doc).

## Results

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3669 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md (§P43-2 Changes 2 + 4)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 18, 2026
* feat(lockfile): Phase 43 P43-0 — add `tarball` field to LockedPackage

Adds `pub tarball: Option<String>` to `LockedPackage` with
`#[serde(default, skip_serializing_if = "Option::is_none")]`.
`LOCKFILE_VERSION` stays at 1 — the change is additive-only at the
TOML layer. Old lockfiles without the field parse as `None`; new
lockfiles with all-None `tarball` serialize byte-identically to
pre-Phase-43 lockfiles.

The TOML writer now threads `InstallPackage.tarball_url` →
`LockedPackage.tarball` at both call sites in `install.rs`
(lockfile-fast-path writer and fresh-resolve writer).

Touches 96 `LockedPackage { ... }` construction sites across 14
files. Every site gets an explicit `tarball: None,` for diff
pattern-matchability rather than `..Default::default()`.

Also fixes three pre-existing breakage sites in integration tests
that were never reached by CI (`--exclude lpm-integration-tests`):
missing `alias_dependencies` (from Phase 40 P2) plus the new
`tarball` field in `tests/integration/tests/{core,output_parity}.rs`,
and missing `aliases` + `root_link_names` on `LinkTarget` in
`core.rs`. Noticed-while-in-file; keeps the integration-test
crate compilable.

New tests (5 in `lpm-lockfile`):
- `phase43_tarball_roundtrips_when_present`
- `phase43_tarball_absent_keeps_old_lockfiles_byte_identical`
- `phase43_tarball_mixed_population_roundtrips`
- `phase43_old_lockfile_without_tarball_field_parses`
- (plus integration crate sites compile)

P43-1 (binary v2 layout) and P43-2 (fast-path gate + stale-URL
retry + generalized writeback) land in follow-up commits.

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3647 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x)  43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lockfile): Phase 43 P43-1 — binary v2 layout with tarball URL slot

Bumps `BINARY_VERSION` 1 → 2. Appends a `(tarball_off: u32,
tarball_len: u16)` pair to every package entry, growing
`ENTRY_SIZE` 30 → 36 bytes. The `(0, 0)` null sentinel distinguishes
`None` from a real URL — same pattern already used for
`source` / `integrity`.

## Strict version check (v2 reader rejects v1 strictly)

Changed from `version == 0 || version > BINARY_VERSION` to
`version != BINARY_VERSION`. The per-entry layout differs between
versions (v1 = 30B, v2 = 36B) — a v2 reader decoding v1 entries
as 36-byte entries would read package N's `name_off`/`name_len`
as package N-1's (nonexistent) tarball pair and produce garbage.
`read_fast` catches the error and falls through to TOML; P43-2
will add a dedicated writeback trigger so fast-path installs also
complete the v1→v2 binary migration.

## Empty-string rejection (M2 from 3rd-pass audit)

`StringTable::insert("")` on the first insert would produce
exactly `(off=0, len=0)` — indistinguishable from the null
sentinel. The new `insert_optional` helper rejects empty strings
outright for all three optional fields (`source`, `integrity`,
`tarball`). Empty tarball URLs / integrity hashes / sources are
nonsensical input; failing loud is correct.

## Tests added (9 new, total 71 pass)

- `phase43_entry_size_is_36_bytes` — wire-format invariant guard.
- `phase43_tarball_roundtrips_through_binary` — Some(url) survives.
- `phase43_mixed_tarball_population_roundtrips` — rollout-window case.
- `phase43_writer_rejects_empty_tarball` — M2 sentinel protection.
- `phase43_writer_rejects_empty_source_and_integrity_too` — M2
  applied consistently across all optional fields.
- `phase43_v2_reader_rejects_v1_binary_strict` — v1 header hand-
  rolled, verifies strict `!=` guard.
- `phase43_v2_reader_rejects_future_version_3` — forward-incompat.
- `phase43_null_tarball_sentinel_roundtrips` — None/None/None
  round-trips correctly through all three optional slots.
- `phase43_read_fast_falls_back_to_toml_when_binary_is_v1` — the
  key migration scenario; client upgrade sees v1 `lpm.lockb`,
  rejects it, falls through to TOML. v1 file intentionally stays
  on disk (P43-2 writeback will clean it up).

## Outdated v1-only docs corrected

The alias-metadata limitation applies to both v1 and v2 (Phase 43
only added the tarball slot, not an alias section). Comments in
`binary_format_supports`, `to_binary`, `to_lockfile`, and
`to_locked_package` updated to say "the binary format" rather
than "v1 binary".

## Size impact

+6 bytes per package entry + URL bytes in string table. For a
typical 409-package lockfile with ~60-char tarball URLs:
409 × 6 + 409 × 60 ≈ 27 KB growth. Mapped file — zero memory
cost beyond pages actually read.

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3656 tests pass, 7 skipped (+9 since P43-0)  ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md (§P43-1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(lockfile): Phase 43 P43-1 — address GPT audit follow-up

Three tightenings after the 2026-04-18 GPT audit of P43-1:

## Finding #1 (Medium) — stale v1 `lpm.lockb` persists across reads

`read_fast` falls back to TOML when binary open returns
`UnsupportedVersion`, but previously left the v1 file on disk.
`read_fast` is called from `lpm install`, `lpm upgrade`, AND
`lpm outdated`. Read-only commands never trigger a write, so an
upgraded user would pay the open-reject + TOML-parse cost on every
`lpm outdated` invocation forever — a real perf regression when
shipping P43-1 standalone before P43-2's install writeback lands.

Fix: best-effort delete the stale binary when open returns
`UnsupportedVersion`. Deletion is scoped to the version mismatch
only; other errors (corrupt magic, structural issues) leave the
file on disk for forensic inspection. Delete failures (read-only
FS, permission denied) are swallowed — correctness still holds via
the TOML fallback.

Test `phase43_read_fast_falls_back_to_toml_when_binary_is_v1`
flipped to assert deletion. New test
`phase43_read_fast_preserves_binary_on_non_version_errors` guards
against aggressive-deletion regression.

## Finding #2 (Low) — corrupt tarball pair surfaces `Some("")`

`BinaryLockfileReader::open` validated the deps-table layout but
not the new tarball pair. Corrupt `(tarball_off, tarball_len)`
bytes flowed through `read_str` (which degrades out-of-bounds
reads to `""`), so `tarball()` returned `Some("")` — silent
corruption that P43-2 would later feed into the shape gate.

Fix: extend the per-entry validation loop in `open` to bounds-
check the tarball pair against the string table length. `(0, 0)`
is the null sentinel and bypasses the check; any other pair must
fit within `[string_table_off, EOF)`. Corruption now forces TOML
fallback via `read_fast` — matching how v1 handled source/integrity
range issues via the deps-validation mechanism.

New test `phase43_open_rejects_corrupt_tarball_pair` exercises the
`u32::MAX` offset case.

## Finding #3 (Low) — empty-string rejection asymmetric

The binary writer rejected empty `source` / `integrity` / `tarball`
at serialization time (via `insert_optional`), but `from_toml` was
pure serde — it accepted `tarball = ""` cleanly and only failed
later when `write_all` tried to emit the binary. P43-1's commit
message claimed "consistent rejection" which was wrong.

Fix: add a validation pass in `from_toml` that walks all packages
and rejects empty `source` / `integrity` / `tarball`. Matches the
binary writer's rejection at the parse boundary, preventing
asymmetric late failures.

New test `phase43_from_toml_rejects_empty_optional_strings`
covers all three fields parametrically.

## Results

74 lockfile tests pass (+3 from +2 existing tests). Workspace
nextest: 3659 pass (+3 since P43-1 initial commit).

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3659 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(lockfile): Phase 43 — 2nd-round GPT audit follow-up (zero-length tarball gap + future-version preservation)

Two tightenings after the 2nd-round GPT audit of the 1st follow-up:

## Finding #1 (Low) — zero-length tarball with non-zero offset slipped through

The 1st follow-up added a range-check `off + len > string_table_len`
in `BinaryLockfileReader::open`, but that fails trivially for any
`(off != 0, len == 0)` pair because `off + 0 > len` is false for
any in-bounds offset. Combined with `tarball()` treating
"not both zero" as `Some(...)`, a corrupt pair still surfaced
`Some("")` — exactly the class of silent corruption the 1st
follow-up was supposed to close.

Fix: add an explicit `len == 0 && off != 0` rejection BEFORE the
range check. The only legitimate zero-length slot is the null
sentinel `(0, 0)`; any non-null value must have `len > 0`. The
check is trivial (one comparison) and makes the invariant
auditable from the code.

Test `phase43_open_rejects_corrupt_tarball_pair_zero_length_nonzero_offset`
stomps a valid binary's tarball slot with `(off=5, len=0)` and
asserts rejection with a clear error message.

## Finding #2 (Low) — future-version binary was deleted aggressively

The 1st follow-up deleted the stale `lpm.lockb` on ANY
`UnsupportedVersion`. That's fine for the common case (old client
wrote a v1, new client is v2), but it also fires when a NEWER
client's v3 binary is read by the current v2 client — forcing the
newer client to regenerate on its next install. GPT flagged this
as an open question: since `lpm.lockb` is derivative cache,
aggressive deletion is acceptable but not ideal.

Fix: narrow the delete to `found < BINARY_VERSION` only. Future-
version binaries fall through to TOML (correctness preserved) but
stay on disk so the newer client's fast path isn't churned.
Required promoting `BINARY_VERSION` from private `const` to `pub`
so `read_fast` in `lib.rs` can compare against it.

Test `phase43_read_fast_preserves_binary_on_future_version` hand-
rolls a v99 header, asserts TOML fallback succeeds AND the binary
is preserved on disk.

## Results

76 lockfile tests pass (+2 from this commit). Workspace nextest:
3661 pass.

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3661 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): Phase 43 P43-2 (1/3) — gate + URL reuse + url_lookup_ms sub-timer

First of three commits delivering P43-2. Scope here is the READ
path: stored tarball URLs from `lpm.lockb` are reused at install
time when safe, falling through to on-demand metadata lookup
when the gate rejects. No stale-URL retry, no writeback — those
land in commits 2/3.

## crates/lpm-registry — new public API

- `is_https_url` / `is_localhost_url` promoted from private `fn`
  to `pub fn` (visibility-only; no logic change).
- New `GateDecision` enum: `Accepted`, `RejectedScheme`,
  `RejectedShape`, `RejectedOrigin`. Distinct variants so callers
  can drive per-reason telemetry counters without re-running
  checks.
- New `RegistryClient::is_configured_origin(url)`: returns true
  if the URL's origin matches `base_url` or `npm_registry_url`.
  Opaque origins (`file://`, `data:`) never match.
- New `evaluate_cached_url(url, client) -> GateDecision`: composes
  scheme + shape + origin gates. The shape gate requires BOTH
  `.tgz` suffix AND a `/-/` path segment; the latter closes the
  H1 SSRF-via-lockfile gap (bare `.tgz` suffix would pass crafted
  paths like `/api/admin/foo.tgz`).

## crates/lpm-cli/install.rs — consumption

- `try_lockfile_fast_path` signature grows `client` +
  `gate_stats` params; two call sites (offline + main install)
  updated.
- Line 2679 `tarball_url: None` → match on `evaluate_cached_url`:
  `Accepted` reuses the stored URL; rejections bump the right
  counter (for shape / origin; scheme rejections are a corrupt-
  lockfile signal and log-only, matching the writer invariant).
- `GateStats` struct with `AtomicU64` counters (origin/shape
  mismatch). Surfaced on `timing.fetch_breakdown.tarball_url_gate`
  in `--json` output.
- `TaskTimings` gains `url_lookup_ms` field — measured in BOTH
  legacy and streaming fetch paths so the Phase 43 win is
  directly measurable regardless of which path is active.
  Previously URL lookup was either buried in `download_ms`
  (legacy) or untimed (streaming); carving it out makes the
  primary projection target visible.
- `FetchBreakdown` gains `url_lookup_sum_ms` / `url_lookup_max_ms`.
- Legacy path `fetch_tarball_to_file` signature grows to return
  `(DownloadedTarball, url_lookup_ms)`; `download_ms` narrows
  to GET + temp-file write only (subtracting the carved-out
  lookup via `saturating_sub`).
- Streaming path times `resolve_tarball_url` inline; comment
  corrected (was wrong — URL resolution was never in
  `extract_ms` on this path, it was untimed).

## What's NOT in this commit (lands in P43-2 commit 2/3)

- Stale-URL same-run retry (C1 from 3rd-pass audit).
- Generalized writeback trigger (C1 + C3 + C4).
- `handle_tarball_not_found` project_dir fix (L3).
- `stale_recovery` / `stale_hard_fail` counters.
- Regression tests (land in commit 3).

## Tests added (8 new, in lpm-registry)

- `phase43_gate_accepts_canonical_lpm_tarball_url`
- `phase43_gate_accepts_canonical_npm_tarball_url`
- `phase43_gate_rejects_non_https_non_localhost`
- `phase43_gate_rejects_wrong_suffix`
- `phase43_gate_rejects_admin_style_path_without_dash_segment`
  (H1 SSRF defense — the `/-/` segment check)
- `phase43_gate_rejects_origin_mismatch_after_registry_switch`
- `phase43_gate_allows_localhost_registry`
- `phase43_gate_rejects_malformed_url`

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3669 tests pass, 7 skipped (+8 since P43-1) ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md (§P43-2 Change 1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): Phase 43 P43-2 (2/3) — stale-URL retry + CWD fix + telemetry completeness

Addresses three GPT-audit findings on P43-2 commit 1, turning the
gate-accepted URL path into a safe standalone rollout. Commit 1's
claim of "clean standalone rollout point" was wrong — a stale
stored URL that passed the scheme/shape/origin gate would
immediately hard-fail the install (both legacy line 3711 and
streaming line 3814 routed `NotFound` straight to
`handle_tarball_not_found`). This commit is **safety**, not
cleanup — P43-2 commit 1 should not ship without it.

## Finding #1 (Medium) — stale URL = first-run hard failure

Pre-Phase-43, the lockfile didn't store URLs, so every fetch did
its own metadata round-trip first — stale upstream paths were
refreshed transparently. Post-commit-1, a stored URL that the
gate accepts is reused as-is; if upstream republished or migrated
the tarball path, the download 404s and lockfiles get nuked.

Fix: both fetch paths gain a **same-run retry** on stored-URL
404s. On `LpmError::NotFound` where `p.tarball_url.is_some()`:

1. Invalidate the metadata cache for this package.
2. Re-resolve via `resolve_tarball_url(..., cached_url=None)` to
   force a real metadata round-trip.
3. Guard against loop: if fresh URL == stale URL, metadata itself
   is stuck → fall through to `handle_tarball_not_found`.
4. Retry the download ONCE with the fresh URL.
5. On success: bump `stale_recovery` counter, carry on.
6. On second 404: bump `stale_hard_fail` counter, fall through
   to `handle_tarball_not_found`.

On-demand path 404s (no stored URL) skip retry — there's nothing
stale to refresh.

## Finding #2 (Medium) — `handle_tarball_not_found` is CWD-relative

Pre-fix `Path::new("lpm.lock")` / `Path::new("lpm.lockb")` —
deletes relative to process CWD, not the project root. A
programmatic install from a nested directory would leak stale
lockfile state (the `lpm.lock` at project root stays, the retry
repeats indefinitely). Now takes `project_dir: &Path` and uses
`project_dir.join(LOCKFILE_NAME)` / `.join(BINARY_LOCKFILE_NAME)`.

`project_dir` is threaded through `fetch_and_store_legacy` and
`fetch_and_store_streaming`; captured from the existing
`project_dir_buf` in the task dispatch closure (line 1663).

## Finding #3 (Low) — RejectedScheme had no counter

`try_lockfile_fast_path`'s `RejectedScheme` branch previously
only logged. Now bumps `gate_stats.scheme_mismatch` so corrupt-
lockfile signals are observable in telemetry (symmetric with
shape/origin — all three rejection types now have counters).

## GateStats expanded

Adds three AtomicU64 counters: `scheme_mismatch`,
`stale_recovery`, `stale_hard_fail`. All surfaced in the JSON
`timing.fetch_breakdown.tarball_url_gate` object.

## Legacy fetch path restructured

`fetch_and_store_legacy` previously delegated to
`fetch_tarball_to_file` (URL resolve + download composed). For
the retry path to distinguish a metadata 404 from a download
404, the two steps are now inline; `fetch_tarball_to_file` is
removed as orphaned. Behaviorally equivalent on the happy path.

## What's NOT in this commit (lands in P43-2 commit 3/3)

- Generalized writeback trigger (C1 + C3 + C4 from the audit
  passes) — with retry in place, convergence is a perf concern
  not a correctness one: every install pays one extra metadata
  round-trip for each stale package until the lockfile is
  refreshed by some other trigger (add/remove dep).
- Regression tests (9 cases from the design doc).

## Results

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3669 tests pass, 7 skipped                   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md (§P43-2 Changes 2 + 4)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): Phase 43 P43-2 (3/3) — generalized writeback for URL convergence + v1→v2 binary migration

Final P43-2 commit — delivers Change 3 from the design doc (the
convergence/persistence fixes C1 + C3 + C4 from the 3rd and 4th
audit passes). With retry safety from commit 2/3 already in
place, this commit is about correctness of the durable state:
making the lockfile actually converge on the URLs used, so the
next install doesn't redo the same recovery work.

## Change 3: generalized writeback

`try_lockfile_fast_path` now returns `LockfileFastPath { packages,
lockfile, needs_binary_upgrade }` instead of bare `Vec<InstallPackage>`.
The driver stashes `lockfile` and `needs_binary_upgrade` so the
install-end writeback step can see them.

Fetch tasks now return the FINAL URL used (legacy + streaming
grew from `Result<(sri, timings)>` to `Result<(sri, timings,
final_url)>`). Happy path: `final_url == initial_url` (stored
URL succeeded). Stale-URL recovery: `final_url == fresh_url`
(retry's metadata round-trip URL). Origin-mismatch rebase:
`final_url == on_demand_url` (gate rejected the stored URL,
`initial_url` came from `resolve_tarball_url(cached=None)`).

Post-fetch aggregator builds `fresh_urls: HashMap<(name, version),
String>` — populated only when the task actually hit the registry
(the store-hit short-circuit reports `None` to avoid double-
counting a sibling task's URL).

At install-end, when `used_lockfile == true`, three trigger
conditions fire the writeback:

1. `!fresh_urls.is_empty()` — URL divergence from stored.
2. `needs_binary_upgrade == true` — v2 `lpm.lockb` was missing
   or out-of-version at fast-path time.
3. Both — compound message.

On the true happy path (URLs all match AND v2 binary current),
no write happens. Lockfiles stay byte-identical, CI diffs stay
empty.

## Addresses C1, C3, C4 from the 4th-pass audit

- **C1 (self-healing writeback):** Stale-URL recovery now
  persists across runs. Previously, the retry would succeed but
  the stored URL stayed stale, so every install re-ran the 404 +
  retry dance.
- **C3 (registry-switch convergence):** `LPM_REGISTRY_URL`
  switches now converge after one install. Previously,
  origin-mismatch rejections fell through to on-demand lookup
  correctly but the lockfile never picked up the new-origin
  URLs — every subsequent install repeated the on-demand
  round-trip forever.
- **C4 (v1→v2 binary migration):** Fast-path-only installs now
  complete the binary migration. Previously, the v2 reader
  rejected v1 binary + `read_fast` fell back to TOML, but the
  fast-path writer was gated on `!used_lockfile` so the v2
  rewrite never fired. Commit 1 of P43-1-follow-up mitigated
  the perf regression on read-only commands via best-effort
  deletion of v1 binaries; this commit completes the migration
  properly on fast-path installs.

## What's NOT in this commit

- **End-to-end install-flow regression tests** (the 9 cases from
  the design doc). These need mock-registry scaffolding that
  doesn't exist in the CLI test harness. Deferred to a
  follow-up "test infra + regression tests" commit. The 4 unit
  tests below cover the critical new code paths.
- **Offline-mode binary upgrade:** `--offline` intentionally
  skips writeback. Users can trigger v1→v2 migration with any
  online install.

## Tests added (4 new)

- `phase43_handle_tarball_not_found_honors_project_dir` —
  design-doc test #8, verifies the CWD fix from commit 2/3 is
  still wired correctly.
- `phase43_try_lockfile_fast_path_flags_v1_binary_for_upgrade`
  — design-doc test #9 mechanism: v1 `lpm.lockb` on disk →
  `needs_binary_upgrade = true`.
- `phase43_try_lockfile_fast_path_flags_missing_binary_for_upgrade`
  — complement: no binary at all → `needs_binary_upgrade = true`.
- `phase43_try_lockfile_fast_path_skips_upgrade_when_binary_current`
  — happy-path guard: current v2 binary → `needs_binary_upgrade = false`,
  no spurious rewrites on every install.

## Byte-stability guarantee

Regression test `phase43_try_lockfile_fast_path_skips_upgrade_when_binary_current`
guards the byte-stability invariant — a future refactor that
accidentally sets `needs_binary_upgrade = true` on clean
installs would churn every CI lockfile commit. Test fails if
the happy-path trigger isn't clean.

## Results

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3673 tests pass, 7 skipped (+4 since P43-2 commit 2/3)   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓

Design doc: DOCS/new-features/37-rust-client-RUNNER-VISION-phase43.md (§P43-2 Change 3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(install): Phase 43 P43-2 — failing-test-first retrofit + A/B bench verification

Retrofits the methodology reminder #2 contract from the design doc
("P43-2's regression test must fail on the pre-fix `tarball_url:
None` assumption and pass when URL is populated from lockfile").
Adds three tests that directly exercise the core Phase 43 read-path
contract at the struct-creation boundary in `try_lockfile_fast_path`.

## Tests added

1. `phase43_gate_accepted_url_populates_tarball_url` — the fix-
   verifier. Lockfile stores a canonical npm tarball URL; gate
   accepts; `InstallPackage.tarball_url` MUST be `Some(url)`.
2. `phase43_gate_rejected_urls_downgrade_to_none_with_telemetry` —
   parametric over the three rejection modes. Asserts both the
   downgrade AND the correct counter bump.
3. `phase43_no_stored_tarball_produces_none_install_package_url` —
   boundary-case guard for pre-Phase-43 lockfile shape.

## Empirical verification (2026-04-18)

The first two tests were surgically verified to FAIL against the
pre-fix `tarball_url: None` stub. Procedure:

1. Replaced the gate-match block at install.rs:~2908 with the
   pre-fix stub `tarball_url: None`.
2. Ran `cargo test -p lpm-cli phase43`.
3. Observed:
   - `phase43_gate_accepted_url_populates_tarball_url` FAILED
     (expected `Some(url)`, got `None`).
   - `phase43_gate_rejected_urls_downgrade_to_none_with_telemetry`
     FAILED (counters at 0 instead of 1).
   - Other 5 phase43 tests passed (orthogonal behaviors).
4. Restored the gate logic via `git restore`.
5. Reran — all 7 pass.

This is the "fail before fix, pass after fix" contract that
methodology #2 specifies.

## A/B bench results (same session — per methodology reminder #1)

Decision-gate fixture (/tmp/phase40-decision-gate), 409 packages,
fresh-CI shape (lockfile present + cold store + cold
node_modules). 5 runs each, interleaved A/B/A/B/A/B/A/B/A/B.
`LPM_HOME=/tmp/lpm-ab-home/.lpm` isolation (user's ~497MB store
untouched).

| metric                    | A (main) | B (P43) | delta   | projected |
|---------------------------|---------:|--------:|--------:|----------:|
| total_ms                  |     4592 |    4043 | -12%    |    -17%   |
| fetch_ms                  |     4342 |    3568 | **-18%**|    **-18%** |
| queue_wait.sum_ms         |  437,700 | 356,092 | -19%    |    -58%   |
| url_lookup.sum_ms (new)   |      n/a |       0 | HIT     |  near-0   |
| gate mismatch counters    |      n/a |   all 0 | clean   |     —     |

Readings:

- **Primary projection target MET.** `url_lookup.sum_ms` is
  exactly 0 across all 5 B runs. Every one of 409 packages' URLs
  was reused from the lockfile with zero metadata round-trips.
- **`fetch_ms` delta EXACTLY as projected** (-18% vs -18%).
- **Projection overshot on secondary metrics.** `queue_wait.sum_ms`
  projected to drop -58% via the causal-chain hypothesis
  (shorter per-task → faster permit turnover). Actual drop is
  -19%. Interpretation per stop-and-diagnose rule: queue_wait
  is dominated by actual download latency, not URL lookup. The
  causal-chain reasoning in the doc was overstated.
- **`total_ms` delta falls short** (-12% vs -17%). The fetch-side
  win is fully captured in `fetch_ms`; some doesn't translate
  to total wall-clock — link_ms variance (run B#5 had 1110ms
  link_ms vs typical ~220ms) and pre-fetch overhead.
- **Gate counters all zero on B** — no origin/shape/scheme
  rejections, no stale recovery, no hard fails. Steady-state
  clean across 2045 package-fetches.

CI gate (CARGO_TARGET_DIR=/tmp/lpm-phase43-target):
- cargo clippy --workspace -- -D warnings       ✓
- cargo fmt --check                              ✓
- grep -r 'fancy-regex' crates/*/Cargo.toml     ✓ (no matches)
- cargo build --workspace                        ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
    3676 tests pass (1 perf-test flake under load — rerun in
    isolation passed; unrelated lpm-task eval perf assertion)   ✓
- cargo test -p lpm-auth (3x) 43 tests pass each ✓
- cargo test -p lpm-cli phase43                 7 pass          ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 24, 2026
…ests

Introduces crates/lpm-cli/src/precedence.rs implementing the §6
three-layer containment model's pure-policy branch from
DOCS/new-features/37-rust-client-RUNNER-VISION-phase48.md
(a-package-manager repo). Foundation commit — resolver is a pure
function, all tests green, not yet wired into production callers.

# Rule it encodes

Legacy pure-policy knobs (`scriptPolicy`) keep Phase 46 project >
user precedence when `force-security-floor = false`. The default
flip for legacy knobs is Phase 5 (Phase 49), not P0.

New Phase 48 pure-policy knobs (`network-policy`,
`install-policy.strict-behavioral`) always use user-is-floor, with
project values that loosen below the floor rejected at load time
with a named-source warning. No approval path; "loosen" means
"project declaration → drop, surface warning."

When `force-security-floor = true`, legacy and new behave
identically: user is floor for both, CLI loosening flags
(`--yolo`, `--policy=allow`, `--accept-*`) suppressed, project
loosening rejected. Tightening CLI / project values are still
honored — the flag blocks loosening, not movement. Approval
suspension is a separate concern for later in P0.

# Scope of this commit

- Public types: `PolicyKind` (Legacy|New), `PolicyTier`
  (Cli|Project|User|Default), `RejectionReason`
  (ForceFlagSuppressesCli | ForceFlagRejectsProject |
  NewKnobProjectLoosens), `Rejection<T>`, `Resolution<T>`,
  `PolicyInputs<T>`.
- Public trait `PurePolicyKnob` with `NAME`, `KIND`, `loosens`.
- Public function `resolve_pure_policy<T: PurePolicyKnob>`: pure,
  no I/O, three documented branches.
- `PurePolicyKnob` impl on `ScriptPolicy` (Legacy kind).
- Test-local `NetworkPolicy { Fenced, Allow }` stub (New kind);
  real `network-policy` lands in P3 with the backend wiring.

# Exit criteria covered (§7 P0)

Test `force_flag_rejects_cli_yolo_and_project_loosening` pins #1:
`force-security-floor = true` + project `scriptPolicy = "allow"`
+ `--yolo` → effective `deny`, CLI rejection + project rejection
both fired.

Test `legacy_knob_without_force_flag_preserves_phase46_order`
pins #2: Phase 46 project > user order preserved for legacy
knob, no rejections.

Test `new_knob_without_force_flag_rejects_project_loosening`
pins #3: new-knob user floor prevails without force flag, project
value rejected with `NewKnobProjectLoosens` reason. Rejection
count asserted == 1 to pin the "no request flow, no approval
path" rule from the test side — if a future refactor routes
rejections through the approval UI, this test fails loudly.

# Not yet in this commit (next slices on this branch)

1. Wire `force-security-floor` into GlobalConfig and re-plumb
   resolve_script_policy through the new resolver so legacy
   callers hit the tested logic.
2. Approval-record suspension + three distinct migration-warning
   wordings (one per RejectionReason variant).
3. max-sandbox-write-roots config key + load_sandbox_write_dirs
   hardening (reject `/`, `$HOME/.ssh`, etc.).
4. Per-package capability resolver for passEnv, sandboxLimits,
   readProject.

Deferred intentionally: the existing resolve_script_policy
behavior is unchanged in this commit. Re-plumbing happens in the
next slice so the "no behavior change" promise for legacy knobs
stays auditable.

# Verification

- cargo clippy --workspace -- -D warnings: clean
- cargo fmt --check: clean
- cargo nextest run --workspace --exclude lpm-integration-tests
  --no-fail-fast: 4231 passed, 7 skipped, 0 failed
- grep fancy-regex crates/*/Cargo.toml: empty (banned crate absent)
- 12/12 precedence unit tests pass (3 exit-criterion tests + 6
  edge cases + 2 loosens-ordering proofs + 1 display-phrasing
  pin).

Ran with CARGO_TARGET_DIR=/tmp/lpm-phase48-target so the dev
incremental cache stayed clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 30, 2026
…rs/ (Phase 61.1)

The big lever — the isolated linker's per-package wrapper tree moves
out of `node_modules/.lpm/` to `<project>/.lpm/wrappers/`. After the
relayout, `rm -rf node_modules` no longer wipes the entire incremental
linker cache, so the warm-install bench (and the user pattern Phase 57.2
surfaced — wiping node_modules after a teammate's lockfile change)
actually exercises the incremental linker.

Symlink-target shape changes (audit fix #1, v3):
- Phase 3 root symlinks (canonical + aliases) gain one extra `..`
  segment and route through `<project>/.lpm/wrappers/<seg>/...`.
  Centralized in `LayoutPaths::root_symlink_target()` so the depth
  math (link-depth + 1) is computed in one place.
- Phase 3.5 self-references unchanged — they target the project root,
  which doesn't move under Tier 2.
- Phase 2 internal sibling-wrapper symlinks unchanged — both endpoints
  live inside `.lpm/wrappers/` so the relative `../../` shape is
  preserved.

Drive-by audit fixes folded in:
- #3 (bin-shim wrapper segment): `create_bin_links` now uses
  `pkg.wrapper_segment()` instead of hardcoding
  `format!("{safe}@{version}")`. Pre-fix, local-source deps with a
  `bin` field produced shims pointing at non-existent wrapper paths.
- #7 (Windows junction `..` normalization): added a lexical-clean
  helper inside `create_symlink_or_junction`'s Windows arm so the
  `../.lpm/wrappers/...` shape doesn't embed an unresolved `..`
  segment in the path handed to `cmd /c mklink /J`.

`cleanup_stale_entries` updates:
- Explicitly creates `node_modules/` (pre-Tier-2 the wrapper-root
  `create_dir_all` covered both via parent recursion; now they're
  disjoint paths).
- Skips dotfile entries (e.g., the new `.version` schema-tag) when
  sweeping stale wrappers.
- Writes `<wrapper-root>/.version` (D6) for forward-compat shape
  detection.

Test fixtures migrated to use `LayoutPaths` so they track production
semantics on any future shape change. 4949 workspace tests pass;
clippy --workspace -D warnings clean; cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request Apr 30, 2026
* feat(lpm-runtime): RuntimeStatus carries resolved managed-runtime bin

`Ready` and `Installed` now carry a `bin_dir: PathBuf` field — the
managed-runtime bin path that `node_bin_dir(&version)` already resolves
inside `ensure_runtime` and would otherwise discard. Downstream callers
(the PATH builder in `lpm-runner/bin_path`) can consume this hint to
skip a redundant `detect_node_version` + `list_installed` pass per
`lpm run` invocation.

For the `Installed` branch, defensively re-stat after install — if the
freshly-installed bin dir vanished mid-call (race / external tampering),
degrade to `NotInstalled` rather than panic.

This is the data-shape change that the rest of Phase 61 Tier 1 builds on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lpm-runner): 3-state ManagedRuntimeHint + pre-resolved PATH builder

Adds `ManagedRuntimeHint { Bin(PathBuf) | Absent | Unknown }` plus
`build_path_with_bins_pre_resolved(start_dir, hint)`. The existing
public `build_path_with_bins` becomes a thin wrapper that passes
`Unknown` — preserving the silent-detect contract for callers that
don't go through `ensure_runtime` first (rebuild, dlx, hooks,
tools.rs, doctor, orchestrator).

Why three states, not `Option<PathBuf>`:
- `Bin(path)`  — caller resolved the managed runtime: use it directly.
- `Absent`     — caller called `ensure_runtime` and confirmed there
                 is no managed runtime to use. PATH builder skips the
                 silent re-detect entirely (the win on unpinned projects).
- `Unknown`    — caller hasn't checked. Falls back to silent detect
                 (current pre-Phase-61 behavior).

Collapsing `Absent` and `Unknown` into one nullable would force the
silent re-detect on the unpinned-project path — the most common shape.

Two deterministic unit tests cover the contract: `_uses_hinted_bin`
asserts the produced PATH is exactly [nm_bin, hint_bin, ...inherited]
when `Bin(...)` is supplied (uses a non-existent fake path so any
re-stat would fail-loud); `_absent_skips_runtime` asserts the PATH is
exactly [nm_bin, ...inherited]. Both assert full structure rather than
substring presence/absence so they're robust to whatever managed-
runtime fragments the developer's PATH happens to contain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lpm-runner/script): thread bin_hint through script/command entrypoints

Extends every `pub fn run_*` in the script runner with a
`bin_hint: &ManagedRuntimeHint` parameter, routing each internal
PATH-build through `build_path_with_bins_pre_resolved` instead of the
silent-detect wrapper. Eight entrypoints touched:

- run_script, run_script_with_envs, run_script_captured
- run_script_buffered, run_script_prefixed
- run_command, run_command_captured, run_command_buffered,
  run_command_prefixed

No backwards-compatibility shims — per CLAUDE.md "no `// removed`
comments, no shims, no parallel slow-path wrappers." Tests pass
`&ManagedRuntimeHint::Unknown` (imported as `Unknown` at the top of
the test mod for brevity).

Public API surface change is mechanical (one extra parameter); the
sole external consumer is `lpm-cli`, migrated in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lpm-cli): consume bin_hint, collapse cache-config reads, delete dead wrappers

Threads the `ManagedRuntimeHint` from `commands::run::ensure_runtime`
through the script-execution chain so the downstream PATH builder
doesn't redo `detect_node_version` + `list_installed` on every
`lpm run` invocation.

Signature changes:
- `commands::run::ensure_runtime` now returns `ManagedRuntimeHint`
  (`Bin(bin_dir)` for Ready/Installed; `Absent` for NotInstalled and
  NoRequirement).
- `run`, `run_multi`, `run_workspace`, `run_watch`, `exec`,
  `run_tasks_sequential`, `run_tasks_parallel`, `run_task`, and
  `run_task_captured` all gain a `bin_hint` parameter.

Caller migration:
- `main.rs:3102` (watch path) and `main.rs:3527` (External script
  shortcut) capture the hint before calling `run_watch` / `run`.
- `dev.rs` captures `runtime_hint` via the existing `tokio::join!`
  block instead of discarding it; threads to the dev script invocation.
- `migrate.rs::run_verification` resolves the hint once and reuses
  it across the build + test verification scripts.

Caller contract: every callsite of `run` / `run_multi` / `run_watch`
/ `exec` MUST invoke `ensure_runtime` first — that's where the
user-visible "Using node X" notice + auto-install fire. Documented
on `pub async fn run` so future callers don't bypass it accidentally.

Cache-context dedup (Tier 1.4.2):
- `run` reads `lpm.json` once at the top instead of twice (cache-hit
  check + caching-enabled check both used to read).
- Migrates the simple-script path to use the existing
  `try_cache_hit_with_config` and `is_task_cached_with_config`
  helpers — the no-config wrappers were only used by this one
  callsite.

Dead-code removal (CLAUDE.md "no shims"):
- Delete `is_task_cached`, `try_cache_hit`, `try_cache_store_with_output`
  — every other call site already used the `_with_config` variants.
- Delete the `is_task_cached_false_without_lpm_json` test that
  exclusively exercised the deleted wrapper; the equivalent contract
  is exercised by `is_task_cached_with_config_*` tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(lpm-cli/run): Tier 1 follow-ups — workspace pin inheritance, parallel Arc reuse, is_meta_task plumbing

Three follow-ups that landed during the M/L review pass on top of the
base hint threading:

L1 — `is_meta_task` no longer reads `package.json` per call.

  Caller (`run_multi`, `run_workspace_package`) extracts `pkg.scripts`
  once and threads it down through `run_tasks_sequential` /
  `run_tasks_parallel` / `is_meta_task`. The dependsOn-but-no-command
  case previously paid one `package.json` read per task in the
  parallel loop; now zero. The `is_meta_task_from_config` alias
  collapses into the single `is_meta_task` since the helper is
  filesystem-free now.

L2 — `run_tasks_parallel` wraps shared per-call state in `Arc`.

  Pre-Tier-1: each spawned thread did a full `clone` of the hint,
  the tasks `HashMap`, the `LpmJsonConfig`, and (post-L1) the
  `pkg_scripts` `HashMap`. Post-Tier-1: each is `Arc::new`'d once
  before the loop, threads do a refcount bump. Negligible per-thread
  but avoids quadratic-feeling allocations on wide parallel levels.

L3 — workspace per-member calls inherit the root hint when the
member has no own pin.

  `run_workspace_package` probes the member dir via
  `lpm_runtime::detect::detect_node_version` (single-dir, no walk).
  If the member has its own .nvmrc / engines / lpm.json runtime,
  pass `Unknown` so the silent detect resolves the member-level
  pin. If not, inherit the root hint. Matches user intuition that
  the workspace-root pin governs the whole workspace (like nvm
  walking parent dirs).

  Small behavior change: a workspace member with NO own Node pin
  now uses the root-resolved managed runtime instead of falling back
  to system Node. Arguably a bug fix — pre-Tier-1 behavior was
  inconsistent (root auto-installed Node 22 but member silently
  ran on whatever `node` happened to be on PATH).

Plus the M/L review fixes batched in:
- M1: doc note on `pub async fn run` documenting the
  `ensure_runtime`-must-be-called-first contract.
- M2/M3: `bin_path` test assertions tightened to compare the full
  PATH segment list, not substring presence/absence (robust to
  whatever managed-runtime fragments the developer's PATH happens
  to contain).
- Style: `Default for ManagedRuntimeHint` returning `Unknown`; test
  mods import `ManagedRuntimeHint::Unknown` so call sites read
  `&Unknown` instead of `&ManagedRuntimeHint::Unknown`.

Measurement (n=101, time.perf_counter_ns(), M5 Mac, load avg ~3):
- Managed-runtime fixture (.nvmrc + 7 entries): ~150 µs / lpm run.
- No-managed-runtime fixture: ~60 µs / lpm run.
- bench/run.sh script-overhead (1ms resolution, n=21): within noise.

Sub-perceptible at ms resolution; preparatory plumbing for Tier 2
warm-path relayout. See preplan v3 status block for full numbers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lpm-linker): introduce LayoutPaths utility (Phase 61.0.5, no behavior change)

Centralizes wrapper / metadata / health-check path construction. Every
production callsite that built `node_modules/.lpm/...` paths inline now
goes through `LayoutPaths::for_project(project_dir).{isolated,hoisted}_*`.

61.0.5 contract: every helper returns the legacy path
(`node_modules/.lpm/`). No observable behavior change. 61.1 will flip
`isolated_*` to `<project>/.lpm/wrappers/...` as a single source-of-truth
edit; consumers migrate transparently.

Production migrations in this commit:
- `lpm-linker::cleanup_stale_entries`: wrapper-root construction
- `lpm-linker::link_one_package`: pkg-entry-dir + .linked marker
- `lpm-linker::link_finalize`: wrapper-root for bin link traversal
- `lpm-linker::link_packages_hoisted`: metadata path + nested-root (via
  `hoisted_*` helpers, intentionally still scoped to `node_modules/`)
- `lpm-cli::commands::rebuild::live_package_dir`: isolated probe

`doctor.rs` predicate is intentionally NOT migrated here — its semantic
change (handling hoisted-no-conflicts via `install_appears_healthy()`)
lands in 61.4.

Adds `crates/lpm-linker/src/layout.rs` with 13 unit tests covering all
helpers including the 5 `InstallHealth` variants and the
`needs_layout_migration` invariant in 61.0.5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lpm-linker): flip isolated wrapper root to <project>/.lpm/wrappers/ (Phase 61.1)

The big lever — the isolated linker's per-package wrapper tree moves
out of `node_modules/.lpm/` to `<project>/.lpm/wrappers/`. After the
relayout, `rm -rf node_modules` no longer wipes the entire incremental
linker cache, so the warm-install bench (and the user pattern Phase 57.2
surfaced — wiping node_modules after a teammate's lockfile change)
actually exercises the incremental linker.

Symlink-target shape changes (audit fix #1, v3):
- Phase 3 root symlinks (canonical + aliases) gain one extra `..`
  segment and route through `<project>/.lpm/wrappers/<seg>/...`.
  Centralized in `LayoutPaths::root_symlink_target()` so the depth
  math (link-depth + 1) is computed in one place.
- Phase 3.5 self-references unchanged — they target the project root,
  which doesn't move under Tier 2.
- Phase 2 internal sibling-wrapper symlinks unchanged — both endpoints
  live inside `.lpm/wrappers/` so the relative `../../` shape is
  preserved.

Drive-by audit fixes folded in:
- #3 (bin-shim wrapper segment): `create_bin_links` now uses
  `pkg.wrapper_segment()` instead of hardcoding
  `format!("{safe}@{version}")`. Pre-fix, local-source deps with a
  `bin` field produced shims pointing at non-existent wrapper paths.
- #7 (Windows junction `..` normalization): added a lexical-clean
  helper inside `create_symlink_or_junction`'s Windows arm so the
  `../.lpm/wrappers/...` shape doesn't embed an unresolved `..`
  segment in the path handed to `cmd /c mklink /J`.

`cleanup_stale_entries` updates:
- Explicitly creates `node_modules/` (pre-Tier-2 the wrapper-root
  `create_dir_all` covered both via parent recursion; now they're
  disjoint paths).
- Skips dotfile entries (e.g., the new `.version` schema-tag) when
  sweeping stale wrappers.
- Writes `<wrapper-root>/.version` (D6) for forward-compat shape
  detection.

Test fixtures migrated to use `LayoutPaths` so they track production
semantics on any future shape change. 4949 workspace tests pass;
clippy --workspace -D warnings clean; cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lpm-cli): rebuild.rs uses LayoutPaths + closes store-fallback hole (Phase 61.2)

Three things land together because they all touch `prepare_live_package_dir`:

D8a — store-fallback hard-error. Pre-Phase-61 the function returned
`Ok(store_path)` whenever the live probe fell through, letting the
caller chdir into canonical store bytes for a lifecycle script. On
macOS (clonefile, CoW) that was silent corruption on first write; on
Linux (hardlinks) the early `if !live.starts_with(store_root)` branch
skipped detach so the script ran against shared inodes. Either way, a
soundness violation. Post-fix the function returns `Err("...not linked
into project — refusing to run lifecycle script inside the store...")`
so failures are loud, actionable, and never corrupt the store.

Audit fix #4 — wrapper-segment shape. `live_package_dir` now takes a
`wrapper_id: Option<&str>` and computes the wrapper segment via
`LayoutPaths::wrapper_segment(name, version, wrapper_id)`. The same
helper `LinkTarget::wrapper_segment` delegates to (single source of
truth across the linker / rebuild / future doctor code paths). Pre-fix
the inline `format!("{safe}@{version}")` silently missed every
non-Registry source: a Directory / Link / Tarball / Git dep with a
lifecycle script had its wrapper probe fail and fall through to the
store. Post-fix `ScriptablePackage` carries the `wrapper_id` derived
from `lp.source` via `Source::source_id()`.

Audit fix #5 — test inversion. The pre-existing
`prepare_live_package_dir_does_not_detach_when_path_is_under_store_root`
test pinned the silent-fallback contract D8a inverts. Replaced with
`prepare_live_package_dir_errors_when_unlinked` asserting the new
`Err("...not linked into project...")` shape; canary-bytes-intact
assertion preserved.

Adjacent fix in `p6_triage_autoexec_reference.rs`: the test seeded
the store but not the wrapper, relying on the silent-fallback hole to
run lifecycle scripts. Added a `seed_wrapper` helper that materializes
`<project>/.lpm/wrappers/<seg>/node_modules/<name>/` from the store —
mirroring real post-install state. Pre-D8a the same fixture passed by
accident; the new state captures the actual contract.

`LayoutPaths::wrapper_segment` is the new cross-crate helper.
`LinkTarget::wrapper_segment` delegates to it so the two cannot drift.

4949 workspace tests pass; clippy --workspace -D warnings clean;
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lpm-cli): layout-aware install_state + wrapper-layout migration (Phase 61.3)

Two pieces, both load-bearing per the v3 audit fix #2 / D8c:

1. Layout-aware freshness gate. `check_install_state` AND
   `try_mtime_fast_path` now consult
   `LayoutPaths::needs_layout_migration()` and force `up_to_date = false`
   when a populated legacy `node_modules/.lpm/` coexists with an empty
   `<project>/.lpm/wrappers/`. Without this gate, an upgrade-in-place
   user (binary upgraded but `node_modules/` not wiped) hash-matches
   on the install-hash check, the top-of-`main` fast lane
   short-circuits, and the migration code path never runs — they stay
   silently on the legacy layout until something else invalidates the
   hash.

2. Migration code path inside `lpm install`. Right after the fast-exit
   guard returns false, `migrate_legacy_wrapper_layout` checks the
   same predicate and (when true) wipes `node_modules/.lpm/` so the
   subsequent `cleanup_stale_entries` rebuilds at the new wrapper-root
   location. No rename-first attempt — cross-FS rename hazards
   (Linux containers, network FS, EXDEV) outweigh the saved relink
   cost, which Phase 61 makes faster anyway. Best-effort wipe; legacy-
   state quirks don't abort the install.

D9 — migration notice modes. Human-pretty mode prints a one-line
"migrating wrapper layout" notice via `output::info`; JSON / `--quiet`
/ non-TTY remain silent.

Tests added:
- `legacy_layout_present_forces_install_via_full_read` — hash matches
  but migration is owed → `up_to_date = false`.
- `legacy_layout_present_forces_install_via_mtime_fast_path` — same
  but with v2 mtime line; the mtime fast path bails to slow path.
- `empty_legacy_dir_does_not_force_install` — empty `.lpm/` doesn't
  count as legacy.
- `populated_new_layout_does_not_force_install` — both populated →
  migration considered complete; gate stops firing.
- `migrate_legacy_wrapper_layout_wipes_legacy_state` — happy path.
- `migrate_legacy_wrapper_layout_noop_when_not_owed` — no-op
  on a fresh project (doesn't synthesize directories).
- `migrate_legacy_wrapper_layout_noop_when_both_populated` —
  doesn't wipe on a mid-migration mixed state (real convergence
  happens via the next normal install).

4956 workspace tests pass; clippy --workspace -D warnings clean;
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lpm-cli): doctor + gitignore + sandbox comment refresh (Phase 61.4 + 61.5 + 61.7)

61.4 — `lpm doctor` predicate becomes layout-aware. The legacy
`nm.exists() && nm.join(".lpm").exists()` probe is replaced with
`LayoutPaths::install_appears_healthy()` plus a `needs_layout_migration()`
gate. The doctor now distinguishes:
- Healthy { Isolated } → "exists with .lpm/wrappers store"
- Healthy { Hoisted } → "exists with hoisted layout"
- Healthy { Mixed } → warn + remediation
- NodeModulesPresentButNoStore → warn (existing message preserved)
- NoNodeModules → fail (existing message preserved)
- legacy layout detected (migration owed) → warn pointing the user
  at `lpm install` to converge

The hoisted-no-conflicts case (which the legacy predicate misreported
as "no .lpm store") now correctly classifies as healthy.

61.5 — `ensure_lpm_wrappers_gitignore` runtime helper. Mirrors
`ensure_skills_gitignore` (and the lpm-vault / npmrc siblings):
runtime "ensure once" pattern, idempotent, OpenOptions-append to
narrow the TOCTOU window. Marker is `.lpm/wrappers/`. Wired into the
install entry point alongside `migrate_legacy_wrapper_layout`.

61.7 — sandbox comment refresh. `landlock_rules.rs` explanatory
comment referenced `{project}/node_modules/.lpm/`; updated to mention
the post-Phase-61.1 `<project>/.lpm/wrappers/` location. The actual
ReadWrite rule at line 103 already grants `<project>/.lpm` so the
post-relayout location was already covered — comment-only change,
no functional impact.

Tests added:
- `ensure_lpm_wrappers_gitignore_appends_entry`
- `ensure_lpm_wrappers_gitignore_no_duplicate`
- `ensure_lpm_wrappers_gitignore_creates_when_no_gitignore`

4959 workspace tests pass; clippy --workspace -D warnings clean;
cargo fmt clean; no fancy-regex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(lpm-linker): retarget legacy root symlinks + dotfile-aware layout predicates

Two audit fixes (round 2 of Phase 61 review):

CRITICAL — legacy root-symlink retarget. Pre-fix, the 61.3 migration
wiped `node_modules/.lpm/` but never touched root symlinks at
`node_modules/<pkg>` whose targets pointed into the legacy
wrapper-root shape. Phase 3's `if root_link.exists()` guard skipped
recreation, so an upgrade-in-place install left dangling symlinks —
the wrapper tree was wiped, but `node_modules/<pkg>` still pointed at
the old location and stayed broken.

Fix: `cleanup_stale_entries`'s root-symlink sweep gains a second
predicate. Beyond the existing "not in `direct_names`" stale-name
removal, it now ALSO removes any root symlink whose target traverses
a `.lpm/` segment NOT followed by `wrappers/` (legacy shape). Phase 3
recreates with the correct new target. Walks `Path::components()`
so the predicate is robust to path-separator style and to whether
the relative target leads with `.lpm/` (unscoped) or `../.lpm/`
(scoped). Self-refs (target = `..`, no `.lpm`) and workspace-member
symlinks (target outside `.lpm/`) are unaffected.

5 new tests:
- `cleanup_stale_entries_removes_legacy_shape_root_symlink`
- `cleanup_stale_entries_preserves_new_shape_root_symlink`
- `cleanup_stale_entries_preserves_workspace_member_symlink`
- `cleanup_stale_entries_preserves_self_reference_symlink`
- `link_finalize_retargets_legacy_root_symlink_after_migration`
  (end-to-end: post-migration install produces a working symlink
  resolving to a real `package.json`)

MEDIUM — `.version` schema-tag must not mask migration. The 61.1
`.version` write at the wrapper root happens BEFORE any wrapper is
materialized; pre-fix, `dir_is_nonempty` counted `.version` as
evidence of a populated layout, so a half-completed install (or any
state where the new root has only `.version`) would silently mask a
needed migration AND make `lpm doctor` report a healthy isolated
install when no wrappers actually existed. Both
`needs_layout_migration` and `install_appears_healthy` consume the
helper.

Fix: `dir_is_nonempty` now skips entries whose name starts with `.`.
Wrapper segments from `LayoutPaths::wrapper_segment` cannot produce
a leading-dot name (path-separator sanitizer is `replace('/', '+')`,
never `.`), so the dotfile filter cannot miss a real wrapper.

2 new tests:
- `needs_layout_migration_true_when_new_root_has_only_version_file`
- `install_appears_healthy_metadata_only_root_is_not_isolated`

4966 workspace tests pass; clippy --workspace -D warnings clean;
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(lpm-linker): scoped legacy-symlink retarget belt-and-braces

Audit follow-up: the scoped-name branch (`@scope/pkg`) of
`cleanup_stale_entries`'s root-symlink sweep traverses a separate
code path from the unscoped branch. The retarget fix in the prior
commit applies to both, but the existing test only exercised the
unscoped case. This test adds the scoped equivalent so a future
refactor that drops the legacy-shape predicate from the scoped
branch fails loud.

Setup: a `node_modules/@types/node` symlink whose target is the
pre-Phase-61.1 scoped shape (`../.lpm/<seg>/node_modules/@types/node`,
no `wrappers/` segment). After cleanup the legacy symlink must be
removed so Phase 3 recreates it pointing at the new
`../../.lpm/wrappers/<seg>/...` two-level shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 8, 2026
…ips during tarball unpack

Phase 66 perf followup #1, samply-driven (2026-05-08). The tar crate's
default unpack path calls `fchmodat` + `fchownat` per regular file —
~9 % of cold-install CPU on `bench/fixture-large` per the
`tar::EntryFields::unpack::set_ownerships` self-time hotspot. For a
256-package install at ~80 entries each, that's ~20 000 unnecessary
metadata syscalls.

We don't preserve perms or ownerships in the global store: extracted
files inherit the lpm process's umask and uid, which is what every
downstream `require()` actually expects. npm tarballs ship with
arbitrary uid/gid + mode bits authored on whoever-built-the-tarball's
OS — they mean nothing on the consumer side.

Disabling both is a 2-line change with zero behavior delta on the
require/import path.

Bench delta (paired n=10 cold/clean on fixture-large):
- extract sum: 116 → 61 ms (Fix #1 + Fix #3 combined)
- total wall: ~5-15 ms recovered

Modest in absolute terms but pure win — the syscalls were doing
nothing useful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 8, 2026
Phase 66 perf followup #3, samply-driven (2026-05-08).
`prepare_output_path` was a ~4.4 % cold-install hotspot per samply's
self-time data: per tarball entry, it walked every path component
calling `symlink_metadata` per component to detect the symlink-attack
case. For an npm tarball with ~80 entries averaging 4 path components,
that's ~320 redundant `symlink_metadata` syscalls per package — the
same intermediate dirs (`src/`, `lib/`, `node_modules/.bin/`, etc.)
get re-stat'd on every entry that lives under them.

Fix: thread a `HashSet<PathBuf>` of verified-already parent dirs
through `extract_tarball_from_reader_with_inspector` and skip the
syscall on cache hit. Only NON-leaf components are cached — the leaf
is the per-entry file path which still needs the symlink-attack
guard.

Safety invariant: the cache only carries verified PARENT directories,
never leaf paths. A leaf hit can never live in the set, so the
symlink-attack guard cannot be bypassed by accident.

Bench delta (paired n=10 cold/clean on fixture-large):
- extract sum: 116 → 61 ms (combined with Fix #1's chmod skip)
- ~320 → ~84 `symlink_metadata` calls per ~80-entry package
- total wall: ~5-10 ms recovered (small because syscall is cached
  by the OS dirent cache and parallelism bounds wall by max-per-task)

Capacity heuristic: HashSet sized for 64 parents up front. Most npm
tarballs have ≤ 10 distinct intermediate dirs; 64 covers the long
tail without over-allocating.

Pinned by the existing 20 extractor unit tests (every path-traversal
edge case still passes after the signature change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 8, 2026
Three small, verified wins on cold-install CPU + wall, identified by
symbolicated samply on bench/fixture-large after fixes b03d051 already
landed.

#1 extractor: archive.set_preserve_mtime(false)
   Pre-fix flame attributed 2.0% of active CPU to filetime::set_file_-
   handle_times → fsetattrlist (100% from extract_tarball). mtime is
   meaningless for content-addressable store bytes — require() doesn't
   read it; lpm doctor doesn't use it. tar 0.4.45's preserve_mtime
   defaulted to true; flipping it eliminates the syscall entirely.

#4 extractor: stream_entry_to_disk replaces entry.unpack() for files
   Even with preserve_permissions(false) tar 0.4.45's _set_perms still
   unconditionally calls fs::set_permissions (entry.rs:814 — the flag
   only controls SUID-bit retention). 1.7% of active CPU was __fchmod
   from 100% extract_tarball. New helper does File::create + io::copy
   only — same minimal write semantics as the existing
   write_buffered_entry path.

#2 store: LinkMeta::write_to_unpublished skips inner tmp+rename
   populate_into stages the sidecar inside an unpublished tmp_dir
   (links/<key>.tmp.<pid>.<tid>/) that is published via a single outer
   atomic rename. The inner tmp+rename in write_to was redundant: no
   observer can ever see a half-written sidecar inside an unpublished
   dir. New write_to_unpublished writes the JSON directly. Saves one
   rename syscall per link entry × N packages.

Verification — paired A/B median over 8 iters (worst dropped):

  Stage    | Pre-fix  | Post-fix | Δ
  total    | 998 ms   | 937 ms   | −61 ms (−6.1%)
  fetch    | 355      | 304      | −51 ms  (#1 + #4)
  link     | 138      | 132      |  −6 ms  (#2)

Flame profile confirms target syscalls eliminated:
  set_perms_ownerships: 1.7% → 0
  set_file_handle_times: 2.0% → 0
  __fchmod: 1.7% → 0
  __rename: 10.2% → 7.7%
  LinkMeta::write_to → write_to_unpublished: 4.2% → 0.3%

Tests: cargo nextest run -p lpm-extractor -p lpm-store — 134/134 pass.
Clippy: clean across workspace.

Followups still open (separate tranche):
- #5 fuse extract+analyze (drop redundant 2nd-pass walk, ~10% on-CPU)
- #6 restore event-driven link/fetch overlap (~50-100 ms wall)
- #3 lazy warm-hit sidecar touch (warm-install-only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 8, 2026
…s (v2 path)

Drops the redundant post-extract directory walk + per-source-file disk
re-read that the v2 path inherited from its non-streaming origin. v2's
`extract_object_with_timings` now mirrors v1's
`stream_and_store_package` (lib.rs:594-618): the extractor's
`extract_tarball_from_reader_with_inspector` filters scannable entries
via `PackageAnalyzer::should_scan` and feeds matching entries' bytes
into the analyzer while still in the tar walk's write buffer. The
post-extract `finalize` only reads `package.json` for manifest tags —
the per-source-file pass is gone.

Pre-flame attribution for the eliminated work:
  analyze_package (orchestrator):           3.8% active CPU
  analyze_single_file (per-file fs::read):  4.9%
  collect_source_files_recursive (fs walk): 1.5%
  total redundant cost:                    ~10.2%

Post-#5 flame:
  analyze_package*:                          gone (only manifest 0.4%)
  analyze_single_file:                       gone
  collect_source_files_recursive:            gone
  PackageAnalyzer::feed (during extract):    8.5% (= analyze_bytes work
                                                   moved to fused path)

Bench (single iter, network-noisy resolver swamps total wall):
  fetch_ms:  304 → 264 (−40 ms)
  link_ms:   132 → 124 (−8 ms)
  extract_sum unchanged (analyzer cost folded into the extract phase)
  security_sum: now ~0 (only finalize manifest read remains there)

`tarball_data: &[u8]` implements Read directly so no Cursor needed.
RefCell wraps the analyzer for the FnMut inspector closure.

Tests: cargo nextest run -p lpm-extractor -p lpm-store -p lpm-security
       — 543/543 pass.
Clippy: clean.

Followups still open:
  #6 restore event-driven link/fetch overlap (~50-100 ms wall, substantial)
  #3 lazy warm-hit sidecar touch (warm-install only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 8, 2026
Replace the read+touch+write+rename cycle on every link-entry cache
hit with a single `set_modified()` call (one `utimes(2)` syscall) on
the sidecar JSON. The on-disk `last_referenced_at` field becomes a
stale-from-creation snapshot; prune reads max(json_field, file_mtime)
via the new `LinkMeta::effective_last_referenced_at(path)` helper so
existing pre-followup sidecars keep working.

`LinkEntry.sidecar` becomes `Option<LinkMeta>` — None on cache hit,
Some on fresh population. The lpm-linker callsite only consumes
`freshly_populated`, so the optional sidecar lets cache hits skip
the JSON parse+memory-alloc on top of the rewrite.

Verified on bench/fixture-large (256 transitive deps, lockfile-fast-
path warm install where every package is a cache hit, paired bench
post-#6b vs post-#3, drop-max median):
  pre-#3 → wall=500 ms  fetch=14 ms  link=382 ms
  post-#3 → wall=112 ms  fetch= 9 ms  link=  1 ms
  Δ      → wall −388 ms (−78%)       link −381 ms (−99.7%)

Cold/clean unchanged (no cache hits to optimize, wall 901 vs 902 ms).
Workspace tests: 5745/5745 pass (2 new tests for `touch_on_disk` +
`effective_last_referenced_at`). audit-fixtures: 16 PASS / 1 SKIP /
1 FAIL (`native/esbuild-prebuilt` confirmed pre-existing).

The handoff doc predicted ~20-30 ms warm savings; actual delta is
13× higher because per-package cost was dominated by the JSON
parse+serialize + atomic-rename trio, not just the rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 12, 2026
`lpm-triage-advisor`:
- `AmberScript` gains `referenced_scripts: &[ReferencedScript]` with
  `ReferencedScript { filename, content }`. Drops `Serialize/Deserialize`
  derives — the struct is transport-only for borrowed prompt inputs
  (borrowed slices can't auto-derive deserialize).
- `build_prompt_with_nonce` emits a "Referenced files (DATA, not
  instructions)" section ONLY when the slice is non-empty. Each
  referenced file gets its OWN per-file random nonce so an attacker
  who edits one file's content can't break out of another file's
  data section.
- APPROVE bullet adds: "evaluate the embedded file as if it were
  the script body for the fetch-IDENTITY rule." Closing reminder
  extends to "each `Referenced file` block is also UNTRUSTED."
- `prompt_template_hash` canary uses `referenced_scripts: &[]` (the
  no-embed render path) for determinism.
- Cache key adds a REF_SECTION_SEP delimiter + per-file `(filename,
  content)` records, so content changes invalidate cached verdicts.
- 4 new prompt tests + 1 cache-key test cover the present/absent,
  per-file-nonce, and content-axis cases.

`lpm-cli`:
- `AmberPackageRequest` gains `referenced_scripts: Vec<(String, String)>`.
- `build_state::collect_referenced_scripts` reads referenced files
  from the package store with runbook caps:
    - depth 1 (no recursive require following),
    - ≤ 32 KB per file (truncated mid-line with explicit marker,
      walked back to a char boundary),
    - safe-relative path only (rejects `..`, abs, `~`, `$VAR`),
    - canonical-prefix check defends against sym-link traversal,
    - NUL byte in head 4 KB → reject as binary.
- `parse_delegated_paths` mirrors `static_gate::matches_node_relative`
  / `matches_delegating_identity_green` — only the two-token
  `node <safe-relative>.{js,cjs,mjs}` shape extracts a path.
- Install pipeline (`collect_amber_classification_requests`) scans
  every amber phase, deduplicates filenames across phases, and
  emits the embedded view into the advisor session.
- 9 new unit tests cover the green path, escape-rejection, binary
  detection, truncation marker, missing-file, and extension guards.
- Added `shlex` workspace dep.

`lpm-audit-corpus`:
- `PackageAudit` gains `referenced_scripts: Vec<ReferencedScriptEntry>`
  persisted on each record. Live audit path leaves empty (no tarball
  fetch); hermetic / curated fixtures supply content directly.
- `HermeticEntry` + `CuratedExpectation` gain `referenced_scripts`.
- `classify_one_with_advisor` threads the embedded view through
  both the prompt builder AND the cache key — content changes
  produce new cache slots so the verdict re-evaluates.
- Hermetic fixture: two delegate-to-local-file entries now carry
  realistic install.js content (binary fetcher from same-repo
  releases).
- Curated fixture: `amber-d18-013-sharp-install-js` carries an
  excerpt of sharp's actual install.js.

Measurement (claude-cli, runs run-to-run variance ~3-5pp):
- Hermetic: advisor-enhanced auto-run 9→10 (one additional Approve
  with embedded view). Stayed at 5 ambers post Lever #4.
- Curated: only 1 entry has referenced_scripts and it was already
  L1-Green'd by Lever #4, so this corpus shows no isolated Lever #3
  movement. Real install.js content shines through the install
  pipeline's file-reader at install time.

Tests: 2257/2257 lpm-security, 17/17 lpm-cli triage, 9/9 new
build_state unit tests, 57/57 lpm-triage-advisor. Clippy + fmt clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 12, 2026
Phase 46b — triage-layer UX levers (#6 cache, #1 repo URL, #4 L1 widening, #3 embed install.js)
tolgaergin added a commit that referenced this pull request May 14, 2026
…ix (#58)

* test(workflows): pin concurrency + recovery contracts for lpm install

Adds tests/workflows/tests/install_concurrency.rs with 13 falsifiable
tests covering production failure modes that had zero coverage:

Category A — process racing:
  * two concurrent installs on same project (pins finding-#77 floor)
  * install + concurrent store-clean serialize via shared/exclusive
    store_lock (probed via try_with_exclusive_lock on the actual
    lock file, not a directory-existence proxy)
  * two concurrent `lpm install -g` via global_tx_lock — proves
    final manifest + WAL coherence under serialized commits

Category B — interruption recovery:
  * kill mid-tarball-fetch leaves no .lpm/install-hash
  * next `lpm install` converges to a coherent end state

Category C — network faults:
  * tarball 503 → 200 succeeds after retry (counting Respond impl)
  * metadata 404 fails immediately without retry (<2s wall-clock)

Category D — filesystem faults:
  * readonly project dir fails with actionable error (no panic);
    POSIX-only via #[cfg(unix)], RAII guard restores permissions
  * `<project>/.lpm` planted as a regular file fails clearly

Category E — partial state recovery:
  * stale install-hash triggers re-resolve + refetch
  * partial node_modules re-links to full state
  * truncated lpm.lockb either recovers or fails cleanly (no panic)

Category F — WAL recovery hook:
  * torn WAL tail (3 garbage bytes) gets truncated by the dispatcher's
    recovery hook before the command runs; idempotent on re-invocation

Support helper refactor (same commit so the new helper has callers):
  * extracts env-isolation set into `LpmEnvSink` trait +
    `apply_lpm_env(cmd, project)` shared by `lpm()` (assert_cmd) and
    the new `lpm_spawnable()` / `lpm_spawnable_with_registry()`
    (std::process::Command, supports Child::kill())
  * trait impl on both Command variants ensures the two helpers
    cannot drift on the ~30 env knobs that gate test isolation

Surfaced findings during this work:
  * #77 — no project-level install lock: concurrent installs silently
    drop one side's work AND/OR fail with atomic-rename races (3
    observed failure modes documented in findings.md). Fix shape:
    LpmRoot::project_install_lock + with_exclusive_lock_async wrap.
  * #78 — retry-backoff has no test-friendly knob; retry-exhaustion
    tests take 15s+. Fix shape: LPM_RETRY_BACKOFF_MS_OVERRIDE env in
    debug builds.

CI gate locally green:
  clippy --workspace --all-targets -- -D warnings: clean
  cargo fmt --check: clean
  fancy-regex ban: empty
  cargo build --workspace: clean
  cargo nextest run --workspace --exclude lpm-integration-tests:
      6439 passed, 7 skipped, 1 leaky (pre-existing)

Deferred (filed under "next session" in the followup plan):
  B.3 (kill doesn't tear lockfile) — subsumed by B.1/B.2
  B.4 (panic injection) — needs LPM_TEST_PANIC_AT env hook
  C.2 (retry exhaustion) — blocked by finding #78
  C.3 (truncated body) — needs custom Respond with Content-Length mismatch
  D.3 (disk-full simulation) — no portable mechanism
  F.2, F.3 (orphan WAL, torn WAL with real records) — needs
    framed-WAL construction helpers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): pin lpm.lock well-formedness + recovery skip-on-contention

Closes B.3 and F.2 of the concurrency tranche — 13 → 15 tests, meeting
the "≥15 of 21" acceptance criterion for Item 2.

B.3 — `install_killed_mid_pipeline_leaves_well_formed_or_absent_lockfile`:
Exercises two SIGKILL windows on the install pipeline — fresh project
and project with a committed lpm.lock from a prior install. After each
kill, asserts the on-disk lpm.lock is either absent OR parses as TOML.
Never half-written. Adds `toml = { workspace = true }` as a workflow-
tests dev-dep for the parse assertion. Helper
`assert_lockfile_well_formed_or_absent` shared between both windows.

F.2 — `lpm_command_skips_recovery_when_another_lpm_holds_global_tx_lock`:
Validates the dispatcher's `try_with_exclusive_lock` idempotent-skip
path at `main.rs:2531`. A background thread acquires `global_tx_lock`
via `lpm_common::with_exclusive_lock` and blocks on a channel. With
the lock held, runs `lpm global list` against a project with a torn-
WAL prefix — asserts the WAL bytes are UNCHANGED (skip arm fired,
recovery did not run). Then releases the lock and re-runs; asserts
the WAL is now truncated (recovery defers correctly to the next
lock-free invocation). Exercises both branches of the `try_with_
exclusive_lock` Ok(None) / Ok(Some) arm.

CI gate locally green:
  cargo clippy --workspace --all-targets -- -D warnings: clean
  cargo fmt --check: clean
  cargo nextest run --workspace --exclude lpm-integration-tests:
      6441/6441 passed, 7 skipped
  5x parallel re-run of install_concurrency: 15/15 stable each run

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): pin truncated-tarball + orphan-WAL recovery contracts

Two new tests in tests/workflows/tests/install_concurrency.rs:

- C.3 tarball_connection_dropped_mid_body_fails_or_retries: a custom
  wiremock Respond impl serves half a tarball with a Content-Length
  header naming the full length. Pins the install pipeline's
  retry-then-fail behavior on transport-class failures (~14s wall-clock
  for the full 4-attempt retry schedule). Hyper 1.9 server-side panics
  on the Content-Length lie, dropping the connection — a valid surrogate
  for a broken upstream / CDN dropping mid-body. Surfaced 8 tarball GETs
  per install (deterministic, 3-of-3 reproducer), explained by two
  distinct download_tarball_* call sites in install.rs each running the
  4-attempt retry budget.

- F.3 lpm_command_with_orphan_pending_tx_emits_recovery_banner: plants
  both halves of an orphan transaction (WAL Intent record without
  matching Commit/Abort + matching [pending.<pkg>] row in manifest.toml
  pointing at a non-existent install root) and asserts the dispatcher's
  recovery hook fires the RolledBack banner from main.rs:2543. Sets
  RUST_LOG=lpm=info to lift the default lpm=warn filter so the
  tracing::info! line surfaces. Adds lpm-global as a workflow dev-dep
  for WalWriter / IntentPayload / write_for. Pins post-state: orphan
  pending row gone, no spurious active row.

Together these close the C.3 and F.3 gaps in Item 2 of the test
coverage follow-up plan: 17/21 scenarios pinned (was 15/21). The four
remaining items all need source-side hooks (LPM_TEST_PANIC_AT,
LPM_RETRY_BACKOFF_MS_OVERRIDE, container infra) and are out of scope
for this tranche.

Full CI gate green: clippy clean, fmt clean, fancy-regex empty,
6443/6443 nextest pass (was 6441 pre-tranche).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): pin tarball-extraction security contracts at install tier

New file tests/workflows/tests/tarball_security.rs ships phase 1 of
Item 3 (tarball-extraction security): 5 of 10 planned tests covering
the most distinct security contracts at the install-pipeline tier.
Each test constructs its malicious tarball in-line via tar::Builder
(no checked-in fixtures), serves it through MockRegistry, and runs
lpm install end-to-end so any pipeline-level regression that bypasses
the extractor's hardening is caught.

Tests landed:

- #1 tarball_with_dot_dot_path_entry_is_rejected_by_install — pokes
  package/../escape.txt into the raw tar header bytes; install fails
  with "path traversal detected"; outside sentinel never created.
- #3 tarball_with_absolute_path_entry_is_normalized_to_relative_under_package_dir
  — renamed from "rejected" to reflect actual contract. The extractor's
  strip_first_component consumes the RootDir; an entry like
  /etc/lpm-pwned.txt extracts as node_modules/<pkg>/etc/lpm-pwned.txt.
  Install SUCCEEDS; literal /etc/lpm-pwned.txt is never written.
  Defensible: malformed-but-safe input normalized rather than refused.
- #2 tarball_with_symlink_to_outside_path_is_silently_skipped —
  renamed. The is_file() gate at lib.rs:398 silently drops symlinks;
  install succeeds with byte-identical outside sentinel.
- #5 tarball_with_hard_link_to_outside_file_is_silently_skipped —
  renamed. Same is_file() gate; hardlinks silently skipped; outside
  victim file unmodified.
- #8 tarball_with_setuid_executable_extracts_with_setuid_bit_stripped
  (POSIX-only) — tarball entry mode 0o4755 extracts as 0o755. SUID,
  SGID, and sticky bits all cleared via set_preserve_permissions(false)
  + the explicit `0o644 | exec_bits` mode set after write. Exec bits
  preserved.

Three tests carry a "plan-vs-actual" docstring section explaining why
the rename is defensible — the actual extractor contract differs from
the plan's prescribed phrasing in safe ways, not in regression-grade
ways. No findings filed.

Phase 2 (5 remaining tests: Unicode normalization, device file, FIFO,
zero-byte sanity, OS-max path) is deferred to a follow-up tranche
with rationale + lift estimate documented in the plan. None blocks
phase 1 acceptance.

Pre-merge gate green: clippy clean, fmt clean, fancy-regex empty,
6448/6448 nextest pass (was 6443; +5 for the new tests). 0.18s wall-
clock for the full file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): per-project lock prevents concurrent-install data loss

Closes finding #77. Two `lpm install <pkg>` invocations on the same
project no longer race on the manifest snapshot+commit window.

Pre-fix, both processes acquired only a SHARED store_lock and proceeded
in parallel. Each opened its own per-process ManifestTransaction
snapshot of the pre-edit package.json, staged its own dep on top, and
ran the install pipeline. Whoever wrote package.json + lpm.lock last
won; the other process's edits — including its node_modules link —
silently vanished. Both processes still exited 0 with success-path
output. CI scripts that ran two installs in parallel saw no signal of
the data loss.

The fix introduces:

- crates/lpm-common/src/paths.rs::project_install_lock(project_dir):
  free helper returning <project_dir>/.lpm/.install.lock. Re-exported
  from crates/lpm-common/src/lib.rs.

- run_add_packages and run_install_filtered_add in
  crates/lpm-cli/src/commands/install.rs now wrap the snapshot →
  stage → install → finalize → commit window in
  with_exclusive_lock_async against the project lock. The lock is
  per-project (no cross-project contention) and held across all
  ?-early-exits via the async block's return.

For the workspace path, the lock sits at the discovered workspace root
(not per-member) so two concurrent `lpm install --filter <member>`
invocations on the same workspace serialize without per-member
deadlock-ordering complexity.

run_with_options (the inner install pipeline) does NOT acquire this
lock — it's called from inside both run_add_packages's wrap and from
many other commands; double-acquiring the same fd-lock would deadlock
in-process.

Deferred (phase 2, not exercised by A.1): lpm add (add.rs:723-904)
has a similar 180-line transaction with recursive Swift handling.
Wrapping it is invasive and the race surface is theoretical (users
don't typically run `lpm add` and `lpm install` concurrently). Defer
to a separate tranche if a concurrent `lpm add` × `lpm install` race
is ever observed.

Test contract tightening (bug-first per CLAUDE.md):
two_concurrent_installs_on_same_project_leave_well_formed_manifest in
tests/workflows/tests/install_concurrency.rs went from "at-least-one
survives + manifest is well-formed JSON" (the floor) to "BOTH installs
succeed, BOTH packages present in package.json deps, BOTH packages
linked in node_modules/" (the contract). Pre-fix: 1/1 fail (pkg-b
silently dropped). Post-fix: 5/5 pass with no flakes (~1.2s wall-clock
each — install B observes pkg-a's commit and reports "Resolved 2
packages").

Pre-merge gate green: clippy --workspace --all-targets clean, fmt
clean, fancy-regex empty, 6448/6448 nextest pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(registry): test-only retry-backoff override env knob

Closes finding #78 + lands C.2 (`tarball_503_exhausts_retries_fails_with_http_status`).

Pre-fix, retry-exhaustion tests were blocked: the registry client's
backoff schedule (1+2+4+8s, capped at 10s) made every retry-exhaustion
test take ~15s per fetch site (~28s with the install pipeline's 2
distinct download_tarball_* call sites). MAX_RETRIES, RETRY_BASE_DELAY,
and RETRY_MAX_DELAY are private const with no env override. C.2
therefore had to be #[ignore]-gated behind LPM_RUN_SLOW_TESTS=1, and
the retry-exhaustion contract went unproven on `cargo nextest run`.

The fix introduces:

- crates/lpm-registry/src/client.rs::backoff_override(): reads
  LPM_RETRY_BACKOFF_MS_OVERRIDE (a u64 ms value) gated by
  cfg!(debug_assertions) || LPM_TEST_MODE=1. Returns Some(Duration)
  when both conditions hold; None otherwise. Production retry policy
  is immune — release builds without LPM_TEST_MODE=1 silently ignore
  the env.

- backoff_delay(attempt) consults the override before computing the
  exponential schedule.

- The two 429 Retry-After sleep sites also consult the override so a
  future 429-flood retry-exhaustion test wouldn't hang on the
  server-supplied header.

C.2 test landed alongside (bug-first per CLAUDE.md):

- Mock returns 503 on every tarball request — no recovery path.
- Test sets LPM_RETRY_BACKOFF_MS_OVERRIDE=10 on the lpm subprocess.
- Asserts: install fails non-zero, no panic, ≥4 attempts (proves the
  retry loop fired), elapsed < 2s (load-bearing — without the knob
  this fails at ~14s), stderr contains an actionable HTTP-class
  noun (503 / status / http / network / etc).
- Surfaces 8 tarball GETs per install (4 attempts × 2 distinct
  download_tarball_* call sites — matches C.3's observation).

Pre-fix verification: same C.2 against the unfixed client.rs failed
on the elapsed assertion at 14.04s (knob ignored). Post-fix: passes
in 1.6s cold / 0.1s warm. 5/5 passes with no flakes.

Pre-merge gate green: clippy --workspace --all-targets clean, fmt
clean, fancy-regex empty, 6449/6449 nextest pass (was 6448 pre-fix;
+1 for C.2).

Item 2 of the test-coverage-followup-plan now at 18/21 (was 17/21).
Both findings #77 and #78 fixed in production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): tarball-security phase 2 — Unicode, device, FIFO, zero-byte, long-path

Adds 5 more tests to tarball_security.rs, completing Item 3 of the
test-coverage follow-up plan. Each test pins the actual extractor
contract under malicious-or-edge-case tarball shapes that reach the
install pipeline through MockRegistry.

Tests landed:

- #4 tarball_with_unicode_lookalike_parent_dir_extracts_safely_as_literal_bytes
  — renamed from "_normalization_traversal_rejected" to reflect the
  actual contract. Tarball entry path uses full-width dots U+FF0E `..`
  (bytewise NOT ASCII `..`). Component::ParentDir is byte-exact, so
  `..` becomes Component::Normal. Install SUCCEEDS; `..` materializes
  as a literal directory under node_modules/<pkg>/; outside sentinel
  byte-identical. Defensible because Path::components() doesn't
  NFKC-normalize on POSIX.

- #6 tarball_with_character_device_entry_is_silently_skipped
  (POSIX-only). EntryType::Char with /dev/null-shaped major/minor.
  Same is_file() gate as symlinks/hardlinks — silently skipped.
  Install SUCCEEDS; no device file at the expected path.

- #7 tarball_with_fifo_entry_is_silently_skipped (POSIX-only).
  EntryType::Fifo. Same posture as #6.

- #9 tarball_with_zero_byte_regular_file_extracts_as_empty_file.
  Sanity check that empty files still extract correctly (legitimate
  npm shape: .gitkeep, license placeholders).

- #10 tarball_with_single_path_component_exceeding_name_max_fails_cleanly.
  300-byte single-component name, well over POSIX NAME_MAX=255. Tar
  wire format succeeds via GNU long-name extension; the FILESYSTEM
  rejects on extraction (ENAMETOOLONG). Extractor wraps as
  LpmError::Io → install fails non-zero with the OS error visible
  and an actionable noun in stderr.

Three of the five tests are renamed to reflect actual extractor
contract vs the plan's prescribed phrasing — same "plan-vs-actual"
docstring pattern as phase 1. No findings filed; all 10 contracts
across phase 1 + 2 are defensible-as-implemented.

Pre-merge gate green: clippy --workspace --all-targets clean, fmt
clean, fancy-regex empty, 6454/6454 nextest pass (was 6449 pre-tranche;
+5 for the new tests). Full file 0.2s wall-clock for all 10 tests.

Item 3 now COMPLETE (10/10).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): cross-command flows Item 4 — migrate→rebuild + workspace filter isolation

Closes Item 4 of the test-coverage-followup-plan at 6/6 (target was
≥5). Two additions to tests/workflows/tests/cross_command_flows.rs:

- Plan #1 — extended flow_migrate_install_audit_lockfile_round_trips
  with a `lpm rebuild --dry-run --policy=deny` step. Pins the full
  migrate → install → audit → rebuild lifecycle. Asserts the rebuild
  step exits 0 + does not mutate the post-audit state (lpm.lock +
  lpm.lockb still present). Catches regressions where rebuild's
  lockfile or build-state parser breaks against a freshly-migrated
  manifest.

- Plan #5 — added flow_workspace_install_filter_member_a_does_not_mutate_member_b
  (new test, 159 LOC). Pins the workspace-member isolation contract
  using the workspace-monorepo fixture (3 members: app, core,
  utils):
    1. Initial filtered install on @test/core (re-pinning its
       existing semver dep) populates core's per-member quadruple:
       lpm.lock=319 B, lockb=230 B, install_hash=118 B.
    2. Snapshot core's full quadruple.
    3. Run `lpm install chalk@5.3.0 --filter @test/app` to add a
       new dep to app ONLY.
    4. Assert app's package.json gained chalk; core's quadruple
       (package.json + lpm.lock + lpm.lockb + install-hash) is
       BYTE-IDENTICAL post-install; chalk does NOT appear in
       core's node_modules/.

  Catches a regression where a per-member filtered install
  accidentally also mutates a sibling member's package.json /
  lockfile / install-hash — a real bug class because run_install_filtered_add
  shares the workspace-root project lock (added in #77 fix) and
  could over-snapshot if the target-set computation drifts.

  Helper `mount_pkg_full(mock, name, version)` factors out the
  three-step metadata + batch-metadata + tarball mount so the
  test body stays readable.

Other 4 plan flows already covered pre-tranche:
- Plan #2: flow_add_install_graph_added_dep_visible
- Plan #3: flow_install_patch_patch_commit_install_persists_patch
- Plan #4: flow_token_rotate_publish_dry_run_picks_new_token
- Plan #6: flow_install_upgrade_major_audit_picks_new_version

Pre-merge gate green: clippy --workspace --all-targets clean, fmt
clean, fancy-regex empty, 6455/6455 nextest pass (was 6454; +1 for
the new flow). Plan #5 stable across 5/5 reruns at ~0.11s each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(install): LPM_TEST_PANIC_AT hook + B.4 panic-rollback contract

Adds a deterministic panic-injection hook to the install pipeline +
unblocks the long-deferred B.4 contract test for ManifestTransaction
Drop-based rollback on panic.

The hook (`maybe_test_panic(stage)` in
crates/lpm-cli/src/commands/install.rs) reads LPM_TEST_PANIC_AT and
panics when the env value matches the stage name. Gated to
`cfg!(debug_assertions) || LPM_TEST_MODE=1` — same pattern as the
#78 retry-backoff override. Production builds without LPM_TEST_MODE=1
silently treat the env as no-op.

Wired 4 stages in `run_add_packages`:
- "after-snapshot" — manifest unchanged; Drop is no-op
- "after-stage" — placeholder `*` written to package.json (load-bearing)
- "after-install" — pipeline complete; manifest still has `*`
- "after-finalize" — concrete versions written; pre-commit only

The hook unblocks B.4 (`install_panics_mid_pipeline_rollback_restores_manifest`),
deferred since the original Item 2 tranche because there was no
deterministic way to trigger a panic mid-install from a workflow
test. Recoverable errors fire `?`-rollback (covered by E.1/E.2/E.3);
SIGKILL bypasses Drop entirely (B.1/B.2/B.3 cover that). The panic
path was the missing rollback proof.

B.4 sets LPM_TEST_PANIC_AT=after-stage and asserts:
- process exits non-zero (panic propagates to runtime)
- stderr contains `"panicked at"` AND `"LPM_TEST_PANIC_AT=after-stage"`
- package.json BYTE-IDENTICAL to pre-stage (Drop ran on unwind, snapshot
  bytes restored — load-bearing)
- the new pkg is NOT in dependencies (placeholder rollback worked)
- .lpm/install-hash absent (invalidate-on-rollback)
- lpm.lock absent (matched optional snapshot's None pre-state)

Catches a regression where:
- panic = "abort" added to release profile (no Drop on panic)
- ManifestTransaction Drop logic stops restoring snapshot bytes
- The `lpm install` snapshot+commit window grows without re-wiring Drop

Test runs in 0.07s warm. 5/5 stable across reruns.

Pre-merge gate green: clippy --workspace --all-targets clean, fmt
clean, fancy-regex empty, 6456/6456 nextest pass (was 6455; +1 for B.4).
install_concurrency now at 19/19. Item 2 of test-coverage-followup-plan
moves to 19/21 — only A.2 (no contract) and D.3 (needs container
infra) remain deferred indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(workflows): align MockRegistry tarball URL shape with production /-/ gate

Workflow tests mounted tarballs at `/tarballs/{name}-{version}.tgz` —
missing the `/-/` path segment that the registry-client's `evaluate_cached_url`
gate at [crates/lpm-registry/src/client.rs#L4117] requires (`.tgz` suffix
AND `/-/` substring). The gate is a defense-in-depth check that blocks
the H1 auth-token leak: a tampered lockfile URL like `/api/admin/foo.tgz`
(no `/-/`) would otherwise attach the bearer to a non-registry endpoint.

The mismatch produced two test-environment side effects that don't manifest
in production:

1. **WARN noise**: every install test that read a tarball URL from the
   lockfile fast path logged `cached tarball URL for X@Y failed shape check;
   falling back to on-demand lookup`. Polluted stderr across the suite.
2. **`shape_mismatch_count` defeated**: the registry-client documents this
   counter as a "BUG signal — the writer should never emit a gate-rejectable
   URL". Test runs incremented it on every install, making the counter
   useless for catching real bugs.

This commit migrates the mock to the production-shape
`/tarballs/{name}/-/{name}-{version}.tgz` everywhere — both the helper
methods (`MockRegistry::tarball_path` / `tarball_url`) and the ~60
hard-coded `format!` sites across 14 test files + 1 snapshot.

The new `tarball_path` helper is `pub` with a prominent docstring warning
future test authors not to re-introduce the legacy shape. Internal mounts
in `with_package_and_deps` / `with_package_published_at` /
`with_full_package_metadata` all route through it.

Post-fix verification: WARN gone, gate `Accepted` path runs, all 691
lpm-workflows tests pass (0 leaky in the latest full-workspace run, down
from 1-3 leaky pre-fix — fewer fallback paths firing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflows): test-coverage-followup tranche — Items 2/3/4/5

Closes the remaining open rows from `private/test-coverage-followup-plan.md`
across four items. ~2,600 LOC of new test code + fixture + budget infra.

**Item 3 — tarball-security additional candidate surfaces (7 tests
in `tarball_security.rs`):**

- `tarball_with_pax_path_traversal_rejected` — PAX extended `path`
  header smuggling `..` is rejected by the extractor's `Component::ParentDir`
  check after the tar crate resolves the override.
- `tarball_with_gnu_longname_traversal_rejected` — symmetric GNU `L`
  entry; same rejection path.
- `tarball_rejects_or_rolls_back_when_later_entry_is_malicious` — pins
  the `rollback_extraction` contract: valid first entry is cleaned up
  when a later `..`-traversal entry trips rejection mid-stream.
- `tarball_with_duplicate_member_path_rejected_or_deterministic` — pins
  current last-write-wins contract (defensible; flagged scanner-
  disagreement risk in test comment).
- `tarball_with_truncated_gzip_rolls_back_partial_extract` — half-
  truncated gzip stream → libdeflate fails cleanly → no partial extract.
- `tarball_ignores_uid_gid_ownership_metadata` (POSIX) — bogus uid/gid
  in tar header is ignored; extracted files owned by process uid.
- `tarball_with_sparse_huge_file_rejected_by_declared_size` — manually-
  constructed tarball with header declaring `MAX_FILE_SIZE + 1` and
  empty on-wire body; extractor rejects on the pre-check at lib.rs:306
  before draining body.

**Item 4 — cross-command flows additional candidate surfaces (2 tests
in `cross_command_flows.rs`):**

- `flow_install_uninstall_install_graph_round_trip` — pins manifest /
  link / graph hand-off through a full round-trip.
- `flow_cache_clean_then_offline_install_uses_store_or_fails_helpfully` —
  pins the cache/store boundary: `cache clean` must not corrupt offline
  install; store-side bytes byte-identical after a clean.

**Item 2 — concurrency/recovery additional candidate surfaces (3 tests
in `install_concurrency.rs`):**

- `cache_clean_during_slow_tarball_install_does_not_corrupt_install`
  (G.4) — install + cache clean run concurrently (different lock paths,
  no serialization); install succeeds despite metadata cache wipe
  mid-stream. Empirical timing observed: install elapsed 1.57s, cache
  clean fired at t=30-39ms cleanly inside the install window.
- `install_panics_after_install_hash_write_rollback_invalidates_hash`
  (G.5) — reuses existing `LPM_TEST_PANIC_AT=after-install` stage (no
  new source-side hook needed — `write_post_install_v6_hash` runs
  inside `run_with_options` which returns BEFORE that stage fires).
  Pins that Drop-based rollback restores manifest AND deletes the
  freshly-written install-hash.
- `malformed_registry_json_fails_without_manifest_or_lockfile_mutation`
  (G.6) — truncated JSON on all three metadata endpoints; install
  fails cleanly, no panic/backtrace, package.json byte-identical, no
  torn lockfile.

**Verdaccio-npm parity for `which@4.0.0`
(`install_real_registry.rs`):**

- `verdaccio_npm_parity_for_bin_package_pins_metadata_and_shim_presence`
  — extends the existing lodash byte-diff with a bin-shipping target
  package. Asserts metadata equivalence + `.bin/<name>` shim present
  on both sides + bin target file materialized + exec bits non-zero
  (POSIX).

**Item 5 — realworld fidelity (new fixture + new test file):**

- `tests/fixtures/realworld-nextjs/` (package.json + README) — pinned
  Next.js 14.2.13 + React 18.3.1 + TypeScript 5.6.3 + 3 `@types/*`
  packages. Resolves to ~28 transitive deps empirically. README
  documents the calibration methodology including raw measurement data.
- `tests/workflows/tests/install_realworld.rs` — `install_realworld_nextjs_fixture_succeeds_through_verdaccio`
  installs the fixture through Verdaccio→npmjs and asserts end-to-end
  success at production scale. Always logs cold + warm wall-clock + peak
  RSS to stderr for calibration data.
- **`LPM_BUDGET_GATE=1`-gated budget assertions**: cold ≤ 25s, warm ≤
  25ms, cold peak RSS ≤ 1500 MiB. Calibrated from N=6 cold + N=3 warm +
  N=3 RSS runs on M-series macOS, 2026-05-14. Memory measurement via
  `/usr/bin/time -l` (macOS) / `-v` (Linux); Windows skips with a clear
  warning.

This closes Item 5 entirely (all 4 acceptance criteria green) and brings
Items 2/3/4 to the parked-by-design or infrastructure-blocked baseline.

CI gate: clippy `--workspace --all-targets -- -D warnings` clean, fmt
clean, fancy-regex empty, build clean, `cargo nextest run --workspace`
6471/6471 pass. Suite runtime ~2:40 (was ~2:24 pre-tranche; +15s for the
realworld test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(workflows): collapse Linux-only let-chain in parse_peak_rss

CI lint on Linux failed on `clippy::collapsible_if` in the Linux-cfg'd
branch of `parse_peak_rss`. The macOS branch had an intermediate
`let bytes_str = rest.trim();` between the two `if let`s, which is why
the local clippy run on macOS didn't catch this — only the macOS-cfg
branch compiled there.

Collapse the Linux branch to use `&&` (stable let-chains) so it
satisfies the lint while preserving the same semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 16, 2026
Removes phase numbers, plan section refs (§4.1, §7.2, §11, §12.2,
§18), attribution (GPT, Gemini), date stamps, finding IDs (Finding #N,
D-impl-N), and internal plan labels (Phase 46 P2/P3/P4/P6/P8,
Phase 46b Lever #3/4, Option B, Chunk N) from all comments and doc
strings. Behavioral descriptions are preserved and rewritten in plain
language where needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant