Skip to content

Flip install resolver default to greedy-fusion#20

Merged
tolgaergin merged 2 commits into
mainfrom
default-fusion-resolver
Apr 29, 2026
Merged

Flip install resolver default to greedy-fusion#20
tolgaergin merged 2 commits into
mainfrom
default-fusion-resolver

Conversation

@tolgaergin
Copy link
Copy Markdown
Contributor

Summary

Make greedy-fusion the global install default. Phase 56 W4 (commit 102353e) made fusion the default for the greedy arm (opt-in via LPM_RESOLVER=greedy); PubGrub-with-split-retry + legacy walker remained the global install default. After Phase 60 (lpm add source-anywhere) shipped clean, flip the global default so lpm install (no env vars) reaches the fast path everyone has been measuring against.

Resolver dispatch matrix (post-flip)

LPM_RESOLVER LPM_GREEDY_FUSION Result
unset (default) unset / non-"0" greedy-fusion (NEW)
unset (default) "0" greedy + legacy walker
"greedy" unset / non-"0" greedy-fusion
"greedy" "0" greedy + legacy walker
"pubgrub" (any) PubGrub + legacy walker

Escape hatches:

  • LPM_RESOLVER=pubgrub — full opt-out to the previous install default. Use only if you hit a greedy-fusion edge case in the wild and need a tested fallback.
  • LPM_GREEDY_FUSION=0 — opt-out from the fused dispatcher to the legacy walker arm with greedy resolver behavior held constant.

What changed

  • crates/lpm-cli/src/commands/install.rs:3825 — fusion gate: == "greedy"!= "pubgrub"
  • crates/lpm-resolver/src/resolve.rs:317 — legacy-walker greedy gate: same flip
  • 3 PubGrub-arm-specific resolver tests (split-retry / npm-alias range parsing) now pin LPM_RESOLVER=pubgrub via a guard struct + StdMutex serialising env-var mutation across async tests, mirroring the pattern in lpm-vault/src/sync.rs
  • 1 workflow test (install_json_output_contains_package_list) asserts the PubGrub-walker arm's timing.resolve.streaming_bfs telemetry contract; subprocess gets LPM_RESOLVER=pubgrub via Command::env

Test plan

  • cargo clippy --workspace -- -D warnings clean

  • cargo fmt --check clean

  • cargo build --workspace clean

  • cargo nextest run --workspace --exclude lpm-integration-tests4926/4926 pass

  • n=20 bench, cold install, bench/fixture-large:

    Arm n median mean trim-mean (10%) stdev
    default (post-flip) 20 984 ms 1149 ms 1079 ms 371 ms
    greedy-fusion (explicit) 20 925 ms 923 ms 923 ms 40 ms
    bun 20 853 ms 901 ms 874 ms 185 ms

    All fusion gates PASS for the default arm: hard ≤1500 ms (516 ms margin), stretch ≤1000 ms (16 ms margin), stdev 371 ms ≤500 ms. Default median 984 ms is inside the bench/scripts/README.md "all good" band of 850-1050 ms.

    On the variance asymmetry between default (stdev 371 ms) and explicit greedy-fusion (stdev 40 ms): the dispatch logic is identical post-flip, so this is most likely round-robin position bias. The bench script runs default → greedy-fusion → bun per outer iter, so default starts each iter without warm npm DNS/TCP state from a prior arm; greedy-fusion benefits from that warmth. Real users see the default arm's behavior (no prior arm's network warmth), and 984 ms still beats both the hard and stretch gates. bun also had unusually high variance this run (185 ms vs typical 112 ms in W4 baseline), suggesting variable network conditions.

Pre-existing follow-up not addressed here

Lockfile::new() at crates/lpm-lockfile/src/lib.rs:245 hardcodes resolved_with: "pubgrub" even when greedy-fusion writes the lockfile. This was already wrong post-Phase-56-W4 whenever LPM_RESOLVER=greedy was set; the flip widens the bug to the default path. Worth a follow-up that records the actual resolver used at lockfile-write time. Not blocking this merge — resolved_with is purely informational metadata; no code branches on it.

🤖 Generated with Claude Code

tolgaergin and others added 2 commits April 29, 2026 16:51
Phase 56 W4 (commit 102353e) made greedy-fusion the default *for the
greedy arm* — opt-in via `LPM_RESOLVER=greedy`. PubGrub-with-split-retry
+ legacy walker remained the global install default. After Phase 60
shipped the `lpm add` work, flip the global install default.

Resolver dispatch matrix (post-flip):

| LPM_RESOLVER    | LPM_GREEDY_FUSION | Result                  |
|-----------------|-------------------|-------------------------|
| unset (default) | unset / non-"0"   | greedy-fusion (NEW)     |
| unset (default) | "0"               | greedy + legacy walker  |
| "greedy"        | unset / non-"0"   | greedy-fusion           |
| "greedy"        | "0"               | greedy + legacy walker  |
| "pubgrub"       | (any)             | PubGrub + legacy walker |

Escape hatches:
  - `LPM_RESOLVER=pubgrub` — full opt-out to the previous install
    default (PubGrub-with-split-retry + walker).
  - `LPM_GREEDY_FUSION=0` — opt-out from the fused dispatcher to the
    legacy walker arm with the resolver behavior held constant.

Code change is the one-line semantic flip at:
  - install.rs:3825 — fusion gate (`== "greedy"` → `!= "pubgrub"`)
  - resolve.rs:317 — legacy-walker greedy gate (same flip)

Test fixes
- 3 PubGrub-arm-specific resolver tests (split-retry / npm-alias range
  parsing) now pin `LPM_RESOLVER=pubgrub` via a guard struct + StdMutex
  serialising the env-var mutation across async tests, mirroring the
  pattern in lpm-vault/src/sync.rs.
- 1 workflow test (`install_json_output_contains_package_list`) asserts
  the PubGrub-walker arm's `timing.resolve.streaming_bfs` telemetry
  contract; subprocess gets `LPM_RESOLVER=pubgrub` via Command::env.

Validation
- cargo clippy --workspace -- -D warnings: clean
- cargo fmt --check: clean
- cargo nextest run --workspace --exclude lpm-integration-tests: 4926/4926 pass
- n=20 bench (cold install, bench/fixture-large):
    default       median 984 ms,  stdev 371 ms (incl. 4 network outliers)
    greedy-fusion median 925 ms,  stdev  40 ms
    bun           median 853 ms,  stdev 185 ms
  All fusion gates PASS for the default arm: hard ≤1500 ms, stretch
  ≤1000 ms, stdev ≤500 ms. Default median is 984 ms which is in the
  bench README's "all good" band of 850-1050 ms. The +225 ms paired
  delta vs the explicit greedy-fusion arm is most likely round-robin
  position bias (default runs first per outer iter and doesn't
  benefit from the warmer npm DNS/TCP state that the second arm sees);
  the dispatch logic is identical post-flip.

Pre-existing follow-ups (not addressed here):
- `Lockfile::new()` hardcodes `resolved_with: "pubgrub"` even when
  greedy-fusion writes the lockfile. This was already wrong post-W4
  whenever `LPM_RESOLVER=greedy` was set; the flip widens the bug to
  the default path. Worth a follow-up that records the actual resolver
  used at lockfile-write time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `cargo clippy --workspace --all-targets -- -D warnings` (the lint
job invocation, which my local pre-push gate missed by running plain
`--workspace`) caught `await_holding_lock` on the env-lock guards I
added in the previous commit. The 3 PubGrub-pinned resolver tests
hold `let _env = env_lock().lock().unwrap()` across the `.await` on
`resolve_with_prefetch`, which is unsafe under a single-threaded
runtime and dangerous under any runtime.

Switch the env-mutation lock from `std::sync::Mutex` to
`tokio::sync::Mutex` (async-aware), mirroring the existing pattern in
`crates/lpm-vault/src/sync.rs`. The three test sites get
`.lock().await` instead of `.lock().unwrap()`.

Local re-run of the full CI gate now passes:
  cargo clippy --workspace --all-targets -- -D warnings  → clean
  cargo fmt --check                                       → clean
  cargo nextest run --workspace --exclude lpm-integration-tests
    → 4926/4926 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tolgaergin tolgaergin merged commit e9792f5 into main Apr 29, 2026
3 checks passed
@tolgaergin tolgaergin deleted the default-fusion-resolver branch April 29, 2026 17:18
tolgaergin added a commit that referenced this pull request Apr 29, 2026
Highlights since v0.27.0:

- **Phase 60: `lpm add` source delivery from any registry** (#19).
  Decouples `lpm add` from the LPM-only package identity. `react`,
  `lodash.merge`, `@juggle/resize-observer`, `@private/internal-pkg`
  via `.npmrc` — anything on any registry works. Adds destination-
  side path containment, file-spool tarball download, version-spec
  resolution (dist-tags + semver ranges), and a non-interactive
  simple-path guard. ~44 new tests.

- **Resolver default flipped to greedy-fusion** (#20). `lpm install`
  with no env vars now reaches the fast path (~1s on
  `bench/fixture-large`) directly. PubGrub-with-split-retry remains
  as `LPM_RESOLVER=pubgrub` opt-out; `LPM_GREEDY_FUSION=0` still
  falls back to the legacy walker arm.

- **README benchmarks updated** (#21, #22). New round-robin lpm-vs-bun
  methodology (`bench/scripts/run-readme.sh`) + corrected `bun.lock`
  wipe in `bench/run.sh` give honest apples-to-apples numbers on
  `bench/fixture-large` (266 packages, the Phase 49+ ship-gate
  fixture). Cold install equal-footing: lpm 962ms vs bun 1005ms
  (0.96×); warm/up-to-date/script-overhead/lint/fmt unchanged.

CI gate green on this commit:
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --check: clean
- cargo build --workspace: clean
- cargo nextest run --workspace --exclude lpm-integration-tests: 4926/4926

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 6, 2026
…hness fold, single-writer ownership

Phase 64 findings cluster (#20, #24, #31, #32, #39) plus follow-on
multi-pass linker validation work and the upstream `lpm download`
panic fix.

Linker validation (#20):
- Shared `linker_config::resolve_effective_linker` is the single
  source of truth for the precedence chain (CLI > config.toml >
  LPM_LINKER > package.json > default isolated). Used by both the
  install pipeline validation seam AND the freshness check, so they
  can't drift.
- New `LinkerCli` clap value-enum on `lpm install --linker` clamps
  unknown values at parse time. `LinkerMode::parse_str` + warn-on-
  failure on the env / config.toml surfaces.
- Resolution moved into `run_with_options_under_store_lock` so every
  caller (add, upgrade, dev, migrate, deploy, dlx, install -g, global
  update, doctor) honors the same chain without per-callsite wiring.
- Validation runs BEFORE the up-to-date fast-exit so invalid env
  values fail loud regardless of cache state.

v6 install-hash schema:
- `compute_install_hash_v6(pkg, lock, file_link, linker_mode)` folds
  the resolved linker mode into the SHA-256.
- `.lpm/install-hash` gains an `l:<isolated|hoisted>` line so the
  mtime fast path detects post-install env/config linker flips
  without re-hashing.
- Schema bump v5 -> v6 invalidates stale caches once.

Single-writer ownership of `.lpm/install-hash`:
- New `write_post_install_v6_hash` helper called from the empty-deps
  short-circuit AND the canonical end-of-function path; freshness
  stays consistent across exit paths.
- New `materialize_empty_install_artifacts` writes `lpm.lock` /
  `lpm.lockb` / `.gitattributes` and creates `node_modules/` for the
  empty-deps case so a freshly-installed empty-deps project actually
  satisfies the freshness gate on the next probe.
- `dev.rs::auto_install_if_stale` no longer writes a competing bare-
  hash post-install (was clobbering v6 mtime/linker metadata).
- Single mtime probe in `check_install_state_with_linker` (was
  probed twice on the stale path).
- `tracing::warn!` on hash-write failure (was silent `let _ = ...`).

Other Phase 64 fixes:
- #24: `LPM_CONCURRENT_DOWNLOADS` warns on invalid input + documents
  the default. Stale "16-wide" comment generalized.
- #31: `lpm download --json` envelope adds `tarball_url`, `integrity`
  (SRI string), canonicalized absolute `output_dir`.
- Workspace test helper scrubs `LPM_LINKER` and
  `LPM_CONCURRENT_DOWNLOADS` so a developer's exported state can't
  pollute test results.

Upstream `lpm download` panic fix:
- Pre-existing on main: `lpm download <pkg>` panicked with "Mismatch
  between definition and access of \`version\`" because the global
  `--version: bool` flag collided with the `Download.version:
  Option<String>` subcommand arg.
- Removed `long = "version"` from the global flag (now `-V` / `-v`
  short-only). Renamed Rust field `version` -> `version_flag`.
- New `argv_requests_top_level_version` pre-clap argv check handles
  the bare `lpm --version` case before clap parses anything.
- Subcommand `version` fields renamed to `package_version` with
  `#[arg(long = "version")]` keeping the user-facing flag intact.
- Same root cause was breaking `lpm info <pkg> --version <v>` too;
  both fixed.

Tests added:
- 8 linker workflow tests (CLI flag rejection, package.json rejection,
  env var rejection, sync-fast-lane invalid-linker fail-loud,
  freshness-cache-invalidates-on-LPM_LINKER-flip, migrate-honors-env,
  empty-deps-writes-v6, empty-deps-second-install-up-to-date).
- 2 download/json_output workflow tests (version-flag + canonicalize-
  output_dir contract).
- 4 in-crate parser tests for the version-flag collision fix.
- 1 dev composed test (`#[tokio::test]` + ScopedEnv isolation,
  invokes auto_install_if_stale and asserts v6 shape on disk).
- 2 install_state unit pins (linker fold + mtime fast path bails on
  stored-linker mismatch).
- 4 linker_config unit tests using the existing process-wide
  `test_env::ScopedEnv` env-mutation lock.

CI gate: clippy clean (-D warnings), fmt clean, build clean,
nextest 5600/5600 pass / 7 skipped, lpm-auth deterministic across
two parallel runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant