feat: unified install integrity — .mcpp_ok marker + auto-cleanup#73
Merged
Conversation
Unified mechanism for detecting and recovering from interrupted installs (Ctrl+C, network failure, kill -9). Applies to all package types: toolchains, bootstrap tools, and modular libraries. src/fallback/install_integrity.cppm: - is_install_complete(): check .mcpp_ok marker or backward-compat heuristic - mark_install_complete(): write .mcpp_ok after verified install - clean_incomplete_install(): remove directory if incomplete - clean_all_incomplete(): scan xpkgs/ and clean all residue resolve_xpkg_path() now: 1. Check complete (marker or heuristic) → use 2. Clean incomplete residue → install → mark complete 3. Install failed → clean residue → copy fallback → mark complete 4. All failed → clear error with hint mcpp self init now scans and cleans incomplete xpkgs. Bootstrap tools (patchelf/ninja) get .mcpp_ok after ensure.
1. is_install_complete: handle mcpplibs nested layout (single subdir with src/ or mcpp.toml), prevents false deletion of old packages 2. copy_from_global: check return value and verify completeness before marking .mcpp_ok; clean partial copies on failure 3. Restore !inst error propagation in resolve_xpkg_path — don't mask xlings launch/protocol errors behind generic "payload missing" 4. Per-command bootstrap gating: check_base_init is deferred to get_cfg(requireBootstrap=true) in build/run/toolchain-install, not run globally in load_or_init. Light commands (self env, toolchain list) skip the check.
1. toolchain install now calls check_base_init() before proceeding, failing early if patchelf/ninja bootstrap is incomplete 2. Preserve original xlings install error (exit code + message) in final error instead of masking with generic "payload missing" 3. One-time legacy migration: migrate_legacy_installs() scans xpkgs and writes .mcpp_ok markers for old complete packages on first run. is_install_complete() still has legacy fallback for un-migrated packages but migration ensures it's rarely needed. 4. Clean trailing whitespace in .agents/docs/*.md
Round-3 review fixes: 1. Bootstrap marker: mark_install_complete() only called after verifying the actual binary (bin/patchelf, bin/ninja) exists, not just after ensure_*() returns (which may have failed). 2. Remove automatic migrate_legacy_installs() from load_or_init(). Heuristic-based marker writing could stamp half-extracted packages as complete. Legacy heuristic remains in is_install_complete() as read-only fallback (won't delete old packages), but .mcpp_ok is only written on explicit success paths or via `mcpp self init`.
Round-4 review fixes: 1. clean_incomplete_install() now uses marker-only check: - Has .mcpp_ok → keep (verified install) - No marker but looks_complete_legacy → keep (pre-upgrade package) - No marker, no legacy content → clean (genuinely incomplete) This prevents half-extracted packages that happen to have bin/lib from escaping cleanup. 2. Remove migrate_legacy_installs() — it was dead code (declared but never called). The legacy fallback in is_install_complete() handles old packages read-only without writing markers.
Round-4 review fix: clean_incomplete_install() now uses STRICT marker-only semantics. Used on the resolve/install path for the CURRENT target — absence of .mcpp_ok unambiguously means the install attempt was incomplete. A half-extracted dir with bin/ would otherwise escape cleanup and corrupt subsequent installs. clean_all_incomplete() (global scan via `mcpp self init`) keeps the legacy-aware behavior: packages without marker but with legacy content dirs are preserved for backward compatibility with pre-upgrade installs. is_install_complete() retains the legacy fallback for read-only compat in resolve_xpkg_path() — old packages are recognized as usable, but this doesn't shield them from explicit cleanup on the install path.
Round-5 review fix: is_install_complete() is now strict marker-only. No more legacy heuristic fallback on the resolve/install path. Rationale: from directory layout alone we cannot distinguish a legacy-complete install (bin/ exists, full) from a half-extracted residue (bin/ exists, partial). Adopting the latter silently corrupts the user's toolchain. Strict semantics close this gap. Cost: upgrade users do a one-time reinstall per toolchain. In practice this hits the fast copy_xpkg_from_global() fallback that reuses ~/.xlings/, so it's rarely a real download. clean_all_incomplete() (mcpp self init) still preserves legacy packages (no marker + legacy layout) as user-visible assets — that's a separate concern from the resolve path's strictness. looks_complete_legacy() is now exported for explicit legacy-aware call sites (currently only clean_all_incomplete uses it).
Round-6 review fix:
After Round-5 made is_install_complete() strict marker-only, the copy
fallback path broke:
bool copyOk = copy_xpkg_from_global(verdir);
if (copyOk && is_install_complete(verdir)) { // ← always false
mark_install_complete(...); // never reached
return make_payload();
}
clean_incomplete_install(verdir); // ← wipes the copy
copy_xpkg_from_global() doesn't (and can't) write .mcpp_ok in the
copied directory, so the marker-only check would always fail, and the
just-copied package would be immediately wiped, returning "xpkg payload
missing".
Fix: validate the copied content via looks_complete_legacy() (the
structural heuristic) before writing the marker. This is safe in this
context because:
1. Step 2 of the resolve chain already cleaned any pre-existing
residue using strict marker-only semantics — so anything at
verdir now MUST be the result of our just-completed copy.
2. copy_xpkg_from_global() only returns true on a clean recursive
copy (no partial copies reach this branch).
3. The heuristic validates that the source actually had content
(rules out copying from an empty/broken global xlings dir).
This restores the documented "copy_xpkg_from_global is the typical fast
fallback" behavior that Round-5 unintentionally broke.
Round-7 review fix:
Previously the copy fallback ran after ANY xlings install failure
(exitCode != 0), copying whatever was in ~/.xlings/ and validating
with looks_complete_legacy(). That heuristic only checks for top-level
bin/lib/include/share — a half-extracted residue in the GLOBAL xlings
directory (which we cannot clean) would pass this check and get
marked as complete, permanently masking the broken install.
Fix: split into three branches based on what xlings reported.
- exitCode == 0 && verdir exists → normal success, mark complete
- exitCode == 0 && verdir missing → XLINGS_HOME propagation bug;
this is the ONLY scenario where
we trust the global location
enough to fall through to copy
- exitCode != 0 → genuine install failure;
propagate the original error
without trying global copy
(global may also be residue
from the same failure, and
looks_complete_legacy can't
tell them apart)
Also clarifies the autoInstall=false branch: still allow copy from
global if the user previously installed via system xlings (no install
attempt to confuse the state).
The autoInstall=false branch was performing copy_xpkg_from_global recovery without a "this session's xlings install reported success" witness, falling outside the safety boundary established in round-7. Currently no caller passes autoInstall=false, so this is a no-op cleanup that removes a future foot-gun: anyone adding such a caller would inadvertently re-introduce the "half-extracted residue marked as complete" window. Semantic: when the caller explicitly disables auto-install, do not perform any implicit recovery — return "payload missing" so the caller (and the user) sees the truth instead of a silently-recovered possibly-broken package.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unified mechanism for detecting and recovering from interrupted installs. Fixes the "Ctrl+C then permanently stuck" problem for all package types.
How it works
.mcpp_okmarker file written after every verified successful installis_install_complete()is strict marker-only — no legacy heuristic fallback on the resolve/install pathclean_incomplete_install()is also strict marker-only — any directory without.mcpp_okis treated as residue from an interrupted install~/.xlings/, the content is validated vialooks_complete_legacy()(structural check) before writing the markerclean_all_incomplete()(used bymcpp self init) preserves legacy packages (no marker but valid layout) — user-visible asset protection only, separate from the install pathTrade-offs
copy_xpkg_from_globalfrom~/.xlings/(no real download)mcpp self init: preserves legacy packages so users don't lose themChanges
src/fallback/install_integrity.cppm— unified integrity APIresolve_xpkg_path()— strict marker check → clean residue → install → copy fallback (with content validation)mcpp self init— scans and cleans incomplete xpkgs (legacy-aware)ensure_patchelf/ensure_ninja) — verify actual binary before marking completetoolchain install— explicit bootstrap integrity check at entryUser experience
Test plan
self initcleans incomplete, keeps complete legacytoolchain installwrites.mcpp_okmarker after successbuild + runworks normallytoolchain installfails early if bootstrap is incomplete