Skip to content

docs(dev-flow): Phase 5 live bake \u2014 items #1 #2 #4#51

Merged
githubrobbi merged 7 commits into
mainfrom
test/phase-5-preview-bake
Apr 24, 2026
Merged

docs(dev-flow): Phase 5 live bake \u2014 items #1 #2 #4#51
githubrobbi merged 7 commits into
mainfrom
test/phase-5-preview-bake

Conversation

@githubrobbi

Copy link
Copy Markdown
Collaborator

What

Scratch PR that doubles as the live validation bench for three Phase 5 checklist items currently unticked in @/Users/rnio/Private/Github/UltraFastFileSearch/docs/architecture/dev-flow-implementation-plan.md:1904-1924:

# Item Priority
1 Label-trigger path — preview-binaries label fires the preview workflow normal
2 Same-SHA integrity — manifest.git_sha + per-file sha256 match PR head normal
4 Pre-fast-gate enforcement — deliberate PR Fast CI failure blocks preview 🔴 Critical

Bake sequence on this PR

  1. Commit 1 (already pushed): plan-doc ticks for items Bump the cargo group across 1 directory with 2 updates #1, Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture #2, Fix Drive D/S parity: remove premature metrics + child sorting #4 (lands only if bake passes).
  2. Wait for PR Fast CI / required → green.
  3. Apply preview-binaries label → preview workflow triggers on labeled event. Validates Bump the cargo group across 1 directory with 2 updates #1.
  4. Preview pipeline runs: gateverify-pr-fast-green (passes, since PR Fast CI is green) → build-windows + build-test-archivesmoke-windows (executes nextest archive on windows-latest) → manifest.
  5. Download manifest-<sha> + windows-preview-<sha> artifacts. Verify manifest.git_sha == PR head SHA and files[].sha256 == sha256sum(downloaded). Validates Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture #2.
  6. Push commit 2: temporary sabotage (exit 1 first step of fmt job in pr-fast.yml). PR Fast CI / required goes red on the new SHA.
  7. Preview workflow re-triggers on synchronize (label still applied). verify-pr-fast-green polls check-runs for the new SHA, detects PR Fast CI / required = failure, aborts preview before build-windows / build-test-archive start. Validates Fix Drive D/S parity: remove premature metrics + child sorting #4 🔴.
  8. Remove label, push commit 3: revert the sabotage. PR Fast CI / required returns to green.
  9. Squash-merge. Only the plan-doc diff lands on main (sabotage and revert cancel).

What does NOT get validated here

Close / do-not-merge contract

If ANY step of the bake fails unexpectedly (e.g. preview builds succeed on the sabotaged commit — which would be the catastrophic false-success case for #4), I close this PR without merging, revert nothing, and open a bug report against preview-artifacts.yml.

Checklist

…s PR)

Preemptively mark the Label-trigger / Same-SHA integrity / Pre-fast-gate enforcement validations in \u00a710.3 Phase 5 checklist as baked.  This commit lands only if the live bake on this PR succeeds \u2014 if any check fails, the ticks get reverted before the PR is merged.

Bake plan (executed on this PR):

1. This commit (docs-only) runs `PR Fast CI` \u2192 green.

2. Apply `preview-binaries` label \u2192 validates item #1 (trigger wiring).

3. Wait for preview workflow to complete \u2192 download `manifest-<sha>` + `windows-preview-<sha>` artifacts \u2192 verify `manifest.git_sha == PR head SHA` and `files[].sha256 == sha256sum(downloaded)` \u2192 validates item #2 (integrity).

4. Push a temporary sabotage commit (`exit 1` in `fmt` job of `pr-fast.yml`) \u2192 `PR Fast CI / required` goes red \u2192 preview workflow re-triggers on synchronize \u2192 `verify-pr-fast-green` detects the failure and aborts preview before Windows runner minutes are spent \u2192 validates item #4 (\ud83d\udd34 Critical gate enforcement).

5. Revert the sabotage; CI returns to green; PR merged.

Item #3 downgraded from deferred to partial-satisfied: preview\\s own `smoke-windows` job already does the archive round-trip on `windows-latest` against the pinned SHA.  The bullet now tracks the remaining external-box verification gap only.
@githubrobbi githubrobbi added the preview-binaries Apply to trigger opt-in Windows preview artifact build + nextest smoke label Apr 24, 2026
Bug caught by the live Phase 5 bake on this scratch PR.

Root cause: `cargo nextest --version` on 0.9.132 emits multiple lines where awk $2 evaluates to `0.9.132` on more than one of them.  The `$(...)` command substitution preserves the inner newline, producing a multi-line value that GitHub Actions's output-file parser rejects with:

    Error: Unable to process file command `output` successfully.

    Error: Invalid format `0.9.132`

Fix: `awk 'NR==1 {print $2}'` \u2014 restrict processing to the first line so no banner / upgrade-notice / self-check line nextest might add later can pollute $GITHUB_OUTPUT.

Regression-guard comment added inline so the failure mode stays legible and the fragility is not silently reintroduced by a future refactor.

Also adds a \u00a710.5 Deviations log entry documenting the discovery and fix.  The same PR that surfaces the bug also lands the Phase 5 validation ticks in \u00a710.3 once the re-bake passes.
Second bug caught by the live Phase 5 bake on this scratch PR.

Root cause: the polling cap (60 \u00d7 10 s = 10 min, `timeout-minutes: 12`) was calibrated implicitly for the docs-only / short-circuited case where PR Fast CI finishes in under 2 min.  On any full-matrix PR (infra-change or rust-change) the `tests` job alone runs 10\u201315 min cold plus `test-build` sequentially before it, pushing the aggregator completion to minute 20\u201325.  The poller keeps seeing `status=missing` because `PR Fast CI / required` is not yet registered as a check-run, and at retry 60 it fails with `\u23f1\ufe0f  Timed out waiting for PR Fast CI / required` \u2014 a **false negative**: the PR would have gone green 5 min later.

This was anticipated in the plan\\s Phase 5 notes ("if PR-fast is slower than 10 min, increase the cap in one commit") but nobody had exercised it against a real full-matrix PR before today.

Fix: bump to 120 \u00d7 15 s = 30 min polling, `timeout-minutes: 32`.  Factored the magic numbers into `MAX_RETRIES` / `RETRY_DELAY_MS` constants so the next recalibration is a one-line bump rather than a hunt through an inline loop.  Expanded the job header comment with: (a) the 2026-04-23 incident reference, (b) a guardrail against dropping the budget below p99 PR Fast CI wall-clock without adding explicit queue-awareness.

\u00a710.5 Deviations log gains a second entry for this.  \u00a710.3 Phase 5 Notes updated: the stale "10 minutes" claim now reads "120 \u00d7 15 s = 30 min" with a cross-ref to \u00a710.5.
Live bake on this scratch PR surfaced a deeper upstream blocker than any of the three fixes landed here:

  * `cargo nextest archive` defaults to debug profile

  * Debug xcompile to x86_64-pc-windows-msvc produces a ~5.5 GB `polars-ops` rlib

  * That rlib exceeds the COFF archive format\\s string-table offset capacity

  * `lld-link` (and likely native `link.exe`) dies with "truncated or malformed archive"

Root-cause + fix recipe already live at `docs/xwin-msvc-rlib-size-root-cause-and-workarounds.md` (dedicated `xwin-dev` profile + per-package polars overrides).  Being worked on a concurrent branch.

Impact on Phase 5 validation bake:

- Item #1 (Label-trigger path): still \u2713 \u2014 preview workflow correctly triggers on `labeled` event and the `gate` + `verify-pr-fast-green` jobs both ran against the pinned PR head SHA before the pipeline failed downstream.  Trigger wiring is proven.

- Item #2 (Same-SHA integrity): reverted to \u274c with a blocker note pointing at the polars issue + \u00a710.5 log entry.  Can\\t validate manifest integrity without a completed preview pipeline.  Re-bake on the next preview run after the polars fix lands.

- Item #3 (Nextest round-trip): "partially satisfied by smoke-windows" claim softened to "will be, once the polars blocker is resolved" \u2014 smoke-windows depends on build-test-archive.

- Item #4 (Pre-fast-gate enforcement, \ud83d\udd34 Critical): UNaffected by the polars blocker.  Validated separately via the sabotage commit that follows this one on the same PR.

Plan updates:

- \u00a710.3 Phase 5 checklist: items #2 and #3 revised as above.

- \u00a710.5 Deviations: new entry consolidating the investigation (proximal lib.exe error \u2192 xwin subcommand gap \u2192 polars-rlib ceiling) and documenting why my attempted "move build-test-archive to windows-latest" was rolled back (it would have shifted the failure mode to a polars-rlib error instead of a lib.exe error, not actually fixed anything, and would have collided with the concurrent branch\\s xwin-centric direction).

- \u00a710.6 Active: prepended the polars blocker as the top-priority active item.
Adds `- run: exit 1` as the first step of the `file-size` job so `PR Fast CI / required` goes red on the pinned SHA.  The preview workflow should re-trigger on synchronize (preview-binaries label still applied) and `verify-pr-fast-green` should correctly detect the failed aggregator and fail the preview at the gate \u2014 before `build-windows` / `build-test-archive` / `smoke-windows` start.  That\\s the \ud83d\udd34 Critical Phase 5 item #4 validation.

REMOVE-BEFORE-MERGE.  Next commit on this branch is the revert.  Squash-merge cancels both.
@githubrobbi githubrobbi removed the preview-binaries Apply to trigger opt-in Windows preview artifact build + nextest smoke label Apr 24, 2026
Replaces the preemptive placeholder note with the actual evidence from the live sabotage bake on this same PR:

- Sabotage target: `file-size` job (not `fmt` \u2014 fmt doesn\\t run on infra-only changes due to its `if: rust=true` gate).

- Sabotaged SHA: 0600ce6.

- `PR Fast CI / required` = FAILURE on that SHA.

- `verify-pr-fast-green` detected the red aggregator at poll retry 48/120 and set `core.setFailed`.

- Downstream `build-windows` / `build-test-archive` / `smoke-windows` / `manifest` all correctly stayed `skipped` \u2014 zero Windows runner minutes spent on a red PR.

This is precisely the \ud83d\udd34 Critical behavior the gate was designed for.  The previous commit\\s revert undoes the sabotage so the file-size policy check returns to normal.
@githubrobbi githubrobbi marked this pull request as ready for review April 24, 2026 04:32
@githubrobbi githubrobbi merged commit b9a67f2 into main Apr 24, 2026
19 checks passed
@githubrobbi githubrobbi deleted the test/phase-5-preview-bake branch April 24, 2026 04:32
githubrobbi added a commit that referenced this pull request Apr 24, 2026
… spawn (bug #4)

Surfaced by the preview re-bake on PR #52 (run 24873105115, SHA dbdbbb7).  `build-windows` failed with:

    winresource: failed to embed icon + manifest:

    Os { code: 2, kind: NotFound, message: "No such file or directory" }

from `crates/uffs-cli/build.rs:106`.

## Root cause

`winresource v0.1.31` at `src/lib.rs:735-736` hardcodes `PathBuf::from("llvm-rc")` on `cfg(unix)` and spawns it unqualified.  `cargo-xwin` wires MSVC CRT/SDK env but does NOT prepend any LLVM `bin/` dir to PATH.  On ubuntu-22.04 runners `llvm-rc` is preinstalled but lives at `/usr/lib/llvm-<N>/bin/llvm-rc` (not default PATH).

Net: `res.compile()` spawns `"llvm-rc"` → `execvp` → ENOENT → panic.

## Why this is bug #4 in the same log jam

Three bugs prevented the preview lane from reaching `build-windows` on prior runs:

  #1. nextest multi-line output → `$GITHUB_OUTPUT` parse error  (PR #51)

  #2. 10-min polling budget → `verify-pr-fast-green` false-negative  (PR #51)

  #3. `build-test-archive` on ubuntu-22.04 + xwin gap → `lib.exe` NotFound  (this PR, earlier commit)

  #4. `winresource` hardcoded `llvm-rc` + missing PATH → ENOENT  (this commit)

Each bug masked the next.  Bug #4 stayed latent because bug #3 always aborted the preview before `build-windows` finished compiling polars-ops to reach the `uffs-cli` build.rs call.

## Fix

Added a `Locate llvm-rc` step to `build-windows` that scans `/usr/lib/llvm-*/bin/llvm-rc`, picks the highest-versioned match via `sort -V | tail -1`, and exports `RC_PATH` to `$GITHUB_ENV`.  `winresource` honors `RC_PATH` at `lib.rs:733-734` ahead of the hardcoded fallback, so no crate patch needed.

Version-robust against runner image bumps: if LLVM 15 is replaced with 16 tomorrow, the sort still picks up the new binary.

## Docs

Added §10.5 row (2026-04-24) for bug #4 with full crate/runner anchoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant