fix: first-run breakage (#559, #561) + #560 platform-aware diagnosis#562
Open
ruvnet wants to merge 3 commits into
Open
fix: first-run breakage (#559, #561) + #560 platform-aware diagnosis#562ruvnet wants to merge 3 commits into
ruvnet wants to merge 3 commits into
Conversation
…gnosis Three related fixes — a fresh-clone user hitting any of these would conclude the project doesn't work; #557's "feels like mock" narrative is fed in part by these breakages. ## #559 — `./verify` pointed at removed `v1/` paths The wrapper hard-coded `v1/data/proof` / `v1/src`, but the proof scripts moved to `archive/v1/` long ago. A fresh clone failed before the pipeline could even run. User `Fewmanism` provided the exact diff in the issue. Applied verbatim across four hits (PROOF_DIR, V1_SRC, the Phase 3 scan-message, and the SKIP-state recovery hint). ./verify # now PASS end-to-end ## #561 — firmware README would misflash and point at the wrong provisioner Two real bring-up bugs: 1. Manual flash command put the app at `0x10000`. The partition tables (`partitions_display.csv`, `partitions_4mb.csv`) define `ota_0` at `0x20000`. `0x10000` is the start of `phy_init` data — flashing the app binary there would corrupt the PHY init data and the app would never run. The QEMU section already had the right `0x20000`, so this was an internal contradiction. Both occurrences fixed. Also added `0xf000 ota_data_initial.bin` to the manual flash command — the release bundle ships this binary and without it the bootloader can refuse to boot after a factory wipe. 2. `python scripts/provision.py` referenced the wrong file. There are actually TWO `provision.py` files in the repo (`scripts/` — 275 lines, stale; `firmware/esp32-csi-node/` — 348 lines, has the issue #391 full-replace semantics fix). The canonical one is in the firmware dir. Both README occurrences fixed to point at the canonical path. (The stale `scripts/provision.py` is a separate cleanup; the historical ADRs that reference it are intentionally not touched.) ## #560 — proof hash mismatches on macOS arm64 / Accelerate User `Fewmanism` reports that with the same pinned `numpy 1.26.4` / `scipy 1.14.1` on macOS arm64, the proof's SHA-256 differs from the published expected hash. The proof passes on linux-x86_64 and windows-x86_64 (where wheels ship OpenBLAS); it mismatches on darwin-arm64 (where numpy/scipy use Accelerate.framework). That is not a code bug — Accelerate's FFT and BLAS produce bit-different output on identical IEEE 754 inputs from the same backend, and the proof's bit-exact contract therefore cannot hold across backends. What this commit changes: - `verify.py` now prints a RUNTIME ENVIRONMENT block before the pipeline runs: platform, machine, Python version, numpy BLAS backend. Users on a non-reference backend see the cause up front. - The FAIL message reorders causes: platform BLAS/FFT backend is now the *primary* suspect (not "unlikely"), with a pointer to the printed RUNTIME ENVIRONMENT block. - New `archive/v1/data/proof/REFERENCE_PLATFORMS.md` documents the reference platforms (linux-x86_64 + windows-x86_64 with OpenBLAS), the expected-MISMATCH platforms (darwin-arm64 with Accelerate, any MKL install), and three workable responses for users hitting a non-reference backend (run on a reference platform, generate a local-reference hash, or use tolerance-based comparison — that last one is the roadmap path). This converts #560 from "the proof is broken on my Mac" to "the proof has a documented single-backend contract". ## Verification - `./verify` (Windows x86_64 / OpenBLAS): VERDICT PASS, hash `8c0680d7…51c6` matches expected. RUNTIME ENVIRONMENT block prints numpy BLAS = `scipy-openblas`. - `grep -E '0x10000|scripts/provision\.py' firmware/esp32-csi-node/README.md`: no matches. Co-Authored-By: claude-flow <ruv@ruv.net>
This was referenced May 14, 2026
|
Thank you for the quick and thorough follow-up. I appreciate you applying the verify fix, clarifying the ESP32-S3 flashing/provisioning docs, and documenting the macOS arm64 BLAS/hash behavior. |
Same drift as #559 but in CI: the workflow ran `working-directory: v1` on the two verify steps, but the Python codebase moved to `archive/v1/` ages ago. The job failed with: An error occurred trying to start process '/usr/bin/bash' with working directory '/home/runner/work/RuView/RuView/v1'. No such file or directory Fixed both occurrences (working-directory: v1 -> working-directory: archive/v1). Also added `SECRET_KEY` env var to both steps — `verify.py` transitively imports `src.app` -> `src.config.settings` (since PR #547 introduced pydantic-settings with a required `secret_key` field). The value is never used for any auth path in the proof pipeline; it just needs to satisfy the import chain. Same env-var workaround used locally to make `./verify` pass. After this commit, "Verify Pipeline Determinism (3.11)" should go green on this PR. Co-Authored-By: claude-flow <ruv@ruv.net>
| # uses pydantic-settings with a required `secret_key` field. The proof | ||
| # only needs the import chain to resolve; the value is never used for | ||
| # any auth path in the proof pipeline. | ||
| SECRET_KEY: ci-proof-replay-only-not-a-real-secret |
| working-directory: v1 | ||
| working-directory: archive/v1 | ||
| env: | ||
| SECRET_KEY: ci-proof-replay-only-not-a-real-secret |
Two real bugs found while pushing the v0.8.0 image to Docker Hub: ## Rust 1.85 -> 1.90 `hnsw_rs 0.3.4` (transitive via wifi-densepose-ruvector -> ruvector-attn-mincut -> hnsw_rs) calls `nbp.is_multiple_of(500_000)`. `is_multiple_of` on unsigned integers was stabilised in Rust 1.87 (rust-lang/rust#128101 — RFC 3565). On 1.85 the compile fails with: error[E0658]: use of unstable library feature `unsigned_is_multiple_of` --> hnsw_rs-0.3.4/src/hnswio.rs:736:20 Pinned to 1.90 for reproducibility — a comment in the Dockerfile flags the 1.87 MSRV requirement so a future downgrade can't quietly break it. ## .gitattributes — force LF on shell scripts + Dockerfile Without a `.gitattributes`, git's default `core.autocrlf=true` on Windows converts shell scripts to CRLF on checkout. `COPY`ing `docker/docker-entrypoint.sh` into a Linux image then preserves CRLF. The shebang line `#!/bin/sh\r\n` causes `exec /app/docker-entrypoint.sh` to fail with: exec /app/docker-entrypoint.sh: no such file or directory The kernel tries to look up an interpreter literally named `/bin/sh\r`, which doesn't exist. Container exits immediately. The first v0.8.0 image push (digest sha256:7957…44fa) suffered exactly this; the re-pushed image (digest sha256:e9f4…d38315) was built on a renormalised tree. The .gitattributes rule forces LF for: - *.sh / *.bash - Dockerfile* - docker/* (covers docker-entrypoint.sh + docker-compose.yml) - scripts/* - `verify` (the proof-replay wrapper — same root cause as if it had landed CRLF in someone's clone) Binary file globs (*.bin, *.wasm, *.rvf, *.pcap, etc.) explicitly marked binary so text-normalisation never touches them. ## CHANGELOG — drop the false `--introspection` flag claim The CHANGELOG entry for v0.8.0 said the introspection endpoints were "off by default, enabled via `--introspection`". That isn't true: `sensing-server --help` has no such flag. The routes are mounted unconditionally in `main.rs`. The per-frame `update()` p99 of 0.041 ms (~24× under D4's 1 ms budget) makes always-on viable; the "off by default" framing came from an earlier draft of ADR-099 that the implementation outgrew. Corrected. ## Verification End-to-end smoke test of the pushed image: docker run -d -p 13000:3000 -e CSI_SOURCE=simulated -e SENSING_BIND_ADDR=0.0.0.0 ruvnet/wifi-densepose:v0.8.0 /health -> {"status":"ok","source":"simulated",...} /api/v1/info -> {"backend":"rust","features":{"ruvector":true,"signal_processing":true,...}} /api/v1/introspection/snapshot -> {"regime":"unknown", "regime_changed":false,"top_k_similarity":[]} (ADR-099 shape exact) /ui/observatory.html -> HTTP 200, 15 KB Published manifest digests: ruvnet/wifi-densepose:v0.8.0 -> sha256:e9f4c5af…d38315 ruvnet/wifi-densepose:latest -> sha256:e9f4c5af…d38315 Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three real first-run breakages reported in the last few hours. Fresh-clone users hitting any of these would conclude the project doesn't work — and that perception substrate is what's feeding the "feels like mock" narrative in #557.
What's fixed
#559 —
./verifypointed at removedv1/pathsWrapper hard-coded
v1/data/proof/v1/srcbut the proof scripts moved toarchive/v1/long ago. Fresh clone failed before the pipeline ran. Reporter (@Fewmanism) provided the exact diff in the issue; applied verbatim across all four hits../verify # now end-to-end PASS#561 — firmware README would misflash and point at the wrong provisioner
0x10000. Partition tables (partitions_display.csv,partitions_4mb.csv) putota_0at0x20000.0x10000isphy_init— flashing the app there would corrupt PHY data. Both occurrences fixed; added0xf000 ota_data_initial.binwhich release bundles ship.python scripts/provision.py. There are twoprovision.pyin the repo —scripts/provision.py(275 lines, stale) andfirmware/esp32-csi-node/provision.py(348 lines, has the provision.py: esptool v5 incompat + NVS partition wipes existing keys when partial update #391 full-replace fix). README updated to point at the canonical one. The stale duplicate is a separate cleanup.#560 — proof hash mismatches on macOS arm64 / Accelerate
@Fewmanismreports that with pinnednumpy 1.26.4/scipy 1.14.1on macOS arm64, the proof's SHA-256 differs. Root cause: numpy/scipy use Accelerate.framework on darwin-arm64 and OpenBLAS on linux/windows x86_64. Accelerate's FFT + BLAS produce bit-different IEEE 754 output. That is not a code bug — the proof's bit-exact contract cannot hold across BLAS backends.What this PR changes:
verify.pynow prints a RUNTIME ENVIRONMENT block before the pipeline runs (platform, machine, Python version, numpy BLAS backend).archive/v1/data/proof/REFERENCE_PLATFORMS.mddocuments the reference platforms (linux/windows x86_64 with OpenBLAS), the expected-MISMATCH platforms (darwin-arm64 with Accelerate, any MKL install), and three workable responses.Converts #560 from "the proof is broken on my Mac" → "the proof has a documented single-backend contract".
Verification
./verifyon Windows x86_64 / OpenBLAS — VERDICT PASS, hash8c0680d7…51c6matches expected. RUNTIME ENVIRONMENT block printsnumpy BLAS: scipy-openblas.grep -E '0x10000|scripts/provision\.py' firmware/esp32-csi-node/README.md— no matches.Closes
🤖 Generated with claude-flow