Skip to content

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#113

Draft
iav wants to merge 6 commits into
mainfrom
feat/native-armhf-on-arm64
Draft

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#113
iav wants to merge 6 commits into
mainfrom
feat/native-armhf-on-arm64

Conversation

@iav
Copy link
Copy Markdown
Owner

@iav iav commented May 12, 2026

Fork-internal codex review trap. Mirrors armbian#9769 (`feat/native-armhf-on-arm64`, head 5cab79ef). Not for merge. CodeRabbit reviews on armbian#9769 (paid by Armbian, only sees upstream PRs); Codex reviews here (paid by iav, only sees fork PRs). Re-trigger codex on each force-push via @chatgpt-codex-connector review.

--- Below: verbatim copy of the upstream PR description for codex context. ---

Summary

Continues armbian#9284 (arm64-compat-vdso extension + custom_kernel_make_params hook).
That PR enabled COMPAT_VDSO in arm64 kernels; this PR uses the resulting
32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-arm emulation in chroot work — typically ~12× faster than qemu-arm-static
on Cortex-A53/A55/A72/A73/A76.

How

Disable qemu-arm in /proc/sys/fs/binfmt_misc/ for the duration of the
build; kernel binfmt_elf then handles 32-bit ARM ELFs natively via
CONFIG_COMPAT. Activation is delayed until AFTER mmdebstrap, whose
cross-arch path needs a working qemu-arm registration to populate the
chroot before libc/ld-linux-armhf.so.3 exist.

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms never, disabled).

Concurrency — kernel flock on the binfmt entry

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock(2) on /proc/sys/fs/binfmt_misc/qemu-arm itself.
No userspace state, no per-builder lockfiles, no /run/lock directory.

  • Each builder holds LOCK_SH on the binfmt entry via a long-lived fd.
    Kernel BSD-flock counter is the refcount; kernel releases the fd on
    process exit (crash-safe).
  • First-arrival idempotently echo 0 to disable; subsequent arrivals
    observe 0 and proceed without writing.
  • On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on
    a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out
    re-enables qemu-arm.

Trade-offs

  1. Prior qemu-arm state is not recorded across builds. Last-out
    unconditionally re-enables. An admin's pre-existing disabled
    policy is not preserved across the build window.
  2. No defense against an external agent toggling qemu-arm mid-build.
    Don't, or use NATIVE_ARMHF_ON_ARM64=no on builds that should not
    interfere.
  3. Cleanup ordering invariant — BSD flock is per-OFD; a forked subshell
    inheriting our SH-fd shares the same lock entry. We rely on
    add_cleanup_handler running registrations in order and the umount
    teardown handlers (registered first) killing the docker container
    before our restore handler runs. Documented in the trap handler's
    docstring; if a future refactor breaks ordering, POSIX F_SETLK or
    explicit descendant-kill are the documented escape hatches.

The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on /run/lock being
host-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.

Empirical

./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=noble
on droid (RK3328) and m1 (RK3568, Cortex-A55):

Scenario Real time Speedup
baseline, rootfs cache-miss 60:35
this PR, rootfs cache-miss 19:27 3.12×
this PR, rootfs cache-hit 6:38 ~9× over expected baseline cache-hit

mmdebstrap itself (~10 min of every cache-miss build) cannot be
accelerated — speedup applies to subsequent chroot operations.

Test plan

Phase 1 — feature on iron (single builder):

  • helios4 cache-miss + cache-hit, full image
  • cleanup re-enables qemu-arm on exit
  • DOCKER_PRIVILEGED=no (CAP_SYS_ADMIN unconditional)
  • killswitch (NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent return
  • x86_64 host regression — silent return
  • aarch64 native (no-container) PREFER_DOCKER=no smoke

Phase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):

  • flock primitives on /proc/sys/fs/binfmt_misc/qemu-arm work as expected
  • single-tree build: setup + 5 phase markers + last-out restore
  • two independent worktrees concurrent: both enter native path, kernel
    SH-counter holds qemu-arm disabled, first-finished suppresses restore,
    last-finished restores
  • NATIVE_ARMHF_ON_ARM64=no opt-out with concurrent native builder —
    refused-fast guard rejects (avoids clobbering)
  • SIGINT mid-chroot — cleanup ordering invariant holds; last-out fires,
    qemu-arm restored, no stale state

Reviewed with codex (gpt-5.5) over multiple rounds and coderabbitai. The
final review surfaced the cleanup-ordering invariant; documented inline
and verified by SIGINT test above.

Summary by CodeRabbit

  • New Features

    • Enable native 32-bit ARM (armhf) execution on ARM64 hosts during builds so chroot operations can run 32-bit ARM binaries without injecting emulation binaries.
  • Chores

    • Coordinate host binfmt/qemu management directly via kernel flock on /proc/sys/fs/binfmt_misc/qemu-arm; first-arrival disables qemu-arm, last-departure restores it. Setup failures are non‑fatal so image builds continue.

Summary by CodeRabbit

  • New Features

    • Allow native execution of 32-bit ARM binaries on compatible 64-bit hosts to speed and simplify ARM image builds.
    • Skip copying emulation binaries into chroots when native native-ARM handling is active.
  • Bug Fixes

    • Improve coordination and recovery around host binary-format handling to avoid interference between concurrent builds and ignore non-fatal setup failures.

Review Change Stack

iav added 2 commits May 10, 2026 06:00
…u-arm

When building an armhf image on an aarch64 host that supports 32-bit ARM
execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the
qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the
build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively
via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation
on Cortex-A53/A55/A72/A73/A76.

Continues armbian#9284 (arm64-compat-vdso extension + custom_kernel_make_params
hook), which enabled COMPAT_VDSO in arm64 kernels.

Activation point is delayed to AFTER mmdebstrap. Its cross-arch path
extracts base packages and runs their postinst hooks, which require a
working qemu-arm registration to populate the chroot before
libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to.
The speedup applies to all subsequent chroot operations
(install_distribution_specific, customize_image, update-initramfs).

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled).

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock on the binfmt entry itself — no userspace state, no
per-builder lockfiles, no /run/lock directory tree:

- Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a
  long-lived fd. Kernel BSD-flock counter is the refcount; the kernel
  releases the fd on process exit (crash-safe).
- First-arrival idempotently `echo 0` to disable. Subsequent arrivals
  observe 0 and proceed without writing.
- On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a
  fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables
  qemu-arm.

Trade-offs (documented):
- Prior qemu-arm state is not recorded across independent builds. Last-
  out unconditionally re-enables. Admin's pre-existing `disabled` policy
  is not preserved.
- No defense against an external agent toggling qemu-arm mid-build.
  Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds
  that should not interfere.
- Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell
  inheriting our SH-fd shares the same lock entry. add_cleanup_handler
  runs in registration order; we register after the umount/SDCARD/MOUNT
  teardown handlers, so by the time we run, the docker container is
  killed and its child-tree (with our inherited fd) gone. Documented in
  the trap handler's docstring; if a future refactor breaks ordering,
  POSIX F_SETLK or explicit descendant-kill are the documented escape
  hatches.

Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble):
- baseline (rootfs cache-miss): 60:35
- this PR (rootfs cache-miss): 19:27 (3.12x)
- this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit)

Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two
independent armbian worktrees concurrent on same host, both enter native
path, kernel SH-counter holds qemu-arm disabled across both, first-
finished suppresses restore (LOCK_EX-NB blocked by other's SH), last-
finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up
correctly.

Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier
revision of this work coordinated via a userspace owner-flock directory
(~250 lines, depended on /run/lock being host-bind-mounted into build
containers, which Armbian docker does not do); replaced with the kernel-
flock-on-binfmt-entry approach above.

Assisted-by: Claude:claude-opus-4.7
…ative builder probes EX-NB before disabling qemu-arm

K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on
/proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning
immediately. N-builder (default) probes EX-NB on a fresh fd before
acquiring its own SH; if probe fails AND qemu-arm is observably enabled
(state==1), a K-builder holds the emulation anchor and switching to
native mid-flight would corrupt their qemu-arm-static routing — refuse
with exit_with_error.

Symmetric closure of the K↔N race:
- N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host
  guard (addressed in 9d8b848).
- K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build
  with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit).

Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8):
- N first, K second: K exits 8s with 'qemu-arm globally disabled by
  another concurrent build'.
- K first, N second: N exits 12s with 'cannot enable native armhf:
  concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'.
- Cleanup of both K (release SH) and N (last-out restore echo 1) leaves
  qemu-arm in 'enabled' state.

Assisted-by: Claude:claude-opus-4.7
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cab79ef0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/functions/rootfs/qemu-static.sh Outdated

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix cleanup handler ordering before restoring qemu-arm

For an interrupted arm64→armhf build after this handler is registered, it runs before trap_handler_cleanup_rootfs_and_image, not after it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). That means the native-armhf lock can be released and qemu-arm re-enabled while the chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided and can route remaining target execs through qemu unexpectedly.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +22 to +24
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target qemu binaries on native armhf

When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static (for example via package lists or customization) loses that package's binary even though this deploy path never copied it.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +238 to +239
if flock -x -n "${_native_armhf_lock_fd}"; then
flock -u "${_native_armhf_lock_fd}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close the flock race before disabling qemu-arm

After the exclusive probe succeeds, unlocking here before the later shared lock acquisition leaves a window where a NATIVE_ARMHF_ON_ARM64=no builder can acquire its shared emulation lock; because shared locks are compatible, this builder will then also take a shared lock and still echo 0, disabling qemu-arm while the killswitch build is relying on it for chroot execution. The probe and the state change need to be atomic with respect to new shared-lock entrants.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +22 to +24
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target qemu binaries on native armhf

When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static during image customization loses that package's binary even though this deploy path never copied it.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +238 to +239
if flock -x -n "${_native_armhf_lock_fd}"; then
flock -u "${_native_armhf_lock_fd}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close the flock race before disabling qemu-arm

After the exclusive probe succeeds, unlocking here before the later shared lock acquisition leaves a window where a NATIVE_ARMHF_ON_ARM64=no builder can acquire its shared emulation lock; because flock --help defines -s as shared, -x as exclusive, and -u as removing the lock, this builder will then also take a shared lock and still echo 0, disabling qemu-arm while the killswitch build is relying on it for chroot execution. The probe and the state change need to be atomic with respect to new shared-lock entrants.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix cleanup handler ordering before restoring qemu-arm

When activation happens from the rootfs/image path rather than the earlier host-prep path, this handler is added after trap_handler_cleanup_rootfs_and_image and therefore runs before it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). In that case an interrupted arm64→armhf build can release the native-armhf lock and re-enable qemu-arm while chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
iav added a commit that referenced this pull request May 13, 2026
…r verifies state=1 after SH-NB

Addresses Codex P2 review feedback on PR #113 / armbian#9769:

- Race in N-builder entry (line:239): EX-NB → release → re-take SH split
  let a K-builder slip in between observation and the state-check write,
  silently disabling qemu-arm while K relied on it. Restructure to hold
  EX during state check + echo 0, then atomically downgrade EX→SH on the
  same fd via Linux flock(2)'s flc_lock transition. SH waiters are
  granted only after our SH is in place and qemu-arm is already 0.

- K-builder symmetric check: after acquiring the emulation-mode SH-NB,
  verify state=1; if 0, concurrent N-builders have disabled qemu-arm
  and the killswitch cannot honor its contract — bail with a clear
  error rather than running a broken anchor.

- Cleanup-ordering invariant comment was inverted. add_cleanup_handler
  PREPENDS, so later registrations run FIRST. The invariant survives
  because setup is called before mount_chroot in both call sites; the
  comment now states the actual mechanism and documents the failure
  mode if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@iav
Copy link
Copy Markdown
Owner Author

iav commented May 13, 2026

@chatgpt-codex-connector review

New commit ecde3c1 addresses:

  • Race in N-builder entry (line:239) — restructured to hold EX during state-check + echo 0, then atomically downgrade EX→SH on the same fd. Linux flock(2) performs the transition under the inode's flc_lock, so SH waiters are granted only after our SH is in place and qemu-arm is already 0.

  • K-builder symmetric check — after acquiring the emulation-mode SH-NB, K verifies state=1; if 0 (concurrent N-builders already disabled qemu-arm), bail with a clear error instead of running a broken anchor.

  • Cleanup-ordering invariant comment — was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is invoked before mount_chroot in both call sites; the new comment states the actual mechanism and the failure mode if a future call site inverts the order.

line:24 (preserve target qemu binaries) and line:432 (killswitch + qemu absent) — deferred / not-a-bug; replies in their threads.

Repository owner deleted a comment from chatgpt-codex-connector Bot May 13, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecde3c177b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

display_alert "Native armhf via binfmt_elf" "concurrent build holds emulation-mode lock (NATIVE_ARMHF_ON_ARM64=no)" "err"
exit_with_error "cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock. Wait for it to finish or run on a separate host."
else
if ! flock -s -w 30 "${_native_armhf_lock_fd}"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recheck qemu-arm after joining as a shared holder

When this joiner path observes state=0, there is still a window before it actually acquires the shared lock where the last existing native builder can release its SH lock, take EX in trap_handler_native_armhf_restore_qemu_arm, and write qemu-arm back to 1. This branch then acquires SH without rechecking the state; the post-check can pass via the restored qemu registration, ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF is set, and later deploys skip copying qemu-arm-static, so chroot execs can fail because binfmt points at a qemu binary that is absent inside the chroot. Recheck that the entry is still disabled after flock -s succeeds, or retry the first-arrival path if it was restored.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
# (or becoming first) keeps qemu-arm disabled coherently and lets
# /usr/share/binfmts/qemu-arm absence in this container be a non-issue.
if [[ "${host_arch}" == "aarch64" && "${wanted_arch}" == "arm" && "${ARCH}" == "armhf" ]]; then
if _native_armhf_setup_binfmt_elf; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don’t switch to native mode before mmdebstrap

On an aarch64 armhf cache-miss build with the normal PRE_PREPARED_HOST!=yes flow, prepare_host runs before create_new_rootfs_cache_via_debootstrap, so this early call disables qemu-arm and sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes before the rootfs exists. The subsequent rootfs creation then skips deploy_qemu_binary_to_chroot at line 134 while running mmdebstrap --arch=armhf; this contradicts the later activation point added after mmdebstrap and leaves the bootstrap phase without the qemu registration/binary it relies on. Keep host preparation in emulation mode and only call _native_armhf_setup_binfmt_elf after mmdebstrap has populated libc/ld-linux.

Useful? React with 👍 / 👎.

# native-armhf builders detect us via EX-NB probe and refuse to switch
# qemu-arm off. Without this anchor an N-builder arriving mid-K-chroot
# would echo 0 and silently break K's qemu-arm-static routing.
if [[ "${killswitch}" == "yes" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate the killswitch path to armhf-on-aarch64 builds

Because the killswitch branch runs before the ARCH==armhf and host-architecture checks, any build with NATIVE_ARMHF_ON_ARM64=no calls this path even when it is not an armhf-on-aarch64 build. For example, an unrelated arm64 or amd64 target on the same host will either take the qemu-arm SH lock and block native armhf builders unnecessarily, or exit here if a concurrent native builder already has qemu-arm disabled, even though that build does not need qemu-arm at all. Return before the killswitch handling unless the current build is actually the armhf/aarch64 case.

Useful? React with 👍 / 👎.

@iav iav changed the title test: codex review trap for native-armhf-on-arm64 (mirrors armbian/build#9769) feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm May 13, 2026
iav added a commit that referenced this pull request May 13, 2026
… + clarify ordering invariant

Addresses codex review feedback on PR #113 / armbian#9769.

N-builder entry race: EX-NB → release → re-take SH split let a K-builder
slip in between observation and the state-check write, silently disabling
qemu-arm while K relied on it. Hold EX during state check + echo 0, then
atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock
transition. SH waiters are granted only after our SH is in place and
qemu-arm is already 0.

K-builder symmetric state check: after acquiring the emulation-mode
SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm
and the killswitch cannot honor its contract — bail with a clear error
rather than running a broken anchor.

Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing
state=0 and acquiring SH the last live N-builder may release SH and run
the restorer (echo 1). Recheck state after SH and bail back to emulation
if state has flipped — otherwise we'd skip qemu deploy while binfmt
actually routes to qemu, and chroot exec fails because the qemu binary
is not in the chroot.

Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful
for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates
above killswitch evaluation so unrelated builds (amd64/arm64 targets,
x86 hosts) skip qemu-arm entirely.

Early call site removed: prepare_host_binfmt_qemu_cross previously called
_native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap
phase without qemu registration on a chroot that has no libc/ld-linux
yet — armhf maintainer-script exec fails. Native activation stays where
it belongs: post-mmdebstrap call sites in rootfs-create.sh and
rootfs-image.sh.

Cleanup-ordering invariant comment was inverted. add_cleanup_handler
PREPENDS, so later registrations run FIRST. The invariant survives
because setup is called before mount_chroot in both call sites; the
comment now states the actual mechanism and documents the failure mode
if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from ecde3c1 to 6138bd0 Compare May 13, 2026 02:13
@iav
Copy link
Copy Markdown
Owner Author

iav commented May 13, 2026

@chatgpt-codex-connector review

Force-pushed 6138bd03 addressing all three P1/P2 findings from your previous review:

  • P1 (line:407 → removed) — early _native_armhf_setup_binfmt_elf call in prepare_host_binfmt_qemu_cross deleted. Native activation stays where it belongs: post-mmdebstrap (rootfs-create.sh, rootfs-image.sh). The container relaunch / Apple-Silicon race the early call was trying to mitigate is a separate concern — addressing it without breaking cache-miss mmdebstrap is out of scope here.

  • P2 (line:275 → joiner state-recheck) — added in the else branch. After flock -s -w 30 succeeds, observe state again; if the last-out restorer raced in and wrote qemu-arm back to 1, release SH and fall back to emulation rather than continuing as a phantom native joiner.

  • P2 (line:189 → killswitch ARCH-gated) — moved idempotent + ARCH/host gates above killswitch evaluation. Unrelated builds (amd64/arm64 targets, x86 hosts) skip the qemu-arm SH-anchor entirely instead of pointlessly holding it.

Build cache-miss test on real aarch64 host (m1, RK3568) with the prior ecde3c1 is still running in the background — will report empirical confirmation once it completes mmdebstrap.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6138bd0370

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +22 to +24
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing target qemu binary on native path

When native armhf mode is active, this early return skips the preservation step for an existing ${chroot_target}/usr/bin/qemu-arm-static. The later undeploy_qemu_binary_from_chroot still removes any file at that path, so a cache-hit/image that legitimately contains qemu-user-static will have its package-owned binary deleted instead of restored. The native fast path still needs to record that no host binary was deployed, or make undeploy a no-op for files it did not copy.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +202 to +204
if [[ -e /proc/sys/fs/binfmt_misc/qemu-arm ]] &&
{ exec {_native_armhf_emul_lock_fd}< /proc/sys/fs/binfmt_misc/qemu-arm; } 2> /dev/null &&
flock -s -n "${_native_armhf_emul_lock_fd}"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait or abort when the killswitch lock is contended

With NATIVE_ARMHF_ON_ARM64=no, if a native builder is briefly holding the exclusive flock while switching qemu-arm, this nonblocking shared flock fails and the killswitch build falls through to emulation without holding the SH anchor. The peer can then downgrade with qemu-arm disabled while this build later enters its chroot expecting qemu-arm-static routing, causing execs to fail; this path should wait/recheck or fail instead of continuing unanchored.

Useful? React with 👍 / 👎.

# so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf
# and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm
# because its cross-arch path requires that registration to be present.
_native_armhf_setup_binfmt_elf || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep qemu-arm enabled while peers bootstrap

A concurrent armhf build can still be inside the mmdebstrap command above when this build reaches the post-mmdebstrap switch and disables the global qemu-arm registration. Those bootstrapping peers have not called _native_armhf_setup_binfmt_elf yet, so they hold no SH lock or killswitch anchor, but their cross-arch mmdebstrap path still depends on qemu-arm; the first cache-miss build to finish bootstrap can therefore break another cache-miss build that is still running mmdebstrap.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup has killed children

This cleanup is registered with add_cleanup_handler, which prepends handlers, while the earlier rootfs cleanup handler that unmounts/kills chroot users remains later in the list and mount_chroot does not add another handler. On SIGINT or a failing chroot command, trap_handler_native_armhf_restore_qemu_arm therefore runs before rootfs cleanup; any still-running child that inherited the flock fd keeps the SH lock alive, the last-out EX probe fails, and qemu-arm can be left globally disabled after the build exits.

Useful? React with 👍 / 👎.

… + clarify ordering invariant

Addresses codex review feedback on PR #113 / armbian#9769.

N-builder entry race: EX-NB → release → re-take SH split let a K-builder
slip in between observation and the state-check write, silently disabling
qemu-arm while K relied on it. Hold EX during state check + echo 0, then
atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock
transition. SH waiters are granted only after our SH is in place and
qemu-arm is already 0.

K-builder symmetric state check (success path): after acquiring the
emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have
disabled qemu-arm and the killswitch cannot honor its contract — bail
with a clear error rather than running a broken anchor.

K-builder SH-NB → blocking SH-with-timeout (failure path): the old
non-blocking flock -s -n would fail when an N-builder briefly held EX
(probe / disable / EX→SH downgrade window). Falling through to qemu
emulation without holding the SH anchor let the peer N complete its
transition with qemu-arm=0, breaking K's chroot exec routing. Now block
up to 30s; if SH lands, re-check state — anchor on state=1, exit_with_error
on state=0. Sub-millisecond peer transitions resolve well within the
window; persistent contention falls back without anchor (preserves old
behavior for degenerate cases).

Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing
state=0 and acquiring SH the last live N-builder may release SH and run
the restorer (echo 1). Recheck state after SH and bail back to emulation
if state has flipped — otherwise we'd skip qemu deploy while binfmt
actually routes to qemu, and chroot exec fails because the qemu binary
is not in the chroot.

Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful
for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates
above killswitch evaluation so unrelated builds (amd64/arm64 targets,
x86 hosts) skip qemu-arm entirely.

Early call site removed: prepare_host_binfmt_qemu_cross previously called
_native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap
phase without qemu registration on a chroot that has no libc/ld-linux
yet — armhf maintainer-script exec fails. Native activation stays where
it belongs: post-mmdebstrap call sites in rootfs-create.sh and
rootfs-image.sh.

Cleanup-ordering invariant comment was inverted. add_cleanup_handler
PREPENDS, so later registrations run FIRST. The invariant survives
because setup is called before mount_chroot in both call sites; the
comment now states the actual mechanism and documents the failure mode
if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 6138bd0 to c4a1f8b Compare May 13, 2026 03:25
@iav
Copy link
Copy Markdown
Owner Author

iav commented May 13, 2026

@chatgpt-codex-connector review

Force-pushed c4a1f8be addressing the K-builder SH-NB failure-path race from your previous review (line:204):

The killswitch path no longer falls through to qemu emulation when flock -s -n fails — that case was where a brief peer EX hold (probe / disable / EX→SH downgrade window) caused us to drop the anchor and let the peer complete its transition with qemu-arm=0. Now we use blocking flock -s -w 30 and recheck state after acquiring SH: anchor on state=1, exit_with_error on state=0. Sub-millisecond peer transitions resolve well within the window; if SH stays unreachable for 30s the code still falls back without anchor but only after the timeout, not on the first probe.

The remaining findings from your prior review are deferred for now:

  • P1 line:148 rootfs-create.sh (peer-bootstrap-in-mmdebstrap race): correct concern; needs serialization at prepare_host_binfmt_qemu_cross (host-wide EX-NB probe before any builder enters mmdebstrap, fail-fast if a concurrent native peer already won). Out of scope for the current minimal fix; will track as a follow-up.
  • P1 line:306 (cleanup ordering): I am still verifying call-site registration order — the empirical SIGINT test holds, but if your reading is correct (rootfs cleanup registered earlier in time → later in PREPEND list → runs after us) we need explicit descendant-kill or POSIX F_OFD on a helper fd. Will reproduce with a controlled SIGINT test before patching.
  • P2 line:24 (preserve target qemu binary): still treated as out-of-scope edge case (armhf image deliberately bundling qemu-user-static). Acknowledged but not in this round.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4a1f8be3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf
# and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm
# because its cross-arch path requires that registration to be present.
_native_armhf_setup_binfmt_elf || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Anchor qemu-arm before mmdebstrap can be interrupted

When two armhf builds run on the same aarch64 host, this disables the global qemu-arm registration as soon as one build finishes mmdebstrap, but another build may still be inside the mmdebstrap call above and still needs that registration for its maintainer-script execution. Because no emulation SH-lock is held before/during mmdebstrap, the first finisher can switch the host to native mode and break the peer's bootstrap. The native-owner transition needs to be serialized against builders that have not yet left mmdebstrap, not only after this line.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup runs

This cleanup is registered after prepare_rootfs_build_params_and_trap has already registered trap_handler_cleanup_rootfs_and_image, and add_cleanup_handler prepends handlers, so this restore handler runs before the rootfs cleanup that unmounts/kills chroot work. On an interrupt while a chroot/container child still has the inherited flock fd open, the last-out LOCK_EX -n probe sees that inherited holder and skips re-enabling qemu-arm, leaving the host with arm emulation disabled after the build exits.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines 94 to 99
if [[ ! -f "${dst_target}" ]]; then
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug"
return 0
fi
exit_with_error "Missing qemu binary during undeploy_qemu_binary_from_chroot from ${caller}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target-owned qemu binaries when deploy was skipped

On the cache-hit/image path _native_armhf_setup_binfmt_elf runs before deploy_qemu_binary_to_chroot, so native mode makes deploy skip without creating any .armbian.orig marker. If the rootfs legitimately contains /usr/bin/${QEMU_BINARY} (for example it installs qemu-user-static), this missing-file-only shortcut does not fire and the later undeploy falls through to remove the target's own binary. Track whether this invocation actually deployed a host copy, or no-op all native-mode undeploys that were preceded by a skipped deploy.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh
Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav
Copy link
Copy Markdown
Owner Author

iav commented May 13, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8bf0f6944a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/functions/rootfs/qemu-static.sh Outdated

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup handlers

When a native armhf build is interrupted or fails while chroot work is still running, this cleanup is prepended after trap_handler_cleanup_rootfs_and_image, so run_cleanup_handlers executes it first (see lib/functions/logging/traps.sh lines 117-140). Because BSD flock locks are inherited by forked chroot/subshell processes, releasing this fd before the rootfs cleanup kills/unmounts those descendants makes the last-out LOCK_EX -n probe see the inherited shared lock and skip echo 1, leaving /proc/sys/fs/binfmt_misc/qemu-arm disabled on the host.

Useful? React with 👍 / 👎.

Comment on lines +272 to +274
if [[ "$(_native_armhf_observe_qemu_arm_state)" == "1" ]]; then
if ! echo 0 > /proc/sys/fs/binfmt_misc/qemu-arm 2> /dev/null; then
display_alert "Native armhf via binfmt_elf" "could not disable qemu-arm (no CAP_SYS_ADMIN?); falling back to qemu-arm-static emulation" "wrn"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block new mmdebstrap runs while qemu-arm is disabled

If a second armhf cache-miss build starts after another native build reaches this echo 0, prepare_host_binfmt_qemu_cross does not re-enable qemu-arm when /proc/sys/fs/binfmt_misc/qemu-arm already exists, and its final arch-test armhf still passes via native compat. That lets the new build enter mmdebstrap with qemu-arm disabled even though rootfs-create.sh explicitly depends on qemu-arm until libc/ld-linux exist, so staggered concurrent builds can fail during bootstrap rather than joining only after mmdebstrap.

Useful? React with 👍 / 👎.

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
Comment on lines +22 to +24
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not undeploy qemu when deployment was skipped

When native armhf is active on the cache-hit/image path, this early return skips the backup step that normally protects an already-installed /usr/bin/${QEMU_BINARY} in the rootfs. The matching undeploy still removes any existing file if it is present, so images whose cached rootfs legitimately contains qemu-user-static/qemu-user-binfmt lose that package-owned binary even though this build never deployed it.

Useful? React with 👍 / 👎.

# so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf
# and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm
# because its cross-arch path requires that registration to be present.
_native_armhf_setup_binfmt_elf || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Protect peer mmdebstrap before disabling qemu-arm

In concurrent armhf cache-miss builds, one build can reach this post-mmdebstrap setup and globally disable qemu-arm while another build is still inside its own run_host_command_logged "${debootstrap_bin}" step above. That peer has already deployed the qemu binary but has not reached the point where libc/ld-linux are guaranteed usable natively, so its remaining mmdebstrap maintainer-script executions can suddenly stop going through qemu-arm and fail mid-bootstrap.

Useful? React with 👍 / 👎.

…(cleanup ordering)

Two related codex P2 findings on PR #113, both about
`_native_armhf_setup_binfmt_elf` / `prepare_host_binfmt_qemu_cross_arm64_host_armhf_target`
behaving incorrectly when the build is interrupted or when the user opts
out of native armhf execution.

== Codex P2 :478 — Honor the armhf native killswitch when qemu is absent ==

`prepare_host_binfmt_qemu_cross_arm64_host_armhf_target` is reached only
via the NATIVE_ARMHF_ON_ARM64=no/never/disabled killswitch — its job is
to make sure armhf chroot execution goes through qemu-arm, not the
kernel's native armhf compat.

The previous gate

  if arch-test armhf; then
      ... assume qemu emulation already setup, skip import ...
  else
      ... import + enable qemu-arm ...

is broken on a native-capable aarch64 host: `arch-test armhf` returns 0
whenever the kernel can run armhf at all, which is true even when
`/proc/sys/fs/binfmt_misc/qemu-arm` is absent or its
`/usr/share/binfmts/qemu-arm` descriptor is missing. The helper then
skips importing qemu-arm and armhf chroots run natively, silently
defeating the killswitch opt-out.

`arch-test(1)` reports whether the kernel can run an arch at all, not
which mechanism does the running. Probe the binfmt_misc entry directly
instead: require both `/proc/sys/fs/binfmt_misc/qemu-arm` to exist and
report `enabled`, AND `/usr/share/binfmts/qemu-arm` to be present (so a
later `update-binfmts --enable` won't NOOP). Anything else → fall
through to the import branch.

The post-import smoke-test below is left as-is — it's a diagnostic alert
on the import path, not a gate.

== Codex P2 :265 — Fix cleanup handler ordering before restoring qemu-arm ==

`_native_armhf_setup_binfmt_elf` is invoked from rootfs-create.sh after
`prepare_rootfs_build_params_and_trap` has already registered
`trap_handler_cleanup_rootfs_and_image`. `add_cleanup_handler` PREPENDS
to the handler array, and `run_cleanup_handlers` iterates that array in
order, so a plain `add_cleanup_handler` for our restore would put it at
the head — running BEFORE `trap_handler_cleanup_rootfs_and_image` on
interrupt. That releases the native-armhf SH-fd and re-enables qemu-arm
while chroot/loop teardown is still pending: any straggling subshell
inherits the SH-fd (so the LOCK_EX-NB probe spuriously blocks AND
neighbouring native-armhf builds see false contention), and any in-
flight exec re-routes through qemu-arm — which the native deploy path
never copied into the chroot, so the exec fails.

Add `add_cleanup_handler_last` to logging/traps.sh — same registration
API but appends to the tail of the handler list — and use it for
`trap_handler_native_armhf_restore_qemu_arm`. This pins our restore
after every PREPEND-registered handler, including
`trap_handler_cleanup_rootfs_and_image` and `mount_chroot`'s umount
handlers. By the time we run, the chroot is unmounted, the loop is
freed, and inheriting subshells have exited; the EX-NB probe sees a
clean lock and the qemu-arm re-enable is safe.

The ordering-invariant comment is rewritten to describe the new pinning
strategy explicitly.

Assisted-by: Claude:claude-opus-4.7
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 8bf0f69 to f3c87e5 Compare May 13, 2026 16:05
…path (codex P2 :24)

`deploy_qemu_binary_to_chroot` returns early when
`ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes` — the kernel binfmt_elf path
runs 32-bit ARM ELF natively, so no qemu-arm-static needs to be copied
into the chroot. But the early return happened BEFORE the preservation
block that moves any existing `${chroot_target}/usr/bin/qemu-arm-static`
to `…/qemu-arm-static.armbian.orig`. An armhf image that ships
`qemu-user-static` as a package (via package lists or customization) had
that binary in place before deploy ran; native-armhf skipped preserve;
the matching `undeploy_qemu_binary_from_chroot` then treated the
package-owned file as the host copy and `rm -fv`'d it. The package's
binary disappeared from the produced image even though our deploy path
never put a host file there.

Fix:

1. In deploy, move the dst_target/.orig path computation and the
   "preserve existing target binary" block BEFORE the native-armhf early
   return. Resolving `src_host` (host's qemu-arm-static lookup) is
   deferred to after the early return — a native armhf builder may
   intentionally have no qemu-user-static on the host.

2. In undeploy, stop short-circuiting on
   `ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes` when dst_target is missing:
   that branch must still run the `.armbian.orig` restore step, or a
   preserved target binary stays in .orig forever. New flow:

      if dst_target present:                     rm  (deploy ran cross-arch)
      elif native-armhf was not active:          exit_with_error (state loss)
      else:                                      fall through to restore
      …
      restore .armbian.orig if present.

The .armbian.orig marker is the only signal undeploy has to tell a
target-shipped binary from a deploy-copied one. The remaining edge case
— target apt-installing qemu-user-static AFTER deploy ran (so no .orig
exists yet at deploy time) — is not covered here; a true fix needs an
explicit "we copied this" marker file, called out in a TODO comment in
undeploy.

Assisted-by: Claude:claude-opus-4.7
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abf328e87b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +385 to +386
exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize restore before releasing native lock

When the last native armhf builder exits while a NATIVE_ARMHF_ON_ARM64=no builder is starting, closing the shared fd here creates a gap before the last_fd exclusive probe below. I checked flock --help: -s takes a shared lock, -x an exclusive lock, and -n fails on conflict; in that gap the killswitch builder can acquire SH, observe qemu-arm is still 0, and while it is exiting/releasing, this handler's EX-NB probe can fail, leaving no remaining owner to write echo 1 and restoring never happens. Fresh evidence is that the current diff still closes _native_armhf_lock_fd before opening/probing last_fd, so the last-out transition is not serialized.

Useful? React with 👍 / 👎.

display_alert "Native armhf via binfmt_elf" "killswitch requested but qemu-arm already disabled by concurrent native-armhf builders" "err"
exit_with_error "cannot honor NATIVE_ARMHF_ON_ARM64=no: concurrent native-armhf builders have disabled qemu-arm. Wait for them to finish or run on a separate host."
fi
add_cleanup_handler trap_handler_native_armhf_release_emul_lock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hold the killswitch lock until chroot cleanup finishes

When a NATIVE_ARMHF_ON_ARM64=no build is interrupted or fails after this point, this cleanup was registered with add_cleanup_handler, which prepends handlers and therefore runs before the earlier rootfs teardown handler that kills/unmounts chroot users. That releases the emulation SH anchor while killswitch chroot descendants can still be alive and still depend on qemu-arm; a concurrent native armhf builder can then take EX and write 0, defeating the lock's purpose of blocking switchover until the emulation-mode build is done with chroot execution.

Useful? React with 👍 / 👎.

… P2 :222)

`_native_armhf_setup_binfmt_elf`'s killswitch path (entered when
`NATIVE_ARMHF_ON_ARM64=no/never/disabled`) used to fall through with a
`wrn` and `return 1` on three anchor-establish failures:

  1. `/proc/sys/fs/binfmt_misc/qemu-arm` not registered
  2. open of that file failed (no CAP_SYS_ADMIN)
  3. blocking SH-lock not granted within 30s

The fall-through is unsafe on a native-capable aarch64 host: the kernel
will run armhf chroot execs through binfmt_elf (COMPAT) regardless of
whether qemu-arm is registered, so the build still runs natively despite
the operator explicitly opting out via the killswitch. Codex P2 :222
flags exactly case (1).

Every "killswitch can't be honored" branch now emits a clear error and
exits, matching the already-fatal case (4) at line 247 ("concurrent
native-armhf builders have disabled qemu-arm"). NATIVE_ARMHF_ON_ARM64=no
is treated as a hard contract — if we can't anchor the emulation lock,
the build stops rather than silently running native.

Each error message tells the operator the actual blocking condition and
suggests the relevant fix (install qemu-user-static / grant CAP_SYS_ADMIN
/ wait for stuck peer).

Assisted-by: Claude:claude-opus-4.7
@iav iav marked this pull request as draft May 14, 2026 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant