fix: catch Cranelift JIT panics and fall back to non-JIT engines#319
Merged
Conversation
Upstream cranelift-jit 0.116 has an AArch64 near-call relocation
assertion (compiled_blob.rs:90) that fires non-deterministically when
the JIT code-cache and runtime memory end up more than ±64 MB apart.
On a first run after a fresh cargo build that surfaces as a hard
process crash on macOS arm64, with no recovery path for the user.
Add JitCallError::Panic { msg } and wrap compile_and_call in
std::panic::catch_unwind so the panic becomes a recoverable error
the dispatcher can fall back from. The panic-hook chain is installed
once via Once and scoped by a thread-local IN_JIT_DISPATCH flag, so
concurrent JIT dispatches on other threads do not race on set_hook
and panics outside the JIT keep their default rendering.
Defensively resets the JIT bump arena and drains the runtime-error
TLS cell in the panic arm: if the panic fired mid-call() after some
helper allocations but before the normal tail reset, those would
otherwise leak into the next invocation.
Debug-build-only test hooks (FORCE_PANIC_FOR_TEST thread-local and
ILO_FORCE_JIT_PANIC env var) raise a synthetic panic at the same
call site so tests can exercise the fallback path without depending
on the AArch64-specific upstream bug. Both gated on debug_assertions
so release binaries do not carry them.
Handle JitCallError::Panic in both engine dispatch paths. run_default (the default file/inline path) falls through to the tree interpreter, matching the existing NotEligible arm. run_cranelift_engine (explicit --run-cranelift) falls back to the bytecode VM, since the user opted into a JIT engine and VM is the closest non-JIT tier. Both paths emit a single-line stderr breadcrumb including the panic payload so the upstream cranelift issue stays measurable in production logs rather than collapsing into a generic message or a silent engine swap.
Three integration tests using ILO_FORCE_JIT_PANIC=1 against the ilo binary: - default engine falls through to the tree interpreter and produces the expected program output - --run-cranelift falls back to the bytecode VM - the stderr breadcrumb includes the panic payload so the upstream issue stays searchable in production logs Plus a cross-engine examples/*.ilo pin so the broader regression harness exercises the same numeric pipeline shape on tree and VM. Integration tests gated on cfg(debug_assertions) since the env-var hook only exists in debug builds.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
danieljohnmorris
added a commit
that referenced
this pull request
May 16, 2026
eight fixes since 0.11.4, all from rerun5 personas: bare-bang silent-nil regression (#324), Cranelift AArch64 panic catch_unwind fallback (#319), multi-line body span drift (#318), HOF tree-bridge error parity on Cranelift (#321), bool-ternary brace sugar (#323), single-line body diagnostic with brace-block bodies (#322), unknown-subcommand error in multi-fn files (#320), window perf cliff fused flt/map (#325).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Upstream cranelift-jit 0.116 has a non-deterministic AArch64 near-call relocation assertion (
compiled_blob.rs:90-(diff >> 26 == -1) || (diff >> 26 == 0)) that fires when the JIT code-cache and runtime memory end up more than ±64 MB apart. On macOS arm64 this surfaces as a hard process crash on the first run after a freshcargo build, with no recovery path for the user. Five subsequent invocations of the same command line typically run clean, but the one that crashed killed the pipeline.Wrapping the Cranelift JIT dispatch in
std::panic::catch_unwindconverts the crash into a measurable diagnostic plus a slower-but-correct run on the next engine down. The panic payload is surfaced verbatim in a single-line stderr breadcrumb so the upstream issue stays searchable rather than degrading silently into the fallback engine.Repro before/after
Before this PR, on macOS arm64 (intermittent, first run after fresh build):
After this PR, same panic forced via the debug-build env-var hook:
One clean breadcrumb on stderr, correct program output on stdout, exit 0.
What's in the diff
Commit 1 - jit: wrap Cranelift dispatch in catch_unwind, add Panic variant
JitCallError::Panic { msg }joinsNotEligibleandRuntimeinsrc/vm/jit_cranelift.rs.compile_and_callnow wraps the whole dispatch instd::panic::catch_unwindwith a thread-scoped panic-hook chain installed once viaOnce. The hook readsIN_JIT_DISPATCH(a thread-local) and suppresses the default backtrace only when the calling thread is inside the JIT entry. Concurrent JIT dispatches on other threads do not race onset_hook, and panics outside the JIT keep their default rendering.IN_JIT_DISPATCHflag so a nestedcompile_and_call(none today, but a JIT helper could legitimately re-enter) does not clobber the outer's flag on return.FORCE_PANIC_FOR_TESTthread-local andILO_FORCE_JIT_PANICenv var, both gated oncfg(debug_assertions)so release binaries do not carry them.Commit 2 - cli: fall back to VM/tree on Cranelift panic, with stderr breadcrumb
run_default(the default file/inline path) falls through to the tree interpreter, matching the existingNotEligiblearm.run_cranelift_engine(explicit--run-cranelift) falls back to the bytecode VM viavm::run, since the user opted into a JIT engine and VM is the closest non-JIT tier.Commit 3 - test: cover Cranelift panic fallback across engines
tests/regression_cranelift_panic_fallback.rsdriving the binary withILO_FORCE_JIT_PANIC=1:--run-craneliftfalls back to VM, exit 0, breadcrumb mentions VMFORCE_PANIC_FOR_TESTthread-local: one insrc/vm/jit_cranelift.rs(assertscompile_and_callreturnsPanicand the next call recovers normally), two insrc/main.rs(assertrun_defaultandrun_cranelift_engineexit 0 on fallback).examples/cranelift-panic-fallback.ilopins the same numeric pipeline shape across tree and VM via the existing examples_engines harness.Decisions
catch_unwindrather than bumping cranelift-jit to 0.131. Bulletproof regardless of upstream state, no API churn, contained blast radius. A version bump can come later as a separate PR if 0.117+ ships the fix.Once. Avoids the globalset_hook/take_hookrace that the obvious "save and restore around the catch" approach would have introduced.cargo testruns multi-threaded by default; embedding ilo as a library is a foreseeable use case.NotEligibleshape, same user expectation - JIT is an optimisation).--run-craneliftfalls to VM (the user asked for an optimised engine; tree would be a much larger surprise).debug_assertions. Release binaries never carry the synthetic-panic path; the only behaviour change in release is the catch_unwind wrapper itself.Test plan
cargo test --release --features cranelift- full suite greencargo clippy --features cranelift --tests --release -- -D warnings- cleancargo fmt --all -- --check- cleancranelift_compile_and_call_catches_panic,run_default_cranelift_panic_falls_back_to_interpreter,run_cranelift_engine_panic_falls_back_to_vm,cranelift_panic_default_falls_back_to_interpreter,cranelift_panic_explicit_engine_falls_back_to_vm,cranelift_panic_breadcrumb_includes_payloadILO_FORCE_JIT_PANIC=1 ilo 'f x:n>n;*x 2' f 5prints breadcrumb on stderr and10on stdout, exits 0Follow-ups