fix: cranelift JIT srt-after-map TLS desync silent miscompile by danieljohnmorris · Pull Request #306 · ilo-lang/ilo

danieljohnmorris · 2026-05-16T15:34:58Z

Summary

Cranelift JIT silently returned nil from srt fn xs (and grp / uniqby / partition) when another HOF with an inline-lambda callback had run earlier in the same function. Tree and VM produced the correct sorted list. Caught by pdf-analyst rerun4 via the frq + mget + srt pattern, then minimised down to two inline lambdas in the same function body.

Repro

main>_;a=map (x:n>n;+x 1) [1 2 3];sk=srt (p:L _>n;p.0) [[2 "a"] [1 "b"]];sk

Before:

--run-tree → [[1, b], [2, a]]
--run-vm → [[1, b], [2, a]]
--run-cranelift → nil

After: all three engines agree on [[1, b], [2, a]].

Root cause

The four TLS slots ACTIVE_REGISTRY, ACTIVE_FUNC_NAMES, ACTIVE_PROGRAM, ACTIVE_AST_PROGRAM had drop guards that NULL'd the slot unconditionally on scope exit. That is safe at the top-level Cranelift entry, but a per-element HOF callback re-enters the VM via jit_call_dyn -> VM::new(program).call(...); the inner execute()'s guards then NULL the slots when the inner returns, leaving the outer JIT with no TLS state.

The next tree-bridge HOF in the same Cranelift entry saw null ACTIVE_FUNC_NAMES, deserialised its FnRef arg to a synthetic <user_fn:N> placeholder Text, failed to dispatch it through the tree interpreter, and the bridge swallowed the failure as TAG_NIL (legacy nil-sweep gap, filed separately).

Regressed when the HOF dispatch chain (#277 / #278 / #279 / #280 / #283) introduced the sub-VM re-entry path. Pre-#277, map was tree-bridge-only and the inner-VM re-entry didn't exist, so the TLS desync never happened.

What is in the diff

Commit 1 — vm: save-restore TLS guards. The four guards now snapshot the previous pointer at construction and restore it on drop. Linear stack discipline across arbitrary nesting; the outermost restore still ends at null. clear_active_registry() retained for the AOT runtime tear-down path where there is no prior pointer to restore.

Commit 2 — test: cross-engine regression. Nine new tests in regression_lambdas_cross_engine.rs cover every native HOF (map/flt/fld/flatmap) followed by every tree-bridge HOF (srt/grp/uniqby/partition). Plus the original pdf-analyst frq + mget shape, and a native->bridge->native sandwich. New examples/srt-after-map-inline-lambda.ilo exercises the same pattern through tests/examples_engines.rs so the higher-level harness catches future drift too.

Test plan

Repro from pdf-analyst rerun4 (min2.ilo) now matches tree/VM on Cranelift
Nine new cross-engine regression tests pass on tree / VM / Cranelift
New examples/srt-after-map-inline-lambda.ilo runs identically on every engine via tests/examples_engines.rs
Full cargo test --release --features cranelift green (3071 + 24 + 30 + ... passes, 0 failures)
cargo fmt --check and cargo clippy --release --features cranelift -- -D warnings clean

Follow-ups (filed, out of scope here)

Promote tree-bridge errors from silent TAG_NIL to runtime errors for HOFs (srt, grp, uniqby, partition) so a callback failure surfaces as ILO-R instead of nil. Independent of this PR; logged in ilo_assessment_feedback.md parked section.

… JIT state The four TLS slots (ACTIVE_REGISTRY, ACTIVE_FUNC_NAMES, ACTIVE_PROGRAM, ACTIVE_AST_PROGRAM) were cleared to null on guard drop. That's fine at the top-level Cranelift entry, but a per-element HOF callback re-enters the VM via jit_call_dyn -> VM::new(program).call(...); the inner execute()'s guards then null the slots when the inner returns, leaving the outer JIT with no TLS state. The next tree-bridge HOF in the same Cranelift entry would see null ACTIVE_FUNC_NAMES, deserialise its FnRef arg to a synthetic <user_fn:N> placeholder Text, fail to dispatch it through the tree interpreter, and the bridge would swallow the failure as TAG_NIL. Manifested as a silent miscompile: srt-after-map returned nil on Cranelift, correct on tree/VM. pdf-analyst rerun4 caught it via the frq + mget + srt pattern. Guards now snapshot the previous pointer at construction and restore it on drop. Linear stack discipline across arbitrary nesting depth; top-level restore still ends at null since the outer caller's prev was null. clear_active_registry() retained for the AOT runtime tear-down path where there is no prior pointer to restore.

Nine new tests in regression_lambdas_cross_engine.rs cover every combination of a native-dispatch HOF (map/flt/fld/flatmap) followed by a tree-bridge HOF (srt/grp/uniqby/partition) in the same function body. Pre-fix, the second HOF returned nil on Cranelift while tree and VM produced the correct result; post-fix all three engines agree. Includes the original pdf-analyst rerun4 shape (frq + mget loop building [count word] pairs, then srt-by-count) and a native->bridge->native sandwich to confirm the fix doesn't only help the bridge call. New example examples/srt-after-map-inline-lambda.ilo exercises the same pattern through tests/examples_engines.rs so any future regression breaks at the higher-level harness too.

codecov · 2026-05-16T15:38:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

three P0 fixes since 0.11.3, all surfaced by rerun4 personas: srt-cranelift TLS desync silent miscompile (#306), CLI auto-run for main and inline programs restored (#307), OP_LISTAPPEND O(n^2) memory regression in in-process Cranelift JIT (#308).

twelve fixes since 0.11.3, surfaced by rerun4 personas plus standing asks: srt-Cranelift TLS desync (#306), CLI auto-run restoration (#307), OP_LISTAPPEND O(n^2) JIT memory regression (#308), precedence-pair hint false-positive on parens (#309), prefix ?? accepts call expression (#310), += pure-shape docs (#311), bare-mutation silent no-op verifier warning ILO-T033 (#312), asin/acos/atan inverse trig builtins (#313), flat cross-engine (#314), cond{~v} discard hint multi-stmt false-positive (#315), rsrt fn xs key-fn overloads (#316), xs.(expr) paren-after-dot diagnostic hint (#317).

danieljohnmorris added 2 commits May 16, 2026 16:34

danieljohnmorris merged commit 684189c into main May 16, 2026
5 checks passed

danieljohnmorris deleted the fix/srt-cranelift-nil branch May 16, 2026 15:38

danieljohnmorris mentioned this pull request May 16, 2026

vm: HOF tree-bridge error parity on Cranelift #321

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cranelift JIT srt-after-map TLS desync silent miscompile#306

fix: cranelift JIT srt-after-map TLS desync silent miscompile#306
danieljohnmorris merged 2 commits into
mainfrom
fix/srt-cranelift-nil

danieljohnmorris commented May 16, 2026

Uh oh!

codecov Bot commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 16, 2026

Summary

Repro

Root cause

What is in the diff

Test plan

Follow-ups (filed, out of scope here)

Uh oh!

codecov Bot commented May 16, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant