fix: cranelift JIT srt-after-map TLS desync silent miscompile#306
Merged
Conversation
… JIT state The four TLS slots (ACTIVE_REGISTRY, ACTIVE_FUNC_NAMES, ACTIVE_PROGRAM, ACTIVE_AST_PROGRAM) were cleared to null on guard drop. That's fine at the top-level Cranelift entry, but a per-element HOF callback re-enters the VM via jit_call_dyn -> VM::new(program).call(...); the inner execute()'s guards then null the slots when the inner returns, leaving the outer JIT with no TLS state. The next tree-bridge HOF in the same Cranelift entry would see null ACTIVE_FUNC_NAMES, deserialise its FnRef arg to a synthetic <user_fn:N> placeholder Text, fail to dispatch it through the tree interpreter, and the bridge would swallow the failure as TAG_NIL. Manifested as a silent miscompile: srt-after-map returned nil on Cranelift, correct on tree/VM. pdf-analyst rerun4 caught it via the frq + mget + srt pattern. Guards now snapshot the previous pointer at construction and restore it on drop. Linear stack discipline across arbitrary nesting depth; top-level restore still ends at null since the outer caller's prev was null. clear_active_registry() retained for the AOT runtime tear-down path where there is no prior pointer to restore.
Nine new tests in regression_lambdas_cross_engine.rs cover every combination of a native-dispatch HOF (map/flt/fld/flatmap) followed by a tree-bridge HOF (srt/grp/uniqby/partition) in the same function body. Pre-fix, the second HOF returned nil on Cranelift while tree and VM produced the correct result; post-fix all three engines agree. Includes the original pdf-analyst rerun4 shape (frq + mget loop building [count word] pairs, then srt-by-count) and a native->bridge->native sandwich to confirm the fix doesn't only help the bridge call. New example examples/srt-after-map-inline-lambda.ilo exercises the same pattern through tests/examples_engines.rs so any future regression breaks at the higher-level harness too.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
danieljohnmorris
added a commit
that referenced
this pull request
May 16, 2026
twelve fixes since 0.11.3, surfaced by rerun4 personas plus standing asks: srt-Cranelift TLS desync (#306), CLI auto-run restoration (#307), OP_LISTAPPEND O(n^2) JIT memory regression (#308), precedence-pair hint false-positive on parens (#309), prefix ?? accepts call expression (#310), += pure-shape docs (#311), bare-mutation silent no-op verifier warning ILO-T033 (#312), asin/acos/atan inverse trig builtins (#313), flat cross-engine (#314), cond{~v} discard hint multi-stmt false-positive (#315), rsrt fn xs key-fn overloads (#316), xs.(expr) paren-after-dot diagnostic hint (#317).
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cranelift JIT silently returned
nilfromsrt fn xs(andgrp/uniqby/partition) when another HOF with an inline-lambda callback had run earlier in the same function. Tree and VM produced the correct sorted list. Caught by pdf-analyst rerun4 via thefrq+mget+srtpattern, then minimised down to two inline lambdas in the same function body.Repro
Before:
--run-tree→[[1, b], [2, a]]--run-vm→[[1, b], [2, a]]--run-cranelift→nilAfter: all three engines agree on
[[1, b], [2, a]].Root cause
The four TLS slots
ACTIVE_REGISTRY,ACTIVE_FUNC_NAMES,ACTIVE_PROGRAM,ACTIVE_AST_PROGRAMhad drop guards that NULL'd the slot unconditionally on scope exit. That is safe at the top-level Cranelift entry, but a per-element HOF callback re-enters the VM viajit_call_dyn -> VM::new(program).call(...); the innerexecute()'s guards then NULL the slots when the inner returns, leaving the outer JIT with no TLS state.The next tree-bridge HOF in the same Cranelift entry saw null
ACTIVE_FUNC_NAMES, deserialised its FnRef arg to a synthetic<user_fn:N>placeholder Text, failed to dispatch it through the tree interpreter, and the bridge swallowed the failure asTAG_NIL(legacy nil-sweep gap, filed separately).Regressed when the HOF dispatch chain (#277 / #278 / #279 / #280 / #283) introduced the sub-VM re-entry path. Pre-#277,
mapwas tree-bridge-only and the inner-VM re-entry didn't exist, so the TLS desync never happened.What is in the diff
Commit 1 —
vm: save-restore TLS guards. The four guards now snapshot the previous pointer at construction and restore it on drop. Linear stack discipline across arbitrary nesting; the outermost restore still ends at null.clear_active_registry()retained for the AOT runtime tear-down path where there is no prior pointer to restore.Commit 2 —
test: cross-engine regression. Nine new tests inregression_lambdas_cross_engine.rscover every native HOF (map/flt/fld/flatmap) followed by every tree-bridge HOF (srt/grp/uniqby/partition). Plus the original pdf-analyst frq + mget shape, and a native->bridge->native sandwich. Newexamples/srt-after-map-inline-lambda.iloexercises the same pattern throughtests/examples_engines.rsso the higher-level harness catches future drift too.Test plan
min2.ilo) now matches tree/VM on Craneliftexamples/srt-after-map-inline-lambda.iloruns identically on every engine viatests/examples_engines.rscargo test --release --features craneliftgreen (3071 + 24 + 30 + ... passes, 0 failures)cargo fmt --checkandcargo clippy --release --features cranelift -- -D warningscleanFollow-ups (filed, out of scope here)
TAG_NILto runtime errors for HOFs (srt,grp,uniqby,partition) so a callback failure surfaces as ILO-R instead ofnil. Independent of this PR; logged inilo_assessment_feedback.mdparked section.