split call_function into 14 inline-never family dispatchers#504
Closed
danieljohnmorris wants to merge 3 commits into
Closed
split call_function into 14 inline-never family dispatchers#504danieljohnmorris wants to merge 3 commits into
danieljohnmorris wants to merge 3 commits into
Conversation
The monolithic ~4500-line call_function with ~134 arms caused stack overflows in debug builds because rustc reserves stack slots for every local in a frame simultaneously, even across mutually exclusive branches. PR #494 and PR #500 both needed band-aids (per-test thread wrapper and RUST_MIN_STACK=16MiB) as a result. Extract the arms into 14 #[inline(never)] family dispatchers: dispatch_map_builtins, dispatch_linalg_builtins, dispatch_math_builtins, dispatch_datetime_builtins, dispatch_basic_builtins, dispatch_list_builtins, dispatch_io_builtins, dispatch_text_builtins, dispatch_fs_builtins, dispatch_json_builtins, dispatch_hof_builtins, dispatch_stat_builtins, dispatch_regex_builtins, dispatch_fft_builtins. The new call_function is a thin match router (<100 lines). Each recursive call now uses only the router frame plus the one family that executes, so per-frame pressure is O(one family's locals) regardless of total builtin count. HOF dispatchers (list, hof) receive env: &mut Env because srt/rsrt key-fn and map/flt/fld/etc. call call_function recursively. Non-HOF dispatchers are pure. resolve_fn_ref and closure_captures are promoted from nested functions inside call_function to module-level items, visible to all dispatchers.
fib(30) = 832040 and fold over range 0..1000 (sum = 499500) must complete on the default 8 MiB thread stack in debug builds. Both tests run on VM and JIT engines to cover all call_function dispatch paths. Before the fix these tests would have hit SIGSEGV at around fib(22-25) depending on platform.
Demonstrates fib(30) and sum via fld over range 0..1000 - the same patterns that would have overflowed with the old monolithic call_function. Acts as an in-context learning example for agents and an additional regression harness via examples_engines.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
7 tasks
Collaborator
Author
|
Closing to unstick the merge queue. The keep-both rebase strategy can't handle this PR cleanly (it touches the same dispatch table every other PR appends to, producing broken-brace artifacts on rebase). Will reimplement against current main after the rest of the queue drains. |
4 tasks
Collaborator
Author
|
Superseded by fresh-impl PR (in flight on feature/-v2 branch). The original PR's keep-both rebase produced broken Rust that can't be unstuck without manual brace surgery. The v2 PR is a clean reimpl against current main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
call_functionwas a single ~4,500-line function with ~134if builtin == ...arms. In debug builds, rustc reserves stack slots for every local in a frame simultaneously even across mutually exclusive branches, so each recursive user-fn call consumed several hundred KiB of stack.RUST_MIN_STACK=16777216in CI respectively.#[inline(never)]family dispatchers. The newcall_functionis a thin match router (<100 lines). Per-frame stack pressure is now O(one family's locals) regardless of how many builtins exist.Repro before / after
Before (debug build, default 8 MiB stack):
After:
What's in the diff
Commit 1 - split call_function into 14 inline-never family dispatchers
dispatch_map_builtins,dispatch_linalg_builtins,dispatch_math_builtins,dispatch_datetime_builtins,dispatch_basic_builtins,dispatch_list_builtins,dispatch_io_builtins,dispatch_text_builtins,dispatch_fs_builtins,dispatch_json_builtins,dispatch_hof_builtins,dispatch_stat_builtins,dispatch_regex_builtins,dispatch_fft_builtinsresolve_fn_refandclosure_capturespromoted from nested functions insidecall_functionto module-level items, shared by list and HOF dispatchersenv: &mut Envbecause srt/rsrt key-fn and map/flt/fld callcall_functionrecursivelycall_functionis a thinmatch builtin { ... }router with Len inlined (high-frequency, trivial)Commit 2 - regression tests (
tests/regression_call_function_stack.rs)fib30_no_stack_overflow_vm/fib30_no_stack_overflow_jitfold_range_no_stack_overflow_vm/fold_range_no_stack_overflow_jitfib30_cross_engine_parityCommit 3 -
examples/deep-recursion.ilo-- run:/-- out:assertions; exercises all example enginesTest plan
cargo buildclean (no warnings)cargo clippy -- -D warningscleancargo test --lib- 3274 passed / 0 failed (AOT tests need libilo.a via symlink)cargo test- all integration tests pass including newregression_call_function_stackcargo test --test examples_engines- deep-recursion.ilo assertions passFollow-ups
RUST_MIN_STACK=16777216inrust.ymlon PR feature: --allow-net/read/write/run CLI capability flags #500 should be dropped in its rebase