ilo run: cap runtime + stdout to prevent runaway programs (#5ar)#544
Closed
danieljohnmorris wants to merge 5 commits into
Closed
ilo run: cap runtime + stdout to prevent runaway programs (#5ar)#544danieljohnmorris wants to merge 5 commits into
danieljohnmorris wants to merge 5 commits into
Conversation
danieljohnmorris
added a commit
to ilo-lang/ilo-site
that referenced
this pull request
May 21, 2026
Mirror the SPEC.md + skills/ilo updates from ilo-lang/ilo#544. The two CLI flags and two runtime diagnostic codes guard `ilo run` against runaway programs: 60 s wall-clock cap raises ILO-R016, ~100 MB stdout cap raises ILO-R017. Both flags accept 0 to disable for batch / training workloads.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
A mandelbrot persona run (2026-05-20) missed a `col=col+1` loop increment, spun in an infinite loop, and produced 165 MB of stdout before the harness killed it. The transcript was useless to the agent - just a wall of dots, no signal about what went wrong. New `runtime_guard` module installs two process-wide caps for `ilo run`: - wall-clock budget (default 60 s) via a watchdog thread that polls elapsed at 100 ms granularity and aborts with ILO-R016 - stdout byte budget (default ~100 MB) accounted at every print site, aborts with ILO-R017 on the first write over the cap Both abort by writing a structured diagnostic to stderr (JSON or plain text following the active output mode) and exit(1). No graceful return path on purpose - threading a cancellation token through every engine's eval loop is a much bigger change for a guard that only fires on already-broken programs. Registry entries explain the most common cause for each code (missing loop increment, recursion without base case) and point agents at the override flag.
Plumb the runtime guard through the CLI surface. Both flags are global (work on `ilo`, `ilo run`, anywhere we execute user code) and installed once from `fn main` after dispatch is resolved - non-run subcommands (check, build, serv, graph) skip the watchdog. `--max-runtime 0` and `--max-output-bytes 0` disable the respective caps for batch / training workloads the operator already knows are long. Defaults come from `runtime_guard` (60 s, 100 MB) - high enough that no legitimate program is bothered, low enough that a runaway gets killed inside a single agent turn. Both the explicit `Cmd::Run` arm and the bare-positional dispatch install the guard, since both ultimately execute user code via the same engine surface.
Three call sites carry the user's `prnt` builtin across engines:
- `src/interpreter/mod.rs` Builtin::Prnt arm (tree-walker fallback)
- `src/vm/mod.rs` OP_PRT (bytecode VM, default engine)
- `src/vm/mod.rs` jit_prt extern (Cranelift JIT)
Each now formats the value, charges `len() + 1` (the trailing
newline) against the runtime_guard budget, then prints. The +1
matters: a `wh true{prnt 0}` loop in the JIT prints "0\n"
repeatedly; without charging the newline the byte counter would
drift and the abort would fire later than the operator expects
(potentially after several MB extra).
`print_value` (the entry-point result print in main.rs) is NOT
instrumented - it runs once at program exit, never in a loop, so
the cap doesn't apply to it. Keeps the test surface stable.
Six subprocess tests pinning the guard behaviour: - infinite-loop VM aborts on --max-runtime 1 with ILO-R016 - infinite-loop JIT aborts on --max-runtime 1 with ILO-R016 - runaway prnt loop on VM aborts on --max-output-bytes 200 with ILO-R017 - runaway prnt loop on JIT aborts on --max-output-bytes 200 with ILO-R017 - a well-behaved program runs clean under the default 60 s / 100 MB caps - --max-runtime 0 disables the wall-clock guard for batch workloads Tests must run as subprocesses because the guard calls process::exit(1) after writing the diagnostic - there is no graceful return path, by design. Each diagnostic assertion checks both the code (ILO-R016 / ILO-R017) and the override flag name so the regression catches a future copy-paste that drops the hint. Tree-walker isn't exercised directly: --run-tree was removed from the public CLI in 0.12.1, so the tree path now only runs as an internal HOF/builtin fallback the VM bails to. The Prnt site there still accounts for output and the unit tests in src/runtime_guard.rs cover the counter mechanics; cross-engine parity for tree bridges is pinned elsewhere. Adds examples/runtime-guard.ilo as in-context documentation that exercises the bounded happy-path (a triangle-number loop). The examples harness runs it on every engine, giving a higher-level regression that the guards don't affect well-behaved programs.
SPEC.md gets a new CLI section row for each flag and a short "production-safety guards" explainer alongside the existing P103 paragraph. ai.txt regenerates from SPEC.md via build.rs. skills/ilo/ilo-errors.md adds R016 and R017 one-liner rows pointing at the most common cause for each (missing loop increment, prnt inside an unbounded loop) and the override flag. The agent skill gets a "Runtime + output caps" section next to the AST-depth-cap section so agents hitting the cap know to check their loop variables or bump the budget before retrying. The site docs (cli.md, diagnostics.md) live in the separate ilo-lang/site repo and follow in a companion commit there.
671002e to
0ddbc26
Compare
Collaborator
Author
Collaborator
Author
|
Superseded by fresh-impl PR (in flight on feature/-v2 branch). The original PR's keep-both rebase produced broken Rust that can't be unstuck without manual brace surgery. The v2 PR is a clean reimpl against current main. |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A mandelbrot persona run (2026-05-20) missed a
col=col+1loop increment, spun in an infinite loop, and produced 165 MB of stdout before the harness killed the process. The transcript was useless to the agent: a wall of dots with no signal about what went wrong. This PR adds two production-safety guards so the next runaway aborts cleanly with a structured diagnostic the agent can act on.--max-runtime SECS(default 60) caps wall-clock; over budget raisesILO-R016and exits 1.--max-output-bytes BYTES(default ~100 MB) caps stdout; over budget raisesILO-R017and exits 1.Both flags accept
0to disable for batch / training workloads. Defaults are well above any legitimate agent task (real ones finish under 10 s and produce kilobytes), so well-behaved programs are unaffected — pinned by a regression test.Repro before / after
Before:
After:
Output budget:
What's in the diff
Five commits, each one coherent change:
src/runtime_guard.rswithinstall()/record_output()/abort_with(). Watchdog thread polls elapsed at 100 ms granularity. Registry entries forILO-R016andILO-R017explain the most common cause and override flag.cli/args.rsandmain.rs. Both the explicitCmd::Runarm and the bare-positional dispatch install the guard before executing user code.Builtin::Prntin interpreter,OP_PRTin VM,jit_prtin Cranelift). Each chargeslen() + 1(for the newline) against the budget before printing.0disables). Subprocess is required because the guard callsprocess::exit(1)after writing the diagnostic — no graceful return path on purpose.Why a watchdog thread instead of cancellation tokens
Threading a cancellation token through every engine's eval loop (tree, VM, Cranelift JIT) is a much bigger change for a guard that only fires on already-broken programs. A watchdog that writes a diagnostic and
exit(1)s is async-signal-safe enough and ships today. Output accounting at the three print sites is the symmetric tight loop covered without touching the eval dispatch.Test plan
cargo test --release --features cranelift— full suite green (3322 + 199 + ... + 1, no failures)cargo fmt --check— cleancargo clippy --release --features cranelift --all-targets -- -D warnings— cleanFollow-ups