Skip to content

ilo run: cap runtime + stdout to prevent runaway programs (#5ar)#544

Closed
danieljohnmorris wants to merge 5 commits into
mainfrom
fix/exec-time-guard
Closed

ilo run: cap runtime + stdout to prevent runaway programs (#5ar)#544
danieljohnmorris wants to merge 5 commits into
mainfrom
fix/exec-time-guard

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

A mandelbrot persona run (2026-05-20) missed a col=col+1 loop increment, spun in an infinite loop, and produced 165 MB of stdout before the harness killed the process. The transcript was useless to the agent: a wall of dots with no signal about what went wrong. This PR adds two production-safety guards so the next runaway aborts cleanly with a structured diagnostic the agent can act on.

  • --max-runtime SECS (default 60) caps wall-clock; over budget raises ILO-R016 and exits 1.
  • --max-output-bytes BYTES (default ~100 MB) caps stdout; over budget raises ILO-R017 and exits 1.

Both flags accept 0 to disable for batch / training workloads. Defaults are well above any legitimate agent task (real ones finish under 10 s and produce kilobytes), so well-behaved programs are unaffected — pinned by a regression test.

Repro before / after

Before:

$ ilo 'main>n;n=0;wh true{n=+n 1};n'
^C    -- after burning CPU until the operator notices

After:

$ ilo --max-runtime 1 --json 'main>n;n=0;wh true{n=+n 1};n'
{"error":{"code":"ILO-R016","hint":"infinite loop is the most common cause - check loop variables increment, recursion has a base case, or pass `--max-runtime N` if a legitimate program needs longer.","message":"wall-clock runtime exceeded 1000 ms (--max-runtime 1)"}}
$ echo $?
1

Output budget:

$ ilo --max-output-bytes 100 'main>n;n=0;wh true{prnt n;n=+n 1};0'
0
1
2
...
{"error":{"code":"ILO-R017","hint":"a loop printing without a break or increment is the most common cause - check `prnt` calls inside `wh`/`fa` bodies. raise the cap with `--max-output-bytes N` if a legitimate program needs more.","message":"stdout output exceeded 100 bytes (--max-output-bytes)"}}

What's in the diff

Five commits, each one coherent change:

  1. runtime_guard module + diagnostic registry entries - new src/runtime_guard.rs with install() / record_output() / abort_with(). Watchdog thread polls elapsed at 100 ms granularity. Registry entries for ILO-R016 and ILO-R017 explain the most common cause and override flag.
  2. CLI: --max-runtime / --max-output-bytes flags - global flags wired through cli/args.rs and main.rs. Both the explicit Cmd::Run arm and the bare-positional dispatch install the guard before executing user code.
  3. Engines: output accounting at every print site - three sites total (Builtin::Prnt in interpreter, OP_PRT in VM, jit_prt in Cranelift). Each charges len() + 1 (for the newline) against the budget before printing.
  4. Tests: cover R016 + R017 across VM and JIT engines - six subprocess tests (one per engine x guard, plus happy path and 0 disables). Subprocess is required because the guard calls process::exit(1) after writing the diagnostic — no graceful return path on purpose.
  5. Docs sync - SPEC.md gets the two new CLI rows + a production-safety paragraph; ai.txt regenerates via build.rs; skills/ilo/ilo-agent.md gets a "Runtime + output caps" section; skills/ilo/ilo-errors.md adds R016 / R017 one-liners.

Why a watchdog thread instead of cancellation tokens

Threading a cancellation token through every engine's eval loop (tree, VM, Cranelift JIT) is a much bigger change for a guard that only fires on already-broken programs. A watchdog that writes a diagnostic and exit(1)s is async-signal-safe enough and ships today. Output accounting at the three print sites is the symmetric tight loop covered without touching the eval dispatch.

Test plan

  • cargo test --release --features cranelift — full suite green (3322 + 199 + ... + 1, no failures)
  • cargo fmt --check — clean
  • cargo clippy --release --features cranelift --all-targets -- -D warnings — clean
  • Manual smoke tests on every engine (VM default, JIT) — infinite loops abort cleanly, output overflows abort cleanly, happy-path triangle-number example runs

Follow-ups

  • Site docs (cli.md, diagnostics.md) ship in the separate ilo-lang/site repo with the same wording.
  • Persona harness should treat ILO-R016 / ILO-R017 as a signal to look at loop variables before re-dispatching (no code change needed — already what the diagnostic hint says).

danieljohnmorris added a commit to ilo-lang/ilo-site that referenced this pull request May 21, 2026
Mirror the SPEC.md + skills/ilo updates from ilo-lang/ilo#544.
The two CLI flags and two runtime diagnostic codes guard `ilo run`
against runaway programs: 60 s wall-clock cap raises ILO-R016,
~100 MB stdout cap raises ILO-R017. Both flags accept 0 to disable
for batch / training workloads.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 96.87500% with 6 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/runtime_guard.rs 93.10% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

A mandelbrot persona run (2026-05-20) missed a `col=col+1` loop
increment, spun in an infinite loop, and produced 165 MB of stdout
before the harness killed it. The transcript was useless to the
agent - just a wall of dots, no signal about what went wrong.

New `runtime_guard` module installs two process-wide caps for
`ilo run`:

- wall-clock budget (default 60 s) via a watchdog thread that
  polls elapsed at 100 ms granularity and aborts with ILO-R016
- stdout byte budget (default ~100 MB) accounted at every print
  site, aborts with ILO-R017 on the first write over the cap

Both abort by writing a structured diagnostic to stderr (JSON or
plain text following the active output mode) and exit(1). No
graceful return path on purpose - threading a cancellation token
through every engine's eval loop is a much bigger change for a
guard that only fires on already-broken programs.

Registry entries explain the most common cause for each code
(missing loop increment, recursion without base case) and point
agents at the override flag.
Plumb the runtime guard through the CLI surface. Both flags are
global (work on `ilo`, `ilo run`, anywhere we execute user code)
and installed once from `fn main` after dispatch is resolved -
non-run subcommands (check, build, serv, graph) skip the watchdog.

`--max-runtime 0` and `--max-output-bytes 0` disable the
respective caps for batch / training workloads the operator
already knows are long. Defaults come from `runtime_guard`
(60 s, 100 MB) - high enough that no legitimate program is
bothered, low enough that a runaway gets killed inside a single
agent turn.

Both the explicit `Cmd::Run` arm and the bare-positional
dispatch install the guard, since both ultimately execute user
code via the same engine surface.
Three call sites carry the user's `prnt` builtin across engines:

- `src/interpreter/mod.rs` Builtin::Prnt arm (tree-walker fallback)
- `src/vm/mod.rs` OP_PRT (bytecode VM, default engine)
- `src/vm/mod.rs` jit_prt extern (Cranelift JIT)

Each now formats the value, charges `len() + 1` (the trailing
newline) against the runtime_guard budget, then prints. The +1
matters: a `wh true{prnt 0}` loop in the JIT prints "0\n"
repeatedly; without charging the newline the byte counter would
drift and the abort would fire later than the operator expects
(potentially after several MB extra).

`print_value` (the entry-point result print in main.rs) is NOT
instrumented - it runs once at program exit, never in a loop, so
the cap doesn't apply to it. Keeps the test surface stable.
Six subprocess tests pinning the guard behaviour:

- infinite-loop VM aborts on --max-runtime 1 with ILO-R016
- infinite-loop JIT aborts on --max-runtime 1 with ILO-R016
- runaway prnt loop on VM aborts on --max-output-bytes 200 with ILO-R017
- runaway prnt loop on JIT aborts on --max-output-bytes 200 with ILO-R017
- a well-behaved program runs clean under the default 60 s / 100 MB caps
- --max-runtime 0 disables the wall-clock guard for batch workloads

Tests must run as subprocesses because the guard calls process::exit(1)
after writing the diagnostic - there is no graceful return path, by
design. Each diagnostic assertion checks both the code (ILO-R016 /
ILO-R017) and the override flag name so the regression catches a
future copy-paste that drops the hint.

Tree-walker isn't exercised directly: --run-tree was removed from the
public CLI in 0.12.1, so the tree path now only runs as an internal
HOF/builtin fallback the VM bails to. The Prnt site there still accounts
for output and the unit tests in src/runtime_guard.rs cover the counter
mechanics; cross-engine parity for tree bridges is pinned elsewhere.

Adds examples/runtime-guard.ilo as in-context documentation that
exercises the bounded happy-path (a triangle-number loop). The
examples harness runs it on every engine, giving a higher-level
regression that the guards don't affect well-behaved programs.
SPEC.md gets a new CLI section row for each flag and a short
"production-safety guards" explainer alongside the existing P103
paragraph. ai.txt regenerates from SPEC.md via build.rs.

skills/ilo/ilo-errors.md adds R016 and R017 one-liner rows pointing
at the most common cause for each (missing loop increment, prnt
inside an unbounded loop) and the override flag. The agent skill
gets a "Runtime + output caps" section next to the AST-depth-cap
section so agents hitting the cap know to check their loop variables
or bump the budget before retrying.

The site docs (cli.md, diagnostics.md) live in the separate
ilo-lang/site repo and follow in a companion commit there.
@danieljohnmorris
Copy link
Copy Markdown
Collaborator Author

Closing — keep-both rebase produces broken Rust syntax (install_runtime_guard call mangled). Will reimplement against post-drain main per the same path as #503/#504/#494.

@danieljohnmorris
Copy link
Copy Markdown
Collaborator Author

Superseded by fresh-impl PR (in flight on feature/-v2 branch). The original PR's keep-both rebase produced broken Rust that can't be unstuck without manual brace surgery. The v2 PR is a clean reimpl against current main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant