feat: add `--per-depth-timeout` option for kontrol prove by Stevengre · Pull Request #1141 · runtimeverification/kontrol

Stevengre · 2026-05-12T08:16:48Z

Summary

Adds --per-depth-timeout SECONDS to kontrol prove. When set (>0), each prove attempt runs in a forked subprocess. The parent treats current_depth * per_depth_timeout as a stall window rather than a hard total budget: every window the parent polls the proof's on-disk subdir for file mtime changes; any change means the prove committed a step and the window resets for another round. Only when a full window passes with no progress does the parent SIGKILL the entire subprocess session (Python + KoreServer subprocess + parallel-frontier worker threads), halve current_depth (floor 1), and start the next attempt — which resumes from the disk-persisted KCFG state. Default 0 disables the wrapper.

Example: --max-depth 1000 --per-depth-timeout 10 starts with a 10000s stall window at depth=1000. As long as KCFG keeps growing within each 10000s window, the prove runs as long as it likes. If no node is committed for 10000s, kill → halve to depth=500 with a 5000s window → … → depth=1 with a 10s window.

Why stall window (vs. hard wall-clock cap)

A hard total budget kills proofs that are productively making progress just because the wall-clock crossed a threshold. The intent of progressive halving is to detect when execute_depth is too coarse to break out of a stuck step — exactly the case where no new nodes appear. Stall-window semantics matches the symptom directly: if the prove keeps writing to disk, leave it alone; if it stops writing, the next step is stuck and shrinking execute_depth is the right move.

Why subprocess + SIGKILL (vs. callback inside `run_prover`)

advance_proof's maintenance callback only fires after step_proof returns. If a single step takes minutes (deep symbolic execution, expensive SMT), the callback can't fire and a callback-based timeout can't interrupt. Subprocess + session-group SIGKILL gives precise wall-clock cutoff regardless of what the prover is doing internally — the kernel atomically reaps every relevant process and thread.

Implementation

cli.py: new --per-depth-timeout flag.
options.py: new ProveOptions.per_depth_timeout (default 0).
prove.py:
- When per_depth_timeout > 0, init_and_run_proof delegates to _init_and_run_proof_progressive, which loops over halving depths.
- _run_attempt_under_timeout(test, attempt_max_depth, budget_s) runs one attempt:
  - mp.get_context('fork').Process so the child inherits everything via CoW (no pickling, no spawn cost).
  - Child calls os.setsid() to become a session leader, mutates its local fork-copy of options.max_depth and options.per_depth_timeout = 0 (preventing recursion), then re-enters init_and_run_proof. The new KoreServer it starts is in the same session.
  - Parent polls every budget_s via proc.join(timeout=budget_s). If child exited → break out and read the result via Pipe. If still alive → walk foundry.proofs_dir / test.id and compare max file mtime against the previous snapshot.
  - Marker changed → grant another budget_s window. Marker unchanged → os.killpg(os.getpgid(proc.pid), SIGKILL) reaps the whole session, return _ATTEMPT_TIMEOUT.

Test plan

kontrol prove --help lists --per-depth-timeout.
Default-off: existing test suite passes unchanged (no subprocess overhead, run_prover(...) is the unaltered code path).
Stall detection: a contrived proof that hangs in a single long step_proof is killed within budget_s + poll_overhead; logs show depth=N attempt exhausted Ms budget; halving.
No false positives: a productive proof making steady progress (any committed step inside each window) runs to completion without being killed.
Hard kill cleanup: while a --per-depth-timeout 10 --max-depth 100 proof runs, pgrep kore-rpc-booster | wc -l returns to baseline within a second of the kill (kore-rpc subprocess reaped via session-group SIGKILL, not orphaned).
Resumption: after a forced halving, the next attempt's KCFG starts from where the previous one left off (no full restart).
workers > 1: each test's per-attempt subprocess is independent; no cross-contamination of progress signals (the marker walks proofs_dir / test.id, scoped per test).

Caveats

Uses mp.get_context('fork'); requires POSIX (already a kontrol requirement for kore-rpc-booster).
Each halving spawns a fresh KoreServer; per-attempt startup cost adds a few seconds.
Atomicity: pyk's proof.write_proof_data() is assumed to write atomically (temp + rename). A SIGKILL during write could yield a partial state, but pyk's loader can re-fetch from logs.
Polling cadence equals budget_s. Kill latency is therefore bounded by one extra window beyond actual stall onset — not millisecond-precise but adequate for budgets >= seconds.

Adds progressive depth-halving with a per-attempt **stall window**. When `--per-depth-timeout S` is set, each attempt is given an initial window of `current_depth * S` seconds. The parent polls the proof's on-disk subdir every window: if any file mtime changed (i.e. the prove committed at least one step), the window is reset for another round. If no progress is observed across a full window, the entire subprocess session is reaped with `os.killpg(..., SIGKILL)` (Python + KoreServer + parallel-frontier worker threads), `current_depth` is halved (floor 1), and the next attempt resumes from the disk-persisted KCFG state. Each attempt runs in a forked subprocess that calls `os.setsid()` to become its own session leader, so a single `killpg` reaps the entire subtree. The proof state is persisted by `advance_proof`'s maintenance loop (maintenance_rate=1, default), so on-disk KCFG state is current up to the last committed step at the moment of the kill. Default 0 disables the wrapper: the existing single-attempt `run_prover(...)` path is taken with no subprocess overhead.

Stevengre force-pushed the progressive-depth-timeout branch from 1f0cc6a to 36d97eb Compare May 13, 2026 02:38

Stevengre force-pushed the progressive-depth-timeout branch from 36d97eb to a20aacb Compare May 26, 2026 03:07

Stevengre mentioned this pull request May 26, 2026

kevm-pyk: add per_depth_timeout to run_prover for progressive depth halving runtimeverification/evm-semantics#2850

Closed

4 tasks

Stevengre force-pushed the progressive-depth-timeout branch from a20aacb to bd56284 Compare May 26, 2026 09:55

Stevengre changed the title ~~feat: add --per-depth-timeout option for progressive depth-halving prove~~ feat: add --per-depth-timeout option for kontrol prove (blocked by kevm-pyk #2850) May 26, 2026

Stevengre force-pushed the progressive-depth-timeout branch from bd56284 to a0fce21 Compare May 26, 2026 12:16

Stevengre changed the title ~~feat: add --per-depth-timeout option for kontrol prove (blocked by kevm-pyk #2850)~~ feat: add --per-depth-timeout option for kontrol prove May 26, 2026

Stevengre force-pushed the progressive-depth-timeout branch from a0fce21 to 62ed2fd Compare May 26, 2026 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `--per-depth-timeout` option for kontrol prove#1141

feat: add `--per-depth-timeout` option for kontrol prove#1141
Stevengre wants to merge 1 commit into
masterfrom
progressive-depth-timeout

Stevengre commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Stevengre commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why stall window (vs. hard wall-clock cap)

Why subprocess + SIGKILL (vs. callback inside run_prover)

Implementation

Test plan

Caveats

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Stevengre commented May 12, 2026 •

edited

Loading

Why subprocess + SIGKILL (vs. callback inside `run_prover`)