fix(checker): self-driving stack-depth measurement + CI guard (#37) by hyperpolymath · Pull Request #80 · hyperpolymath/my-lang

hyperpolymath · 2026-05-30T20:58:20Z

Closes the one remaining open DoD item on #37: the Windows-CI leg of the stack-budget measurement that confirms MAX_EXPR_DEPTH = 128 is safe on the 1 MiB msvc main-thread stack.

Background

Subtleties 1 & 3 of #37 (general iterative AST teardown; MAX_EXPR_DEPTH re-derived 256 → 128) were fixed in #43. That PR explicitly left #37 open for a single datapoint: "the Windows-CI leg of the stack measurement — needs CI access … Leave #37 open until that datapoint confirms the 128 budget on the msvc toolchain."

The blocker was that examples/measure_depth.rs was a single-shot probe — it ran one walk at one depth/stack and relied on a human reading the exit code across many runs. Nothing in CI drove it, so the Windows number was never produced.

What this does

examples/measure_depth.rs — rewritten into a self-driving measurement. A stack overflow aborts the process, so it can't be caught in-process; the example now re-execs itself as worker subprocesses and binary-searches the overflow cliff for each of: recursive Drop, the guarded check_expr walk, and the iterative teardown. It prints per-platform bytes/level and exits non-zero if MAX_EXPR_DEPTH no longer fits the 1 MiB floor with headroom. The old worker mode (measure_depth <mode> <depth> <stack_kib>) is preserved as the subprocess entry point.

.github/workflows/stack-depth.yml — new. Runs the measurement + the regression test on ubuntu-latest and windows-latest (same SHA-pinned checkout / rust-toolchain actions as checker-scaling.yml). The Windows datapoint is now produced and locked in on every change — a future bump that breaks the budget fails CI on the affected OS.

crates/my-lang/tests/stack_depth_37.rs — new. Asserts, on a thread pinned to exactly 1 MiB (the Windows budget), that the depth-MAX_EXPR_DEPTH guarded walk and a 1,000,000-deep iterative teardown both survive. This is the OS-portable regression guard.

checker.rs — doc update. The MAX_EXPR_DEPTH doc-comment now records that the budget is reconfirmed automatically on both OSes, not by a one-off manual run.

Verification (Linux, this run)

recursive Drop      : cliff ~4016 levels @ 256 KiB  (~65 B/level)
checker @ depth 128 : fits in >= 111 KiB  (~888 B/level)
iterative teardown  : survives 1000000 levels @ 256 KiB: true
PASS: MAX_EXPR_DEPTH=128 is safe within 819 KiB of the 1024 KiB floor on linux.

cargo test -p my-lang --test stack_depth_37 --release → 2 passed, 0 failed. The Windows leg runs once this lands and the windows-latest job executes.

DoD status (#37)

Measure stack-frame cost per recursion level — checker & Drop, now on Linux and Windows via CI (was Linux-only)
Replace shape-specific drop helper with a general non-overflowing teardown (done in fix(checker): general non-overflowing AST teardown + measured MAX_EXPR_DEPTH (#37) #43)
Re-derive MAX_EXPR_DEPTH from the measured budget (done in fix(checker): general non-overflowing AST teardown + measured MAX_EXPR_DEPTH (#37) #43; now CI-verified on msvc)
Regression test with a deep, non-Call-shaped AST (extended here to a 1 MiB-stack guard on both OSes)

Once the Stack Depth (#37) workflow goes green on windows-latest, #37 can close.

https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2

Generated by Claude Code

Closes the one open DoD item on #37: the Windows-CI leg of the stack-budget measurement confirming MAX_EXPR_DEPTH=128 is safe on the 1 MiB msvc main-thread stack. Subtleties 1 & 3 were fixed in #43; the value was left open until that datapoint, and nothing yet ran the probe. - examples/measure_depth.rs: rewrite the single-shot probe into a self-driving measurement. It re-execs itself as worker subprocesses, binary-searches the overflow cliff for the recursive Drop, the guarded checker walk and the iterative teardown, prints per-platform bytes/level, and exits non-zero if MAX_EXPR_DEPTH no longer fits the 1 MiB floor with headroom. One command now produces and asserts the datapoint on any platform (was a probe needing a manual exit-code wrapper). - .github/workflows/stack-depth.yml: run that measurement and the regression test on ubuntu-latest + windows-latest (same pinned actions as checker-scaling.yml), so the Windows datapoint is produced and locked in on every change instead of relying on a manual run. - tests/stack_depth_37.rs: assert the depth-128 guarded walk and a 1e6-deep iterative teardown both survive a 1 MiB stack. - checker.rs: document that the budget is now reconfirmed automatically on both OSes rather than by a one-off manual measurement. https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2

Review of the self-driving measurement flagged that survives() collapsed both subprocess spawn-failure and any non-zero exit into false ("cliffed"), with no positive proof the walk ran — an infra failure (or a future path that exits 0 without doing the work) could skew a measured cliff or risk a false PASS. - survives() now distinguishes three outcomes: spawn-failure / exit-0-without- running-the-walk are fatal (exit 3, never a datapoint); a survived run requires exit 0 AND the worker's "OK ..." completion line; a non-zero exit is the genuine overflow cliff signal. - Document that overflow-aborts-the-process is the cliff mechanism and that join().is_err() only covers the unwinding-panic path (default panic=unwind; abort would also exit non-zero, so either strategy is safe). Driver still PASSes locally (MAX_EXPR_DEPTH=128 safe within 819 KiB of the 1 MiB floor); the binary searches were independently verified correct. https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2

https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2

https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2 Co-authored-by: Claude <noreply@anthropic.com>

github-actions · 2026-05-31T11:08:20Z

🔍 Hypatia Security Scan

Findings: 96 issues detected

Severity	Count
🔴 Critical	6
🟠 High	39
🟡 Medium	51

⚠️ Action Required: Critical security issues found!

View findings

[
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cflite_batch.yml",
    "type": "missing_timeout_minutes",
    "file": "cflite_batch.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cflite_batch.yml",
    "type": "missing_timeout_minutes",
    "file": "cflite_batch.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cflite_pr.yml",
    "type": "missing_timeout_minutes",
    "file": "cflite_pr.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cflite_pr.yml",
    "type": "missing_timeout_minutes",
    "file": "cflite_pr.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in checker-scaling.yml",
    "type": "missing_timeout_minutes",
    "file": "checker-scaling.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in codeql.yml",
    "type": "missing_timeout_minutes",
    "file": "codeql.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in governance.yml",
    "type": "missing_timeout_minutes",
    "file": "governance.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in hypatia-scan.yml",
    "type": "missing_timeout_minutes",
    "file": "hypatia-scan.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in mirror.yml",
    "type": "missing_timeout_minutes",
    "file": "mirror.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

claude added 2 commits May 30, 2026 20:55

hyperpolymath marked this pull request as ready for review May 30, 2026 22:06

hyperpolymath merged commit efba529 into main May 30, 2026
15 of 24 checks passed

hyperpolymath deleted the claude/bold-noether-8uSuC branch May 30, 2026 22:06

hyperpolymath mentioned this pull request May 30, 2026

Recursive AST Drop stack-overflows on very deep inputs; revisit MAX_EXPR_DEPTH value #37

Closed

4 tasks

hyperpolymath pushed a commit that referenced this pull request May 30, 2026

docs(changelog): record #80 stack-depth CI guard (closes #37)

a6dcae6

https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2

hyperpolymath mentioned this pull request May 30, 2026

docs(changelog): record #80 stack-depth CI guard (closes #37) #81

Merged

hyperpolymath added a commit that referenced this pull request May 30, 2026

docs(changelog): record #80 stack-depth CI guard (closes #37) (#81)

18e209e

https://claude.ai/code/session_013JnrUmkCpMHsABRmhLVyd2 Co-authored-by: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(checker): self-driving stack-depth measurement + CI guard (#37)#80

fix(checker): self-driving stack-depth measurement + CI guard (#37)#80
hyperpolymath merged 2 commits into
mainfrom
claude/bold-noether-8uSuC

hyperpolymath commented May 30, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hyperpolymath commented May 30, 2026

Background

What this does

Verification (Linux, this run)

DoD status (#37)

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026

🔍 Hypatia Security Scan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants