Skip to content

Add Concrete Syntax for Unstructured Programs in Strata Core#1132

Open
PROgram52bc wants to merge 58 commits into
mainfrom
htd/unstructured-procedure
Open

Add Concrete Syntax for Unstructured Programs in Strata Core#1132
PROgram52bc wants to merge 58 commits into
mainfrom
htd/unstructured-procedure

Conversation

@PROgram52bc
Copy link
Copy Markdown
Contributor

@PROgram52bc PROgram52bc commented May 6, 2026

Add Support for Unstructured Programs in Strata Core

Description of changes

  • Adds syntax and parser support for unstructured (CFG-based) procedure bodies in Strata Core. For example,
cfg entry { 
    // Deterministic block (default): 
    entry: {  
      x := 0; 
      if (x < n) goto entry else done;       
    } 
    done: {
      return; 
    }
  }
  
cfg entry {
    // Nondeterministic block:   
    entry: {   
      y := y + 1;     
      goto entry, done;   // nondeterministic choice 
    }
    done: {
      return; 
    }
  }

Prior to this PR, the only way to obtain unstructured programs was to apply StructuredToUnstructured transformation.

  • Adds .core.st file examples illustrating the syntax of unstructured programs.
  • Introduces Procedure.Body as a sum type:
  inductive Procedure.Body where
    | structured : List Statement → Procedure.Body
    | cfg : DetCFG → Procedure.Body
  • Adapts uses of Procedure.body to handle the cfg case based on the context
  • Adds metadata support to unstructured programs (new since initial PR draft)
  • Propagates loop invariants/decreases measure to unstructured CFG during transformation (new since initial PR draft)
  • Adds CFG-based Core-to-GOTO pipeline alongside direct path (new since initial PR draft)

Adaptation Methods for Procedure.body Uses

  1. CFG (Implemented) — Functions that handle both structured and CFG bodies with meaningful separate logic for each case.
  2. CFG (N/A) — Operations where CFGs legitimately contribute nothing (e.g., no local funcDecls, no structured loops), so returning [] or skipping is semantically correct.
  3. CFG (deferred) — Operations that currently error on CFGs or produce placeholder results, where proper CFG support is desirable but requires non-trivial additional work (e.g., dominator analysis, two-stage
    pipelines, graph-level inlining).
  4. Proof (Implemented) — Formal correctness proofs adapted to the Procedure.Body sum type.
  5. Proof (deferred) — Proof obligations where CFG correctness is not yet captured; the current formulation is trivially true for CFGs and is left for future PRs.

(Bolded items are updates since the initial PR draft)

Category Specific Strategy File Function/Location
Proof (Implemented) postconditionsValid adapted with CoreBodyExec Transform/CoreSpecification.lean ProcedureCorrect.postconditionsValid
Proves structured if transform succeeds Transform/ProcBodyVerifyCorrect.lean procToVerifyStmt_is_structured (new helper theorem)
Unifies two ss witnesses via subst Transform/ProcBodyVerifyCorrect.lean procBodyVerify_procedureCorrect
Adjusted case splits Languages/Core/ObligationExtraction.lean extractGo_ok proof
CFG (Implemented) Symbolic evaluation of CFG bodies using fuel measure, without path merging. Languages/Core/ProcedureEval.lean eval
Type-check CFG bodies (labels, vars, targets) implemented. Languages/Core/ProcedureType.lean typeCheck, checkModificationRights
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToCProverGOTO.lean transformToGoto
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToGOTOPipeline.lean procedureToGotoCtx
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG StrataTest/.../E2E_CoreToGOTO.lean coreToGotoJsonWithSummary
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/CoreToCBMC.lean createImplementationSymbolFromAST
extractCallsFromStatements vs extractCallsFromDetCFG Languages/Core/CallGraph.lean extractCallsFromProcedure
blockToCST vs detCFGToCST Languages/Core/DDMTransform/FormatCore.lean procToCST
runStmt vs runCFG Languages/Core/StatementEval.lean Command.runCall
Use CoreBodyExec with two constructors, and CoreStepStar vs CoreCFGStepStar Languages/Core/StatementSemantics.lean EvalCommand.call_sem
Lang.core vs Lang.coreCFG Transform/CoreSpecification.lean AssertValidInProcedure
transformStmts vs transformDetCFG Transform/PrecondElim.lean precondElim
eraseTypes, stripMetaData, getVars implemented and used for CFG Languages/Core/Procedure.lean eraseTypes, stripMetaData, getVars
extractFromStatements vs extractFromDetCFG Languages/Core/ObligationExtraction.lean extractObligations
CFG (N/A) return .none on CFG, skip inlining, because the inlined procedure could be structured or unstructured. Semantics is largely undefined. Transform/ProcedureInlining.lean inlineCallCmd
stmtsToCFG vs identity, latter is a no-op. StrataTest/.../Loops.lean singleCFG
Remove loops in structured, pass CFG through because it has no loops Transform/LoopElim.lean loopElim
[] for CFG (no local funcDecls in CFG bodies) Languages/Core/Core.lean buildEnv
CFG (deferred) Dominator-based path-condition propagation to be implemented Languages/Core/ObligationExtraction.lean extractFromDetCFG
throw on CFG, due to incompatible return type of List Statement instead of List Command Transform/CoreTransform.lean runProgram
throw on CFG, VC for CFG bodies to be implemented, proof to be adapted as well. Transform/ProcBodyVerify.lean procToVerifyStmt
Encode structured procedures, handling for CFG procedures to be implemented. Transform/ANFEncoder.lean anfEncodeProgram
CFG body renaming for inlining to be implemented Transform/ProcedureInlining.lean renameAllLocalNames
Call extraction for CFG bodies to be implemented StrataTest/Boole/global_readonly_call.lean callHelper
Proof (deferred) WF Properties wfstmts, wfloclnd, bodyExitsCovered conditioned on structured procedures, props for CFG to be implemented Languages/Core/WF.lean WFProcedureProp

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread Strata/Backends/CBMC/GOTO/CoreToCProverGOTO.lean
Comment thread Strata/Backends/CBMC/GOTO/CoreToGOTOPipeline.lean Outdated
Address PR review comments: note that the GOTO backend could be
refactored into a two-stage pipeline (structured→cfg, then cfg→GOTO)
to eliminate the pattern matching on Procedure.Body.
Comment thread Strata/Backends/CBMC/CoreToCBMC.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean Outdated
Comment thread Strata/Languages/Core/Procedure.lean
Comment thread Strata/Languages/Core/DDMTransform/Grammar.lean
David Deng added 5 commits May 6, 2026 14:08
- Add Body.getCfg accessor (mirrors getStructured)
- CoreToCBMC: throw error on CFG body instead of returning []
- StatementEval: interpret CFG bodies by linearizing blocks
- Grammar: add comment explaining why 'branch' is used instead of 'if'
  (DDM registers tokens globally, causing conflict with if-statement)
Comment thread Strata/Languages/Core/Procedure.lean Outdated
Comment thread Strata/Languages/Core/WF.lean Outdated
Comment thread Strata/Languages/Core/Procedure.lean Outdated
Comment thread StrataTest/Transform/ProcedureInlining.lean Outdated
Comment thread StrataTest/Languages/Boole/global_readonly_call.lean Outdated
David Deng and others added 2 commits May 11, 2026 12:40
Test 1 — Type-checking rejection: confirms Core.typeCheck rejects a CFG
procedure with "expected structured body, got CFG" diagnostic.

Test 2 — CFG body preservation: confirms a parsed CFG procedure retains
.cfg body (isCfg=true, isStructured=false) with correct entry label and
block count, guarding against the prior bug where type-checking collapsed
CFG bodies to .structured [].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… type

TerminationCheck.lean (from main) constructs a Procedure with
body := stmts, but this branch changed Procedure.body from
List Statement to Procedure.Body. Wrap with .structured to fix
the merge-queue CI build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
David Deng and others added 2 commits May 12, 2026 09:04
…eBodyExec TODOs

The postconditionsValid field in ProcedureCorrect was vacuously true for
CFG procedures — the match expression produced [] for CFGs, making
CoreStepStar terminate immediately with rho' = rho0.

Split into two explicit fields:
- postconditionsValid_structured: for structured bodies, uses CoreStepStar
  on the statement list (preserves existing proven behavior)
- postconditionsValid_cfg: for CFG bodies, uses CoreCFGStepStar directly,
  making the soundness gap explicit and non-vacuous

Updated procBodyVerify_procedureCorrect proof to construct all three
fields. The CFG field is discharged by contradiction since
procToVerifyStmt only succeeds for structured bodies.

Added TODO comments on CoreBodyExec documenting:
- Should be wired into postconditionsValid_cfg
- The cfg constructor drops terminal eval (limiting postcondition support)
- An equivalence theorem structured_iff_CoreStepStar would bridge the
  two representations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ionsValid

Add a δ' : CoreEval output parameter to CoreBodyExec so it exposes the
terminal evaluator. The structured constructor outputs ρ'.eval; the cfg
constructor passes through the initial δ (since CoreCFGStepStar does not
track eval changes).

This enables unifying postconditionsValid_structured and
postconditionsValid_cfg back into a single postconditionsValid field
that uses CoreBodyExec to abstract over both body kinds.

Updated call_sem to use the terminal eval (δ_final) from CoreBodyExec
when checking postconditions, and updated procBodyVerify_procedureCorrect
proof to invert the CoreBodyExec.structured constructor and reconcile
the initial environment via ProcEnvWF.noFailure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PROgram52bc PROgram52bc requested a review from tautschnig May 12, 2026 17:37
David Deng and others added 13 commits May 12, 2026 13:06
structural checks (unique labels, entry exists, valid targets) and expression-level type-checking.
- Uses a fuel-measure and throws error on running out of fuel
- No path-merging
- Added #guard_msg tests
  1. Trivial CFG (single finish block)
  2. Linear CFG (assignment + goto + postcondition holds)
  3. Missing block error
  4. Fuel exhaustion on loop back-edge
  5. Postcondition failure (non-trivial proof obligation)
  6. Diamond CFG with symbolic branch (two paths, two proof obligations with path conditions)
During structured-to-CFG lowering, loop invariants and decreases
measures were only preserved as lowered assert commands, losing the
connection to the original spec. Downstream CFG passes that need to
recover the contract (e.g., for invariant inference or refinement)
had no way to distinguish spec-level asserts from user asserts.

Add `MetaData.specLoopInvariant` and `MetaData.specDecreases` fields
and attach them to the loop entry block's transfer command metadata.
The existing assert lowering is preserved (both representations
coexist), so this is a non-breaking addition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The grammar previously accepted `goto label1, label2, ..., labelN;`
via CommaSepBy, but the translator rejected >2 targets at translation
time. This meant users could write syntactically valid programs that
failed during elaboration with a confusing error.

Split into two grammar ops:
- `transfer_goto (label : Ident)` — unconditional goto (1 target)
- `transfer_nondet_goto (label1 : Ident, label2 : Ident)` — nondet (2 targets)

The concrete syntax is unchanged: `goto a;` and `goto a, b;` parse
as before. `goto a, b, c;` now fails at parse time with a clear
syntax error instead of passing through to translation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
translateTransfer now increments the var_def counter after reading it
for the $__nondet_N variable name, so multiple nondeterministic gotos
in the same procedure get distinct names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 5 instances of the "match body with structured/cfg-error"
pattern with the existing getStructured helper, which was defined
but unused. Uses .mapError where the caller's error type differs
from Except String.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `procedureToGotoCtxViaCFG`, a parallel pipeline that translates Core
procedures to CProver GOTO via the CFG representation:

  Structured body → stmtsToCFG → coreCFGToGotoTransform → GOTO
  CFG body → coreCFGToGotoTransform → GOTO

This coexists with the existing direct path in `procedureToGotoCtx`, which
remains unchanged.

Changes:

1. CFGToCProverGOTO.lean — close two gaps in `detCFGToGotoTransform`:
   - Source locations: transfer command metadata is now used to derive
     source locations via `metadataToSourceLoc` (previously ignored as
     `_md`).
   - Loop contracts: backward-edge GOTOs targeting loop entry blocks are
     annotated with `#spec_loop_invariant` and `#spec_decreases` named
     fields on the guard, matching CBMC's DFCC expectations. A post-
     processing pass detects loop entries (by presence of contract
     metadata on their condGoto transfer) and annotates GOTOs whose
     target location ≤ source location (i.e., backward edges).

2. CoreToGOTOPipeline.lean — make `renameIdent`, `renameExpr`,
   `renameCmd`, and `collectFuncDecls` non-private so the new pipeline
   can reuse them.

3. CoreCFGToGOTOPipeline.lean (new) — contains:
   - Rename helpers for Core commands (`CmdExt`): `renameCoreCommand`,
     `renameCoreStmt`, `renameCoreDetCFG`.
   - `coreCFGToGotoTransform`: Core-specific CFG-to-GOTO translation
     that handles `CmdExt.call` (emits FUNCTION_CALL instructions) and
     delegates `CmdExt.cmd` to `Cmd.toGotoInstructions`. Also handles
     source locations and loop contract annotation.
   - `procedureToGotoCtxViaCFG`: full pipeline wrapper mirroring
     `procedureToGotoCtx` — renaming, type environment, axioms,
     distinct declarations, contracts, lifted functions — but routing
     through `stmtsToCFG` + `coreCFGToGotoTransform`.

4. E2E_CFGPipeline.lean (new) — 11 equivalence tests that run both
   pipelines on the same Core programs and compare:
   - Semantic instruction types (DECL, ASSIGN, ASSERT, ASSUME, etc.)
     match between direct and CFG paths.
   - Contract annotations (#spec_requires, #spec_ensures) match.
   - Both paths produce valid, non-null JSON output.
   Test programs cover: simple assert, var decl/assign, if-then-else,
   contracts, axioms/distinct, free specs, cover, bitvector ops,
   assume, multiple commands, and CFG-only output validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread Strata/Languages/Core/ProcedureType.lean Outdated
Comment thread Strata/Languages/Core/StatementSemantics.lean Outdated
Comment thread Strata/Transform/ProcedureInlining.lean Outdated
Comment thread Strata/Languages/Core/DDMTransform/FormatCore.lean Outdated
Comment thread StrataTest/Languages/Core/Tests/CFGParseTests.lean Outdated
@PROgram52bc PROgram52bc requested a review from atomb May 14, 2026 20:58
Localized merge.

---------

Co-authored-by: David Deng <htd@amazon.com>
Conflicts:
- Strata/Transform/StructuredToUnstructured.lean: keep branch's
  loop-contract metadata (specLoopInvariant/specDecreases pushed to the
  transfer) and exit metadata propagation, while adopting main's
  String-only exit label and synthesizedMd-tagged synthesized commands.
- Strata/Transform/ProcBodyVerifyCorrect.lean: adopt main's new
  Config.block (Option String × σ_parent × inner) shape, but keep the
  branch's destructured `ss : List Statement` (from procToVerifyStmt_is_structured)
  instead of `proc.body : Procedure.Body`.

Follow-on fixes:
- ProcBodyVerifyCorrect.lean lines 856/859/862: replace `proc.body` with
  `ss` so the new wf-preservation lemmas (core_wfVar_preserved,
  core_wfCong_preserved, core_wfExprCongr_preserved) typecheck.
- Loops.lean / Exit.lean snapshots: replace `[fileRange]` tag with
  `[provenance]` and update shifted byte offsets to match the new
  Provenance metadata format introduced by #1140.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants