feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073
feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073fabiomadge wants to merge 8 commits intomainfrom
Conversation
handleZeroaryOps fell back to logging an error and returning re.none() for any 0-ary op outside the regex set. That silently substituted a regex primitive for unrelated ops in VC printer output; users saw re.none() where e.g. Sequence.empty() was intended. Switch the fallback to mkGenericCall, matching how handleUnaryOps and handleBinaryOps already handle unknown ops. The printer now emits the op name as a free-variable reference, preserving the intent. Parseable Sequence.empty<T>() syntax is still a separate grammar-level feature; this commit only fixes the printer-side noise.
polyUneval is the combinator used to declare unevaluated polymorphic functions with axioms. Unlike unaryOp and binaryOp, it had no way to attach preconditions; callers had to hand-build the WFLFunc. Add a 'preconditions' parameter and the matching free-vars proof obligation (subset of the function's input names), defaulting to empty. No behavioral change for existing callers.
Sequence.select and Sequence.update now require `0 <= i < length(s)`; Sequence.take and Sequence.drop require `0 <= n <= length(s)`. PrecondElim picks these up and generates VC obligations at call sites, both in statement positions (via transformStmt) and in pure positions (via mkContractWFProc / mkFuncWFProc) — so requires/ensures/quantifier-body subscripts are also covered. Obligations carry the propertyType metadata "outOfBoundsAccess" (new MetaData constant) and flow through a new PropertyType.outOfBoundsAccess enum variant — with matching entries in the statement-eval / obligation-extraction / cmd-eval metadata-to-PropertyType conversion sites — to finally render as "out-of-bounds-access" in SARIF output, matching how divisionByZero and arithmeticOverflow are classified. Side effect: `propertyTypeToClassification` in SarifOutput.lean was previously dead code; `vcResultToSarifResult` never set `properties.propertyType` so the SARIF output defaulted every obligation to "assert". Wiring this up means divisionByZero and arithmeticOverflow obligations now also classify correctly in SARIF — a pre-existing bug this PR incidentally fixes.
New tests in StrataTest/Transform/PrecondElim.lean:
- Test 10: Sequence.select in a procedure body emits the bounds assert
(PrecondElim is unconditional — it inserts regardless of any
surrounding requires guard; the SMT solver discharges).
- Test 10c: Sequence.select inside a requires clause triggers the
$$wf-procedure path (mkContractWFProc).
- Test 10d: Sequence.select inside a function body triggers the
function-body $$wf path (mkFuncWFStmts).
- Test 11: collectPrecondAsserts attaches outOfBoundsAccess metadata
for all four partial ops and a nested call. Mirrors
OverflowCheckTest.lean. Also verifies Sequence.length emits
no obligation (it is total).
- Test 12: Sequence.empty printer regression for the commit-1 fix —
renders as a generic call, not re.none().
New property-classification tests in
StrataTest/Languages/Core/Tests/SarifOutputTests.lean cover all five
PropertyType variants, exercising the SARIF wiring fix in commit 3.
Collateral test updates for real behavioral changes:
- StrataTest/Languages/Core/Examples/Seq.lean: expected VC output
includes the new bounds obligations (all SMT-provable from the
surrounding context, except the pre-existing contains_yes unknown).
- StrataTest/Languages/Core/Tests/ProgramEvalTests.lean: Sequence func
signatures now render with the attached requires clauses.
- StrataTest/Languages/Core/Examples/Loops.lean: commit-1 printer fix
propagates (re.none() -> top, error message format updated).
589d615 to
4f5dd33
Compare
1468e37 to
6424973
Compare
…tions Sequences (immutable value types): - TSeq variant in HighType; Seq<T> grammar syntax - [1, 2, 3] sequence literals (desugared to Sequence.build chains) - s[i] subscript read and s[i := v] functional update - 9 external Sequence.* operations (empty, build, select, update, length, append, contains, take, drop) - Seq<T> translates to Core's polymorphic Sequence type Arrays (mutable heap-backed): - TArray variant in HighType; Array<T> grammar syntax - a[i] read and a[i] := v write with heap semantics (aliasing) - Seq literal to Array conversion: var a: Array<int> := [1, 2, 3] - Synthetic $Array composite with $data: Seq<T> field - Conditional injection — no $Array in programs that don't use arrays - Array<T> recognized as composite in modifies clauses - Array.length(a) desugared to Sequence.length(a.$data) Shared infrastructure: - Subscript AST node with type-aware SubscriptElim pass - Grammar productions: seqType, arrayType, subscript, seqLiteral Co-authored-by: Fabio Madge <fmadge@amazon.com>
…tics Add a Laurel-layer validator (ValidateSubscriptUsage) that runs alongside validateDiamondFieldAccesses and flags four common misuses with Dafny-style messages that suggest the correct syntax: 1. `a[i := v]` on `Array<T>` — arrays are mutable; use `a[i] := v` or declare `a` as `Seq<T>`. 2. `s[i] := v` on `Seq<T>` — sequences are immutable; use `s[i := v]` or declare `s` as `Array<T>`. 3. `Array.length(x)` where `x` is not an `Array<T>` — reports the actual argument type. 4. `Array<T>` where `T ≠ int` — flagged with a note about the current SMT limitation. Pipeline integration: - runLaurelPasses now returns a `skipCore : Bool` flag (true when the validator emitted diagnostics) so translateWithLaurel can skip Core translation and VC generation when the Laurel-layer diagnostic is the actionable error. This prevents confusing Core type-checking noise from stacking on top of the validator's helpful message. - SubscriptElim hardens a couple of edge cases so downstream passes don't stack follow-on errors when the validator has already flagged a misuse (no-op for Seq destructive update; LiteralInt 0 fallback for Array.length on a non-Array). Negative tests and docs: - T18_Sequences: negative case for diagnostic 2. - T19_Arrays: negative cases for diagnostics 1, 3, and 4. - docs/verso/LaurelDoc: new "Common mistakes" subsection with example snippets for each of the four validator diagnostics.
Adds the Sequence.fromArray(a) builtin for taking an immutable Seq<T> snapshot of an Array<T>'s current contents. SubscriptElim rewrites calls into the a#$data internal field, preserving the existing Array-layout convention. Validator gains a fifth diagnostic flagging Sequence.fromArray calls whose argument type is not an Array<T>. Tests: - Positive cases in T19 covering snapshot semantics (mutation to the array after extraction is not reflected in the captured sequence). - Negative case asserting the new validator diagnostic fires on a non-Array argument. - T19's inter-procedural setFirst now also ensures length-preservation, which is required under the Core-level bounds preconditions introduced by the preceding Core commits in this branch (fixes a failure that surfaced when bounds obligations became checks rather than no-ops). Docs: - Rewrites the out-of-bounds semantics note from 'unconstrained' to 'verification obligation', matching the Core preconditions introduced upstream in this branch. - New 'Array to sequence conversion' section covering Sequence.fromArray and why there is no implicit Array to Seq coercion. - New 'Common mistakes' entry for Sequence.fromArray on a non-Array argument.
The TSeq arms of boxConstructorName/boxDestructorName/boxConstructorDef returned a fixed 'BoxSeq' / 'Box..SeqVal!' string regardless of the sequence's element type. HeapParameterization deduplicates usedBoxConstructors by name, so two composites with Seq fields of different element types produced two BoxSeq entries with incompatible argument types; only one survived dedup and Core type-checking then failed on the other. Derive a per-element-type tag and append it to the base name: Seq<int> becomes BoxSeq_int, Seq<bool> becomes BoxSeq_bool, Seq<Seq<int>> becomes BoxSeq_Seq_int, etc. Mirrors the existing per-primitive approach (BoxInt, BoxBool, ...). The new highTypeTag helper is also prepared to handle TArray / TSet / TMap / Pure / TTypedField by recursion, so the same pattern extends if box constructors are ever needed for those. T22_MixedSeqFields regression exercises two composites with Seq<int> and Seq<bool> fields and writes/reads through both. Fixes the bug previously tracked in #1101 (now closed).
6424973 to
1f4b0ce
Compare
MikaelMayer
left a comment
There was a problem hiding this comment.
🤖 Well-structured PR. The separation between ValidateSubscriptUsage (pure diagnostics) and SubscriptElim (rewriting) is clean, and the centralized SeqOp.* / arrayCompositeName constants prevent name drift between passes. The highTypeTag approach for per-element-type Box constructors is a good solution to the deduplication collision. The bounds-precondition infrastructure (mkSeqBoundsPrecond + classifyPrecondition) integrates naturally with the existing PrecondElim pattern.
A few items below.
| | other => other | ||
| let program' := { program with | ||
| types := types' | ||
| staticProcedures := program.staticProcedures.map (elimProcedure model) } |
There was a problem hiding this comment.
🤖 subscriptElim processes types (instance procedures) and staticProcedures, but not program.constants. A constant with a subscript initializer (e.g. const x: int := someSeq[0]) would leave an un-eliminated Subscript node in the tree, triggering the "Subscript should have been eliminated" diagnostic in LaurelToCoreTranslator. Unlikely in practice today, but worth either handling or documenting as unsupported.
| /-- Eliminate `Subscript` nodes and desugar `Array.length` across a program. | ||
| Conditionally injects the `$Array` synthetic composite when the program | ||
| uses `Array<T>` anywhere. -/ | ||
| public def subscriptElim (_model : SemanticModel) (program : Program) |
There was a problem hiding this comment.
🤖 Nit: _model is accepted and immediately discarded. Since the pipeline's LaurelPass.run signature requires Program → SemanticModel → ..., this is unavoidable at the call site, but the docstring could note why the caller's model is stale (it predates $Array injection) to save future readers from wondering whether this is a bug.
| /-- `Sequence.length(s) : int` — length of the sequence. -/ | ||
| def SeqOp.length := "Sequence.length" | ||
| /-- Name of the `$data` field on the synthetic `$Array` composite. -/ | ||
| def SeqOp.dataField := "$data" |
There was a problem hiding this comment.
🤖 Nit: SeqOp.dataField is semantically different from the other SeqOp.* entries — it's a field name on the synthetic composite, not a Sequence.* operation name. Consider a separate namespace or at least a comment distinguishing it from the operation names above.
Adds
Seq<T>(immutable value sequences) andArray<T>(mutable heap-backed arrays) to Laurel, with type-aware desugaring, bounds-checked subscript, validator diagnostics for common misuses, and aSequence.fromArraysnapshot operation.Supersedes #787, which became stale against
main.Depends on #1100 (Core PR for the Sequence well-formedness infrastructure). The first four commits of this branch are the Core PR; once it merges, this branch will be rebased onto
main.scope
Sequences (
Seq<T>, immutable):Seq<T>types,[a, b, c]sequence literals,s[i]ands[i := v]subscript.Sequence.*primitives:empty,build,select,update,length,append,contains,take,drop.Sequence.Arrays (
Array<T>, mutable heap-backed):Array<T>types,a[i]read,a[i] := vdestructive write,Array.length(a)length.$Arraycomposite with a$data: Seq<int>field (theintelement type matches the currentArray<int>-only restriction; see the validator diagnostic below). The$prefix follows the existing convention for compiler-internal names.Array<T>don't get the synthetic composite.modifiesclauses.Sequence.fromArray(a)takes an immutable snapshot of anArray<T>'s current contents. Snapshot is independent of subsequent mutations.bounds checking
Handled by Core preconditions from #1100:
Sequence.select,Sequence.update:0 <= i < length(s)Sequence.take,Sequence.drop:0 <= n <= length(s)Core's
PrecondElimpass generates VC obligations at every call site — both in imperative code (via inserted asserts) and in pure positions likerequires,ensures, quantifier bodies, and function bodies (via synthetic$$wfprocedures). Errors are classified asoutOfBoundsAccessfor SARIF reporting, matching how division by zero is handled.Validator diagnostics
ValidateSubscriptUsageflags five syntactic misuses before verification runs:a[i := v]onArray<T>— functional update not supported on mutable arrays.s[i] := vonSeq<T>— destructive update not allowed on immutable sequences.Array.length(x)wherexis not anArray<T>.Array<T>whereT ≠ int(current SMT limitation).Sequence.fromArray(x)wherexis not anArray<T>.When the validator fires, Core translation is skipped to prevent follow-on type-checker noise from obscuring the helpful message.
BoxSeq per-element-type constructor names
Initial versions of this PR shared a single
BoxSeqconstructor across allSeq<T>field types. This collided inHeapParameterizationwhen a program had a composite withSeqfields of different element types: deduplication by constructor name kept one, and Core type-checking then failed on the other. Fixed by deriving a per-element-type tag (BoxSeq_int,BoxSeq_bool, …) matching the existing per-primitiveBoxInt/BoxBool/… approach.T22_MixedSeqFieldsregresses it.Out of scope
Array<T>forT ≠ int: rejected by diagnostic 4. Lifting this would require per-element-type$Arrayinjection (or similar) at the Laurel layer.Sequence.empty<T>()syntax in raw Core source — separate grammar-level design.Tests
StrataTest/Languages/Laurel/Examples/Fundamentals/T18_Sequences.lean,T19_Arrays.lean, andT22_MixedSeqFields.leancover:Sequence.*operations, contracts withrequires/ensures/opaque, quantifiers, nested sequences, aliasing, loops, inter-proceduralmodifies,Sequence.fromArraysnapshot semantics, one composite carrying twoSeqfields of different element types.Core-side tests for the bounds preconditions live in #1100.
Docs
docs/verso/LaurelDoc.leangains a# Sequences and Arrayssection covering literals, subscripts, operations,Array.length,Sequence.fromArraywith snapshot semantics, the verification-obligation treatment of OOB, a "Common mistakes" list tied to the five validator diagnostics, and internal representation.