feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics by fabiomadge · Pull Request #1073 · strata-org/Strata

fabiomadge · 2026-04-29T00:57:30Z

Adds Seq<T> (immutable value sequences) and Array<T> (mutable heap-backed arrays) to Laurel, with type-aware desugaring, bounds-checked subscript, validator diagnostics for common misuses, and a Sequence.fromArray snapshot operation.

Supersedes #787, which became stale against main.

Depends on #1100 (Core PR for the Sequence well-formedness infrastructure). The first four commits of this branch are the Core PR; once it merges, this branch will be rebased onto main.

scope

Sequences (Seq<T>, immutable):

Seq<T> types, [a, b, c] sequence literals, s[i] and s[i := v] subscript.
9 external Sequence.* primitives: empty, build, select, update, length, append, contains, take, drop.
Translates to Core's polymorphic Sequence.

Arrays (Array<T>, mutable heap-backed):

Array<T> types, a[i] read, a[i] := v destructive write, Array.length(a) length.
Represented internally by a synthetic $Array composite with a $data: Seq<int> field (the int element type matches the current Array<int>-only restriction; see the validator diagnostic below). The $ prefix follows the existing convention for compiler-internal names.
Conditional injection — programs that don't use Array<T> don't get the synthetic composite.
Participates in modifies clauses.
Sequence.fromArray(a) takes an immutable snapshot of an Array<T>'s current contents. Snapshot is independent of subsequent mutations.

bounds checking

Handled by Core preconditions from #1100:

Sequence.select, Sequence.update: 0 <= i < length(s)
Sequence.take, Sequence.drop: 0 <= n <= length(s)

Core's PrecondElim pass generates VC obligations at every call site — both in imperative code (via inserted asserts) and in pure positions like requires, ensures, quantifier bodies, and function bodies (via synthetic $$wf procedures). Errors are classified as outOfBoundsAccess for SARIF reporting, matching how division by zero is handled.

Validator diagnostics

ValidateSubscriptUsage flags five syntactic misuses before verification runs:

a[i := v] on Array<T> — functional update not supported on mutable arrays.
s[i] := v on Seq<T> — destructive update not allowed on immutable sequences.
Array.length(x) where x is not an Array<T>.
Array<T> where T ≠ int (current SMT limitation).
Sequence.fromArray(x) where x is not an Array<T>.

When the validator fires, Core translation is skipped to prevent follow-on type-checker noise from obscuring the helpful message.

BoxSeq per-element-type constructor names

Initial versions of this PR shared a single BoxSeq constructor across all Seq<T> field types. This collided in HeapParameterization when a program had a composite with Seq fields of different element types: deduplication by constructor name kept one, and Core type-checking then failed on the other. Fixed by deriving a per-element-type tag (BoxSeq_int, BoxSeq_bool, …) matching the existing per-primitive BoxInt/BoxBool/… approach. T22_MixedSeqFields regresses it.

Out of scope

Array<T> for T ≠ int: rejected by diagnostic 4. Lifting this would require per-element-type $Array injection (or similar) at the Laurel layer.
Parseable Sequence.empty<T>() syntax in raw Core source — separate grammar-level design.

Tests

StrataTest/Languages/Laurel/Examples/Fundamentals/T18_Sequences.lean, T19_Arrays.lean, and T22_MixedSeqFields.lean cover:

Positive: literal construction, subscript read/update, all Sequence.* operations, contracts with requires/ensures/opaque, quantifiers, nested sequences, aliasing, loops, inter-procedural modifies, Sequence.fromArray snapshot semantics, one composite carrying two Seq fields of different element types.
Negative: one test per validator diagnostic, pinned on substring of each error message.

Core-side tests for the bounds preconditions live in #1100.

Docs

docs/verso/LaurelDoc.lean gains a # Sequences and Arrays section covering literals, subscripts, operations, Array.length, Sequence.fromArray with snapshot semantics, the verification-obligation treatment of OOB, a "Common mistakes" list tied to the five validator diagnostics, and internal representation.

handleZeroaryOps fell back to logging an error and returning re.none() for any 0-ary op outside the regex set. That silently substituted a regex primitive for unrelated ops in VC printer output; users saw re.none() where e.g. Sequence.empty() was intended. Switch the fallback to mkGenericCall, matching how handleUnaryOps and handleBinaryOps already handle unknown ops. The printer now emits the op name as a free-variable reference, preserving the intent. Parseable Sequence.empty<T>() syntax is still a separate grammar-level feature; this commit only fixes the printer-side noise.

polyUneval is the combinator used to declare unevaluated polymorphic functions with axioms. Unlike unaryOp and binaryOp, it had no way to attach preconditions; callers had to hand-build the WFLFunc. Add a 'preconditions' parameter and the matching free-vars proof obligation (subset of the function's input names), defaulting to empty. No behavioral change for existing callers.

Sequence.select and Sequence.update now require `0 <= i < length(s)`; Sequence.take and Sequence.drop require `0 <= n <= length(s)`. PrecondElim picks these up and generates VC obligations at call sites, both in statement positions (via transformStmt) and in pure positions (via mkContractWFProc / mkFuncWFProc) — so requires/ensures/quantifier-body subscripts are also covered. Obligations carry the propertyType metadata "outOfBoundsAccess" (new MetaData constant) and flow through a new PropertyType.outOfBoundsAccess enum variant — with matching entries in the statement-eval / obligation-extraction / cmd-eval metadata-to-PropertyType conversion sites — to finally render as "out-of-bounds-access" in SARIF output, matching how divisionByZero and arithmeticOverflow are classified. Side effect: `propertyTypeToClassification` in SarifOutput.lean was previously dead code; `vcResultToSarifResult` never set `properties.propertyType` so the SARIF output defaulted every obligation to "assert". Wiring this up means divisionByZero and arithmeticOverflow obligations now also classify correctly in SARIF — a pre-existing bug this PR incidentally fixes.

New tests in StrataTest/Transform/PrecondElim.lean: - Test 10: Sequence.select in a procedure body emits the bounds assert (PrecondElim is unconditional — it inserts regardless of any surrounding requires guard; the SMT solver discharges). - Test 10c: Sequence.select inside a requires clause triggers the $$wf-procedure path (mkContractWFProc). - Test 10d: Sequence.select inside a function body triggers the function-body $$wf path (mkFuncWFStmts). - Test 11: collectPrecondAsserts attaches outOfBoundsAccess metadata for all four partial ops and a nested call. Mirrors OverflowCheckTest.lean. Also verifies Sequence.length emits no obligation (it is total). - Test 12: Sequence.empty printer regression for the commit-1 fix — renders as a generic call, not re.none(). New property-classification tests in StrataTest/Languages/Core/Tests/SarifOutputTests.lean cover all five PropertyType variants, exercising the SARIF wiring fix in commit 3. Collateral test updates for real behavioral changes: - StrataTest/Languages/Core/Examples/Seq.lean: expected VC output includes the new bounds obligations (all SMT-provable from the surrounding context, except the pre-existing contains_yes unknown). - StrataTest/Languages/Core/Tests/ProgramEvalTests.lean: Sequence func signatures now render with the attached requires clauses. - StrataTest/Languages/Core/Examples/Loops.lean: commit-1 printer fix propagates (re.none() -> top, error message format updated).

…tions Sequences (immutable value types): - TSeq variant in HighType; Seq<T> grammar syntax - [1, 2, 3] sequence literals (desugared to Sequence.build chains) - s[i] subscript read and s[i := v] functional update - 9 external Sequence.* operations (empty, build, select, update, length, append, contains, take, drop) - Seq<T> translates to Core's polymorphic Sequence type Arrays (mutable heap-backed): - TArray variant in HighType; Array<T> grammar syntax - a[i] read and a[i] := v write with heap semantics (aliasing) - Seq literal to Array conversion: var a: Array<int> := [1, 2, 3] - Synthetic $Array composite with $data: Seq<T> field - Conditional injection — no $Array in programs that don't use arrays - Array<T> recognized as composite in modifies clauses - Array.length(a) desugared to Sequence.length(a.$data) Shared infrastructure: - Subscript AST node with type-aware SubscriptElim pass - Grammar productions: seqType, arrayType, subscript, seqLiteral Co-authored-by: Fabio Madge <fmadge@amazon.com>

…tics Add a Laurel-layer validator (ValidateSubscriptUsage) that runs alongside validateDiamondFieldAccesses and flags four common misuses with Dafny-style messages that suggest the correct syntax: 1. `a[i := v]` on `Array<T>` — arrays are mutable; use `a[i] := v` or declare `a` as `Seq<T>`. 2. `s[i] := v` on `Seq<T>` — sequences are immutable; use `s[i := v]` or declare `s` as `Array<T>`. 3. `Array.length(x)` where `x` is not an `Array<T>` — reports the actual argument type. 4. `Array<T>` where `T ≠ int` — flagged with a note about the current SMT limitation. Pipeline integration: - runLaurelPasses now returns a `skipCore : Bool` flag (true when the validator emitted diagnostics) so translateWithLaurel can skip Core translation and VC generation when the Laurel-layer diagnostic is the actionable error. This prevents confusing Core type-checking noise from stacking on top of the validator's helpful message. - SubscriptElim hardens a couple of edge cases so downstream passes don't stack follow-on errors when the validator has already flagged a misuse (no-op for Seq destructive update; LiteralInt 0 fallback for Array.length on a non-Array). Negative tests and docs: - T18_Sequences: negative case for diagnostic 2. - T19_Arrays: negative cases for diagnostics 1, 3, and 4. - docs/verso/LaurelDoc: new "Common mistakes" subsection with example snippets for each of the four validator diagnostics.

Adds the Sequence.fromArray(a) builtin for taking an immutable Seq<T> snapshot of an Array<T>'s current contents. SubscriptElim rewrites calls into the a#$data internal field, preserving the existing Array-layout convention. Validator gains a fifth diagnostic flagging Sequence.fromArray calls whose argument type is not an Array<T>. Tests: - Positive cases in T19 covering snapshot semantics (mutation to the array after extraction is not reflected in the captured sequence). - Negative case asserting the new validator diagnostic fires on a non-Array argument. - T19's inter-procedural setFirst now also ensures length-preservation, which is required under the Core-level bounds preconditions introduced by the preceding Core commits in this branch (fixes a failure that surfaced when bounds obligations became checks rather than no-ops). Docs: - Rewrites the out-of-bounds semantics note from 'unconstrained' to 'verification obligation', matching the Core preconditions introduced upstream in this branch. - New 'Array to sequence conversion' section covering Sequence.fromArray and why there is no implicit Array to Seq coercion. - New 'Common mistakes' entry for Sequence.fromArray on a non-Array argument.

The TSeq arms of boxConstructorName/boxDestructorName/boxConstructorDef returned a fixed 'BoxSeq' / 'Box..SeqVal!' string regardless of the sequence's element type. HeapParameterization deduplicates usedBoxConstructors by name, so two composites with Seq fields of different element types produced two BoxSeq entries with incompatible argument types; only one survived dedup and Core type-checking then failed on the other. Derive a per-element-type tag and append it to the base name: Seq<int> becomes BoxSeq_int, Seq<bool> becomes BoxSeq_bool, Seq<Seq<int>> becomes BoxSeq_Seq_int, etc. Mirrors the existing per-primitive approach (BoxInt, BoxBool, ...). The new highTypeTag helper is also prepared to handle TArray / TSet / TMap / Pure / TTypedField by recursion, so the same pattern extends if box constructors are ever needed for those. T22_MixedSeqFields regression exercises two composites with Seq<int> and Seq<bool> fields and writes/reads through both. Fixes the bug previously tracked in #1101 (now closed).

MikaelMayer

🤖 Well-structured PR. The separation between ValidateSubscriptUsage (pure diagnostics) and SubscriptElim (rewriting) is clean, and the centralized SeqOp.* / arrayCompositeName constants prevent name drift between passes. The highTypeTag approach for per-element-type Box constructors is a good solution to the deduplication collision. The bounds-precondition infrastructure (mkSeqBoundsPrecond + classifyPrecondition) integrates naturally with the existing PrecondElim pattern.

A few items below.

MikaelMayer · 2026-05-05T13:52:37Z

+    | other => other
+  let program' := { program with
+    types := types'
+    staticProcedures := program.staticProcedures.map (elimProcedure model) }


🤖 subscriptElim processes types (instance procedures) and staticProcedures, but not program.constants. A constant with a subscript initializer (e.g. const x: int := someSeq[0]) would leave an un-eliminated Subscript node in the tree, triggering the "Subscript should have been eliminated" diagnostic in LaurelToCoreTranslator. Unlikely in practice today, but worth either handling or documenting as unsupported.

MikaelMayer · 2026-05-05T13:52:37Z

+/-- Eliminate `Subscript` nodes and desugar `Array.length` across a program.
+    Conditionally injects the `$Array` synthetic composite when the program
+    uses `Array<T>` anywhere. -/
+public def subscriptElim (_model : SemanticModel) (program : Program)


🤖 Nit: _model is accepted and immediately discarded. Since the pipeline's LaurelPass.run signature requires Program → SemanticModel → ..., this is unavoidable at the call site, but the docstring could note why the caller's model is stale (it predates $Array injection) to save future readers from wondering whether this is a bug.

MikaelMayer · 2026-05-05T13:52:37Z

+/-- `Sequence.length(s) : int` — length of the sequence. -/
+def SeqOp.length   := "Sequence.length"
+/-- Name of the `$data` field on the synthetic `$Array` composite. -/
+def SeqOp.dataField := "$data"


🤖 Nit: SeqOp.dataField is semantically different from the other SeqOp.* entries — it's a field name on the synthetic composite, not a Sequence.* operation name. Consider a separate namespace or at least a comment distinguishing it from the operation names above.

github-actions Bot added Laurel Python Git conflicts labels Apr 29, 2026

fabiomadge added 2 commits May 2, 2026 17:16

fabiomadge mentioned this pull request May 2, 2026

Core: Sequence bounds preconditions and VC-printer fallback fix #1100

Open

fabiomadge added 2 commits May 3, 2026 00:52

fabiomadge mentioned this pull request May 3, 2026

BoxSeq collision when composites have Seq fields of different element types #1101

Closed

fabiomadge force-pushed the feat/laurel-seq-array branch from 589d615 to 4f5dd33 Compare May 3, 2026 00:15

github-actions Bot added Core and removed Git conflicts labels May 3, 2026

fabiomadge changed the title ~~feat(laurel): Add Seq<T> and Array<T> types with usage diagnostics~~ feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics May 3, 2026

fabiomadge force-pushed the feat/laurel-seq-array branch from 1468e37 to 6424973 Compare May 3, 2026 00:35

fabiomadge added 4 commits May 3, 2026 03:13

fabiomadge force-pushed the feat/laurel-seq-array branch from 6424973 to 1f4b0ce Compare May 3, 2026 01:15

github-actions Bot added the Git conflicts label May 5, 2026

MikaelMayer reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073

feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073
fabiomadge wants to merge 8 commits intomainfrom
feat/laurel-seq-array

fabiomadge commented Apr 29, 2026 •

edited

Loading

Uh oh!

MikaelMayer left a comment

Uh oh!

MikaelMayer May 5, 2026

Uh oh!

MikaelMayer May 5, 2026

Uh oh!

MikaelMayer May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fabiomadge commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

scope

bounds checking

Validator diagnostics

BoxSeq per-element-type constructor names

Out of scope

Tests

Docs

Uh oh!

MikaelMayer left a comment

Choose a reason for hiding this comment

Uh oh!

MikaelMayer May 5, 2026

Choose a reason for hiding this comment

Uh oh!

MikaelMayer May 5, 2026

Choose a reason for hiding this comment

Uh oh!

MikaelMayer May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fabiomadge commented Apr 29, 2026 •

edited

Loading