Skip to content

feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073

Draft
fabiomadge wants to merge 8 commits intomainfrom
feat/laurel-seq-array
Draft

feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics#1073
fabiomadge wants to merge 8 commits intomainfrom
feat/laurel-seq-array

Conversation

@fabiomadge
Copy link
Copy Markdown
Contributor

@fabiomadge fabiomadge commented Apr 29, 2026

Adds Seq<T> (immutable value sequences) and Array<T> (mutable heap-backed arrays) to Laurel, with type-aware desugaring, bounds-checked subscript, validator diagnostics for common misuses, and a Sequence.fromArray snapshot operation.

Supersedes #787, which became stale against main.

Depends on #1100 (Core PR for the Sequence well-formedness infrastructure). The first four commits of this branch are the Core PR; once it merges, this branch will be rebased onto main.

scope

Sequences (Seq<T>, immutable):

  • Seq<T> types, [a, b, c] sequence literals, s[i] and s[i := v] subscript.
  • 9 external Sequence.* primitives: empty, build, select, update, length, append, contains, take, drop.
  • Translates to Core's polymorphic Sequence.

Arrays (Array<T>, mutable heap-backed):

  • Array<T> types, a[i] read, a[i] := v destructive write, Array.length(a) length.
  • Represented internally by a synthetic $Array composite with a $data: Seq<int> field (the int element type matches the current Array<int>-only restriction; see the validator diagnostic below). The $ prefix follows the existing convention for compiler-internal names.
  • Conditional injection — programs that don't use Array<T> don't get the synthetic composite.
  • Participates in modifies clauses.
  • Sequence.fromArray(a) takes an immutable snapshot of an Array<T>'s current contents. Snapshot is independent of subsequent mutations.

bounds checking

Handled by Core preconditions from #1100:

  • Sequence.select, Sequence.update: 0 <= i < length(s)
  • Sequence.take, Sequence.drop: 0 <= n <= length(s)

Core's PrecondElim pass generates VC obligations at every call site — both in imperative code (via inserted asserts) and in pure positions like requires, ensures, quantifier bodies, and function bodies (via synthetic $$wf procedures). Errors are classified as outOfBoundsAccess for SARIF reporting, matching how division by zero is handled.

Validator diagnostics

ValidateSubscriptUsage flags five syntactic misuses before verification runs:

  1. a[i := v] on Array<T> — functional update not supported on mutable arrays.
  2. s[i] := v on Seq<T> — destructive update not allowed on immutable sequences.
  3. Array.length(x) where x is not an Array<T>.
  4. Array<T> where T ≠ int (current SMT limitation).
  5. Sequence.fromArray(x) where x is not an Array<T>.

When the validator fires, Core translation is skipped to prevent follow-on type-checker noise from obscuring the helpful message.

BoxSeq per-element-type constructor names

Initial versions of this PR shared a single BoxSeq constructor across all Seq<T> field types. This collided in HeapParameterization when a program had a composite with Seq fields of different element types: deduplication by constructor name kept one, and Core type-checking then failed on the other. Fixed by deriving a per-element-type tag (BoxSeq_int, BoxSeq_bool, …) matching the existing per-primitive BoxInt/BoxBool/… approach. T22_MixedSeqFields regresses it.

Out of scope

  • Array<T> for T ≠ int: rejected by diagnostic 4. Lifting this would require per-element-type $Array injection (or similar) at the Laurel layer.
  • Parseable Sequence.empty<T>() syntax in raw Core source — separate grammar-level design.

Tests

StrataTest/Languages/Laurel/Examples/Fundamentals/T18_Sequences.lean, T19_Arrays.lean, and T22_MixedSeqFields.lean cover:

  • Positive: literal construction, subscript read/update, all Sequence.* operations, contracts with requires/ensures/opaque, quantifiers, nested sequences, aliasing, loops, inter-procedural modifies, Sequence.fromArray snapshot semantics, one composite carrying two Seq fields of different element types.
  • Negative: one test per validator diagnostic, pinned on substring of each error message.

Core-side tests for the bounds preconditions live in #1100.

Docs

docs/verso/LaurelDoc.lean gains a # Sequences and Arrays section covering literals, subscripts, operations, Array.length, Sequence.fromArray with snapshot semantics, the verification-obligation treatment of OOB, a "Common mistakes" list tied to the five validator diagnostics, and internal representation.

fabiomadge added 2 commits May 2, 2026 17:16
handleZeroaryOps fell back to logging an error and returning re.none() for
any 0-ary op outside the regex set. That silently substituted a regex
primitive for unrelated ops in VC printer output; users saw re.none()
where e.g. Sequence.empty() was intended.

Switch the fallback to mkGenericCall, matching how handleUnaryOps and
handleBinaryOps already handle unknown ops. The printer now emits the
op name as a free-variable reference, preserving the intent.

Parseable Sequence.empty<T>() syntax is still a separate grammar-level
feature; this commit only fixes the printer-side noise.
polyUneval is the combinator used to declare unevaluated polymorphic
functions with axioms. Unlike unaryOp and binaryOp, it had no way to
attach preconditions; callers had to hand-build the WFLFunc.

Add a 'preconditions' parameter and the matching free-vars proof
obligation (subset of the function's input names), defaulting to empty.
No behavioral change for existing callers.
fabiomadge added 2 commits May 3, 2026 00:52
Sequence.select and Sequence.update now require `0 <= i < length(s)`;
Sequence.take and Sequence.drop require `0 <= n <= length(s)`. PrecondElim
picks these up and generates VC obligations at call sites, both in
statement positions (via transformStmt) and in pure positions (via
mkContractWFProc / mkFuncWFProc) — so requires/ensures/quantifier-body
subscripts are also covered.

Obligations carry the propertyType metadata "outOfBoundsAccess" (new
MetaData constant) and flow through a new PropertyType.outOfBoundsAccess
enum variant — with matching entries in the statement-eval /
obligation-extraction / cmd-eval metadata-to-PropertyType conversion
sites — to finally render as "out-of-bounds-access" in SARIF output,
matching how divisionByZero and arithmeticOverflow are classified.

Side effect: `propertyTypeToClassification` in SarifOutput.lean was
previously dead code; `vcResultToSarifResult` never set
`properties.propertyType` so the SARIF output defaulted every obligation
to "assert". Wiring this up means divisionByZero and arithmeticOverflow
obligations now also classify correctly in SARIF — a pre-existing bug
this PR incidentally fixes.
New tests in StrataTest/Transform/PrecondElim.lean:
- Test 10:  Sequence.select in a procedure body emits the bounds assert
            (PrecondElim is unconditional — it inserts regardless of any
            surrounding requires guard; the SMT solver discharges).
- Test 10c: Sequence.select inside a requires clause triggers the
            $$wf-procedure path (mkContractWFProc).
- Test 10d: Sequence.select inside a function body triggers the
            function-body $$wf path (mkFuncWFStmts).
- Test 11:  collectPrecondAsserts attaches outOfBoundsAccess metadata
            for all four partial ops and a nested call. Mirrors
            OverflowCheckTest.lean. Also verifies Sequence.length emits
            no obligation (it is total).
- Test 12:  Sequence.empty printer regression for the commit-1 fix —
            renders as a generic call, not re.none().

New property-classification tests in
StrataTest/Languages/Core/Tests/SarifOutputTests.lean cover all five
PropertyType variants, exercising the SARIF wiring fix in commit 3.

Collateral test updates for real behavioral changes:
- StrataTest/Languages/Core/Examples/Seq.lean: expected VC output
  includes the new bounds obligations (all SMT-provable from the
  surrounding context, except the pre-existing contains_yes unknown).
- StrataTest/Languages/Core/Tests/ProgramEvalTests.lean: Sequence func
  signatures now render with the attached requires clauses.
- StrataTest/Languages/Core/Examples/Loops.lean: commit-1 printer fix
  propagates (re.none() -> top, error message format updated).
@fabiomadge fabiomadge force-pushed the feat/laurel-seq-array branch from 589d615 to 4f5dd33 Compare May 3, 2026 00:15
@fabiomadge fabiomadge changed the title feat(laurel): Add Seq<T> and Array<T> types with usage diagnostics feat(laurel): Seq<T> and Array<T> with bounds-checked subscript and diagnostics May 3, 2026
@fabiomadge fabiomadge force-pushed the feat/laurel-seq-array branch from 1468e37 to 6424973 Compare May 3, 2026 00:35
fabiomadge added 4 commits May 3, 2026 03:13
…tions

Sequences (immutable value types):
- TSeq variant in HighType; Seq<T> grammar syntax
- [1, 2, 3] sequence literals (desugared to Sequence.build chains)
- s[i] subscript read and s[i := v] functional update
- 9 external Sequence.* operations (empty, build, select, update, length,
  append, contains, take, drop)
- Seq<T> translates to Core's polymorphic Sequence type

Arrays (mutable heap-backed):
- TArray variant in HighType; Array<T> grammar syntax
- a[i] read and a[i] := v write with heap semantics (aliasing)
- Seq literal to Array conversion: var a: Array<int> := [1, 2, 3]
- Synthetic $Array composite with $data: Seq<T> field
- Conditional injection — no $Array in programs that don't use arrays
- Array<T> recognized as composite in modifies clauses
- Array.length(a) desugared to Sequence.length(a.$data)

Shared infrastructure:
- Subscript AST node with type-aware SubscriptElim pass
- Grammar productions: seqType, arrayType, subscript, seqLiteral

Co-authored-by: Fabio Madge <fmadge@amazon.com>
…tics

Add a Laurel-layer validator (ValidateSubscriptUsage) that runs alongside
validateDiamondFieldAccesses and flags four common misuses with Dafny-style
messages that suggest the correct syntax:

1. `a[i := v]` on `Array<T>` — arrays are mutable; use `a[i] := v`
   or declare `a` as `Seq<T>`.
2. `s[i] := v` on `Seq<T>` — sequences are immutable; use `s[i := v]`
   or declare `s` as `Array<T>`.
3. `Array.length(x)` where `x` is not an `Array<T>` — reports the
   actual argument type.
4. `Array<T>` where `T ≠ int` — flagged with a note about the current
   SMT limitation.

Pipeline integration:
- runLaurelPasses now returns a `skipCore : Bool` flag (true when the
  validator emitted diagnostics) so translateWithLaurel can skip Core
  translation and VC generation when the Laurel-layer diagnostic is the
  actionable error. This prevents confusing Core type-checking noise from
  stacking on top of the validator's helpful message.
- SubscriptElim hardens a couple of edge cases so downstream passes don't
  stack follow-on errors when the validator has already flagged a misuse
  (no-op for Seq destructive update; LiteralInt 0 fallback for
  Array.length on a non-Array).

Negative tests and docs:
- T18_Sequences: negative case for diagnostic 2.
- T19_Arrays: negative cases for diagnostics 1, 3, and 4.
- docs/verso/LaurelDoc: new "Common mistakes" subsection with example
  snippets for each of the four validator diagnostics.
Adds the Sequence.fromArray(a) builtin for taking an immutable Seq<T>
snapshot of an Array<T>'s current contents. SubscriptElim rewrites calls
into the a#$data internal field, preserving the existing Array-layout
convention. Validator gains a fifth diagnostic flagging
Sequence.fromArray calls whose argument type is not an Array<T>.

Tests:
- Positive cases in T19 covering snapshot semantics (mutation to the
  array after extraction is not reflected in the captured sequence).
- Negative case asserting the new validator diagnostic fires on a
  non-Array argument.
- T19's inter-procedural setFirst now also ensures length-preservation,
  which is required under the Core-level bounds preconditions introduced
  by the preceding Core commits in this branch (fixes a failure that
  surfaced when bounds obligations became checks rather than no-ops).

Docs:
- Rewrites the out-of-bounds semantics note from 'unconstrained' to
  'verification obligation', matching the Core preconditions introduced
  upstream in this branch.
- New 'Array to sequence conversion' section covering Sequence.fromArray
  and why there is no implicit Array to Seq coercion.
- New 'Common mistakes' entry for Sequence.fromArray on a non-Array
  argument.
The TSeq arms of boxConstructorName/boxDestructorName/boxConstructorDef
returned a fixed 'BoxSeq' / 'Box..SeqVal!' string regardless of the
sequence's element type. HeapParameterization deduplicates
usedBoxConstructors by name, so two composites with Seq fields of
different element types produced two BoxSeq entries with incompatible
argument types; only one survived dedup and Core type-checking then
failed on the other.

Derive a per-element-type tag and append it to the base name: Seq<int>
becomes BoxSeq_int, Seq<bool> becomes BoxSeq_bool, Seq<Seq<int>>
becomes BoxSeq_Seq_int, etc. Mirrors the existing per-primitive approach
(BoxInt, BoxBool, ...). The new highTypeTag helper is also prepared to
handle TArray / TSet / TMap / Pure / TTypedField by recursion, so the
same pattern extends if box constructors are ever needed for those.

T22_MixedSeqFields regression exercises two composites with Seq<int>
and Seq<bool> fields and writes/reads through both.

Fixes the bug previously tracked in #1101 (now closed).
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Well-structured PR. The separation between ValidateSubscriptUsage (pure diagnostics) and SubscriptElim (rewriting) is clean, and the centralized SeqOp.* / arrayCompositeName constants prevent name drift between passes. The highTypeTag approach for per-element-type Box constructors is a good solution to the deduplication collision. The bounds-precondition infrastructure (mkSeqBoundsPrecond + classifyPrecondition) integrates naturally with the existing PrecondElim pattern.

A few items below.

| other => other
let program' := { program with
types := types'
staticProcedures := program.staticProcedures.map (elimProcedure model) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 subscriptElim processes types (instance procedures) and staticProcedures, but not program.constants. A constant with a subscript initializer (e.g. const x: int := someSeq[0]) would leave an un-eliminated Subscript node in the tree, triggering the "Subscript should have been eliminated" diagnostic in LaurelToCoreTranslator. Unlikely in practice today, but worth either handling or documenting as unsupported.

/-- Eliminate `Subscript` nodes and desugar `Array.length` across a program.
Conditionally injects the `$Array` synthetic composite when the program
uses `Array<T>` anywhere. -/
public def subscriptElim (_model : SemanticModel) (program : Program)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: _model is accepted and immediately discarded. Since the pipeline's LaurelPass.run signature requires Program → SemanticModel → ..., this is unavoidable at the call site, but the docstring could note why the caller's model is stale (it predates $Array injection) to save future readers from wondering whether this is a bug.

/-- `Sequence.length(s) : int` — length of the sequence. -/
def SeqOp.length := "Sequence.length"
/-- Name of the `$data` field on the synthetic `$Array` composite. -/
def SeqOp.dataField := "$data"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: SeqOp.dataField is semantically different from the other SeqOp.* entries — it's a field name on the synthetic composite, not a Sequence.* operation name. Consider a separate namespace or at least a comment distinguishing it from the operation names above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants