Skip to content

Restore CDDL array and map serializers for #244#263

Merged
tahina-pro merged 77 commits intoproject-everest:_nik_smt_univs_2025from
tahina-pro:_taramana_cddl_serialize_gen
Mar 16, 2026
Merged

Restore CDDL array and map serializers for #244#263
tahina-pro merged 77 commits intoproject-everest:_nik_smt_univs_2025from
tahina-pro:_taramana_cddl_serialize_gen

Conversation

@tahina-pro
Copy link
Copy Markdown
Member

This PR solves the admit()s in CDDL.Pulse.Serialize.ArrayGroup and CDDL.Pulse.Serialize.MapGroup introduced by #244 , by introducing a more generic version in preparation for extending CDDL serialization to generic CBOR parsers (as proposed by #251, but the actual support for non-deterministic CBOR has not been made yet.)

Thus, this PR also re-enables "basic" CDDL unit tests.

For array groups: I wrote the Gen.ArrayGroup.Base combinator specifications by hand, but combinators and their proofs have been written by AI.

For map groups: I wrote the Gen.MapGroup.Base combinator specifications, the executable Pulse code for ZeroOrMore, and invariant0 and invariant1 by hand, but invariant2 and the proofs have entirely been written by AI, as well as most other map group combinators.

tahina-pro and others added 30 commits February 28, 2026 17:28
    - Change len from nat to int in valid/invalid definitions
    - Add size_before: SZ.t parameter to impl_serialize_array_group_post
    - Replace existential size in precondition with pts_to out_size size_before

    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

    .fsti excerpt from 76ee67a
Replace the three admit() calls at the zero_or_more_iterator,
one_or_more_slice, and one_or_more_iterator functions with verified
implementations.

Key changes:
- impl_serialize_array_group_zero_or_more_iterator: Full loop
  implementation with iterator operations, guarded invariant for
  v == l1 ++ l2 (since iterator advances in false branch), and
  split exit handling via exit_false helper.
- impl_serialize_array_group_one_or_more_slice: Check if empty then
  return false, else delegate to zero_or_more_slice with conversion
  lemmas.
- impl_serialize_array_group_one_or_more_iterator: Check if empty then
  return false, else delegate to zero_or_more_iterator with conversion
  lemmas.
- Added helper lemmas: one_or_more_empty, one_or_more_nonempty, and
  post_zero_or_more_exit_false.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add helper lemmas (impl_serialize_elim_lemma and impl_serialize_intro_lemma)
that bridge Base.impl_serialize_post and Gen.impl_serialize_post using
Cbor.cbor_det_serialize_parse, with SMT patterns for automatic application.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- MapGroup.Base: increase z3rlimit for cbor_map_fold_max_length_accu induction
- Raw.Format: add round-trip assertion and increase rlimit for parse_cbor_map_equiv
- Nondet.Gen: add assertions for argument_as_uint64 refinement precondition

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Provide the .fst implementation for the ArrayGroup serialization interface.
- Add bridge lemmas connecting non-Gen and Gen pre/post conditions
- Implement intro/elim as Pulse fn wrappers with erasable arguments
- Implement all combinators by composing intro/elim with Gen combinators
- Fix intro/elim signatures to use [@@@Erasable] on ghost arguments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
pointed out by the AI on ArrayGroup
Pulse implementation of the MapGroup serialization interface, bridging
between deterministic (local) and generic (Gen) pre/post conditions.

- Bridge lemmas for pre, valid, and post condition conversion
- impl_serialize_map_group_intro/elim wrappers
- cbor_det_serialize_map_insert_elim and cbor_det_parse_elim bridges
- All combinators: map, ext, nop, choice, zero_or_one, either_left,
  concat, match_item_for, zero_or_more

Zero admits; fully verified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the Gen (generic) version of map group serializer
combinators. Most combinators verified; iterator loop has all
branches implemented with helper lemmas for valid_false proofs
in failure branches.

Remaining admits:
- impl_serialize_map_group_nop (fsti design issue)
- impl_serialize_match_item_for_post_false (abstract maxl)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ite-specific lemmas

Split the single admitted lemma impl_serialize_match_item_for_post_false into
4 lemmas with preconditions specific to each call site in
impl_serialize_match_item_for, and fully prove all of them:

1. impl_serialize_match_item_for_post_false_insert_failed:
   Insert failed (cbor_map_defined key l => disjoint is false)

2. impl_serialize_match_item_for_post_false_value_ser_failed:
   Value serialization returned 0 (cbor_map_max_length reasoning)

3. impl_serialize_match_item_for_post_false_key_ser_failed:
   Key serialization returned 0 (cbor_map_max_length reasoning)

4. impl_serialize_match_item_for_post_false_count_overflow:
   Count overflow (fits is false when count >= pow2 64 - 1)

Also add S.pts_to_len out before the first S.split to establish
buffer length facts needed by the call sites, and increase
z3rlimit from 128 to 512.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Added valid_false_sz1_gen and valid_false_sz2_gen helper lemmas
  (currently admit) to prove valid==false when key/value
  serialization fails due to buffer overflow
- Added contradiction proofs for unreachable parse-None branches
- Added explicit snoc_gen call in insert-failed branch
- Removed all --admit_smt_queries true wrappers
- Fixed push/pop-options balance
- Fixed minl parameter passing in helper call sites

Remaining admits (4):
- impl_serialize_map_group_nop (fsti design issue)
- impl_serialize_match_item_for_post_false (abstract maxl)
- valid_false_sz1_gen (needs max_length reasoning)
- valid_false_sz2_gen (needs max_length reasoning)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tahina-pro and others added 29 commits March 14, 2026 06:21
- Created Lemma5.fsti/fst: proves invariant_min preservation across iterator step
- Created Lemma6.fsti/fst: proves invariant_max preservation across iterator step
- Both use SMTPat on full conclusion to fire automatically from Pulse
- Added rel_len call in Aux2.fst to provide map_of_list_maps_to_nonempty v'
- Replaced assume pure with assert pure for both invariant_min and invariant_max
- Proof technique: top-level aux lemma + Classical.forall_intro_2/move_requires
  to avoid context pruning issues inside Classical.exists_elim callbacks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions

Replace each of the 6 assume pure False statements in the loop body
with assume pure (impl_serialize_map_zero_or_more_iterator_gen_invariant ...)
using the correct current state values at each point.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Create 8 new lemma interface files (Lemma7-Lemma14.fsti) and replace
each assume pure statement with a call to the corresponding lemma
followed by assert pure:

- Lemma7: invariant_init (pre-loop initialization)
- Lemma8: invariant_count_overflow (count overflow error)
- Lemma9: invariant_key_ser_fail (key serialization failure)
- Lemma10: invariant_value_ser_fail (value serialization failure)
- Lemma11: invariant_excluded (excluded entry)
- Lemma12: invariant_insert_success (successful insert)
- Lemma13: invariant_insert_dup (duplicate key)
- Lemma14: invariant_to_post (loop invariant to postcondition)

No .fst implementations since invariant is still an assume val.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add specific preconditions derived from function pre, loop invariant,
function call postconditions, and branch conditions:
- Lemma7: impl_serialize_map_group_pre
- Lemma8: count overflow (U64.v count == pow2 64 - 1)
- Lemma9/10/11/13: Seq.length/size facts via S.pts_to_len
- Lemma14: loop exit condition (res == false \/ em == true)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Trace pure conditions at each lemma call site from while condition,
branch conditions, and function call postconditions:

- Lemma8: add em == false (while condition)
- Lemma9-11,13: add em == false, U64.v count <> pow2 64 - 1 (branch),
  map_of_list_maps_to_nonempty v (rel_len postcondition)
- Lemma12: add map_of_list_maps_to_nonempty v (rel_len postcondition)
- Lemma14: add Seq.length vout == SZ.v (S.len out) (S.pts_to_len)
- Aux2.fst: add S.pts_to_len out before invariant_to_post call

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add existential preconditions to Lemma9-14 that capture the unique
pure facts from each code path in the loop body:

- Lemma9 (key ser fail): key from iterator not serializable or doesn't fit
- Lemma10 (value ser fail): key succeeded, value not serializable/doesn't fit
- Lemma11 (excluded): both serializable, entry excluded by map constraint
- Lemma12 (insert success): both serializable, entry not excluded
- Lemma13 (insert dup): both serializable, not excluded, duplicate key in map
- Lemma14 (post-loop): initial map precondition from function entry

Each error branch now has distinct preconditions reflecting its specific
code path through the iterator next, serialize, parse, and insert operations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the loop invariant (with res == true) as an existentially
quantified precondition to Lemma9-13, which are called from within
the loop body. The old state values are existentially quantified
since the current state at each call site differs from the loop
entry state after next/serialize/parse operations.

Lemma8 already had the invariant directly (state unchanged at that
point). Lemma14 (post-loop) is unchanged as it's outside the loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Combine the two separate existential blocks in Lemma9-13 into a
single existential that tracks the relationships between loop-entry
state and current state at each call site:

- v_old_iter = map_of_list_cons key_eq gk gv v (from next)
- min = update_min(min_old, gk, gv)
- max = update_max(max_old, gk, gv)

For error paths (Lemma9-11, 13): em, count, m, size are unchanged
from loop entry, so the invariant uses them directly. Only vout_old,
min_old, max_old remain existentially quantified.

For Lemma12 (insert success): all old state values are existential
since count, size, m, em are all modified by the success path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For Lemma9-13, promote vout_old, gk, gv, min_old, max_old from
existentially quantified precondition variables to explicit
Ghost.erased parameters. Only keq remains existential (adding more
args hits a Pulse elaboration limit). Lemma14 keeps its 3 explicit
args (count_init, size_init, w_init) with no remaining existentials.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Plain types work since Pulse lemma calls are in the ghost effect,
so ghost values from with-bindings can be passed directly without
Ghost.erased wrapping. This also removes all Ghost.reveal calls
from the requires clauses of Lemma9-14.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Lemma10: promote sz1 (nat) as 6th explicit argument.
Lemma12: promote em_old, size_old, count_old, m_old as 4 more
explicit arguments (9 total new args).

The only remaining existential across all lemmas is keq
(EqTest.eq_test tkey), which has a function type that triggers
Pulse elaboration Error 19 when passed as an explicit argument.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split the while loop body from Aux2.fst into a new
Aux2.LoopBody.fsti/.fst module to reduce memory consumption
during verification. The loop invariant existentials are lifted
as explicit Ghost.erased arguments to the loop body function.

Aux2.fst now contains only the loop setup, the while loop with
a single call to the loop body, and the post-loop teardown.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the insert_success (24.9 GB) and insert_dup (21.8 GB) code paths
from LoopBody.fst into a separate InsertBranch.fsti/.fst module.
LoopBody.fst now calls InsertBranch at the else-branch of the exception
check. The InsertBranch body currently uses admit(); the precondition
uses pure True. The interface and call site are verified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Define insert_branch_pure_pre named predicate in .fsti to avoid
  Pulse val/fn pure clause matching issues with Ghost.erased
  function-typed params
- Restore full body code: cbor_serialize_map_insert_pre_intro,
  insert, invariant_insert_success/dup branches
- All verification passes with make cddl

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace admit() in invariant2 with actual definition using
  invariant1 plus cbor_map_union equality (res==true) and
  impl_serialize_map_group_valid negation (res==false)
- Add (l: cbor_map) as last parameter to invariant2, invariant,
  and invariant_reveal in Invariant.fsti/fst
- Propagate l parameter to all 14 files that call invariant:
  Proof0, Lemma7-14, Aux2, LoopBody, InsertBranch (.fsti/.fst)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Proof0.fst: trivial proof via invariant_reveal
- Lemma7.fst: init proof via invariant_reveal (z3rlimit 64)
- Lemma14.fst: post proof (invariant -> postcondition)
  - Helper lemma serializer_empty_is_empty: proves mg_serializer
    of Map.empty is cbor_map_empty using Set.fold_eq
  - Uses cbor_parse_map_prefix_full for prefix property
  - Uses cbor_map_length_disjoint_union for fits property
- No admit() or assume left in any proof file

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add disjointness hypothesis to invariant2's backward direction:
keys remaining in v must not have their serialized form in m.
This allows Lemma12 (insert success) to derive ~ Map.defined gk v
from the fact that sp1.serializer gk was just inserted into m,
without requiring it as an explicit precondition.

New lemma proofs:
- Lemma8: count overflow (map_nonempty_length helper)
- Lemma9: key serialization failure (max_length chain)
- Lemma10: value serialization failure (enriched .fsti)
- Lemma11: excluded entry (serializable decomposition)
- Lemma12: insert success (derives ~ Map.defined from invariant)
- Lemma13: insert duplicate (disjointness contradiction)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pro/quackyducky into _taramana_cddl_serialize_gen
Move the parse+exclusion/insert branch (the heaviest code path at ~14GB)
into a separate ParseBranch module. LoopBody now calls ParseBranch after
joining slices back, reducing its peak memory from ~14GB to ~8.7GB.

ParseBranch.fst verifies at ~978MB independently.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tahina-pro tahina-pro merged commit 241926a into project-everest:_nik_smt_univs_2025 Mar 16, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant