ANFEncoder: iterate to fixpoint to eliminate nested duplicates by tautschnig · Pull Request #1135 · strata-org/Strata

tautschnig · 2026-05-07T08:15:12Z

The ANF encoder extracts duplicated subexpressions into fresh variables, but the existing single-pass implementation can leave large duplicated sub-subexpressions behind.

Root cause: removeSubsumed drops candidate duplicates that are subexpressions of other (larger) candidate duplicates, to avoid creating redundant variable declarations. But this means that if only the outer expression appears at the top level, the inner dupes are hidden inside the lifted var declaration and never extracted.

Example (from PyAnalyzeLaurel benchmark check_storage_costs):

Original: assert Any_to_bool(Any_get(response, "Datapoints")) ...

After partial evaluation, Any_to_bool inlines its 7-branch body,
each branch referencing the argument. With Any_get also inlined
as an ite (is-DictStrAny response) (DictStrAny_get ...) (List_get ...), the Any_get expression ends up duplicated 62
times inside a single assert.

Old ANF output:
var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>;
assert $__anf.0

New ANF output (after iteration):
var $__anf.3 : Any := Any_get(response, "Datapoints");
var $__anf.0 : bool := <body using $__anf.3>;
assert $__anf.0

Effect: VC file size for VCs on one benchmark drops from 32KB to 11KB (~65% reduction), and another benchmark now completes verification at 36s where it previously hit a 60s timeout.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The ANF encoder extracts duplicated subexpressions into fresh variables, but the existing single-pass implementation can leave large duplicated sub-subexpressions behind. Root cause: `removeSubsumed` drops candidate duplicates that are subexpressions of other (larger) candidate duplicates, to avoid creating redundant variable declarations. But this means that if only the outer expression appears at the top level, the inner dupes are hidden inside the lifted var declaration and never extracted. Example (from PyAnalyzeLaurel benchmark check_storage_costs): Original: assert Any_to_bool(Any_get(response, "Datapoints")) ... After partial evaluation, Any_to_bool inlines its 7-branch body, each branch referencing the argument. With Any_get also inlined as an `ite (is-DictStrAny response) (DictStrAny_get ...) (List_get ...)`, the Any_get expression ends up duplicated 62 times inside a single assert. Old ANF output: var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>; assert $__anf.0 New ANF output (after iteration): var $__anf.3 : Any := Any_get(response, "Datapoints"); var $__anf.0 : bool := <body using $__anf.3>; assert $__anf.0 Effect: VC file size for VCs on one benchmark drops from 32KB to 11KB (~65% reduction), and another benchmark now completes verification at 36s where it previously hit a 60s timeout. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Copilot

Pull request overview

This PR improves the Core ANF encoder so it repeatedly extracts duplicated subexpressions until reaching a fixpoint, reducing duplication that can remain hidden inside newly introduced var declarations after a single extraction pass.

Changes:

Update anfEncodeBody to iterate until no further ANF encoder targets are found.
Adjust documentation to explain why fixpoint iteration is needed for nested-duplicate elimination.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MikaelMayer

🤖 Clean, well-motivated change. The fixpoint iteration is a natural solution to the nested-duplicate problem, and the early-exit on targets.isEmpty guarantees termination in practice (each pass extracts at least one duplicate, strictly reducing the pool of non-leaf subexpressions).

One suggestion below about test coverage for the new iteration behavior.

Three changes responding to the review threads: 1. Make the replacements map collision-safe (Copilot). The map's value type is now List (Expr × Expr) keyed by UInt64. Insertions append to the bucket list and lookups walk the list with structural `==`, so two distinct duplicates that share a hash do not displace each other — at most a few extra equality checks per lookup, and the bottom-up O(n) hash computation in replaceExprs is preserved. 2. Convert anfEncodeBody from `partial def` to a real `def` with a structurally decreasing fuel parameter (joscoh). The fuel value is the total expression size of the initial body, which is a sound upper bound on the iteration count. The docstring now states the termination argument explicitly: each pass replaces every duplicate occurrence by a fresh fvar (a leaf, filtered from future S(body)) and adds at most one var-decl init per duplicate (already in S(body)), so S strictly shrinks in non-trivial passes and is finite. 3. Add a multi-pass regression test (MikaelMayer). `nestedDupProg` has `(x + 1) * 2` (duplicate) and `x + 1` (subsumed by the larger duplicate in pass 1). Pass 1 lifts `(x + 1) * 2`; pass 2 then lifts `x + 1` after it becomes visible in the new var-decl init and the third assert. The single-pass version of the encoder would leave the inner duplicate in place. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…oint

MikaelMayer

🤖🔍 LGTM — all comments addressed.

Drop the "(cf. PR #1135 review)" parenthetical so the docstring is authoritative on its own (per MikaelMayer). Replace it with a sentence that makes explicit what the parenthetical was hand-waving toward: the collision-safe lookup is what lets anfEncodeBody's termination argument claim that every duplicate found by findANFEncoderTargets is actually rewritten on the same pass, so no unreplaced duplicate can survive into the next iteration. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…oint

MikaelMayer

🤖🔍 All previous comments appear addressed — reviewer sign-off still needed.

tautschnig self-assigned this May 7, 2026

tautschnig requested review from a team and Copilot May 7, 2026 08:15

github-actions Bot added the Waiting-For-Review label May 7, 2026

Copilot started reviewing on behalf of tautschnig May 7, 2026 08:15 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread Strata/Transform/ANFEncoder.lean Outdated

MikaelMayer reviewed May 7, 2026

View reviewed changes

Comment thread Strata/Transform/ANFEncoder.lean Outdated

joscoh reviewed May 15, 2026

View reviewed changes

Comment thread Strata/Transform/ANFEncoder.lean Outdated

tautschnig and others added 2 commits May 16, 2026 00:06

Merge remote-tracking branch 'origin/main' into tautschnig/anf-fixedp…

95d5570

…oint

MikaelMayer reviewed May 16, 2026

View reviewed changes

joscoh previously approved these changes May 16, 2026

View reviewed changes

github-actions Bot added Has 1 approval and removed Waiting-For-Review labels May 16, 2026

tautschnig assigned MikaelMayer and unassigned tautschnig May 18, 2026

MikaelMayer reviewed May 18, 2026

View reviewed changes

Comment thread Strata/Transform/ANFEncoder.lean Outdated

tautschnig and others added 2 commits May 19, 2026 08:15

Merge remote-tracking branch 'origin/main' into tautschnig/anf-fixedp…

3bd9acf

…oint

tautschnig dismissed joscoh’s stale review via 3bd9acf May 19, 2026 08:28

github-actions Bot added Waiting-For-Review and removed Has 1 approval labels May 19, 2026

tautschnig assigned joscoh May 19, 2026

MikaelMayer reviewed May 19, 2026

View reviewed changes

joscoh approved these changes May 19, 2026

View reviewed changes

github-actions Bot added Has 1 approval and removed Waiting-For-Review labels May 19, 2026

MikaelMayer approved these changes May 19, 2026

View reviewed changes

github-actions Bot removed the Has 1 approval label May 19, 2026

MikaelMayer added this pull request to the merge queue May 19, 2026

Merged via the queue into main with commit 18878bd May 19, 2026
23 of 24 checks passed

MikaelMayer deleted the tautschnig/anf-fixedpoint branch May 19, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135
MikaelMayer merged 5 commits into
mainfrom
tautschnig/anf-fixedpoint

tautschnig commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

MikaelMayer left a comment

Uh oh!

Uh oh!

Uh oh!

MikaelMayer left a comment

Uh oh!

Uh oh!

MikaelMayer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tautschnig commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

MikaelMayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MikaelMayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MikaelMayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants