Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

Open
jfecher opened this issue Mar 25, 2024 · 11 comments · Fixed by #5493
Open

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

jfecher opened this issue Mar 25, 2024 · 11 comments · Fixed by #5493
Assignees
Labels
bug Something isn't working

Comments

@jfecher
Copy link
Contributor

jfecher commented Mar 25, 2024

Aim

Testing the code in https://github.com/zac-williamson/noir-poseidon1-array-bug.

Expected Behavior

Expected the acir opcode count and backend circuit size to increase by a constant 6 for each additional iteration of the loop in poseidon::bn254::permute.

Bug

Actual change in acir and backend circuit size for iteration total N is:

// N | Acir size | Acir change | Backend Circuit Size | circuit change
// 1 | 10        |             | 15                   |
// 2 | 16        | 6           | 21                   | 6
// 3 | 23        | 7           | 28                   | 7
// 4 | 31        | 8           | 36                   | 8
// 5 | 40        | 9           | 45                   | 9
// 6 | 50        | 10          | 55                   | 10
// 7 | 61        | 11          | 66                   | 11
// 8 | 73        | 12          | 78                   | 12
// 9 | 86        | 13          | 91                   | 13
// 10| 100       | 14          | 105                  | 14
// 11| 115       | 15          | 120                  | 15
// 12| 131       | 16          | 136                  | 16
// 13| 148       | 17          | 153                  | 17

To Reproduce

Project Impact

None

Impact Context

No response

Workaround

None

Workaround Description

No response

Additional Context

No response

Installation Method

None

Nargo Version

No response

NoirJS Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

@jfecher jfecher added the bug Something isn't working label Mar 25, 2024
@jfecher
Copy link
Contributor Author

jfecher commented Mar 25, 2024

I've confirmed that SSA increases linearly by 14 extra instructions each time. The extra instructions are (in order):

2x add
4x mul
4x add then mul

(The last iteration of the loop only has 2x add then mul since the last two modify array indices 1 & 2 which are unused in main and are thus eliminated by DIE)

So this looks like an Acir-gen issue.

@jfecher
Copy link
Contributor Author

jfecher commented Mar 25, 2024

I've confirmed this is unrelated to array copying in Acir. This makes sense since no arrays are modified.

@jfecher
Copy link
Contributor Author

jfecher commented Mar 25, 2024

For reference, the SSA for a run with 3 iterations looks like this:

acir fn main f0 {
  b0(v0: Field, v1: Field):
    // permute iteration 1
    v770 = add v0, v1
    v775 = add v0, Field -5512639236676924786014155380588025764096985869754559557744523208552434701087
    v776 = mul v775, v775
    v777 = mul v776, v776
    v778 = mul v775, v777
    v779 = mul v778, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v780 = add v778, v779
    v781 = mul v1, Field 1781874611967874592137274483616240894881315449294815307306613366069350853425
    v782 = add v780, v781
    v783 = mul v770, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v784 = add v782, v783
    v785 = mul v784, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v786 = add v1, v785
    v787 = mul v784, Field 8364259238812534287689210722577399963878179320345509803468849104367466297989

    // iteration 2
    v788 = add v770, v787
    v789 = add v784, Field -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127
    v790 = mul v789, v789
    v791 = mul v790, v790
    v792 = mul v789, v791
    v793 = mul v792, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v794 = add v792, v793
    v795 = mul v786, Field -6684379154708237978759272567577041337887670303253204317176013706256788968730
    v796 = add v794, v795
    v797 = mul v788, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v798 = add v796, v797
    v799 = mul v798, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v800 = add v786, v799
    v801 = mul v798, Field -6902316737387657021175622823110739310551009794185512225013048131011375572021

    // iteration 3
    v802 = add v788, v801
    v803 = add v798, Field 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814
    v804 = mul v803, v803
    v805 = mul v804, v804
    v806 = mul v803, v805
    v807 = mul v806, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v808 = add v806, v807
    v809 = mul v800, Field -3778477114939312735135329793763823326274743295264527892924859844466140293618
    v810 = add v808, v809
    v811 = mul v802, Field 8034324828084400593020431506480243533881627849088152439427470035355284392177
    v812 = add v810, v811

    // main
    v817 = eq v812, v1
    constrain v817 == u1 0
    return 
}

@TomAFrench
Copy link
Member

TomAFrench commented Mar 25, 2024

The issue results from us not reusing intermediate variables optimally during circuit transformation:

Doesn't seem like this is the case. Keeping this here for reference however.

// Untransformed
Compiled ACIR for main (unoptimized):
func 0
current witness index : 16
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _5) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (10302764019366810656324304069659347464634037313761163346014007424803634884116, _2, _4) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _8) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (2056738702329958338528069809778020848023814166851283194299370842509476243522, _2, _4) (692246069124406706574478606975526033386351398548558144542515845293232943739, _5, _7) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _11) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (8244951158953924849441820526537414617456405719694233716653789536929031657217, _2, _4) (1245000739171333043151662300492364762247881201867951700385117535552263642718, _5, _7) (-8866078365483905008300628909657660427368986553774165888273494658969433806135, _8, _10) (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _14) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

// transformed

current witness index : 44
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (-1, _17) 0 ]
EXPR [ (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _18) 0 ]
EXPR [ (-1, _5) (1, _17) (1, _18) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (-1, _21) 0 ]
EXPR [ (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _22) 0 ]
EXPR [ (11478095976109321424005083581332697351833154779671800717176968483932168473324, _17) (1, _21) (-1, _23) 0 ]
EXPR [ (-1, _8) (1, _22) (1, _23) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-1, _27) 0 ]
EXPR [ (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _28) 0 ]
EXPR [ (-8171050935427428633424824527321522461511072821809794285567532217425125208927, _17) (-5514733471089785794374270767163942360219071296880559339102618199968142537142, _21) (-1, _29) 0 ]
EXPR [ (1, _27) (1, _28) (-1, _30) 0 ]
EXPR [ (-1, _11) (1, _29) (1, _30) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (-1, _35) 0 ]
EXPR [ (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _36) 0 ]
EXPR [ (-6379787701423272764930233206555704103213006823652545026586546074778741051842, _17) (-6528941988757818671164751416148170116685212390218786282564565476950079182897, _21) (-1, _37) 0 ]
EXPR [ (-4913523097929112444230457750999582340681777120025624541356118512330689264044, _27) (1, _35) (-1, _38) 0 ]
EXPR [ (1, _36) (1, _37) (-1, _39) 0 ]
EXPR [ (-1, _14) (1, _38) (1, _39) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

+---------------+----------------------+--------------+----------------------+
| Package       | Expression Width     | ACIR Opcodes | Backend Circuit Size |
+---------------+----------------------+--------------+----------------------+
| test_poseidon | Bounded { width: 3 } | 31           | 36                   |
+---------------+----------------------+--------------+----------------------+

@TomAFrench
Copy link
Member

TomAFrench commented Mar 25, 2024

In the happy case of the initial state being comprised on witnesses only then the number of opcodes required for each iteration should be something like this:

let t = s0+const; // 1 opcode
let tttt = t^4; // 2 opcodes (one for each square)
let a = s0 * tttt // 1 opcode
let new_s0 = a + s1*const + s2*const // 1 opcode
let new_s1 = s1 + new_s0 * const // 1 opcode
let new_s2 = s1 + new_s0 * const // 1 opcode

We should have a constant 7 opcodes per iteration

My gut feeling is that the issue is due how we're laying down constraints lazily we're accumulating more and more quadratic terms inside s1 and s2 and we're not recognising that we have a common value of new_s0 being added into it which should be simplified. We're then doing a lot of recalculation because of this.

This can be seen as if we use the loop below (note the large number of iterations plus the range opcodes being used to enforce that the state is always represented as witnesses) I get a circuit size of 556. Commenting out these range opcodes causes the circuit to grow to 1480.

    for _r in 0..50 {
        s0 += C[((rf as u64 / 2)+1)*state.len() + _r as u64];
        let t = s0;
        let tt = t * t;
        let tttt = tt * tt;
        s0 *= tttt;
        s0 += s0 * S[(5)*_r ];
        s0 += s1 * S[(5)*_r +1];
        s0 += s2 * S[(5)*_r +2];
        s1 += s0 * S[5 * _r +2];
        s2 += s0 * S[5*_r  + 3];
        s1.assert_max_bit_size(253);
        s2.assert_max_bit_size(253);
    }

I also get a constant difference of 11 opcodes if I change the number of loop iterations in this case. Removing 2 for the range opcodes, this doesn't quite line up with the expected 7 but 🤷

@jfecher
Copy link
Contributor Author

jfecher commented Mar 25, 2024

@TomAFrench Going from 1 to 2 iterations in the original code I see a change of 6 opcodes, which lines up with Zac's original estimate.

I also just finished testing acir-gen/Ssa::into_acir and found that that does increase by a constant 3 additional opcodes each iteration when going from 1 iteration to 2 to 3. So acir-gen seems fine here.

Sounds like we're missing a way to factor out common instances of new_s0 then? Or maybe we just can't lay down constraints lazily anymore.

@TomAFrench
Copy link
Member

TomAFrench commented Mar 25, 2024

Sounds like we're missing a way to factor out common instances of new_s0 then?

It's kinda yucky as it exposes knowledge about circuits to the end user, but yeah it does seem that a force_witness(expr) would be useful for some of these algorithms.

Or maybe we just can't lay down constraints lazily anymore.

I don't think so, we're always going to need some laziness as emitting an acir opcode per SSA instruction will mean that we'll never fill up a plonk gate. I'd need to think on if we can identify variables such as s1 and s2 automatically.

@jfecher
Copy link
Contributor Author

jfecher commented Mar 25, 2024

I've confirmed the extra constraints are being pushed here: https://github.com/noir-lang/noir/blob/master/acvm-repo/acvm/src/compiler/transformers/mod.rs#L85-L96, 3 for the first iter, 4 for the second, then 5, etc.

github-merge-queue bot pushed a commit that referenced this issue May 21, 2024
…ss to be equal to a variable (#4641)

# Description

## Problem\*

Quick fix to #4629 

## Summary\*

This PR adds a `as_witness` function which takes a `Field` and instructs
the compiler to lay down an opcode which constrains a witness to be
equal to this value.

This prevents the issue in #4629 where we collect ever growing
expressions which need to be broken down multiple times while reshaping
the circuit. Doing this during acirgen instead allows us to avoid this
repetition.

## Additional Context



## Documentation\*

Check one:
- [ ] No documentation needed.
- [ ] Documentation included in this PR.
- [x] **[For Experimental Features]** Documentation to be submitted in a
separate PR.
Feature is experimental

# PR Checklist\*

- [x] I have tested the changes locally.
- [x] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
github-merge-queue bot pushed a commit that referenced this issue May 24, 2024
# Description

## Problem\*

Related to  #4629

## Summary\*
Do not create a witness when multiplying a witness (or an affine
transformation of a witness) with an expression of degree 1.


## Additional Context
This PR does NOT fix the issue #4629, but it should help reducing the
overall gates number.
The arithmetic expressions resulting from the multiplication should not
explode in width since I only consider multiplication with a witness (or
'affine witness').

## Documentation\*

Check one:
- [X] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [X] I have tested the changes locally.
- [X] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
@TomAFrench
Copy link
Member

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

@vezenovm
Copy link
Contributor

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

Yeah happy to take this on.

github-merge-queue bot pushed a commit that referenced this issue Jul 24, 2024
# Description

## Problem\*

Resolves #4629 

## Summary\*

This PR updates how we add vars in ACIR gen to account for an expression
width. This is under a compile time flag `--bounded-codegen`.

If the sum of two expressions is going to go over the specified
expression width we automatically create witnesses for either the lhs,
the rhs, or both, before then generating a sum expression.

This new bounded codegen could provide an easy way for developers to
tell whether their program can be optimized using `as_witness`.
Reference the additional context section below for some numbers.

## Additional Context


There are some limitations in this approach as it is pretty naive in how
it decides to when to generate a new witness. It doesn't look ahead at
all other than for whether the two AcirVar's being added are going to
create an expression over the specified ACIR width.

For example we can see the following gate counts for
`poseidonsponge_x5_254`:
```
No width awareness:
+-----------------------+----------+----------------------+--------------+
| Package               | Function | Expression Width     | ACIR Opcodes |
+-----------------------+----------+----------------------+--------------+
| poseidonsponge_x5_254 | main     | Bounded { width: 4 } | 3096         |
+-----------------------+----------+----------------------+--------------+

No width awareness w/ as_witness (this is the currently optimized poseidon we have in the stdlib):
+-----------------------+----------+----------------------+--------------+
| Package               | Function | Expression Width     | ACIR Opcodes |
+-----------------------+----------+----------------------+--------------+
| poseidonsponge_x5_254 | main     | Bounded { width: 4 } | 1302         |
+-----------------------+----------+----------------------+--------------+

Width awareness:
+-----------------------+----------+----------------------+--------------+
| Package               | Function | Expression Width     | ACIR Opcodes |
+-----------------------+----------+----------------------+--------------+
| poseidonsponge_x5_254 | main     | Bounded { width: 4 } | 2114         |
+-----------------------+----------+----------------------+--------------+

Width awareness w/ as_witness:
+-----------------------+----------+----------------------+--------------+
| Package               | Function | Expression Width     | ACIR Opcodes |
+-----------------------+----------+----------------------+--------------+
| poseidonsponge_x5_254 | main     | Bounded { width: 4 } | 1792         |
+-----------------------+----------+----------------------+--------------+
```
From the above we can see that we actually have a degradation when using
the addition strategy used in this PR with a hand optimized program
using `as_witness`. Although this PR still gives an improvement in the
default.

Another example is the following program:
```rust
fn main(x: Field, y: pub Field) {
    let state = [x, y];
    let state = oh_no_not_again(state);

    // This assert will fail if we execute
    assert(state[0] + state[1] == 0);
}

fn oh_no_not_again(mut state: [Field; 2]) -> [Field; 2] {
    for _ in 0..200 {
        state[0] = state[0] * state[0] + state[1];
        state[1] += state[0];
    }
    state
}
```
Without any width awareness we get 1150 ACIR gates. With this PR we will
get 399 gates. If we substitute `oh_no_not_again` for the following:
```rust
fn oh_no_not_again_as_witness(mut state: [Field; 2]) -> [Field; 2] {
    for i in 0..200 {
        state[0] = state[0] * state[0] + state[1];
        std::as_witness(state[0]);
        state[1] += state[0];
        if (i & 1 == 1) {
            std::as_witness(state[1]);
        }
    }
    state
}
```
We will get 301 gates if the method above is called instead of
`oh_no_not_again`.

## Documentation\*

Check one:
- [X] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [X] I have tested the changes locally.
- [X] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.

---------

Co-authored-by: Tom French <15848336+TomAFrench@users.noreply.github.com>
@jfecher
Copy link
Contributor Author

jfecher commented Jul 24, 2024

Reopening this since the flag introduced by #5493 is not enabled by default

@jfecher jfecher reopened this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants