Vectorized aggregation #1009

andyleiserson · 2024-04-11T17:59:09Z

This changes aggregation to work with shares in $\mathbb{F}_2$ rather than $\mathbb{F}_p$ (required to support DZKP-based malicious security), and vectorizes it.

akoshelev · 2024-04-17T03:54:46Z

there are plenty of changes coming from 1008 that add some noise - can you rebase it on top of main after #1008 is merged?

codecov · 2024-04-17T22:47:30Z

Codecov Report

Attention: Patch coverage is 97.11934% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 90.43%. Comparing base (762393b) to head (3a4c449).

Files	Patch %	Lines
ipa-core/src/helpers/stream/chunks.rs	95.69%	12 Missing ⚠️
ipa-core/src/protocol/context/mod.rs	63.63%	4 Missing ⚠️
ipa-core/src/test_fixture/ipa.rs	55.55%	4 Missing ⚠️
ipa-core/src/test_fixture/world.rs	81.25%	3 Missing ⚠️
ipa-core/src/protocol/ipa_prf/aggregation/mod.rs	99.42%	2 Missing ⚠️
ipa-core/src/secret_sharing/vector/transpose.rs	97.97%	2 Missing ⚠️
ipa-core/src/query/executor.rs	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1009      +/-   ##
==========================================
+ Coverage   90.24%   90.43%   +0.19%     
==========================================
  Files         172      173       +1     
  Lines       25727    26390     +663     
==========================================
+ Hits        23218    23867     +649     
- Misses       2509     2523      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andyleiserson · 2024-04-17T23:08:16Z

ipa-core/src/protocol/context/semi_honest.rs

@@ -272,7 +261,7 @@ impl<'a, B: ShardBinding, F: ExtendableField> UpgradedContext<F> for Upgraded<'a
        T: Send,
        UpgradeContext<'a, Self, F>: UpgradeToMalicious<'a, T, M>,
    {
-        UpgradeContext::new(self.narrow(&UpgradeStep::UpgradeSemiHonest), NoRecord)
+        UpgradeContext::new(self.clone(), NoRecord)


I don't really understand what's going on here (even given the comments above), and I'm not sure this change is appropriate -- but it did resolve issues I had with compact gate after removing modulus conversion from OPRF IPA.

iirc, it was implemented because an early version of compact gate had to use an ignorelist for steps that don't trigger communications and tracked active steps through send. #711 may shed some light on why it was done this way.

I could be wrong, but I believe compact gate uses narrow to track steps now, so it may be the reason why you're not getting panic. If you could share the errors you're getting from modulus conversion, we can dig deeper into this. For now it seems that it should just work

you may want to fix the comment too (lines 254..258)

The errors were "error: ipa_macros::step expects an enum with variants that match the steps in steps.txt" and "error[E0277]: the trait bound compact::Compact: StepNarrow<UpgradeStep> is not satisfied" (i.e. the usual errors when a step is not exercised in the steps.txt generation).

Part of why I was unsure what to do here, is that these methods were not being exercised -- so I wasn't sure there would be test coverage for whatever change I made.

But with the benefit of your comment and some more thinking, I have a proposal here that I'm more confident in. I'm going to open a separate PR for it (to be considered in advance of this one).

#1030. After this PR is merged / rebased to include that, it will need to stub these impls based on cfg(descriptive-gate), because it removes the last place we invoke share upgrades in the protocol (modulus conversion). They'll later need to come back when we add MAC-based malicious security for PRF evaluation and shuffle.

bmcase · 2024-04-24T01:22:49Z

ipa-core/src/protocol/ipa_prf/aggregation/mod.rs

+
+pub type AggResult<const B: usize> = Result<BitDecomposed<Replicated<Boolean, B>>, Error>;
+
+pub async fn aggregate_values<'fut, C, OV, const B: usize>(


maybe nice to add a test for just this aggregate_values function.

benjaminsavage · 2024-04-26T10:41:48Z

ipa-core/src/protocol/ipa_prf/aggregation/bucket.rs

+                    let bit_futures = index_contribution
+                        .zip(repeat((bit_of_bdkey, bucket_c.clone())))
+                        .enumerate()
+                        .map(|(i, (a, (b, bucket_c)))| {
+                            a.multiply(b, bucket_c.narrow(&BitOpStep::Bit(i)), record_id)
+                        });


This got a lot more complex looking.

Does it no longer work to multiply a multi-bit value with a single bit? That's too bad. Could we migrate from BitDecomposed to a BooleanArray and use select instead?

Doesn't need to be in this PR - this PR is too long already =).

The issue here is that we're working with a 2-D array. The type BitDecomposed<AdditiveShare<Boolean, N>> actually does contain a boolean array -- the vector type in AdditiveShare<Boolean, N> is the boolean array type of length N.

We could flatten everything into a boolean array of size N * HV_BITS, but then we'd have to do a bunch of 2-D array index calculations, which I think would make things even more complicated.

I do like the idea of using select, though. I'll see if I can do something with that.

After looking at it some, I don't think it makes sense to do in this PR... it's a 100 lines elsewhere for 10 lines here tradeoff. Which may be the right tradeoff, if it improves the modularity of the code, but I don't think it's the right tradeoff if those 100 lines are getting added to this PR.

The concrete options I see are:

Implementing BooleanArrayMul for BitDecomposed<AdditiveShare<Boolean, N>>. This in turn requires either adding some more Foo: SecureMul<C> bounds in various places, or else adding a C type parameter to the BooleanArrayMul trait.

Converting select into a trait so we can implement it for arbitrary types.

I'm totally supportive of landing this PR as is and working on this in a future PR.

My plan is to defer this and saturating addition to a follow-up. I am working on the (rather gruesome) merge conflicts presently, but I'm hoping this can be ready tonight or tomorrow.

benjaminsavage · 2024-04-26T10:43:43Z

ipa-core/src/protocol/ipa_prf/aggregation/bucket.rs

+            // bdbit_contribution is either zero or equal to row_contribution. So it
+            // is okay to do a carryless "subtraction" here.
+            for (r, b) in row_contribution[left_index]
+                .iter_mut()
+                .zip(bdbit_contribution.iter())
+            {
+                *r -= b;
+            }
            if right_index < breakdown_count {
-                row_contribution[right_index] = bdbit_contribution;
+                for (r, b) in row_contribution[right_index]
+                    .iter_mut()
+                    .zip(bdbit_contribution)
+                {
+                    *r = b;
+                }


This part also seems like it could be simplified if we stopped using BitDecomposed and used BooleanArray instead.

benjaminsavage

This is a very long PR... I really wish it was split into smaller, more digestible chunks...

I mostly focused my review on the protocols. It's pretty awesome how you were able to eliminate the conversions to the prime field. I LOVE the way you are able to just chunk streams now (very nice!), and the highly efficient bitwise addition that only deals with as many digits as actually necessary is great.

This PR actually does about 1/2 the work I was trying to do in the feature_label_dot_product protocol... I should have read this PR first before starting to implement!

So I'm keen to get this landed so that I can build upon this work. I know @bmcase is as well.

The only concern I have is with the wrapping addition (not saturating). I would suggest a follow-up PR to just make it saturating.

ipa-core/src/protocol/ipa_prf/aggregation/mod.rs

benjaminsavage · 2024-04-26T11:15:59Z

ipa-core/src/protocol/ipa_prf/aggregation/mod.rs

+                            if a.len() < usize::try_from(OV::BITS).unwrap() {
+                                sum.push(carry);
+                            }


So this implements wrapping addition, basically addition modulo 2^OV::BITS. This could potentially yield unexpected results (like zero) when the sum wraps around.

An alternative would be to implement a "saturating addition".

We could implement that with:

if /* it could potentially overflow on this step */ { let sum = integer_sat_add(/* stuff */) } else { let (mut sum, carry) = integer_add(/* stuff */).await?; sum.push(carry); }

do we want to move this circuit to a separate function? Seeing wrapping_integer_add or sat_integer_add here makes the intent more obvious imo

The old version didn't have saturating math either, but I agree that now that we can do it relatively cheaply, it's probably worth doing. Is there any interaction with DP?

The reason we never had such logic before is that we were moving into a huge field, and assumed we would never encounter a sum that reached 2^32. Now that we wrap around at a configurable value, it's possible to select a number that's fairly low.

I think it might be fine to just document this behavior and warn everyone to please select an output type that's sufficiently large so that there is no overflow. That would keep the code simpler. I don't think there's any impact on DP. If there is wrapping, it will essentially decrease the overall contribution of some people to the query output. But it's not predictable. Some people will get their contribution removed and others will not. So if we take a "worst case" scenario type of approach, we probably can't say anything definitive and cannot benefit from this.

ipa-core/src/protocol/ipa_prf/prf_sharding/feature_label_dot_product.rs

akoshelev · 2024-04-22T16:14:33Z

ipa-core/src/helpers/stream/chunks.rs

+                None => {
+                    // Input stream ended at a chunk boundary. Unlike the partial chunk case, we return
+                    // None, so we shouldn't be polled again regardless of dummy_fn signalling, but
+                    // might as well make this a fused stream since it's easy.


yea shouldn't we just fuse the inner stream? I haven't look closely into what would happen, but it appears that it may work too. Maybe there is a better argument name for dummy_fn too (padder?)

The case where fusing the inner stream doesn't work very well is when we terminate due to an error -- we'd have to drain it.

akoshelev · 2024-04-22T16:16:52Z

ipa-core/src/helpers/stream/chunks.rs

+/// Process stream through a function that operates on chunks.
+///
+/// Processes `stream` by collecting chunks of `N` items into `buffer`, then calling `process_fn`
+/// for each chunk. If there is a partial chunk at the end of the stream, `dummy_fn` is called


is it enough for our use case to require T::Default?

Yes, I considered that -- it would require implementing Default on several layers, down to AdditiveShare. There is also a semantic required some places this is used that isn't exactly Default -- the dummy records need to contribute nothing to the overall result.

I think I would slightly prefer the version with Default, but I didn't want to presume to make that change given the impact.

ipa-core/src/helpers/stream/chunks.rs

akoshelev · 2024-04-29T17:29:38Z

ipa-core/src/protocol/context/semi_honest.rs

@@ -272,7 +261,7 @@ impl<'a, B: ShardBinding, F: ExtendableField> UpgradedContext<F> for Upgraded<'a
        T: Send,
        UpgradeContext<'a, Self, F>: UpgradeToMalicious<'a, T, M>,
    {
-        UpgradeContext::new(self.narrow(&UpgradeStep::UpgradeSemiHonest), NoRecord)
+        UpgradeContext::new(self.clone(), NoRecord)


iirc, it was implemented because an early version of compact gate had to use an ignorelist for steps that don't trigger communications and tracked active steps through send. #711 may shed some light on why it was done this way.

I could be wrong, but I believe compact gate uses narrow to track steps now, so it may be the reason why you're not getting panic. If you could share the errors you're getting from modulus conversion, we can dig deeper into this. For now it seems that it should just work

you may want to fix the comment too (lines 254..258)

akoshelev · 2024-04-29T17:33:01Z

ipa-core/src/protocol/ipa_prf/aggregation/bucket.rs

-            get_bits::<Fp32BitPrime>(breakdown_key.try_into().unwrap(), Gf8Bit::BITS);
-        let value = Fp32BitPrime::truncate_from(VALUE);
+    async fn move_to_bucket(count: usize, breakdown_key: usize, robust: bool) -> Vec<BA8> {
+        let breakdown_key_bits = BitDecomposed::decompose(Gf8Bit::BITS, |i| {


nit: it may be worth explaining why 8 bit value is enough here. (I believe it is related to MAX_BREAKDOWN_COUNT and if that's true, maybe a static assertion is possible too)

akoshelev · 2024-04-29T17:46:23Z

ipa-core/src/protocol/ipa_prf/aggregation/mod.rs

+                            if a.len() < usize::try_from(OV::BITS).unwrap() {
+                                sum.push(carry);
+                            }


do we want to move this circuit to a separate function? Seeing wrapping_integer_add or sat_integer_add here makes the intent more obvious imo

akoshelev · 2024-04-29T17:48:55Z

ipa-core/src/protocol/ipa_prf/prf_sharding/feature_label_dot_product.rs

-            .set_total_records(num_outputs),
-        flattened_stream,
-        0..<FV as SharedValue>::BITS,
+    let flattened_stream = Box::pin(


does this now work w/o multithreading or we still rely on it being on for aggregation?

I believe it is still required, perhaps even more so. We can add the buffered stream adapter if we really want things to work without multi-threading.

akoshelev · 2024-04-29T17:52:37Z

ipa-core/src/query/runner/oprf_ipa.rs

-            32 => oprf_ipa::<C, BA8, BA3, BA20, BA5, F>(ctx, input, aws).await,
-            64 => oprf_ipa::<C, BA8, BA3, BA20, BA6, F>(ctx, input, aws).await,
-            128 => oprf_ipa::<C, BA8, BA3, BA20, BA7, F>(ctx, input, aws).await,
+            8 => oprf_ipa::<C, BA8, BA3, HV, BA20, BA3, 256>(ctx, input, aws).await,


is it worth to have a generic histogram value parameter with everything else being hardcoded or I misunderstand what's going on here?

I think that's a fair question -- I was modeling it after what was already there (HV is the bitwise equivalent of F). Eventually all of these (maybe excluding SS) need to be runtime-configurable (#953). I don't have a good sense of how similar that implementation will look to the type-generic version.

Yeah, this is an unsolved problem. We never made anything runtime configurable. We just re-built the code for every different combination of query params =).

ipa-core/src/protocol/ipa_prf/aggregation/mod.rs

ipa-core/src/secret_sharing/vector/transpose.rs

andyleiserson · 2024-04-30T18:58:14Z

Capturing some open items:

Test for aggregate_values
Saturating addition (edit: deferred to follow-up PR)
dummy_fn changes (Default? rename?)
Aggregation transpose test

andyleiserson · 2024-05-02T00:52:35Z

Double-posting a comment I made in a thread here for better visibility:

My plan is to defer this [using select in move_value_to_bucket] and saturating addition to a follow-up. I am working on the (rather gruesome) merge conflicts presently, but I'm hoping this can be ready tonight or tomorrow.

benjaminsavage · 2024-05-02T00:54:16Z

I think it's fine to defer the saturation addition as well.

(They are unused for the moment now that modulus conversion is gone, but they will come back at some point.)

andyleiserson · 2024-05-06T22:59:57Z

#1046 has the promised follow-up changes.

andyleiserson force-pushed the vec-agg branch from 8d9b265 to 201c5a8 Compare April 12, 2024 15:44

Vectorized aggregation

652a6c8

andyleiserson force-pushed the vec-agg branch from 201c5a8 to 652a6c8 Compare April 17, 2024 22:25

Fix warnings in compact-gate build

9e7613a

andyleiserson commented Apr 17, 2024

View reviewed changes

andyleiserson added 2 commits April 17, 2024 16:12

Remove dead code

e89a244

Remove some unused transpose shims

a5b5063

andyleiserson mentioned this pull request Apr 22, 2024

61bit field #1019

Merged

bmcase reviewed Apr 24, 2024

View reviewed changes

benjaminsavage reviewed Apr 26, 2024

View reviewed changes

benjaminsavage approved these changes Apr 26, 2024

View reviewed changes

akoshelev approved these changes Apr 29, 2024

View reviewed changes

andyleiserson added 2 commits April 29, 2024 18:04

Expose that chunk processing stream is fused; extend tests

45d6392

Add and revise comments

c9f68a5

andyleiserson added 3 commits April 30, 2024 12:01

Rename dummy_fn to pad_record_fn

f62c541

Aggregation transpose test

8cd57fe

Tests for aggregate_values

331e494

andyleiserson added 3 commits May 1, 2024 18:24

Merge remote-tracking branch 'ipa/main' into vec-agg

944a3fb

Additional post-merge fixes

72c8430

Stub malicious share upgrades to fix compact-gate build

3a4c449

(They are unused for the moment now that modulus conversion is gone, but they will come back at some point.)

andyleiserson merged commit 656ec18 into private-attribution:main May 2, 2024
11 checks passed

andyleiserson deleted the vec-agg branch May 2, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized aggregation #1009

Vectorized aggregation #1009

andyleiserson commented Apr 11, 2024 •

edited

Loading

akoshelev commented Apr 17, 2024

codecov bot commented Apr 17, 2024 •

edited

Loading

andyleiserson Apr 17, 2024 •

edited

Loading

akoshelev Apr 29, 2024

andyleiserson Apr 29, 2024

andyleiserson Apr 29, 2024

bmcase Apr 24, 2024

benjaminsavage Apr 26, 2024

andyleiserson Apr 29, 2024

andyleiserson Apr 30, 2024

benjaminsavage May 2, 2024

andyleiserson May 2, 2024

benjaminsavage Apr 26, 2024

benjaminsavage left a comment

benjaminsavage Apr 26, 2024

akoshelev Apr 29, 2024

andyleiserson Apr 29, 2024

benjaminsavage May 2, 2024

akoshelev Apr 22, 2024

andyleiserson Apr 30, 2024

akoshelev Apr 22, 2024

andyleiserson Apr 29, 2024

akoshelev Apr 29, 2024

akoshelev Apr 29, 2024

akoshelev Apr 29, 2024

akoshelev Apr 29, 2024

andyleiserson Apr 29, 2024

akoshelev Apr 29, 2024

andyleiserson Apr 29, 2024 •

edited

Loading

benjaminsavage May 2, 2024

andyleiserson commented Apr 30, 2024 •

edited

Loading

andyleiserson commented May 2, 2024

benjaminsavage commented May 2, 2024

andyleiserson commented May 6, 2024


		pub type AggResult<const B: usize> = Result<BitDecomposed<Replicated<Boolean, B>>, Error>;

		pub async fn aggregate_values<'fut, C, OV, const B: usize>(

Vectorized aggregation #1009

Vectorized aggregation #1009

Conversation

andyleiserson commented Apr 11, 2024 • edited Loading

akoshelev commented Apr 17, 2024

codecov bot commented Apr 17, 2024 • edited Loading

Codecov Report

andyleiserson Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjaminsavage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyleiserson Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyleiserson commented Apr 30, 2024 • edited Loading

andyleiserson commented May 2, 2024

benjaminsavage commented May 2, 2024

andyleiserson commented May 6, 2024

andyleiserson commented Apr 11, 2024 •

edited

Loading

codecov bot commented Apr 17, 2024 •

edited

Loading

andyleiserson Apr 17, 2024 •

edited

Loading

andyleiserson Apr 29, 2024 •

edited

Loading

andyleiserson commented Apr 30, 2024 •

edited

Loading