-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oblivious attribution protocol [part 1 - attribution] #108
Conversation
Rust check workflow failing because of a clippy error in a code comment. I'll fix this in the next commit when I address feedbacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a short review, based on a few things that I noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I recommend you coordinate with @martinthomson to see if you should land this first, or wait until he lands his first.
… not so subtle operations
…plement Iterator.
Rebase onto @martinthomson's renewed Step scheme. The most recent commit to be reviewed: 9e4d537 |
)); | ||
} | ||
|
||
let results = try_join_all(accumulation_futures).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinthomson in many protocols there seem to be a pattern of generating large number of futures and then polling them all. Do you see it as a problem, given that no work is done until the first poll?
|
||
fn next(&mut self) -> &Self { | ||
self.count += 1; | ||
self.id = format!("{}_{}", self.name, self.count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there will be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go away once we decide to move away from AsRef<str>
. I understand AsRef<str>
for steps is temporary for this prototype?
/// Accumulation step for Oblivious Attribution protocol. | ||
#[allow(dead_code)] | ||
pub struct AccumulateCredit<'a, F> { | ||
input: &'a Batch<AttributionInputRow<F>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this input format implies that somebody after doing the sort must convert the regular batch of sorted secret shares into this "expanded" format with helper bits. Is the plan to insert an "intermediate" step between sort and attribution to convert shares format or you want AccumulateCredit
to eventually implement the conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper bits will be a part of the sort output. IIUC, the sort protocol will output a vector or vectors of index which corresponds to the sorted locations of the input list. I think there will be an intermediate step where we might remove unnecessary fields like match keys and timestamps, but that is still unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this strong suspicion that generating the "helper bits" within the MPC will be more efficient than trying to generate them in a separate step.
// iteration, and the interaction do not depend on the calculation results | ||
// of other elements, allowing the algorithm to be executed in parallel. | ||
|
||
let mut iteration_step = IterStep::new("iteration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I am wrong, but it seems we are using record_id
as step and iteration_step
as record_id. Is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteration_step
and multiply_step
are used to create sub-contexts for each input row per step_size iteration.
This is the issue @benjaminsavage and I were talking about during the meeting. For each row in the input, I call the sub-protocol get_accumulated_credit
. Since this is a sub-protocol, I wanted to pass a reference to the ctx, with each row i
having RecordId::from(i)
. There are 4 multiplications in the sub-protocol, so let's give them steps mult_1, mult_2, .... With this set up, in the sub-protocol, the first multiplication's step should look like protocol/iteration_1/mult_1
and recrod_id=i
. But when I try to run that, I get this error:
thread 'protocol::attribution::accumulate_credit::tests::accumulate' panicked at 'Refined 'protocol' with step 'mult_1' twice', src/protocol/mod.rs:94:13
This is because I'm using the same mult_1 for all rows even though they have different record ids. It seems that we should concatenate record_id=i
at the end of step str to generate a unique step identifier.
As a workaround, I ended up calling another ctx.narrow()
for each row to give them unique steps (i.e. "row_i"), and each multiplication as separate records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have to do this. This is an issue with where that "check" happens in the code. Looking forward to an infra fix that helps avoid the need to do this!
// 1. Create credit and stop_bit vectors | ||
// These vectors are updated in each iteration to help accumulate values and determine when to stop accumulating. | ||
|
||
let one = Replicated::one(ctx.mesh().identity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is possible to make credit and stop bit part of the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credits and stop bits are the states/results generated by running this protocol, so it seems reasonable to create them inside this protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @taikiy - these are basically local variables to this protocol. They're all initialized to Replicated::one(), or one of the other inputs at the beginning of the loop (and over-written in each iteration) so there's no value in passing them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all looking really good, with the exception of needing to work around that infra issue that prevents us from using the same step name on different records.
/// Accumulation step for Oblivious Attribution protocol. | ||
#[allow(dead_code)] | ||
pub struct AccumulateCredit<'a, F> { | ||
input: &'a Batch<AttributionInputRow<F>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this strong suspicion that generating the "helper bits" within the MPC will be more efficient than trying to generate them in a separate step.
// 1. Create credit and stop_bit vectors | ||
// These vectors are updated in each iteration to help accumulate values and determine when to stop accumulating. | ||
|
||
let one = Replicated::one(ctx.mesh().identity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @taikiy - these are basically local variables to this protocol. They're all initialized to Replicated::one(), or one of the other inputs at the beginning of the loop (and over-written in each iteration) so there's no value in passing them in.
// iteration, and the interaction do not depend on the calculation results | ||
// of other elements, allowing the algorithm to be executed in parallel. | ||
|
||
let mut iteration_step = IterStep::new("iteration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have to do this. This is an issue with where that "check" happens in the code. Looking forward to an infra fix that helps avoid the need to do this!
|
||
// first, calculate [successor.helper_bit * successor.trigger_bit] | ||
let mut b = ctx | ||
.multiply(RecordId::from(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. This should be a proper record_id, not just 0 or 1. This is related to the same infra issue from above - we need the ability to run N multiplications in parallel, with the same step, and only the record_id differing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is wrong, too. The only way I was able to make the code work was either like this, or create yet another step for each multiplication and pass record_id = 0 to all of them, which seems equally wrong. I'm looking forward to fixing this part when the infra is updated.
Addressed the latest PR comments. All should be good except for the steps part @benjaminsavage mentioned. Would it be possible to merge this and address the steps issue in a separate PR? |
OK, that sounds reasonable. We will fix this soon (once possible). In the meantime - let's land this and you can move on to the next protocol! |
Overview
This diff implements the first part (out of two - "attribution" and "aggregation") of the last touch oblivious attribution in a secure multiparty computation. Currently it supports "Last Touch" attribution only. Produced result will be used in the second part - "aggregation", which requires sorting of the list.
Details
Last touch attribution algorithm is described in details here:
https://github.com/patcg-individual-drafts/ipa/blob/main/IPA-End-to-End.md#oblivious-last-touch-attribution
First part of the protocol accesses and accumulates data of each node in a sorted list, and produces an intermediate result that has accumulated credits for each breakdown key. The document also describes how the logic can be optimized as follows:
This is what's implemented in
last_touch_attribution()
with multiplications done on a MPC. I believe there could be more optimizations if we put more constrains on the input, but for this round this basic/general implementation should be enough for us to measure baseline benchmarks.