Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oblivious attribution protocol [part 1 - attribution] #108

Merged
merged 6 commits into from
Oct 18, 2022

Conversation

taikiy
Copy link
Contributor

@taikiy taikiy commented Oct 10, 2022

Overview

This diff implements the first part (out of two - "attribution" and "aggregation") of the last touch oblivious attribution in a secure multiparty computation. Currently it supports "Last Touch" attribution only. Produced result will be used in the second part - "aggregation", which requires sorting of the list.

Details

Last touch attribution algorithm is described in details here:
https://github.com/patcg-individual-drafts/ipa/blob/main/IPA-End-to-End.md#oblivious-last-touch-attribution

First part of the protocol accesses and accumulates data of each node in a sorted list, and produces an intermediate result that has accumulated credits for each breakdown key. The document also describes how the logic can be optimized as follows:

flag = current.stop_bit * successor.helper_bit * successor.trigger_bit
current.credit = current.credit + flag * successor.credit
current.stop_bit = flag * successor.stop_bit

This is what's implemented in last_touch_attribution() with multiplications done on a MPC. I believe there could be more optimizations if we put more constrains on the input, but for this round this basic/general implementation should be enough for us to measure baseline benchmarks.

@taikiy
Copy link
Contributor Author

taikiy commented Oct 10, 2022

Rust check workflow failing because of a clippy error in a code comment. I'll fix this in the next commit when I address feedbacks.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a short review, based on a few things that I noticed.

src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
src/protocol/attribute.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@benjaminsavage benjaminsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I recommend you coordinate with @martinthomson to see if you should land this first, or wait until he lands his first.

@taikiy
Copy link
Contributor Author

taikiy commented Oct 17, 2022

Rebase onto @martinthomson's renewed Step scheme. The most recent commit to be reviewed: 9e4d537

src/protocol/batch.rs Show resolved Hide resolved
src/protocol/batch.rs Outdated Show resolved Hide resolved
src/protocol/batch.rs Outdated Show resolved Hide resolved
src/protocol/batch.rs Outdated Show resolved Hide resolved
src/protocol/attribution/mod.rs Show resolved Hide resolved
));
}

let results = try_join_all(accumulation_futures).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson in many protocols there seem to be a pattern of generating large number of futures and then polling them all. Do you see it as a problem, given that no work is done until the first poll?


fn next(&mut self) -> &Self {
self.count += 1;
self.id = format!("{}_{}", self.name, self.count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be $O(N)$ allocations here, we probably want to mark this as todo to fix. I think @martinthomson was planning to add a cheap way to create a unique step id for release builds

Copy link
Contributor Author

@taikiy taikiy Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go away once we decide to move away from AsRef<str>. I understand AsRef<str> for steps is temporary for this prototype?

/// Accumulation step for Oblivious Attribution protocol.
#[allow(dead_code)]
pub struct AccumulateCredit<'a, F> {
input: &'a Batch<AttributionInputRow<F>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this input format implies that somebody after doing the sort must convert the regular batch of sorted secret shares into this "expanded" format with helper bits. Is the plan to insert an "intermediate" step between sort and attribution to convert shares format or you want AccumulateCredit to eventually implement the conversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper bits will be a part of the sort output. IIUC, the sort protocol will output a vector or vectors of index which corresponds to the sorted locations of the input list. I think there will be an intermediate step where we might remove unnecessary fields like match keys and timestamps, but that is still unknown.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this strong suspicion that generating the "helper bits" within the MPC will be more efficient than trying to generate them in a separate step.

// iteration, and the interaction do not depend on the calculation results
// of other elements, allowing the algorithm to be executed in parallel.

let mut iteration_step = IterStep::new("iteration");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong, but it seems we are using record_id as step and iteration_step as record_id. Is it intended?

Copy link
Contributor Author

@taikiy taikiy Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteration_step and multiply_step are used to create sub-contexts for each input row per step_size iteration.

This is the issue @benjaminsavage and I were talking about during the meeting. For each row in the input, I call the sub-protocol get_accumulated_credit. Since this is a sub-protocol, I wanted to pass a reference to the ctx, with each row i having RecordId::from(i). There are 4 multiplications in the sub-protocol, so let's give them steps mult_1, mult_2, .... With this set up, in the sub-protocol, the first multiplication's step should look like protocol/iteration_1/mult_1 and recrod_id=i. But when I try to run that, I get this error:

thread 'protocol::attribution::accumulate_credit::tests::accumulate' panicked at 'Refined 'protocol' with step 'mult_1' twice', src/protocol/mod.rs:94:13

This is because I'm using the same mult_1 for all rows even though they have different record ids. It seems that we should concatenate record_id=i at the end of step str to generate a unique step identifier.

As a workaround, I ended up calling another ctx.narrow() for each row to give them unique steps (i.e. "row_i"), and each multiplication as separate records.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have to do this. This is an issue with where that "check" happens in the code. Looking forward to an infra fix that helps avoid the need to do this!

// 1. Create credit and stop_bit vectors
// These vectors are updated in each iteration to help accumulate values and determine when to stop accumulating.

let one = Replicated::one(ctx.mesh().identity());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to make credit and stop bit part of the input?

Copy link
Contributor Author

@taikiy taikiy Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credits and stop bits are the states/results generated by running this protocol, so it seems reasonable to create them inside this protocol.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @taikiy - these are basically local variables to this protocol. They're all initialized to Replicated::one(), or one of the other inputs at the beginning of the loop (and over-written in each iteration) so there's no value in passing them in.

Copy link
Collaborator

@benjaminsavage benjaminsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all looking really good, with the exception of needing to work around that infra issue that prevents us from using the same step name on different records.

/// Accumulation step for Oblivious Attribution protocol.
#[allow(dead_code)]
pub struct AccumulateCredit<'a, F> {
input: &'a Batch<AttributionInputRow<F>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this strong suspicion that generating the "helper bits" within the MPC will be more efficient than trying to generate them in a separate step.

// 1. Create credit and stop_bit vectors
// These vectors are updated in each iteration to help accumulate values and determine when to stop accumulating.

let one = Replicated::one(ctx.mesh().identity());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @taikiy - these are basically local variables to this protocol. They're all initialized to Replicated::one(), or one of the other inputs at the beginning of the loop (and over-written in each iteration) so there's no value in passing them in.

// iteration, and the interaction do not depend on the calculation results
// of other elements, allowing the algorithm to be executed in parallel.

let mut iteration_step = IterStep::new("iteration");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have to do this. This is an issue with where that "check" happens in the code. Looking forward to an infra fix that helps avoid the need to do this!


// first, calculate [successor.helper_bit * successor.trigger_bit]
let mut b = ctx
.multiply(RecordId::from(0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. This should be a proper record_id, not just 0 or 1. This is related to the same infra issue from above - we need the ability to run N multiplications in parallel, with the same step, and only the record_id differing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is wrong, too. The only way I was able to make the code work was either like this, or create yet another step for each multiplication and pass record_id = 0 to all of them, which seems equally wrong. I'm looking forward to fixing this part when the infra is updated.

@taikiy
Copy link
Contributor Author

taikiy commented Oct 18, 2022

Addressed the latest PR comments. All should be good except for the steps part @benjaminsavage mentioned. Would it be possible to merge this and address the steps issue in a separate PR?

@benjaminsavage
Copy link
Collaborator

OK, that sounds reasonable. We will fix this soon (once possible). In the meantime - let's land this and you can move on to the next protocol!

@benjaminsavage benjaminsavage merged commit 9fe8280 into private-attribution:main Oct 18, 2022
@taikiy taikiy deleted the attribution branch October 18, 2022 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants