Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPA Biweekly Meeting Agenda Request - Additional Sharding Designs #75

Open
bmcase opened this issue May 17, 2023 · 6 comments
Open

IPA Biweekly Meeting Agenda Request - Additional Sharding Designs #75

bmcase opened this issue May 17, 2023 · 6 comments
Assignees
Labels
agenda+ Agenda request for Biweekly Call

Comments

@bmcase
Copy link
Collaborator

bmcase commented May 17, 2023

Agenda+: What do you want to discuss?

@marianapr and @schoppmp have been thinking about additional ideas for sharding that could benefit IPA scalability. We would like to let them present and get feedback on this direction in a future PATCG/IPA call.

To prepone a few questions about their design

  1. What is the setting/assumptions? can it match IPA's malicious honest majority setting?
  2. Can it work as a wrapper around IPA's main MPC queries? In that it partitions the data into shards and each shard runs IPA's MPC query.
  3. Does it work with stable sharding on timestamps or would sorting on timestamp be needed after?
  4. How does it prevent leakages, like cardinality attacks on matchkeys and ensuring that shard size are differentially private?

Time

We can probably spend most of a meeting on this

Issue Link

Other Links

A couple other issues related to sharding for IPA are:
#35
#49

@bmcase bmcase added the agenda+ Agenda request for Biweekly Call label May 17, 2023
@bmcase
Copy link
Collaborator Author

bmcase commented May 17, 2023

@marianapr and @schoppmp would it work for you to present on the sharding designs for IPA that you've been looking into in next week's PATCG/IPA call (next Tuesday 3pm PT)? Look forward to hearing what you've been looking into and discussing.

@schoppmp
Copy link

schoppmp commented Jun 2, 2023

I added an agenda request to present our protocol at the next PATCG meeting: patcg/meetings#125

@schoppmp
Copy link

schoppmp commented Jun 2, 2023

Some quick responses to these:

What is the setting/assumptions? can it match IPA's malicious honest majority setting?

The model is 3PC with honest majority and malicious parties. We wrote our protocol assuming that inputs (matchkeys and payloads) are encrypted by the user (not the report collector). An alternative way that still works and probably fits IPA's query model better is to assume an MPC functionality ShareAndEncryptPayload(x) = Enc(P1, x1), Enc(P2, x2), Enc(P3, x3), where x1 + x2 + x3 = x are fresh secret-shares of x generated in MPC. This could be invoked by the report collector for each payload before running the partitioning protocol.

Can it work as a wrapper around IPA's main MPC queries? In that it partitions the data into shards and each shard runs IPA's MPC query.

Yes, that's the idea.

Does it work with stable sharding on timestamps or would sorting on timestamp be needed after?

The partitioning is shuffling-based and therefore not stable. However, the second variant of our protocol can compute an OPRF with large, sparse output domain, which would allow sorting by matchkey in the clear. The MPC in each partition would then only need to sort by timestamp, which may be more efficient than sorting by the matchkey in MPC.

How does it prevent leakages, like cardinality attacks on matchkeys and ensuring that shard size are differentially private?

Shard sizes are made private by inserting dummies, either to all buckets (small OPRF domain), or as in this paper (large OPRF domain). Dummies would have to be filtered out in MPC in each partition.
As long as all inputs are encrypted by the client (or the helper parties, see above) such that only the helper parties can decrypt, duplicating unknown matchkeys should not be possible.

@bmcase
Copy link
Collaborator Author

bmcase commented Jun 9, 2023

An alternative way that still works and probably fits IPA's query model better is to assume an MPC functionality ShareAndEncryptPayload(x) = Enc(P1, x1), Enc(P2, x2), Enc(P3, x3), where x1 + x2 + x3 = x are fresh secret-shares of x generated in MPC

Yes, that could be done, but I would be concerned about the cost of doing an HE encryption side the MPC. Encryption by the client might be better but we'd need to see how much more expensive that would be. I think overall the next step should be to sketch out some performance estimates for these different approaches.

However, the second variant of our protocol can compute an OPRF with large, sparse output domain, which would allow sorting by matchkey in the clear. ... Shard sizes are made private by inserting dummies, either to all buckets (small OPRF domain), or as in this paper (large OPRF domain).

I think my concern here is that this paper assumes a sensitivity bound of 1, where as in IPA we may be able to have a reasonable upper bound on the sensitivity but it will be a lot larger than 1. For making shard sizes differentially private, I don't think a larger bound on the sensitivity leads to too much extra dummy records added (as in #35), but if we want to keep the granularity at the matchkey level I'm concerned that the amount of noise needed would be very large to get a decent DP bound.

@marianapr
Copy link

I think in the paper we have estimates with epsilon 5 as well. We can get estimates with other epsilons as well (I will be surprised if the number of dummies will be very large). I think the sharing of the payload can be done after the shuffle.

@bmcase
Copy link
Collaborator Author

bmcase commented Jul 10, 2023

The IPA team has done some internal brainstorming on this and how we might get to a hard cap on the sensitivity for how many times a matchkey occurs. We have another agenda item for the first 45min tomorrow, but we can use the last 15min to give an update in the call if @schoppmp @marianapr or other interested folks are there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda+ Agenda request for Biweekly Call
Projects
None yet
Development

No branches or pull requests

4 participants