Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential privacy primitives use insecure noise generation #23002

Open
TedTed opened this issue Jun 13, 2024 · 2 comments
Open

Differential privacy primitives use insecure noise generation #23002

TedTed opened this issue Jun 13, 2024 · 2 comments
Labels

Comments

@TedTed
Copy link

TedTed commented Jun 13, 2024

Hi folks,

This method is adding noise to a sum for the purpose of enforcing differential privacy (as described in a recent talk at PEPR '24). The method used to generate noise is naively calling java.util.Random.nextGaussian, and as such is vulnerable to floating-point attacks as described in this 2012 paper or (since this is Gaussian noise and not Laplace noise) this paper or this one.

This could allow an attacker to get more information out of the output data than they should, in potentially catastrophic ways (precision-based attacks, for attacker, are very simple and allow an attacker to perfectly distinguish between true inputs 0 or 1 more than 25% of the time). I have not gone through the trouble of actually installing Presto and build a PoC, but this is such a textbook example of a vulnerable implementation of this stuff that I hope you'll take this seriously even without it.

@duykienvp
Copy link
Contributor

Thanks @TedTed for bring this up.
We are aware of this attack. I just want to clarify the intent of these noisy functions was NOT to be fully differentially private.
There is some documents about this issue https://github.com/prestodb/presto/pull/22715/files#diff-7461e30f5827d33fc08a54932f0a32b06827971adf505b00dfd531351c824891R185
but it was not released to prestodb.io yet. Some of the limitations mentioned in that doc was just the nature of Presto engine. So we had to ask the practitioner who wants to build a DP system to consult with suitable technical experts first.
And we also welcome anyone to help us address these limitations.

Thanks again

@TedTed
Copy link
Author

TedTed commented Jun 26, 2024

Hi Kien,

I'm going to be honest — I find this response disheartening. You gave a talk at PEPR '24 explaining that you built differential privacy support in Presto, that parts of the code was open-source (even though the rewriter isn't), and that this was used for production use cases across Meta platforms (frustratingly, without giving any more details).

Now you're telling me that this isn't actually trying to implement differential privacy, and that using it for a DP system would require consulting with technical experts. Which experts are you talking to, and why are they not giving you advice such as "first off, make your noise addition primitives safer"? The person who wrote the original paper about floating-point attacks works at Meta. Have you asked him for guidance when building this? If the goal of this work is not to be used to implement differential privacy, then what is the purpose of this code, and why was your PEPR talk suggesting otherwise?

You're saying you welcome help to address these limitations. A very very basic first step would be to fix the noise generation logic, for example by using the primitives from GoogleDP, or re-implementing interval refining in Java. You also probably want to fix this bit of code while you're at it — this is almost certainly not the way you want to compute a DP average. But a lot more things can go wrong when implementing DP, and nobody will be able to help you as long as the rewriter logic is not open-source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants