-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deanonymizable/extractable data #7
Comments
Thanks @FishmanL for your comment. Can you spell out Would you please mind elaborating on the search-decision reduction argument, and argue for how to mitigate this shortcoming? Thank you 🙏 |
Sure: DP = differential privacy Search-decision: you repeatedly put in your location at slightly different places to figure out/triangulate the precise locations of every person with the virus, first executing a coordinated attack where you drop new pins at a grid across the place you want to search in order to locate every case. You can mitigate this by adding some small amount of noise to each person's initial location, increasing the amount of added noise over time |
@FishmanL we are currently working on a limited MVP where we can build this. It would be great to add this to the development roadmap and implement as we roll out. Are there off-the shelf DP libraries in Rust that you can point us to? |
None that I know of. |
Thanks for your insights @FishmanL. I would like to challenge your assumptions here, because I question that what you propose relates to the workflow we envision, which is as follows:
So my question is how can she leverage differential privacy to obtain any information about any user in the set, if those individuals take the precaution of not including home addresses or other locations that can uniquely identify them by themselves? I understand how DP works, but I don't think it applies in the data flow we are envisioning. Thoughts? |
A constant circle is no better than a single point, since you can figure out the bounds of the circle with enough queries and just note the center. (in fact, no deterministic answering procedure solves this issue, you need randomness.) Same's true for time. This doesn't fully allow you to deanonymize users, it just allows you to get exact times and locations (number of individuals is also possible by repeated queries, which'll allow you to split up circles into multiple separate onew). How you get from there to actual users is....let's say 'left to the reader' (in smaller towns it's trivial, in cities it's harder). |
@FishmanL would rate-limiting queries and deleting trailing data (i.e., anything older than 14 days) reduce the risk here? As I understand it in your model, an attacker is essentially creating a series of fake data sets and modifying the time and location slightly every time to "scan" for matches. This could be addressed by say, only allowing once-per-day-per-user updates, or possibly trying to ID and limit this behavior in the client code. |
I mean, I don't see any way to handle scaling this to lots of users (which is the only way it's really useful) without risking 'an attacker makes lots of fake users that are near each other' |
Yeah, it's worth looking into what the best options are for making it difficult or expensive to create a ton of fake users without compromising privacy. We could require sign on with some service that provides a layer sybil protection. |
This platform as currently defined allows for extraction of everyone else who's entered's location via search-decision reductions, needs DP
The text was updated successfully, but these errors were encountered: