Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deanonymizable/extractable data #7

Open
FishmanL opened this issue Mar 24, 2020 · 10 comments
Open

Deanonymizable/extractable data #7

FishmanL opened this issue Mar 24, 2020 · 10 comments

Comments

@FishmanL
Copy link

This platform as currently defined allows for extraction of everyone else who's entered's location via search-decision reductions, needs DP

@lacabra
Copy link
Contributor

lacabra commented Mar 24, 2020

Thanks @FishmanL for your comment. Can you spell out DP for clarity, please?

Would you please mind elaborating on the search-decision reduction argument, and argue for how to mitigate this shortcoming?

Thank you 🙏

@FishmanL
Copy link
Author

Sure: DP = differential privacy

Search-decision: you repeatedly put in your location at slightly different places to figure out/triangulate the precise locations of every person with the virus, first executing a coordinated attack where you drop new pins at a grid across the place you want to search in order to locate every case.

You can mitigate this by adding some small amount of noise to each person's initial location, increasing the amount of added noise over time

@cankisagun
Copy link
Contributor

@FishmanL we are currently working on a limited MVP where we can build this. It would be great to add this to the development roadmap and implement as we roll out.

Are there off-the shelf DP libraries in Rust that you can point us to?

@FishmanL
Copy link
Author

None that I know of.

@lacabra
Copy link
Contributor

lacabra commented Mar 25, 2020

Thanks for your insights @FishmanL. I would like to challenge your assumptions here, because I question that what you propose relates to the workflow we envision, which is as follows:

  1. Users who tested positive add timestamped-locations to the dataset inside the enclave
  2. An attacker who wants to de-anonymize data does not know neither the number of users who have uploaded data nor the number of locations each user has entered. When the attacker queries the enclave for a match, she will get a timestamped location where at least one individual who tested positive has been within a parametrizable radius r (which we can set to not be smaller than a given threshold) and a time interval t (again no smaller than a threshold) for a given time and location, but she will not know if there was an individual or more, nor she will get any other userid for that match.

So my question is how can she leverage differential privacy to obtain any information about any user in the set, if those individuals take the precaution of not including home addresses or other locations that can uniquely identify them by themselves? I understand how DP works, but I don't think it applies in the data flow we are envisioning.

Thoughts?

@FishmanL
Copy link
Author

A constant circle is no better than a single point, since you can figure out the bounds of the circle with enough queries and just note the center. (in fact, no deterministic answering procedure solves this issue, you need randomness.) Same's true for time.

This doesn't fully allow you to deanonymize users, it just allows you to get exact times and locations (number of individuals is also possible by repeated queries, which'll allow you to split up circles into multiple separate onew). How you get from there to actual users is....let's say 'left to the reader' (in smaller towns it's trivial, in cities it's harder).

@ainsleys
Copy link
Contributor

ainsleys commented Apr 8, 2020

@FishmanL would rate-limiting queries and deleting trailing data (i.e., anything older than 14 days) reduce the risk here? As I understand it in your model, an attacker is essentially creating a series of fake data sets and modifying the time and location slightly every time to "scan" for matches. This could be addressed by say, only allowing once-per-day-per-user updates, or possibly trying to ID and limit this behavior in the client code.

@FishmanL
Copy link
Author

FishmanL commented Apr 8, 2020

I mean, I don't see any way to handle scaling this to lots of users (which is the only way it's really useful) without risking 'an attacker makes lots of fake users that are near each other'

@ainsleys
Copy link
Contributor

ainsleys commented Apr 8, 2020

Yeah, it's worth looking into what the best options are for making it difficult or expensive to create a ton of fake users without compromising privacy. We could require sign on with some service that provides a layer sybil protection.

@ainsleys
Copy link
Contributor

ainsleys commented Apr 9, 2020

#43 for @FishmanL current work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants