Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A flexible approach to privacy and location #61

Closed
jfh01 opened this issue Sep 12, 2018 · 9 comments
Closed

A flexible approach to privacy and location #61

jfh01 opened this issue Sep 12, 2018 · 9 comments
Labels
enhancement New feature or request Provider Specific to the Provider API

Comments

@jfh01
Copy link
Contributor

jfh01 commented Sep 12, 2018

The MDS technical workshop made clear that LADOT wants precise location data as part of their interactions with providers. LADOT feels that can manage any real or perceived privacy risks that stem possessing anonymized, location-precise trip data, and that the benefits of this data are significant.

From comments made at the workshop, it seems that there is not universal agreement on this point, especially given the variability in state-level FOIA laws and the different attitudes individual cities have about the broad topic.

I'd like to propose that MDS be extended to allow for flexibility in how location data is reported for trips and routes. LA can choose to require precise data, but other jurisdictions would have the flexibility to request less detailed information. It is ultimately a policy decision, and a universal spec needs to support some level of policy variance across its users.

Two approaches to (intentionally) reducing the accuracy of GPS data:

  1. Blurring data to report less precise coordinates. These would correspond to a radius of uncertainty (e.g. accurate to within 500m).

  2. Aggregating data into defined boundary areas (e.g. census block, ZCTA, neighborhood, etc.).

Option 1 seems like the simplest approach. It could be done without change to the MDS specification. Providers and agencies would simply agree on the level of specificity in the route's GeoJSON FeatureCollection. With this approach, the specification can simply be modified to acknowledge the possibility of deliberately imprecise data and leave it up to the stakeholders to agree on the details.

Option 2 may be easier for cities to consume (since they're already used to aggregating data into boundaries like census block), though would necessitate more pre-processing by the providers. It also would require modification to the specification since GeoJSON does not have an obvious way to describe location within specify externally-defined boundary areas.

Thoughts on:

  1. Whether we should modify the spec to support imprecise location sharing?
  2. If yes, what method to use?
@aickin
Copy link
Contributor

aickin commented Sep 12, 2018

Nitpick: I think that option 1 would require a small change to the spec in that you’d need to remove the line:

Additionally, routes must include all possible GPS samples collected by a provider.

I vote option 1. It’s much more flexible and allows for accurate routes for providers and agencies who are comfortable with that. I suspect that many agencies will want to know traffic patterns, and that is only discernible via option 1.

Note too that a provider and agency could agree to implement option 2 with the spec of option 1 by just saying that the provider will move each point to the center of its census block (or neighborhood or whatever). The converse (implementing option 1 with the spec of option 2) is not possible.

Also, I’d argue that option 2 is worrisome given that there is no international block/neighborhood standard.

@ezheidtmann
Copy link

This is a great discussion. We've heard reports of certain vendors truncating their values to 2 decimal places already! At least, LADOT should be clear they are requiring 6-7 decimal points in latitude and longitude values, without snapping.

I don't believe that truncation is an appropriate anonymization technique, but a snapping technique that is related population-density (like census tract) might be. However, such low-precision locations are not useful for street-level insights, as @aickin points out.

@thekaveman
Copy link
Collaborator

Agreed with @ezheidtmann, great discussion on this topic!

I think the proposal over at #51 would actually cover @jfh01's Option 1 above, correct?

@jfh01
Copy link
Contributor Author

jfh01 commented Sep 12, 2018

I think it would. I don't think we need a way to reflect GPS accuracy separately from intentional "deprecisioning."

As @ezheidtmann points out though, there's a larger question about whether this approach can achieve a good balance between utility and privacy. The answer may end up being dependent on the needs/attitudes of each city.

There's a related question on time-specificity. One additional privacy measure would be to provide trip data with the start and end times rounded to the nearest 15 minute or 1 hour increment.

@jfh01
Copy link
Contributor Author

jfh01 commented Sep 14, 2018

A good article on the specific risks with geo-precise, anonymized data:

https://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/

Even without persistent user IDs, there is some risk of reidentification or misuse.

@asadowns
Copy link
Contributor

A little late to this thread but option 1 seems like a good option. It's worth distinguishing between accuracy and precision. What we are really recording with accuracy is "precision" even if we provide 12 decimal places due to limitations of GPS the accuracy of those readings will probably be less because of factors that effect GPS like urban canyon.

Good StackExchange thread on the subject.

@ezheidtmann
Copy link

I agree that we don't want to conflate imprecision due to the fundamentals of GPS sensors (called accuracy on both major smartphone platforms) with intentional lowering of precision via truncation or snapping.

Imagine a tool that processes MDS entities -- that tool should be able to know whether it's working with a stream of GPS locations, a list of census tract centroids, or a list of locations snapped to a 1km grid. (just 2 examples)

@d-wasserman
Copy link

d-wasserman commented May 21, 2019

Could one version of the specification just take block level aggregations to a SharedStreets ID or OSM ID? I realize this is an old thread but, I think this provides the degree of operational data many cities would need for curbside management and related applications. I think route choice applications might be lost, but this might be a problem that requires a tiers of deployment perhaps?

I agree with @aickin that this could be compatible with by snapping all points to the the nearest "centroid" of a SS ID or OSM ID. Benefit of these is that in theory they should be able to be internationally compatible if based around OSM.

My only concern here is it might not be "anonymized" enough.

@hunterowens hunterowens added this to the Future milestone Jun 27, 2019
@sarob sarob added enhancement New feature or request Provider Specific to the Provider API labels Dec 19, 2019
@jfh01
Copy link
Contributor Author

jfh01 commented Apr 9, 2020

Per conversation on Provider Services Working Group we are closing this issue and will open new issue(s) with more considered approaches to this issue.

@jfh01 jfh01 closed this as completed Apr 9, 2020
@schnuerle schnuerle removed this from the Future milestone Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Provider Specific to the Provider API
Projects
None yet
Development

No branches or pull requests

9 participants