Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider: make Trip 'route' field optional for privacy reasons. #504

Conversation

vperron
Copy link

@vperron vperron commented May 22, 2020

Explain pull request
For many regulatory and privacy-oriented reasons, it's possible that cities or providers do not want external actors to access the route geolocalized data of the trips throughout the city.

We propose to make this route field optional to better reflect that choice.

The distance, cost and duration fields, all available in the general Trip payload, still
contain valuable and aggregated information and should thus be accessible, even if the route is not.

Is this a breaking change

No, not breaking. A mandatory field is now optional.

Which spec(s) will this pull request impact?

provider

For many regulatory and privacy-oriented reasons, it's possible that
cities or providers do not want external actors to access the
geolocalized data of the trips throughout the city.

The distance, average cost, speed, duration of those trips is still
valuable and aggregated information and should thus be accessible, even
if the route may not be.
@vperron vperron requested a review from a team as a code owner May 22, 2020 13:16
@vperron vperron requested a review from a team May 22, 2020 13:16
@CLAassistant
Copy link

CLAassistant commented May 22, 2020

CLA assistant check
All committers have signed the CLA.

@schnuerle
Copy link
Member

About the city limit change, I think that should be removed from this and put in a different PR. Additionally there are cases where a city needs to know about routes outside of their city limits/jurisdiction. See Issue #491 for some discussion.

@vperron vperron force-pushed the vperron/trip-route-optional branch from 84704bc to ad32ddd Compare May 22, 2020 14:35
@vperron
Copy link
Author

vperron commented May 22, 2020

Done with #505 @schnuerle

@vperron vperron changed the title Provider: make Trip 'route' field optional for privacy reasons, fix inconsistency. Provider: make Trip 'route' field optional for privacy reasons. May 22, 2020
@quicklywilliam
Copy link
Contributor

We support this change. It's easy to meet perform basic tasks like getting trip counts via the trips endpoints, which is why many cities use it (the provider/trips endpoint was the most commonly used endpoint in the MDS Maturity survey). It would be great if we could meet such needs without requiring access to route data.

Another thing that we've noticed is that compared to status events, the trips endpoint tends to be more consistent from operator to operator in terms of the data it returns. Hence, a secondary need for a routeless /trips endpoint is to help catch and diagnose issues with status feeds.

@quicklywilliam
Copy link
Contributor

This change is complimentary to #480, which makes location and telemetry data optional.

@thekaveman
Copy link
Collaborator

I think it should be a little clearer that optional here means it is up to the consuming city/agency. There are many consumers of Provider that continue to rely on route objects, including for purposes of accurate cap counting via a blending of status and trips.

Maybe clarity around optional fields is a broader need that should be addressed in the General Information document? Tagging @jfh01 for visibility.

@thekaveman thekaveman added Schema Implications for JSON Schema or OpenAPI Provider Specific to the Provider API labels May 27, 2020
@thekaveman thekaveman added this to the 1.0.0 milestone May 27, 2020
@schnuerle
Copy link
Member

It seems the main issue that is trying to be addressed with this PR is that cities may not want trip line data. I’d like some cities to chime in to see if this is just a ‘possible’ scenario, or a current need.

Another way that this could be solved if needed is to create a way to make this and other fields optional via the API. An example would be the inclusion of a new parameter, say exclude, that could have a list of fields the city does not want to have returned in the response. If a city didn’t want trips, then they can add exclude=trips to the API call. If they don’t want trip or fee data, they could add exclude=trips,fees. This list of possible options can be defined.

This would also solve future issues like this where cities don’t want/can’t receive certain data. And has a benefit of getting only the data you need for the task at hand, and reducing the returned data file size and processing on the provider’s side.

I propose we move this discussion over to the new #507 issue, so we can talk about solutions separate from the details of a PR.

@vperron
Copy link
Author

vperron commented May 27, 2020

@thekaveman I second that !

@schnuerle great idea, I also answered on #507.

@quicklywilliam
Copy link
Contributor

@thekaveman makes a good point about using routes for cap counting and such. I think those needs could be met by just having start/end location but no other route data?

Arguably sharing start/end locations would be equivalent to this PR in terms of privacy impact, because start/end data is already exposed via status changes. So, perhaps this issue could be solved while retaining the use cases @thekaveman mentions via either:

  • a simple language tweak that makes it clear that sharing 2-location (start/end) routes is an acceptable practice
  • start/end location fields that can be provided in lieu of the route field

@thekaveman
Copy link
Collaborator

In my prior experience implementing cap counting in Santa Monica, having just the start/end location would not be enough. The method we used looks at each point of the route within city limits, and uses the earliest/latest timestamps to reconstruct the window of time the vehicle spent inside the boundary.

@vperron
Copy link
Author

vperron commented May 27, 2020

@thekaveman That is very much in line with the point I'm raising, I think.

Of course, if the routes are available (beacuse the city / agency decided it), any client app may then use those to implement accurate information such as heatmaps, etc.

If they are not, advanced features may not be usable, but that would not discard all of the info provided by the /trips endpoint (distance, cost, vehicles involved, etc) and a simpler, best-effort version can be implemented.

Right now, apps couldn't use the endpoint at all if for any reason the routes are not made available.

@quicklywilliam start/end may indeed help, but I'm not entirely sure that every city/agency would be willing to provide them, for the same privacy reasons.

@quicklywilliam
Copy link
Contributor

@vperron are these cities using /trips without the /status endpoint, then?

@vperron
Copy link
Author

vperron commented May 27, 2020

@quicklywilliam I don't know, that was purely hypothetical ! My point being that if privacy is the reason the routes are not provided within trips, replacing them with starting and ending points maybe does not change much that issue.

So my first thought would be that they probably would have to be optional as well, even if I entirely agree, those start/end points could also be found in the /status endpoint, maybe just not as easily.

So we see 4 options now:

  • Do nothing: the routes are mandatory and no trip info is given without it
  • Make the route optional: all the trip metadata is accessible to apps, no privacy issues with /trips
  • Replace route by start/end points in some cases: better info, but maybe still has privacy issues
  • Allow payloads to be parametrized: extremely flexible, but probably tough to implement for all providers

@quicklywilliam
Copy link
Contributor

Ah, thank you for clarifying! Apologies, I think I might have misunderstood your original use case.

Could you clarify what kinds of external actors you have in mind for this use case? If the use case involves sharing trips data widely beyond the agency, I think even removing route data might not suffice to address potential privacy issues.

@vperron
Copy link
Author

vperron commented May 27, 2020

The sensitive data from the point of view of cities&agencies we've contacted is the geolocation, combined with a timestamp, of a particular vehicle, especially if it becomes easy to determine frequent routes from point A to B (commuting, for instance)

If this route information is stripped, the /trips payload still contains duration, vehicle, distance and cost information, which can be used for city-wide mobility analysis and trends but can not identify any place, portion or district specifically.

My point is, we should consider making this basic information accessible even if the cities or agencies are not willing to expose route information, thus making the route "optional".
On the other hand, if the route info is present, further analysis (district-to-district balancing, etc) becomes possible.

@quicklywilliam
Copy link
Contributor

Got, thank you for clarifying! For this use case I think I would advocate for sharing via a different means. I am concerned that in raw format trip data can be attacked, even without route data. For example, if I know duration along with the exact times that a trip ends and begins then it is likely I can attack a GBFS feed by comparing timestamps and durations with when various vehicles disappear at a given location. From there, it is likely I could then establish an exact trip O/D for a portion of trips.

@schnuerle schnuerle modified the milestones: 1.0.0, Future Jun 9, 2020
@schnuerle schnuerle added the privacy Implications around privacy for the attention of the OMF Privacy Committee label Sep 16, 2020
@schnuerle
Copy link
Member

Since the original use case by @vperron is about sharing the feeds with other entities, and removing routes from trips may not be enough to protect privacy, and #480 is now implemented in 1.1.0, and we are discussing this more broadly in #507, I think we should close this specific PR.

@vperron
Copy link
Author

vperron commented Jan 26, 2021

Agreed.

@vperron vperron closed this Jan 26, 2021
@schnuerle
Copy link
Member

This has been addressed even more explicitly with #646 and the ability for cities to exclude route data from provider endpoints.

@schnuerle schnuerle modified the milestones: Future, 1.2.0 Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy Implications around privacy for the attention of the OMF Privacy Committee Provider Specific to the Provider API Schema Implications for JSON Schema or OpenAPI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants