Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider Reports - Static #607

Merged
merged 10 commits into from
Dec 16, 2020
Merged

Provider Reports - Static #607

merged 10 commits into from
Dec 16, 2020

Conversation

schnuerle
Copy link
Member

Explain pull request

This is a beta implementation of the #569 special groups aggregate metrics issue. It adds a /reports endpoint within the Provider API that handles returning monthly trip counts of special groups, in this case just low income groups.

The response served up is a pre-generated CSV file of all data.

Is this a breaking change

  • No, not breaking (new feature)

Impacted Spec

Which spec(s) will this pull request impact?

  • provider

Additional context

See PR #606 for a version that is dynamic and not static like this one.

@schnuerle schnuerle requested a review from a team as a code owner December 11, 2020 22:18
@schnuerle schnuerle requested a review from a team December 11, 2020 22:18
@schnuerle schnuerle self-assigned this Dec 11, 2020
@schnuerle schnuerle added enhancement New feature or request privacy Implications around privacy for the attention of the OMF Privacy Committee Provider Specific to the Provider API labels Dec 11, 2020
@schnuerle schnuerle added this to the 1.1.0 milestone Dec 11, 2020
@schnuerle schnuerle linked an issue Dec 11, 2020 that may be closed by this pull request
@schnuerle schnuerle marked this pull request as draft December 11, 2020 22:20
@schnuerle schnuerle mentioned this pull request Dec 11, 2020
provider/README.md Outdated Show resolved Hide resolved
@quicklywilliam
Copy link
Contributor

I like this PR - it'll get this data out of email and into a standard format in MDS, but it's a small enough change from the current behavior that it's reasonable for cities to begin requiring operators to adopt it in the near future. It would be a happy outcome indeed if, through that adoption, we get operator feedback (from folks besides Spin, thank you @dirkdk and @joshuaandrewjohnson1!).

@quicklywilliam
Copy link
Contributor

quicklywilliam commented Dec 15, 2020

We should clarify that the k=10 requirement is applied to both trips and unique riders. We don't want data being shared about single rider who takes a daily trip, particularly when that rider is a member of a Special Group.

@quicklywilliam
Copy link
Contributor

I want to raise one concern about this sentence:

All geography IDs included in the city published Geography API endpoint are included in the report results.

In practice, I worry that this could lead to situations where a small and/or sensitive geography inadvertently becomes part of a report about Special Groups. For example, the city might have a no parking zone around a single public park that is the home of political protests. Clarifying the k-value requirement as per the above would help address this, but it still violates the principle of data minimization.

Longer term, I believe the fix for this is allowing the Policy API to describe data sharing requirements through a new Policy Rule type. Shorter term, I wonder if we should modify this to say that the Geographies in Reports should be agreed upon by the City and the Operator?

@thekaveman
Copy link
Collaborator

@quicklywilliam

In practice, I worry that this could lead to situations where a small and/or sensitive geography inadvertently becomes part of a report about Special Groups. For example, the city might have a no parking zone around a single public park that is the home of political protests.

I wonder if this is a real concern though? For one, we're talking about a monthly report - so even in the case of a tiny park / geography, you're still not getting any better temporal granularity. The protest happened over a weekend? Great, it's drowned by the rest of the month. The protest happened over the entire month? Great, that's a lot of data aggregation! If there were less than 10 riders and/or trips in that geometry over the month, you get no data anyway.

We touched on the idea of the "City and Provider should discuss which geographies are returned"; it can definitely work, but also doesn't scale as well on either side as things change. I like your longer-term vision for Policy models of data sharing agreements. As this is beta (and Geography itself only now starting to pick up some momentum) we wanted to keep it simple.

@schnuerle
Copy link
Member Author

We should clarify that the k=10 requirement is applied to both trips and unique riders. We don't want data being shared about single rider who takes a daily trip, particularly when that rider is a member of a Special Group.

Yes I will add that clarification thanks!

@quicklywilliam
Copy link
Contributor

@thekaveman Agreed, the concern here is principled (Data Minimization) rather than real world. I gave a pretty poor example, should have been more clear on that point.

I'm OK with leaving it as-is as long as we think it will be OK to make a breaking change in the future that limits Reports to only explicitly requested geographies. I believe that as a beta feature that shouldn't be a problem.

@schnuerle
Copy link
Member Author

schnuerle commented Dec 16, 2020

We touched on the idea of the "City and Provider should discuss which geographies are returned"; it can definitely work, but also doesn't scale as well on either side as things change... As this is beta (and Geography itself only now starting to pick up some momentum) we wanted to keep it simple.

I think keeping it as simple as possible for this version is best, which means not having to communicate which geographies to return ahead of time. This way the entire /reports can be created without any needed communication about the details between the city and provider.

I also agree with @thekaveman that in this case the risk is very minimal with counts in geographies redacted when the values are low over a month period. It's more data and possible some geographies that are not needed, but the effort to calculate and return these is less than the effort/time to communicate and agree on the list of IDs.

I also like the idea of modeling data sharing agreements. This idea came up in a WG call by @whereissean - a sort of meta data MDS json that communicated all the optional parts of MDS that a city would like providers to support: endpoints, optional fields, IDs (like geography for reports in this case), beta features (like GDEs), optional location data, etc. This would make it clear what is required and what is not. A good proposal for 2.0!

@schnuerle schnuerle marked this pull request as ready for review December 16, 2020 01:33
@schnuerle schnuerle merged commit 14380b0 into dev Dec 16, 2020
@quicklywilliam
Copy link
Contributor

@schnuerle My thoughts exactly. I went ahead and filed #608 to track this.

@schnuerle schnuerle deleted the feature-provider-reports-static branch January 26, 2021 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request privacy Implications around privacy for the attention of the OMF Privacy Committee Provider Specific to the Provider API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Special groups aggregate metrics
3 participants