Skip to content
This repository has been archived by the owner on Oct 4, 2023. It is now read-only.

Data Privacy with Addresses #20

Open
kinlane opened this issue May 12, 2017 · 14 comments
Open

Data Privacy with Addresses #20

kinlane opened this issue May 12, 2017 · 14 comments
Labels

Comments

@kinlane
Copy link
Contributor

kinlane commented May 12, 2017

This has been migrated from Ohana (codeforamerica/ohana-api#419 (comment)) - worth revisiting:

JordanLyons commented 23 days ago
Several programs and agencies in my area don't want their physical or mailing addresses possible. I see that this is a required field for several of the CSV's, and I wonder if there's anything that could be done to prevent that kind of sensitive info getting published on the database.
@greggish

greggish commented 22 days ago
This is a known challenge for things like domestic violence shelters, human trafficking services, etc. Before 2016, I would have called it "an important edge case" but now i don't think that 'edge case' does it justice.

The simple scenario seems to be that there's usually a 'front gate' point of contact that is public, without sharing details about where people actually go once they've entered into relationship with the service.

I'm going to cc @kinlane on this (although i suspect that delineating between public and private addresses is not likely to be an issue we can directly address before 2.0). I'm also curious about how other referral providers handle stuff like this, and it might be an issue you could bring to the Google Group for discussion.

@switzersc
Copy link

+1 on this issue. We've been thinking about this a lot recently and are exploring possible attributes that could indicate the private or confidential nature of location. As a note, we do not store or display sensitive addresses due to the sensitivity of this information, and consider it the organization's prerogative to share that information with a client/individual when they've confirmed the need via public-facing contact, rather than the referral provider's responsibility.

Some ideas:

  • having a field on Location indicating that the location's address is public or confidential, and then that location not having a physical address;
  • that location simply not having a physical address and the application logic deals with things like noting to the user that given certain taxonomy or service type, there won't be an address provided.

@kinlane
Copy link
Contributor Author

kinlane commented May 23, 2017

I am also thinking of this in the context of API management. Which endpoints are publicly available vs requiring registration and keying up to access, for monitoring purposes.

I get a lot of open data folks who say, "it's public data. it shouldn't be keyed up, and locked away." when in reality keying up is about monitoring, and quality control, not locking away.

Some guidance on crafting API access tiers, which endpoints require a key to view, and like you said, maybe there is a schema level switch here, that you have to be in a trusted API access tier before you see some aspects of the schema. IDK.

@switzersc
Copy link

Interesting points. I'm leaning towards the position that no database or API should have confidential information like domestic abuse shelter physical addresses, regardless of access tier, because these are highly sensitive and should be shared purely at the discretion of the provider (e.g. over the phone). Having a "confidential" flag or other indication at the schema level would simply serve as an explanation for why no physical address (or other data point) exists or will ever exist.

In terms of broader data/API access tiers (beyond the above topic), I'm not sure if I can think of any off the top of my head for general READ to implementations of the core API. By that I mean that the data models defined in the spec are all generally publicly accessible information that is "public data" (and I would argue the shelter addresses are not public data), so they are things that are available to people regardless (just not in this great format or API), so I'm not sure if I can envision a use case for tiering access to that. I can see the argument for requiring a registered API key for any read access, but not necessarily different read authorization levels based on API key/client. Do you have any specific examples you're thinking of in terms of more complex read authorization?

Write access, however, I do see as tier-able scenario that could have a lot more authorization logic applied.

@kinlane
Copy link
Contributor Author

kinlane commented May 24, 2017

So agree with you on the confidential information. I really would rather those fields have [contact person reference] to get more information. The politics around this is too much.

Regarding read access I see it purely as identifying and being aware of consumers. Helping alleviate scraping concerns of data stewards, and giving them the ability to control (or not) who has access.

When it comes to tiered READ access, I'm still exploring and brainstorming on this. I think the only separation for first round guidance is public or private, and read or write. Maybe in future separate between hobbiest, civic activist, partner, student, researcher, or commercial. A commercial directory should get unlimited calls to the API for sites they have advertising on? IDK.

I'm hitting the service composition side of the guidance around spec, and I want to keep simple at first, but show the control that data stewards and operators can have.

I thoroughly enjoy having you to talk through this stuff with! thanks.

@switzersc
Copy link

I thoroughly enjoy having you to talk through this stuff with!

Me too! These conversations are so much fun =D

@ckoppelman
Copy link

ckoppelman commented Jun 7, 2017

I work at Polaris, which operates the National Human Trafficking Hotline. @greggish pointed me to this thread and I thought I'd chime in.

We have information for many many providers who do not want their information shared more broadly than us (and sometimes with individual help-seekers and law enforcement). At the scale that we operate, it is prohibitive to keep all of this information in paper records. Some of that information includes:

  • Cell phones of directors of emergency shelter
  • Names and email addresses of staff at orgs
  • Addresses of domestic violence shelters
  • Nearly all information about certain international antislavery organizations.

We currently solve that problem by publishing a subset of information that we have, but that means that public search tooling and in-house search tooling are very different (as are the public and in-house databases). It also means that our closest allies only have large-scale access to the same information as the general public - there are only two levels of access.

I would prefer to maintain fewer sets of data, and to provide more granular access controls to this data. Granting API keys for role-based authorization would be ideal for me. I'd want to be able to define an authorization matrix: some categories of data is visible only to certain roles (e.g., only internal users can access individual names), and some providers as a whole are only visible to certain roles (e.g., domestic violence shelter addresses are available to partners with that clearance).

And I probably want to be able to grant access to the public without issuing API keys because that's a lot of maintenance for one-time users.

@kinlane
Copy link
Contributor Author

kinlane commented Jun 8, 2017

Hey @ckoppelman I would love to talk more about your perspective and awareness. There is a concept in tech circles of "service composition" which with APIs + API keys you can compose exactly the level of access each partner or public consumer gets. When done well this can be at the API endpoint (resource ie. contacts, locations, service, etc), and granular level field, timeframe, etc.

Additionally, you get logging, analysis into each API key, or by groups of API keys. This allows for auditing, billing, security, and many other functions. Ideally, the API management solutions handle the overhead of issuing, revoking and dealing with keys. Making it so that all public users are required to get keys so that you can monitor, audit, as well as stay in tune with each user / app who is truly "active". Lots more benefits.

I'm intrigued to hear more how we can further apply this to sensitive information. I'm always wanting to understand how we can use APIs for good and not just for revenue and locking up data. How we can use to truly secure and keep people safe, while still making data accessible. Ping me anytime at kinlane@gmail.com or via the Open Referral slack, and we can talk more.

@NeilMcKLogic
Copy link

I thought HSDS 1.x had flags for confidential information? Physical addresses are obvious candidates for such flags, as are contact people and methods. Then API authentication/authorization should in my mind apply whatever filtering is needed to mask (or not) some/all of the fields marked confidential on a record-by-record basis. If folks agree, then the standard would need more than a broad method of auth/auth but something with more granularity to support this kind of scenario.

@greggish
Copy link
Member

greggish commented Jun 15, 2017 via email

@kinlane
Copy link
Contributor Author

kinlane commented Jun 16, 2017

The concept of service composition is very much an API management discussion. We are squarely in define, design, deploy (with prototype), and portal. The next logical step is approach service composition in the context of API authorization and management -- this is the layer we'll use to guide the business model discussion. Alas, I agree with Greg that it is a 2.0 item.

@NeilMcKLogic
Copy link

My preferred approach to this is, as @ckoppelman also mentions, asserting this specifically for the API Key each request contains. For any particular API Key whose settings in our system do not allow private data to be emitted in the API (per the wishes of our client organizations who own the data), we will either exclude it or replace it with something like: [private].

@NeilMcKLogic
Copy link

....and I'm not sure this is something HSDS or HSDA needs to incorporate except maybe expressing a preference for how to represent redacted data. Otherwise we'll all come up with different ways to do it e.g. my "[private]" idea above.

@timgdavies
Copy link
Contributor

I've opened an issue against the HSDS spec to explore including a flag for confidential information: openreferral/specification#166

This would likely be a simple binary, whereas APIs could offer more fine-grained access control based on API key as described above (albeit setting a confidentiality flag appropriately when giving out any information that should not be openly shared).

@ckoppelman
Copy link

@NeilMcKechnie I think the safest way to omit data is to not display it. That's sort of the "Glomar" approach.

That works cleanly regarding record-level "Forbiddens" and field-level "Forbiddens" for non-required fields.

When the field is a required field, things get hairier. I think if a field is required according to the spec, and not available to the requester, then the record as a whole may be Forbidden as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants