Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should deleted personal data be accessable through the API? #566

Open
asbjornst opened this issue Sep 9, 2019 · 27 comments
Open

Should deleted personal data be accessable through the API? #566

asbjornst opened this issue Sep 9, 2019 · 27 comments
Milestone

Comments

@asbjornst
Copy link

@asbjornst asbjornst commented Sep 9, 2019

Deleted objects can be queried with eg. /api/net?status=deleted&since=1&id__lt=50.

For most object types that's properly not a problem, for poc however there might be a problem with GDPR, as the name, phone and email is considered personal information.

While many poc objects are non-personal contacts like NOC, peering, abuse or sales. Some are individuals, and they properly don't expect/want these to be accessible after deletion.

Currently this exposes 3080 unique email addresses from deleted poc objects, 541 of which have public visibility. These numbers only include addresses not found in non-deleted objects.

Is this as designed, or should deleted objects be limited to the essential fields for synchronization (id, status, updated)?

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 9, 2019

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 9, 2019

Good find! Yeah, we should close this.

Shouldn't we first discuss it. These information has been visible and arguable is out in the wild. As such it is not a data leak imho.

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 9, 2019

@asbjornst

This comment has been minimized.

Copy link
Author

@asbjornst asbjornst commented Sep 10, 2019

Job, I don't know the code base well enough to make a good patch quickly.

I will not have time to give it a stab, before EPF through.

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 10, 2019

@asbjornst

This comment has been minimized.

Copy link
Author

@asbjornst asbjornst commented Sep 10, 2019

Arnold,

Here's a few links to help determine if this is a GDPR breach. Note "data minimisation", personal data is not allowed to be kept once it's outlived it's original purpose. Deleted data has properly done just that. GDPR kicks in when you have the first personal data record about an EU citizen.

What is personal data?

https://gdpr-info.eu/issues/personal-data/

The data subjects are identifiable if they can be directly or indirectly identified, especially by reference to an identifier such as a name, an identification number, location data, an online identifier or one of several special characteristics, which expresses the physical, physiological, genetic, mental, commercial, cultural or social identity of these natural persons. In practice, these also include all data which are or can be assigned to a person in any kind of way. For example, the telephone, credit card or personnel number of a person, account data, number plate, appearance, customer number or address are all personal data.

Principles relating to processing of personal data

https://gdpr-info.eu/art-5-gdpr/

  1. Personal data shall be:
    (a) processed lawfully, fairly and in a transparent manner in relation to the data subject (‘lawfulness, fairness and transparency’);
    (b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);
    (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);
    (d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay (‘accuracy’);
    (f) processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’).
  2. The controller shall be responsible for, and be able to demonstrate compliance with, paragraph 1 (‘accountability’).

Notification obligation regarding rectification or erasure of personal data or restriction of processing

https://gdpr-info.eu/art-19-gdpr/

The controller shall communicate any rectification or erasure of personal data or restriction of processing carried out in accordance with Article 16, Article 17(1) and Article 18 to each recipient to whom the personal data have been disclosed, unless this proves impossible or involves disproportionate effort. 2The controller shall inform the data subject about those recipients if the data subject requests it.

Notification of a personal data breach to the supervisory authority

https://gdpr-info.eu/art-33-gdpr/

In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority competent in accordance with Article 55, unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons. 2Where the notification to the supervisory authority is not made within 72 hours, it shall be accompanied by reasons for the delay.

Who's the supervisory authority of a US organization?

https://www.hipaajournal.com/u-s-companies-appoint-gdpr-lead-supervisory-authority/

A U.S. company that does not have a base in an EU member state has a problem. If it does not have a base in an EU member state where data procession decisions are made, it will not benefit from the one-stop-shop mechanism. Even if a company has a representative in an EU member state, that does not trigger the one-stop-shop mechanism.

The company must therefore deal with the supervisory authority in every member state where the company is active, through its local representative. There would not be any lead supervisory authority. Article 27 of GDPR details the requirement to appoint a local representative in an EU member state.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 10, 2019

In my opinion deleted data shouldn’t be retrievable.

@job, this was added as a feature recently. See #451

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 10, 2019

@koalafil

This comment has been minimized.

Copy link

@koalafil koalafil commented Sep 11, 2019

I think it is very confusing to show deleted data in general.
Further more, making it retrievable contradicts with the users' specific wish to be removed from the db practically too.

PeeringDB keeping deleted data somewhere safe and internal is one thing, but making it retrievable after deletion is a whole different issue.

+1 with @job and @asbjornst's concerns.

I think it may help PeeringDB to keep the deleted data for archival purposes internally (and also to help users if deletion was accidental) but I think deleted data should not be retrievable by any means by other users of PeeringDB.

@koalafil koalafil added this to the Decide milestone Sep 11, 2019
@koalafil

This comment has been minimized.

Copy link

@koalafil koalafil commented Sep 11, 2019

Relates to discussions happening under #121

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

I think it is very confusing to show deleted data in general.

This is not the case. You explicitly have to specify that you want to see data with status=deleted. Hence the users know what they do. No confusion at all.

There are several uses cases where it makes sense to be able to retrieve data with status=deleted.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

Relates to discussions happening under #121

Why?

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 11, 2019

@koalafil

This comment has been minimized.

Copy link

@koalafil koalafil commented Sep 11, 2019

I think it is very confusing to show deleted data in general.

This is not the case. You explicitly have to specify that you want to see data with status=deleted. Hence the users know what they do. No confusion at all.

There are several uses cases where it makes sense to be able to retrieve data with status=deleted.

Can you give some examples of these cases?
If I requested deletion of my own POC from PeeringDB I really doubt if I want that record to be seen by others. What about my use case here or right to be "forgotten" in wider terms?

Re your question why #121 relates to this issue is because 121 also talks about deleting some data and Product Committee is gearing towards keeping the deleted data. While 121 seems to be specific to org objects the general discussion and Product Committee members' suggestions are relevant to this issue too, ie keeping deleted data.

It would be ideal to avoid contradictory results out of these issues, if any. Hence the reference, as a cautious action from my end, to keep an eye on both issues and the discussions within them.

@ccaputo

This comment has been minimized.

Copy link
Contributor

@ccaputo ccaputo commented Sep 11, 2019

As I understand it historically, access to deleted objects (such as in the IRR or RIR realm) is useful for research purposes, but the research is concerned with objects of a non-PII nature.

Regardless of GDPR, but emphasized by GDPR, if a user of PeeringDB anywhere in the world wants their PII data to be deleted from public accessibility, we should be doing just that.

In the sense of the upcoming Task Force on data ownership, it may be determined that the owner of PII data is the person themself. Off the cuff, that sounds logical to me.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

It would be ideal to avoid contradictory results out of these issues, if any. Hence the reference, as a cautious action from my end, to keep an eye on both issues and the discussions within them.

#121 deals with providing a button to directly delete an org record. Regardless of the outcome it will never contradict to whether deleted objects are retrievable via the API.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

As I understand it historically, access to deleted objects (such as in the IRR or RIR realm) is useful for research purposes, but the research is concerned with objects of a non-PII nature.

Regardless of GDPR, but emphasized by GDPR, if a user of PeeringDB anywhere in the world wants their PII data to be deleted from public accessibility, we should be doing just that.

In the sense of the upcoming Task Force on data ownership, it may be determined that the owner of PII data is the person themself. Off the cuff, that sounds logical to me.

My understanding is that POC data deals with role accounts only. Hence per se non-PII. No?

@ccaputo

This comment has been minimized.

Copy link
Contributor

@ccaputo ccaputo commented Sep 11, 2019

My understanding is that POC data deals with role accounts only. Hence per se non-PII. No?

If you mean Point of Contact data in IRR or RIR realms, there are plenty of cases of people using personal email addresses and phone numbers and names. That counts as PII data.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

If you mean Point of Contact data in IRR or RIR realms, there are plenty of cases of people using personal email addresses and phone numbers and names.

PeeringDB contact information only deals with roles

@asbjornst

This comment has been minimized.

Copy link
Author

@asbjornst asbjornst commented Sep 11, 2019

PeeringDB contact information only deals with roles

Which is also PII, when they can be directly or indirectly identified to an individual.
If it was all noc, peering, abuse, sales, ... then it wouldn't be PII.

Executing the query below yields 2876 deleted poc records of which only a few are
non-PII, and since it's save to say that at least one of them is an EU citizen. Those
PII records falls under GDPR protection.

SELECT DISTINCT name, phone, email FROM deleted_poc WHERE
  email NOT ILIKE '%noc%@%' AND
  email NOT ILIKE '%peering%@%' AND
  email NOT ILIKE '%sales%@%' AND
  email NOT ILIKE '%helpdesk%@%' AND
  email NOT ILIKE '%abuse%@%' AND
  email NOT ILIKE '%bgp%@%' AND
  name NOT ILIKE '%admin%' AND
  name NOT ILIKE '%group%' AND
  name NOT ILIKE '%team%' AND
  name NOT ILIKE '%staff%' AND
  name NOT ILIKE '%engineer%' AND
  name NOT ILIKE '%tech%' AND
  name NOT ILIKE '%noc%' AND
  name NOT ILIKE '%planned%' AND
  name NOT ILIKE '%@%' AND
  name NOT ILIKE '%operation%' AND
  name NOT ILIKE '%public%' AND
  name NOT ILIKE '%relations%' AND
  name NOT ILIKE '%peer%' AND
  name NOT ILIKE '%help%';
@ccaputo

This comment has been minimized.

Copy link
Contributor

@ccaputo ccaputo commented Sep 11, 2019

PeeringDB contact information only deals with roles

I am confused by this assertion. When I run then following as an unauthenticated user, I get a ton of personal names and personal emails:

curl -sG https://peeringdb.com/api/poc?status=deleted\&since=1 | tr "," "\n"

@koalafil

This comment has been minimized.

Copy link

@koalafil koalafil commented Sep 11, 2019

It would be ideal to avoid contradictory results out of these issues, if any. Hence the reference, as a cautious action from my end, to keep an eye on both issues and the discussions within them.

#121 deals with providing a button to directly delete an org record. Regardless of the outcome it will never contradict to whether deleted objects are retrievable via the API.

My note says "it relates to". It does not say the two issues are identical.
Deleting objects but keeping them are common discussion points noted in both issues.
We better catch any outcome that can be relevant in the other issue. Reference does not hurt.
Lets focus on the content.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

What’s the use case for PII data though?

Sorry ... I should have been more specific. There are uses cases to show deleted records (net, fac,, ix and the like) per se.

For poc I also do not see a use case atm. Maybe we also should put a hint not to include any PII data in poc information.

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

Deleting objects but keeping them are common discussion points noted in both issues.

Imho this should be discussed in a separate issue. Especially it has nothing to do whether we provide a delete button for org or not

@job

This comment has been minimized.

Copy link
Contributor

@job job commented Sep 11, 2019

@arnoldnipper

This comment has been minimized.

Copy link
Contributor

@arnoldnipper arnoldnipper commented Sep 11, 2019

Alternatively, we don’t return POC data if the status is ‘deleted’

See #569. I guess we should be able to role this in with the next release

@asbjornst

This comment has been minimized.

Copy link
Author

@asbjornst asbjornst commented Sep 12, 2019

Another way to solve it, would be to blank the PII fields in POC objects when status is set to deleted.
That would be in-line with GDPR's data minimization requirement.
#569 isn't since it will keep the PII data around forever, for no good reason.
If really needed it could be kept for a audit log for a fixed amount of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.