Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial set of principles about measurement #52

Merged
merged 6 commits into from
Apr 10, 2024

Conversation

npdoty
Copy link
Collaborator

@npdoty npdoty commented Sep 11, 2023

These principles have been referred to in measurement proposals already.

This is not an advocacy position, just documenting some common positions in shorter form.

This is not an exhaustive list of the principles, even the ones described so far by proposals in the group.

Addresses #49

@AramZS AramZS added the call-for-consensus We are calling for participants to reach consensus. 2 weeks from being added or handled via agenda+ label Sep 11, 2023
</section>

<section>
<h4>Measurement should not significantly enable inferences about individual people</h4>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to be more specific than "inferences"? It doesn't feel like a stretch to say all of these proposals allow some form of inference. If geo is a breakdown key, and I get a rough sense of conversion rate for that geo, then I can infer conversion rate for every individual in the geo

It feels like the intention behind most of these proposals is to prevent deterministic data collection on individuals and to prevent cross-site data collection

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a differential privacy protection, for example, would include limiting how much you can infer about whether I as an individual converted from measuring the conversion rate from my geographic region.

Deterministic data collection seems far too narrow a constraint: a site learning that I probably-but-not-definitely accessed a certain healthcare resource would be often unexpected and inappropriate, for example.

"significantly" is still open to be defined further -- like how high an increased probability or how individualized an inference might be.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your language there may solve the problem. "Measurement should limit inferences about individual people"

Your response also brings up another nuance though. You say "limiting how much you can infer about whether I as an individual converted from measuring the conversion rate from my geographic region" which speaks to making inference about past behavior. Is the restriction on inference scoped to past behavior?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm if I make an inference about you without even using your sensitive data, is that a privacy violation? https://differentialprivacy.org/inference-is-not-a-privacy-violation/ argues that this is not a privacy violation and I agree. Additionally, it is an extremely difficult boundary to enforce (e.g. it prevents learning any sort of model of humanity or human nature - this is the "smoking causes cancer" example).

I think in part this section is getting at the "sensitive information disclosure" aspect of https://w3cping.github.io/privacy-threat-model/#hl-sensitive-information which is about the explicit disclosure of the user's sensitive information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People may still have a privacy interest in what is learned about them even without use of their data, but I do think our focus here is on what can be inferred about an individual from their having participated in the measurement. That's the only kind of protection that differential privacy can afford, for example.

If that's unclear, I could add a sentence to the explanation here, or we could document elsewhere that the scope of the privacy principles about measurement is to participation in the measurement.

(I don't think that explicit disclosure is the only thing to protect against, and that's an old draft not currently under development.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate a little bit more nuance added here, because as written it seems to directly forbid doing any kind of science on populations.

@npdoty npdoty mentioned this pull request Oct 12, 2023
@AramZS AramZS added Principles Document This pertains to the Principles and is in an editorial mode. Merge Pending Change is set to be merged pending editor or chair action. Last chance to comment. and removed call-for-consensus We are calling for participants to reach consensus. 2 weeks from being added or handled via agenda+ labels Oct 30, 2023
@npdoty
Copy link
Collaborator Author

npdoty commented Nov 10, 2023

@tgreasby @csharrison are y'all okay with merging this as is? Or would you prefer an explicit qualifier that this is about inferences from having participated in the measurement?

@npdoty
Copy link
Collaborator Author

npdoty commented Nov 10, 2023

working on a revision to note the qualifier about participation in the measurement itself.

@npdoty npdoty added the needs-work Indicates a PR or Issue needs further work before we can pick it up for further discussion or action label Nov 10, 2023
@npdoty npdoty removed the needs-work Indicates a PR or Issue needs further work before we can pick it up for further discussion or action label Nov 21, 2023
@npdoty
Copy link
Collaborator Author

npdoty commented Nov 21, 2023

@csharrison @tgreasby please take a look at the updated clarification

@npdoty npdoty self-assigned this Nov 21, 2023
Copy link
Collaborator

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These a fine aspirational goals, but after thinking about them for a while, I'm not optimistic that they can be achieved.

That said, I will continue to think about ways that things might be improved. It's not too late for that, and I've been wrong in the past.

As for this change, I think that we should accept something like this. The user-comprehension and user-level accountability pieces are good.

I would want it to be clearer about it being more aspirational than a firm stricture when it comes to the research/oversight/audit piece. That said, if we don't have something like this to aspire to, it's easy to dismiss it as unimportant and that would be a bad outcome.

Especially if we don't achieve those goals, it will be good to have a reminder that we could have done better. Then, maybe one day, we might find it is easier to justify the work to do the better thing when it becomes possible.


<p>Some privacy harms -- including to small groups or vulnerable people -- cannot reasonably be identified in the individual case, but only with some aggregate analysis.</p>

<p>Auditors, with internal access to at least one of the participating systems, should be able to investigate and document whether abuse has occurred (for example, collusion between non-colluding helper parties, or interfering with results). When evidence of abuse is discovered, affected parties must be notified.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call today observed that this requirement is not one of our usual strong mathematical ones, but one that is backed by audits. It might be worth calling that out.


<p>Most users will not choose to investigate or be able to interpret individual data about measurements. Independent researchers can provide an important accountability function by identifying potentially significant or privacy-harmful outcomes.</p>

<p>Some privacy harms -- including to small groups or vulnerable people -- cannot reasonably be identified in the individual case, but only with some aggregate analysis.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First paragraph of this section? Maybe reverse the order of the first three paragraphs.

<section>
<h4>Researchers, regulators and auditors should be able to investigate how a system is used and whether abuse is occurring.</h4>

<p>Researchers should be able to learn what measurements are taking place, in order to identify unexpected or potentially abusive behavior and to explain the implications of the system to users (whose individual data may not be satisfyingly explanatory).</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are looking for a problem statement, try this on for size:

Any system that handles user data in the aggregate needs to provide strong constraints that limit the possibility that the data is misused. However, the uses of data that are permitted within those constraints might still admit narrower forms of abuse. For measurement, this might involve selectively targeting individuals or groups of individuals for the purposes of obtaining more and more actionable data about their online activities.

For advertising purposes, this sort of targeting is often a primary goal of measurement systems. A problem arises when this targeting is repeated to the point that it puts individuals at greater risk of exploitation based on the information that is obtained.

The distinction between abusive uses and ordinary uses of these systems could be hard to make without additional information about the inputs to the system.

The measurement systems being proposed all rely on oblivious computation to some degree. This means that access to their internal operation reveals no meaningful information. To that end, most of the information of interest is held by companies in the advertising market: ad techs, publishers, and advertisers.

In attempting to access that information, the key challenge is that any information that might be needed to detect abuse is also virtually guaranteed to be commercially sensitive. Revealing information about the conduct of measurement also reveals information about how advertisers place their advertisements, how they structure their bidding strategies, and even details of clients.

It might be possible for an independent researcher or auditor to gain access to this sort of information. They might be able to convince participants to allow access to the information for certain narrow purposes. The current environment does not establish good incentives for market participants to accede to that sort of inspection. Inspection carries risks both to that commercially sensitive data and to the reputation of the advertiser, with no real upside.

The question we need to ask is whether there is any change to how the system operates that might make the system more open to these sorts of aggregate, independent systems of accountability. In doing so, we need to balance the commercial sensitivity interests of those participating in advertising with those goals. And we need to sustain the high standards we have for privacy at the same time.

@eriktaubeneck
Copy link
Collaborator

These a fine aspirational goals, but after thinking about them for a while, I'm not optimistic that they can be achieved.

I tend to agree with @martinthomson here if we are assuming that all of these need to be achieved technically within the standard that we design. However, if broaden the approach here to include an external auditing system which users could opt into (via a browser extension or something similar), these may become more achievable.

It would be helpful to distinguish which of these goals we believe should be managed technically by the system, and which of these should be managed by a subset of users opting into some sort of auditing system.

qualifier suggested by erik

Co-authored-by: Erik Taubeneck <erik.taubeneck@gmail.com>
<section>
<h3>Accountability</h3>
<section>
<h4>Users should be able to investigate how data about them is used and shared.</h4>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern on this section was raised.

@AramZS
Copy link
Contributor

AramZS commented Apr 10, 2024

Agreement to merge for now and continue to define.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge Pending Change is set to be merged pending editor or chair action. Last chance to comment. Principles Document This pertains to the Principles and is in an editorial mode.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants