-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial set of principles about measurement #52
Conversation
update front matter links
principles/index.html
Outdated
</section> | ||
|
||
<section> | ||
<h4>Measurement should not significantly enable inferences about individual people</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to be more specific than "inferences"? It doesn't feel like a stretch to say all of these proposals allow some form of inference. If geo is a breakdown key, and I get a rough sense of conversion rate for that geo, then I can infer conversion rate for every individual in the geo
It feels like the intention behind most of these proposals is to prevent deterministic data collection on individuals and to prevent cross-site data collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a differential privacy protection, for example, would include limiting how much you can infer about whether I as an individual converted from measuring the conversion rate from my geographic region.
Deterministic data collection seems far too narrow a constraint: a site learning that I probably-but-not-definitely accessed a certain healthcare resource would be often unexpected and inappropriate, for example.
"significantly" is still open to be defined further -- like how high an increased probability or how individualized an inference might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your language there may solve the problem. "Measurement should limit inferences about individual people"
Your response also brings up another nuance though. You say "limiting how much you can infer about whether I as an individual converted from measuring the conversion rate from my geographic region" which speaks to making inference about past behavior. Is the restriction on inference scoped to past behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm if I make an inference about you without even using your sensitive data, is that a privacy violation? https://differentialprivacy.org/inference-is-not-a-privacy-violation/ argues that this is not a privacy violation and I agree. Additionally, it is an extremely difficult boundary to enforce (e.g. it prevents learning any sort of model of humanity or human nature - this is the "smoking causes cancer" example).
I think in part this section is getting at the "sensitive information disclosure" aspect of https://w3cping.github.io/privacy-threat-model/#hl-sensitive-information which is about the explicit disclosure of the user's sensitive information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People may still have a privacy interest in what is learned about them even without use of their data, but I do think our focus here is on what can be inferred about an individual from their having participated in the measurement. That's the only kind of protection that differential privacy can afford, for example.
If that's unclear, I could add a sentence to the explanation here, or we could document elsewhere that the scope of the privacy principles about measurement is to participation in the measurement.
(I don't think that explicit disclosure is the only thing to protect against, and that's an old draft not currently under development.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate a little bit more nuance added here, because as written it seems to directly forbid doing any kind of science on populations.
@tgreasby @csharrison are y'all okay with merging this as is? Or would you prefer an explicit qualifier that this is about inferences from having participated in the measurement? |
working on a revision to note the qualifier about participation in the measurement itself. |
@csharrison @tgreasby please take a look at the updated clarification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These a fine aspirational goals, but after thinking about them for a while, I'm not optimistic that they can be achieved.
That said, I will continue to think about ways that things might be improved. It's not too late for that, and I've been wrong in the past.
As for this change, I think that we should accept something like this. The user-comprehension and user-level accountability pieces are good.
I would want it to be clearer about it being more aspirational than a firm stricture when it comes to the research/oversight/audit piece. That said, if we don't have something like this to aspire to, it's easy to dismiss it as unimportant and that would be a bad outcome.
Especially if we don't achieve those goals, it will be good to have a reminder that we could have done better. Then, maybe one day, we might find it is easier to justify the work to do the better thing when it becomes possible.
|
||
<p>Some privacy harms -- including to small groups or vulnerable people -- cannot reasonably be identified in the individual case, but only with some aggregate analysis.</p> | ||
|
||
<p>Auditors, with internal access to at least one of the participating systems, should be able to investigate and document whether abuse has occurred (for example, collusion between non-colluding helper parties, or interfering with results). When evidence of abuse is discovered, affected parties must be notified.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call today observed that this requirement is not one of our usual strong mathematical ones, but one that is backed by audits. It might be worth calling that out.
|
||
<p>Most users will not choose to investigate or be able to interpret individual data about measurements. Independent researchers can provide an important accountability function by identifying potentially significant or privacy-harmful outcomes.</p> | ||
|
||
<p>Some privacy harms -- including to small groups or vulnerable people -- cannot reasonably be identified in the individual case, but only with some aggregate analysis.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First paragraph of this section? Maybe reverse the order of the first three paragraphs.
<section> | ||
<h4>Researchers, regulators and auditors should be able to investigate how a system is used and whether abuse is occurring.</h4> | ||
|
||
<p>Researchers should be able to learn what measurements are taking place, in order to identify unexpected or potentially abusive behavior and to explain the implications of the system to users (whose individual data may not be satisfyingly explanatory).</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are looking for a problem statement, try this on for size:
Any system that handles user data in the aggregate needs to provide strong constraints that limit the possibility that the data is misused. However, the uses of data that are permitted within those constraints might still admit narrower forms of abuse. For measurement, this might involve selectively targeting individuals or groups of individuals for the purposes of obtaining more and more actionable data about their online activities.
For advertising purposes, this sort of targeting is often a primary goal of measurement systems. A problem arises when this targeting is repeated to the point that it puts individuals at greater risk of exploitation based on the information that is obtained.
The distinction between abusive uses and ordinary uses of these systems could be hard to make without additional information about the inputs to the system.
The measurement systems being proposed all rely on oblivious computation to some degree. This means that access to their internal operation reveals no meaningful information. To that end, most of the information of interest is held by companies in the advertising market: ad techs, publishers, and advertisers.
In attempting to access that information, the key challenge is that any information that might be needed to detect abuse is also virtually guaranteed to be commercially sensitive. Revealing information about the conduct of measurement also reveals information about how advertisers place their advertisements, how they structure their bidding strategies, and even details of clients.
It might be possible for an independent researcher or auditor to gain access to this sort of information. They might be able to convince participants to allow access to the information for certain narrow purposes. The current environment does not establish good incentives for market participants to accede to that sort of inspection. Inspection carries risks both to that commercially sensitive data and to the reputation of the advertiser, with no real upside.
The question we need to ask is whether there is any change to how the system operates that might make the system more open to these sorts of aggregate, independent systems of accountability. In doing so, we need to balance the commercial sensitivity interests of those participating in advertising with those goals. And we need to sustain the high standards we have for privacy at the same time.
I tend to agree with @martinthomson here if we are assuming that all of these need to be achieved technically within the standard that we design. However, if broaden the approach here to include an external auditing system which users could opt into (via a browser extension or something similar), these may become more achievable. It would be helpful to distinguish which of these goals we believe should be managed technically by the system, and which of these should be managed by a subset of users opting into some sort of auditing system. |
qualifier suggested by erik Co-authored-by: Erik Taubeneck <erik.taubeneck@gmail.com>
<section> | ||
<h3>Accountability</h3> | ||
<section> | ||
<h4>Users should be able to investigate how data about them is used and shared.</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern on this section was raised.
Agreement to merge for now and continue to define. |
These principles have been referred to in measurement proposals already.
This is not an advocacy position, just documenting some common positions in shorter form.
This is not an exhaustive list of the principles, even the ones described so far by proposals in the group.
Addresses #49