Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let browsers have different privacy settings #11

Closed
michaelkleber opened this issue Jun 3, 2019 · 24 comments
Closed

Let browsers have different privacy settings #11

michaelkleber opened this issue Jun 3, 2019 · 24 comments
Assignees

Comments

@michaelkleber
Copy link

Hi John: I'd like to explore ways to make the developer-facing API surface more flexible, so that multiple browsers can provide the same API but still make their own independent decisions on details like the adcampaignid="..." value being 6 bits.

You currently build the limitation directly into the ad HTML with the requirement "If any of the conditions do not hold, such as the ad campaign id being larger than 6-bit, the request for ad click attribution is ignored." This seems to make it hard to change, both for other implementers now, and for any browser which ever wants to change the limits it places on the API's information flow.

To propose a concrete alternative, how would you feel about the click-time ad-campaign-id and the conversion-time ad-attribution-data both being arbitrary strings over some specified alphabet, where the browser limited information flow by truncating the string to some chosen length?  If you were to allow 256 or 100 campaign IDs then we could use hex or decimal; if you need to stick to 64 then I guess this API could use octal (chmod fans will love us!). I'm saying truncation, rather than e.g. mod-n, so that a server receiving an ad click attribution HTTP request can easily tell the granularity to which it was limited; of course there are other ways to achieve that goal.

To be transparent, the kind of choice I'd like Chrome to be able to make is to allow more information in the (click-time) campaign ID, but compensate for it by restricting to less data on the ad-attribution-data (conversion-time) value.  The use cases we've been hearing about in the W3C web-advertising Business Group rely on having more data about what click it was that did or didn't lead to a conversion, and from that point of view the 6+6 bits proposed here isn't the balance we would choose.

Anyway, while I'm happy to debate the specifics of the privacy settings in some other forum, that's not my goal here — I just want us to offer developers a single API across multiple browsers, each of which can come to their own conclusions on thornier questions.

@jasonanovak
Copy link

Hi @michaelkleber! Sorry for our delayed response, @johnwilander and I are tied up with WWDC this week. Our read of the comment -- that we would appreciate your confirmation on -- is that there are three pieces here regarding making the API generalizable:

  1. Budget split -- whether the 12 bits could be split differently than the existing six and six;
  2. Increased budget -- whether it is possible to change the spec to increase the number of bits; and
  3. Budget representation -- if the bits can be arbitrary strings over some specified alphabet and the browser implement the capping logic.

@michaelkleber
Copy link
Author

Thanks @jasonanovak, sorry to intrude on your busy week, and happy to wait for when you're less encumbered.

I'm not sure I would split up the questions quite that way, because bits of entropy aren't simply additive — after all, the conversion itself is an example where the browser is happy to report zero bits of entropy from the ad-campaign-id and infinity bits from the ad-attribution-data. For a less trivial example, if the browser were willing to report only that the conversion happened (zero additional bits from ad-attribution-data), then how many bits from the ad-campaign-id would be appropriate to pair it with? I haven't figured out a reasonable answer to that question.

So I think my pieces are more like:

  1. Budget choices differ by browser — whether we can write the spec so that different browsers can make choices other than the 6+6 that Safari is using today;
  2. Budget representation — if the bits can be arbitrary strings over some specified alphabet (or an alternative) so that developers can write markup that works across browsers implementing different capping logic.

@jasonanovak
Copy link

Can you clarify this:

I'm not sure I would split up the questions quite that way, because bits of entropy aren't simply additive — after all, the conversion itself is an example where the browser is happy to report zero bits of entropy from the ad-campaign-id and infinity bits from the ad-attribution-data.

@johnwilander and I read the "infinite bits" piece differently. Are you referring to the unlimited data that can be sent in pixel request URLs or expanding the number of bits beyond 12?

@michaelkleber
Copy link
Author

I meant that you could get unlimited data from just one side of the click via a pixel request URL, not using the API at all.

@jasonanovak
Copy link

Thanks. Our thinking is that in the future with the JS API, there wouldn't be unlimited data from the pixel request URL.

@michaelkleber
Copy link
Author

I look forward to hearing about your JS API thoughts! But apologies for that digression into "6+6"-vs-"0+∞" bits. I still don't know your feelings on the key questions:

  1. Budget choices differ by browser — whether we can write the spec so that different browsers can make choices other than the 6+6 that Safari is using today;
  2. Budget representation — if the bits can be arbitrary strings over some specified alphabet (or an alternative) so that developers can write markup that works across browsers implementing different capping logic.

@othermaciej
Copy link

It’s harder for us to answer in your reframing because we have different feelings about changing the split and increasing the budget.

Flexing from 6+6 to 4+8 seems in the spirit of the proposal, even if under publisher or advertiser control (rather than browser control). It’s not really clear why budget split would be useful as a browser choice, rather than publisher or advertiser.

Increasing the total to 32 would make the proposal no longer privacy preserving, since that’s enough for a globally unique user ID. Maybe lower totals could make the name still accurate, and we are not necessarily stuck on 12 or on a single fixed value, based on evidence.

I think your example of 0+infinity is not on point. With a shot in the dark tracking pixel, the publisher has no way to know the same user saw an ad for the advertiser on their site (at least under ITP, which generally prevents making this association using cookies). So such a mechanism can’t be used for attribution or for cross-site tracking. It’s associating some bits of information with the publisher-advertiser pair and knowledge that some user interacted with both that provides attribution as well as tracking risk.

@jasonanovak
Copy link

Budget Split

Building on what @othermaciej said, maybe there’s a way to do a split like 4+4+4 where the middle 4 bits in a 4+4+4 split can be used by either website and could be negotiated by the two first parties as part of their business agreements. We cannot let the click source signal to the click destination how many bits it has used since that in itself is a carrier of entropy. So a dynamic budget split will require the click destination to signal the conversion ID as a 4+4 value where only the first 4 bits are guaranteed to be used.

Increased Budget

Key to the “Privacy Preserving” nature of “Privacy Preserving Ad Click Attribution” is the limitation on the number of bits stored and sent to prevent cross-site tracking. The existing budget of 12 bits allows for the unique identification of 4,096 individuals if the ad click source and the ad click destination can work out a scheme to tie both 6-bit parts to an individual user. As an alternative to increasing the 12 bit budget, we could also add a new field for additional bits:

  • The extra bits would have to be explicitly optional for the user-agent to implement, even in its developer-facing name. Making those bits explicitly optional ensures that browsers which don't support them are not only compliant but also perceived as compliant.
  • Because it is explicitly optional, it would have to be something like a third click source attribute, e.g.
    <a adDestination=”https://shop.example” adCampaignID=”31” optionalEventID=”11”>
  • The total number of optional bits would have to be capped so that PPACA is still privacy preserving.

Budget Representation

Budget representation, or the way the bits are expressed (arbitrary string, hex, decimal …), becomes less of an issue if we split out the optionalEventID attribute. Also, a standalone, optionalEventID relieves browsers of truncation logic.

@johnwilander
Copy link
Collaborator

Just to explain our draft process here, what Jason posted above reflects our joint view. Both of us are editors so we try to coordinate when responding to concrete change proposals such as this.

In short:

  • A split other than 6+6 bit should be fine for a few of the bits.
  • Additional bits for browsers who want to be less strict will have to keep the total significantly below 32 bits. Otherwise "privacy preserving" doesn't mean anything.
  • Additional bits on the click side should be separated from the campaign ID and named explicitly so that it's clear to developers that they are optional.
  • With separate additional bits, we don't think we need a more complex representation than the current decimal value.

@michaelkleber
Copy link
Author

I fully agree with any extra bits being clearly marked as optional; @jasonanovak your "not only compliant but also perceived as compliant" point is well said.

I still would hope to find something more tunable that a second optionalEventID attribute, though. It seems like that would expand the number of implementation options from 1 to 2, while my aspiration was for a way to let each browser make their own choice, change over time, etc. I'm trying to understand why this bugs me more than it bugs you.

I feel like @johnwilander your requestStorageAccess() API comes from this same philosophical direction: in the Algorithm for requestStorageAccess() you included step "9. Check any additional rules that the browser has." This reflects the fact that browsers are figuring out heuristics for granting access to information, an ongoing learning process. Why do you feel differently here?

@johnwilander
Copy link
Collaborator

I fully agree with any extra bits being clearly marked as optional; @jasonanovak your "not only compliant but also perceived as compliant" point is well said.

I still would hope to find something more tunable that a second optionalEventID attribute, though. It seems like that would expand the number of implementation options from 1 to 2, while my aspiration was for a way to let each browser make their own choice, change over time, etc. I'm trying to understand why this bugs me more than it bugs you.

I feel like @johnwilander your requestStorageAccess() API comes from this same philosophical direction: in the Algorithm for requestStorageAccess() you included step "9. Check any additional rules that the browser has." This reflects the fact that browsers are figuring out heuristics for granting access to information, an ongoing learning process. Why do you feel differently here?

I think the two are different in that requestStorageAccess() is a binary thing with an immediate result and PPACA is more of a speculative thing (will the user convert?) and involves a delay between click and conversion as well as between conversion and reporting. What I'm trying to say is that the feedback loop of "nope, this browser didn't accept that" is much easier for developers to handle in the case of requestStorageAccess().

Also, in the algorithm case of requestStorageAccess(), the wiggle room is for further restrictions, not relaxations which is the case here.

Are you thinking of an adCampaignID format like "33optionalEvent11" where the part to the right of 33 is optional for browsers to support?

@michael-oneill
Copy link

michael-oneill commented Jun 11, 2019

I agree the whole point is for there to be verifiably no user tracking possible, which means the entropy of any ad/campaign identifier should be kept a low as possible. Even a combined 32 bits is enough to single-out from most of the planet's population.

I suggested in the Web Advertising BG issue 19 w3c/web-advertising#19
a declaration of an ad identifying dictionary (aka an "alphabet") on the originator web site.

The "alphabet" could be quite small as it would only need to be current for some shortish time. If there were 1024 ad/campaign ids in use for a particular publisher (or advertiser) then the report would only need 10 bits and so on.
So the dictionary would be like:


0: PG-Daz-233346/Campaign 12366

1: PG-Daz-233346/Campaign 45776

2:       .

3:       .

         .

1023: PG-Gilette-A224/Campaign 4888AM

Each ad/campaign identifier can be an arbitrarily long string, but there are a finite number in the dictionary.

The dictionary would always be accessible say as JSON at a .well-known location e.g. in Origin Policy

@michaelkleber
Copy link
Author

michaelkleber commented Jun 11, 2019

@johnwilander If we remain at only two spec'd implementation options (with or without one blob of optionalEventID) then separating it out into its own parameter is fine. That doesn't have a particularly nice path to greater flexibility (<a adCampaignID="54" optionTwoMoreBits="3" optionalNextTwoBitsAfterThose="2" optionalFifthAndSixthBit="1">), but if that isn't a need that resonates with other folks here, I'll let it go.

@johnwilander
Copy link
Collaborator

@johnwilander If we remain at only two spec'd implementation options (with or without one blob of optionalEventID) then separating it out into its own parameter is fine. That doesn't have a particularly nice path to greater flexibility (<a adCampaignID="54" optionTwoMoreBits="3" optionalNextTwoBitsAfterThose="2" optionalFifthAndSixthBit="1">), but if that isn't a need that resonates with other folks here, I'll let it go.

I see. One way would be to give the third attribute in a more general name so that it can cover all the optional cases. Then it would have to support some more intricate syntax of course, for instance:
<a adDestination=”https://shop.example” adCampaignID=”31” optionalAdData=”EventID11”>
… which becomes a case of var args. 😀

However, given the constraint of significantly less than 32 bits of total entropy, can we really put much more than an event ID in there? Are you thinking of instructions to the browser that will not be part of the attribution report and thus not count toward the entropy?

@michaelkleber
Copy link
Author

Looking further into the future, @csharrison and I are definitely interested in expanding the amount of metadata that can be associated with a click or conversion event to allow aggregate reporting; this only got a brief mention in https://github.com/csharrison/conversion-measurement-api#browser-control-of-information but seems to have a lot of potential.

But I don't think we should try to design for that here.

@johnwilander
Copy link
Collaborator

The more complex things we want to support, the harder it's going to be to squeeze it into element attributes.

The anchor tag is a regular HTML element which can have an ID. Maybe we should stick with what's already in the draft and consider a version 2 with a JavaScript API that can push complex data associated with the ad based on its element ID? That would allow you to "talk to the browser" in a much broader way.

@ehsan
Copy link

ehsan commented Jun 14, 2019

With an extra attribute like optionalEventID or optionalAdData, I wonder if folks have any ideas on how pages would detect if the browser has accepted the optional entropy bits and is going to submit them with the conversion events or not. Hopefully obtaining this knowledge wouldn't be something that is going to require sniffing the user agent...

@johnwilander
Copy link
Collaborator

I did give this a quick thought since support for attributes isn’t feature detectable afaik. Nothing’s going to break through, the data will just be ignored in browsers that don’t support it.

But this situation makes me lean toward a programmatic/JavaScript way of expanding with optional data over extra attributes. That way the markup is supported if the feature is supported at all and the rest can be feature detectable.

@jasonanovak
Copy link

@johnwilander and I talked and it seems like there's agreement on Budget Split whereby the 12 bits should be redistributed as 4/4/4 where the middle four bits are negotiated by the two first parties as part of their offline contractual arrangements. We'll work on a PR to clarify that.

Regarding Increased Budget, it seems like there are still opens on:

  • How those bits are represented and conveyed to the browser.
  • The number of bits.

The representation of those bits seems like it has two paths: (1) the optionalAdEventID parameter that would be an easy add for this iteration of the API; and (2) a richer set of metadata that can be added via Javascript. We would propose focusing on (1) for now understanding that whatever entropy added by the optionalAdEventID would need to be traded on by whatever entropy is added by a Javascript API. (And because the event should be scoped to Privacy Preserving Ad Attribution, we’ve renamed it to optionalAdEventID in this comment.

In terms of the number of bits, we think that this has to be low, as currently only 4096 browsers can be uniquely identified by Privacy Preserving Ad Click Attribution. Adding just six more bits would increase that number 64 fold to 262,144 browsers that can be uniquely identified, so an optional six bits seems like a cap of entropy we would want to add to the spec for browsers that are willing to take the risk on that high of an entropy (Safari will not).

@michaelkleber
Copy link
Author

Thanks for the discussion; this has helped me understand your underlying philosophical positions. My mental model still feels a little shaky when I read "currently only 4096 browsers can be uniquely identified", though: isn't that how many browsers could be uniquely named in the 12 bits of a conversion report if the publisher and advertiser already had a shared ID for the user? (But in that case they wouldn't use this API at all.)

Regarding bit counts, I would love to hear thoughts from potential API users, but doesn't look like any of them are chiming in on this issue.

@johnwilander
Copy link
Collaborator

I'd like to resolve this issue since we seem to have agreement on a change. Would a PR covering:

  • A 4+4+4 bit entropy budget, and
  • An additional optionalAdEventID attribute supporting 6 more bits on the click source side for browsers that want it

… be good enough to close this?

@michaelkleber
Copy link
Author

Sounds good to me, thanks.

@johnwilander
Copy link
Collaborator

The specifics of how to layer the extra data on top of PCM will be discussed in #26.

@johnwilander
Copy link
Collaborator

#28 goes into detail on how to split the entropy budget and how to layer the two similar proposals. I think that's where this discussion should continue. Thanks for all the feedback above. It has all gone into where we're taking this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants