Skip to content
This repository has been archived by the owner on Mar 16, 2023. It is now read-only.

Defining the level playing field | Google as a third party #18

Open
achimschloss opened this issue Jul 30, 2020 · 5 comments
Open

Defining the level playing field | Google as a third party #18

achimschloss opened this issue Jul 30, 2020 · 5 comments

Comments

@achimschloss
Copy link

While we had a lot of lively discussions around cohort mechanism, gatekeeper(s) or the lack of them and technical/theoretical aspects of these APIs in terms of privacy, it seems about time to start a conversation of the legal / user facing aspects of these and ultimately how the level playing filed should look like. This is a bit of a longer post to have an initial framing.

It is certainly not the most beloved topic for engineers (meaning the formalities), but with the limited time available to me there is an urgency to also prototype these aspects with the first APIs. With https://github.com/WICG/WebID we are already in rather in depth discussion for a couple of weeks around this, but I would argue that we should also start with the advertising related APIs now as we have enough information about one of them at least for one.

With FloCs:

  1. being the most simple API from a functionality perspective
  2. having prototype implementations making its way into the chromium sourcecode https://chromium-review.googlesource.com/q/FloC

it would server well to discuss these topics as the first example... I'm referring to the GDPR in the following, given we can agree I guess it's the most advanced regulation in that regard where we have a lot of policy experience in the market.

Why is the legal framing important?

The FloC mechanism works by

  1. Calculating cohorts out of the users browsing history, that means personal user data which makes it subject to the GDPR - i.e. one needs to think about the purposes and extent of that processing, the legal ground for this processing (as one can see in the chromium commits) and ultimately who is formally controlling and responsible for it.
  2. These cohorts are used to address users with interest-based advertising and are therefore shared between parties (for example with an advertiser) which have different relations and knowledge about that user (i.e there is a relation to other personal data processing in addition).

Given that

  1. The FloC Function is not a self-contained browser function like a password safe where one would argue that it's just a product feature that is used by the user on its own behalf and benefit, but it enables personal data processing beyond that.
  2. From a user's/DPAs perspective the responsible party will be the Publisher were FloC based Ads are shown and browsing history is collected, not the Browser and nor the Advertiser.
  3. Publishers will need answers to these questions. Even if FloC-IDs might theoretically be anonymous by themself, that does not change anything on this observation as it relates to the full extent of processing and leads to the display of an interest-based ads to the user

The level playing field

Unrelated to the legal framing for the processing publishers have a reasonable demand to have 100% clarity how these APIs are and will be entangled with other Google Services, with examples like the iOS 14 changes coming up and the ongoing anti-trust investigations around the globe on bundling services. For now, we do have a high level alignment to establish a level playing field, with FloCs we can really define it now.

Looking at the commits for the prototype it looks like:

  1. For now, the user control and legal grounds are bound to Google services and privacy policies. Practically its fully bound to Google Services for the PoC. Which makes sense to me given one could not even run the PoC with real users, but naturally raises concerns if it will stay that way or be removed down the road.

"Queries google to find out if user has enabled 'web and app " "activity' and 'ad personalization', and if the account type is "NOT a child account.'

  1. Secondly it seems that FloC-IDs are also synchronised to Google Backend Services, which again seems tangible for a PoC, but raises similar concerns again.

It's a service that is supposed to (as some functions are incomplete) regularly compute the floc id by sim hashing the navigation history and log it to chrome sync

Looking Forward:

Once the FloC API should be used to actually address user with personalised ads, one needs to answer these questions at least:

  • Whats is the independent legal framing to address a user based on FloC. To me it seems that there needs to be a means for a publisher to offer a user control and most probably even a consent to enable this. Within TCF that would be Purposes 3,4 (for the Publisher) most probably. Vendors would be less relevant here.
  • What are the UI components and who is operating them? TCF within the Browser, the Browser accepting a publisher consent signal, bespoke UIs?
  • If and how Google Services are de-coupled

My suggestion would be to also prototype these questions unrelated to the engineering aspects to also get publishers and advertiser more engaged and comfortable with these APIs and the general process.

@michaelkleber
Copy link
Collaborator

Hi Achim,

I'm happy to try to answer your questions. But I'll note that I'm an engineer, not a lawyer. So in my engineer sort of way, my response will be to make as clear as possible what happens: who is responsible for doing each thing, what the intended purpose is, and who learns what information.

I like breaking out the different stages in the FLoC mechanism, but let me split things up even finer than the cohort calculating-vs-using division that you mentioned.

  1. Cohort calculation begins with an on-device step where a web browser is taking data that it already has (e.g. a browsing history) and performing a sort of anonymization calculation.

    The browser's goal here is to transform some personal, potentially-sensitive data into an attribute that's common to a large group of people.

  2. Some privacy properties of the flock assignments might be hard to establish entirely on-device. For example, in the original explainer we said "The browser ensures that flocks are well distributed, so that each flock represents thousands of people." That could be done using some kind of multi-browser aggregation service in which the flock values themselves are entirely anonymous.

    That aggregation service doesn't exist yet, so as you noted, for our proof-of-concept Chrome will rely on Google services and privacy policies for this.

  3. A particular domain's server indicates that it wishes to receive flocks from the browser, by sending an Accept-CH = Sec-CH-Flock header.

    The flock itself is designed to be a piece of anonymous data, but the server is asking for it to be attached to future HTTP requests to that domain, and only the party making the request knows how it intends to use that information. So that party will need to do its own analysis of any legal requirements for collecting and using the flock in combination with other data it receives.

  4. The browser may send the flock to the server in a Sec-CH-Flock header on future requests.

    This would of course be governed by whatever permissions and UI controls the browser has for the feature.

  5. The server that requested the flock decides whether to use the value when it performs ad targeting.

    Since the flock came attached to an ad request, there is plenty of opportunity for the publisher to pass along any relevant information about permissions, controls, etc.

  6. Specifically for Chrome users who choose to sync their browsing activity to a Google account, the flock (which is a function of that browsing history) may be synced as well.

I hope this makes clear who the parties are and what roles they all play.

Regarding UI components, certainly the browser will need to include a control and information about FLoC; each browser that implements it will need to make their own decision on the details. Similarly, any consent management system will need to make some new decisions, about UI and what questions to even ask.

@darobin
Copy link

darobin commented Oct 22, 2020

Thanks for starting this discussion @asr-enid. One thing I'm not certain of is whether we disagree on part of your framing or if we are looking at different aspects.

You indicate that publishers would be the responsible party. I believe that is only true in the case in which the publisher is actually processing FLoC data — so presumably they've asked for it and the browser has accepted it. But that part does not seem particularly different from the publisher requesting specific personal data and processing it for advertising purposes. No?

However, while the browser is using publisher content and user behaviour in order to establish cohorts the publisher cannot be the responsible party. Clearly for that processing the browser would have to be the data controller. Now given how surprising the processing is (as well as novel, and potentially risky), I don't believe that any legal basis other than consent could apply here. And this consent can't be bundled into any previously obtained consent (even assuming it to be valid) anyway so the browser will have to have shown some sort of dialog to consent users into FLoC processing, at least in Europe. Is this what you had in mind when you mentioned the TCF? Because, if so, I don't think that the TCF needs to be involved for this part, no? It would be unlawful for the browser to even profile users into cohorts without specific informed consent, even if they don't share the information (and there aren't many browsers, so accountability is easier) so if the you're receiving FLoC data at all then the user has to have consented.

It would be useful to reinforce that by clearly documenting in the standard that browsers are the data controller for FLoC data and therefore assume responsibility for the lawfulness of their processing. (This can be written in a legislation-agnostic manner.) This would avoid the complex mechanics of having to assert that downstream.

For the latter part, I believe that the publisher may only be (partly) responsible if there is a way for publishers to prevent their content from being used in FLoC at all. That would be a good addition (on other grounds), in which case it will be useful to see to what extent the publisher has a responsibility to the user for this.

@achimschloss
Copy link
Author

achimschloss commented Oct 22, 2020

Thanks for following up here, I acutally had missed to dig deeper prior to starting this. To your points:

You indicate that publishers would be the responsible party. I believe that is only true in the case in which the publisher is actually processing FLoC data — so presumably they've asked for it and the browser has accepted it

Agreed in general - the publisher does not even have access to the full dataset (browsing history), but in terms of a users perception, a personalized ad would be shown at a publishers site and the publisher would at least facilitate here and also allow a FloC based ad to be shown. This might lead to a discussion around joint controllership, but that would require to know how the exact end-to-end setup looks like.

However, while the browser is using publisher content and user behaviour in order to establish cohorts the publisher cannot be the responsible party. Clearly for that processing the browser would have to be the data controller. Now given how surprising the processing is (as well as novel, and potentially risky), I don't believe that any legal basis other than consent could apply here.

Agreed, as I noted this is a function that goes way beyond to what a user would expect a browser to do so I don't see a point in arguing it is part of the general service agreement, or even simply a tool that assists a user (like a password safe). Given it is processing for personalization which is quite extensive, consent seems the applicable legal basis

Is this what you had in mind when you mentioned the TCF? Because, if so, I don't think that the TCF needs to be involved for this part, no? It would be unlawful for the browser to even profile users into cohorts without specific informed consent, even if they don't share the information (and there aren't many browsers, so accountability is easier) so if the you're receiving FLoC data at all then the user has to have consented.

Depends I guess

  • its definetly not mandatory by any means, the browser could add some sort of bespoke consent to its proprietary permission system. Which will again lead to more fragmentation of user controll. A user would for example reject consent for personalized ads via the CMP, but still get FloC based personalization. I don't think this will be explainable to an end user.
  • having that said, given Google is now part of TCF they could for example (assuming they would see themself as controller) make FloC based processing dependend on the status of Google Advertising in the current context at the publisher side - the browser could query the __tcfapi and inspect the status on each site for Google. Like they do with Google Analytics now. If there is no consent, FloCs should not be leveraged at this site or this part of the browsing history.

It would be useful to reinforce that by clearly documenting in the standard that browsers are the data controller for FLoC data and therefore assume responsibility for the lawfulness of their processing

Absolutely that was my main intention here, also to get more clarity on the relation to other Google services

@darobin
Copy link

darobin commented Oct 22, 2020

I think we're aligned on the broad lines, particularly on the necessity for the draft to include a discussion about who is responsible for which processing. Very quick notes:

  • I would be very hesitant to have the browser reach inside runtime variables. TCF technical documents tend to be underspecified compared with the level of precision expected in Web standards, I'm not sure how to specify accessing __tcfapi in a way that would be reliable at the tech level. That said, the idea that FLoC profiling would be opt-in per context certainly makes sense since I could certainly see how users might be comfortable in some contexts and not in others. Giving users any kind of real agency with respect to their data is not something you usually find in Google-designed tech, though, so I'm not sure that is something that would be acceptable.
  • Given that the data can be provided to anyone, I don't see how this could be considered part of "Google Advertising" (not that consenting to that would be specific enough to boot!).

@achimschloss
Copy link
Author

TCF technical documents tend to be underspecified compared with the level of precision expected in Web standards, I'm not sure how to specify accessing __tcfapi in a way that would be reliable at the tech level

I guess the documentation could be more explicit in a lot of regards, having that said TCF is just a generic framework in that sense and would not describe the use for an explicit use-case like FloC directly. When we talk about an explicit interpretation and use by a potential controller that would anyway lead to an in depth guidance how the controller expects the CMP to be setup - see here for Google Advertising and Google Analytics:

Publisher Guidance for Google Ad Products
Publisher Guidance for Google Analytics

The open question before even looking into this is anyway what the position w.r.t. to a potential Chrome implementation of FloC would be from Google (not a tech problem)

Given that the data can be provided to anyone, I don't see how this could be considered part of "Google Advertising"

Agreed, it is tricky given we have a bit of a unusual situation where

  1. a Cohort is calculated on personal data - with the above considerations of how this would be framed from a privacy perspective - The calculation of the cohort is a separete concern from its actual use most probably
  2. That Cohort information is leveraged for a variety of other processing purposes, in that sense it would need to be disclosed properly while establishing transparency or consent for that processing (depending on what we are talking about). Just sending it everywhere without that does not seem appropriate.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants