Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary/opaque nature of "organizations" as the arbiter of first-party sets #14

Closed
pbannist opened this issue May 19, 2020 · 8 comments

Comments

@pbannist
Copy link

pbannist commented May 19, 2020

First-party sets, as proposed, states that a desired outcome is “to provide the user’s desired functionality on the site they are interacting with.” However, first-party sets implicitly gives large organizations cross-domain capabilities to track and target users across their owned domains. Additionally, the proposal puts forth that first-party sets should be set by “organizations” and that “otherwise unrelated sites forming a consortium … would be considered abuse”. We can all agree that this proposal should not bias towards large organizations at the expense of small ones.

An “organization” such as a company, government entity, or otherwise is a relatively arbitrary and opaque construct. Users rarely have any concept of what an organization is, how it is managed, and who the members are. A UI element is a way to address this, but it does not address the underlying bias caused by using “organization” as the only construct that first-party sets can be managed by.

As a theoretical example, Berkshire Hathaway, a major conglomerate, owns many companies including GEICO (insurance), Duracell (batteries) and Dairy Queen (ice cream). It could create a first-party set that allows them to track users across geico.com, duracell.com, and dairyqueen.com. There are many other examples of companies that own highly disparate brands (domains) that users would not understand are sharing data. It is also not necessarily the case that all subdivisions of an organization would have the same privacy practices with how data is accessed/stored/shared.

The proposal states that “a collection of completely unrelated sites” would be “clearly unacceptable.” However, upon closer analysis, many “valid” first-party sets (such as Berkshire Hathaway) would appear to be completely unrelated to the user. So the relationship of sites, or potential user intuition of what sites are related, is not a valid consideration for what constitutes an acceptable first-party set. It follows that if any construct similar to first-party sets is to be adopted, users are unlikely to be able to intuitively understand what domains are grouped together except via UX elements.

I believe that the First-Party Sets proposal should be modified to:

  1. Not be based on “organizations” as the controllers and arbiters of those sets
  2. Solely rely on UX signals and unbiased third-party validation as the arbiters of the sets
@jwrosewell
Copy link

The Competition and Market Authority (CMA) interim study into online advertising highlights the problem.

73481009-dc634180-4392-11ea-810d-370c3de094f0

Google is the platform with the largest dataset collected from its leading consumer-facing services such as YouTube, Google Maps, Gmail, Android, Google Chrome and from partner sites using Google pixel tags, analytical and advertising services. A Google internal document recognises this advantage saying that ‘Google has more data, of more types, from more sources than anyone else’.”

The report is due to be published in early July 2020 and should inform these proposals.

@burk504
Copy link

burk504 commented Jun 1, 2020

But Chrome leaders have made it very clear that they believe there is no such thing as an 'un-biased third party'.

@mowexler
Copy link

mowexler commented Jun 9, 2020

While I'm ok with a UX showing what entities or sites are in a set upon visiting any member of the set, it's unclear what 3rd party would or should be "judging" the veracity of any grouping. I also don't understand what UX signal you would prefer, other than the obvious one of having the same domain, which of course obviates the need for sets. I will suggest that if an org owns a domain name, then it has access to the data created in and for it. I also suggest that our UX be a clear list of the names in the set (domains and the "readable" name) to reveal that your data may be shared with the "following list of other sites".

There are many brands that we are surprised to discover are related. Braun shavers? Owned by P&G. Tumi suitcases? Samsonite. Converse Hi-tops? Nike. Priceline, along with Kayak, OpenTable, CheapFlights, and momondo are all part of Booking.com, and the Expedia Group controls ebookers, Expedia, HomeAway, Hotels.com, Hotwire, CheapTickets, Orbitz, Travelocity, and trivago. While subtle and not overt, these are all documented in Wikipedia. And some are sort of hidden, but not very much: Toyota and Lexus, or Mini and BMW. In each case, the parent organization can provide value and simplicity to a user by recognizing that the user is the same across these entities. The brands, however, have to balance that benefit with the trust the brand has created, and the damage that could come from sharing beyond expectations.

So, the entity gets the right to assign any domains it owns to an org; we do have a clear indicator of org-relationship from that pov and no "judgement" is required. But by surfacing the connection, we give the user the option to no longer interact with them, should they choose, instead of being surprised to discover it later on. And if a brand doesn't wish to disclose their ownership of a domain, then they do not participate in the sets feature.

And yes, in the example of Berkshire Hathaway as an org, Forest River RVs could hypothetically share a cookiespace with Justin Brands, maker of Chippewa Boots, Ben Bridge Jewelers, Pampered Chef, Dairy Queen, and Oriental Trading Company, as an org. (I leave out Geico, as in the US, information for insurance is more regulated, but we can squint and add them here). This would be disclosed in the browser via UX upon visiting any of these sites. Of course, some of these companies have data policies which prevent such sharing, and others risk losing customer trust if they choose to participate in the space. But if the companies have chosen to do business in a way that encourages such sharing, and can deliver value to a consumer by doing so, we should both a) allow it with first party sets and b) make it clear that it's happening via UX, so consumers can choose to take their money elsewhere if they wish. As Mr. Buffet has discovered, in fact there are competitors for every one of the brands I mention, any of which can choose to handle information differently and provide a different approach.

@othermaciej
Copy link

I agree with the point of concern about large orgs. But how would unbiased third-party validation work in practice? Who are these third parties? How do they affect which first party sets a browser would accept? What kind of UX signal could conceivably put the user on notice that they are being tracked by something like the Berkshire Hathaway conglomerate, or even domains from totally separate organizations? I don't think a credible example has been given.

I thumbed the original post in both directions, because I think it highlights a real problem, but it does not present a workable solution.

@jdcauley
Copy link

It seems to me that there is an outlet to address the complications around First Party Sets, with a revaluation of changes in the Storage Access API.

If, the storage access API provided a mechanism where a user could opt-in to universal allowance or even just bulk allowance of multiple of domains for a third party, rather than needing to allow that third party to approve each individual site you could address any number of use cases.

For large organizations needing cross domain auth, a single auth event and browser consent would allow them to navigate freely.

For CMP and general privacy management, again a single point of management becomes readily viable.

I'm not going to pretend there isn't some opportunity for abuse, but this could, I believe be managed, through some additional elements of policy and UX design such as requiring a a reauth or re-allow action anytime a new domain is added to a predefined list.

IE First Party Sets could list the domains that the third party storage is allow to operate on against a single user interaction, but adding new domains to the First Party Set would require users to re-approve the change.

@dmarti
Copy link

dmarti commented Mar 12, 2021

@othermaciej raises some good points about factors that would need to be addressed for validating First Party Sets.

First Party Sets should only include sites that have the same policies on user privacy and data stewardship, and obviously share a common user-visible brand.

Common ownership is not really relevant, though. The question of corporate ownership is largely unrelated to the question of which sites the user expects to share data. Some sites have common ownership without clear common branding, and many users are not aware of which companies own which other companies.

It is unrealistic to expect browser developers to parse corporate ownership structures to see what constitutes common ownership.

  • At what stage in a complex corporate acquisition process are the two firms allowed to form a First Party Set?

  • Can Company A and its subsidiary Company B be in a First Party Set together, even if Company C owns enough convertible debt issued by B to take control of B any time they want?

Deciding which domains can be part of a First Party Set with each other would turn web standards development into a debate about mergers and acquisitions and corporate governance.

A more realistic standard would include

  • Common, crawlable policies on user data handling

  • Common user-visible design elements

Any resources required for common First Party Set membership could be checked by looking for matching content under .well-known.

  • Privacy policy

  • Data stewardship policy

  • Common branding resources and guidelines (this might include a common logo and a size at which it must be visible on all sites in the group.)

The simple way to match this content is to make sure that identical resources are present for privacy policy and other documents. In addition, all sites in the group should have identical content for other relevant policy-related items. For example, .well-known/gpc.json should be identical for all sites in the group.

It should be possible to check most of this automatically, and for sites to apply for a third-party evaluation and a signature. Existing services that check ads.txt and other resources on a site could be extended to check that first-party sets are valid.

@jwrosewell
Copy link

@dmarti makes excellent points about the requirements for internet domains to be considered part of the same set. Even if these requirements were adopted there would remain further areas to resolve concerning the difference between brand and data processing entity, and factors other than privacy such as administrative overhead, performance and governance.

I’m now firmly of the opinion that the assumption made about data processing entity and internet domains at the W3C and by many browser vendors will need to be revised before first party sets, and other "Privacy Sandbox" related proposals, can be moved forward legitimately at the W3C. These assumptions are.

  1. That there is a one-to-one relationship between an internet domain and a data processing entity. Consider etsy.com which supports multiple data processing entities.

  2. That an internet domain providing resources and services to an internet domain that is displayed in the address bar cannot be trusted and is assumed to be a bad actor. More work is needed to determine the conditions under which it can and can not be trusted to the same extend as the internet domain displayed in the address bar. Without addressing this issue small organisations that must band together to compete with larger organisations will be disadvantaged.

Perhaps it is time to revisit the following issues, and others like them, before attempting to move forward with this proposal?

Supply chains can be trusted - expand document to consider this possibility
Bad (or good) behaviour by first parties

As TAG chairs saw fit to ban me from participating in TAG discussions after they were unable to justify their assumptions or provide any evidence to support them others will need to take action if they wish to pursue this.

Note: I would caution when dealing with TAG to be mindful of the Code of Ethics and Professional Conduct (CEPC). TAG chairs are requesting that repeating the same question, irrespective of the quality of the answer provided, should constitute a violation of the CEPC. They are assuming such commenters are acting in bad faith. This is disappointing and leads to discrimination and a lack of robust debate.

dmarti added a commit to dmarti/first-party-sets that referenced this issue Aug 23, 2021
 * Remove reference to Do Not Track

 * Add a source and definition of "controller"

 * Remove language on ownership, replace with more consistent mentions of "controller"

 * Mention that common branding should apply to users of assistive technologies

Ownership verification is complex, does not add enforceable protections for users beyond the common controller requirement, and is likely to create costs and risks for some sites that would make it hard to use this feature.

Refs: WICG#14 WICG#18 WICG#20 WICG#49 WICG#55
@johannhof
Copy link
Member

I think this discussion has largely run its course and the proposal has changed a lot since. Some of the concerns voiced here have been addressed, I believe, but there were many points made so I don't think this issue in itself is actionable. Closing now, feel free to file new issues with specific pieces of concern or feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants