Consider alternative per-caller filtering methods #114

cilvento · 2022-11-08T05:26:16Z

The Topics API aims to reduce the usefulness of topics as a fingerprinting surface by using a coarse taxonomy of ~350 human-curated topics and returning a uniformly random topic from the taxonomy with 5% probability for each new site visit. API callers are only able to receive topics that they have previously observed for the user in a recent epoch. However, this per-caller filtering can cause challenges with using the Topics API by reducing recall and introducing consistent cross-site signals that may be possible to exploit (e.g., #74).

Furthermore, per-caller filtering may make it more difficult to reason about topics taxonomy changes or expansions. There has been some feedback suggesting increasing the granularity or size of the topics taxonomy. However, if the taxonomy increases in size, recall may become worse as callers may now observe disjoint sets of highly granular topics, counteracting any utility gains from increased detail of information provided by topics.

Consider approximate per-caller filtering

The Topics API could allow API callers to receive topics within the same sub-tree as a topic that the caller has previously observed for the user. For example, a caller could be allowed to receive any topic that is a descendant of a topic it has observed for the user (which would provide more detail than the caller's prior observation), or a caller could be allowed to receive any topic that is an ancestor of a topic it has observed for the user (which provides less new information). More nuanced criteria could also be applied, e.g., allowing a caller to receive topics with least common ancestor equal to the parent of any observed topic.

This would still require callers to have had some interaction with the user on a closely-related topic in the past, but may improve recall if a user is assigned a similar, but not exactly matching, topic to the observations made by the caller.

Consider noisy per-caller filtering

An alternative approach could be to introduce noise into per-caller filtering. For example, at the end of each epoch, the browser could apply the following logic to each caller who has observed at least one topic in the previous epoch: for each topic in the taxonomy, independently sample the "observed" bit with probability proportional to the distance to the nearest observed topic in the taxonomy. The new set of “observed” topics should not be shared with the caller directly, and should only be used in filtering. This logic could be applied per site-visit (further breaking consistency of matching topics signals across sites) but this may be cumbersome for the browser to implement. This randomization method would allow callers to (probabilistically) receive topics close to topics they have previously observed.

Potential benefits of these approaches could be:

Improved recall for similar/nearby topics.
(For approximate per-caller filtering) Can use approximations that should be straightforward for callers and end-users to comprehend (e.g., subcategory).
(For noisy per-caller filtering) reduce certainty of coordinated cross-site attacks based on per-caller filtering consistency, e.g. #74. This will depend on the noise level and any limitations on the number of callers per site.
(For noisy per-caller filtering) In combination with re-randomization of the epoch topics per site (i.e., sampling topics close to the 5 topics assigned per epoch rather than consistently returning one of the same 5), this could significantly reduce the fingerprinting potential of topics, even if the taxonomy size increases.

Potential downsides of these approaches could be:

May not preserve the original goal of preventing additional cross-site information sharing beyond the scope of 3rd party cookies (depending on approximation and noise choices). It's possible that a caller could learn about a topic or category that it had not observed for the user in the past, even for a topic where the caller has no existing site integrations (particularly for noisy per-caller filtering).
May be difficult for callers and end-users to reason about or understand noise in per-caller filtering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider alternative per-caller filtering methods #114

Consider alternative per-caller filtering methods #114

cilvento commented Nov 8, 2022

Consider alternative per-caller filtering methods #114

Consider alternative per-caller filtering methods #114

Comments

cilvento commented Nov 8, 2022

Consider approximate per-caller filtering

Consider noisy per-caller filtering

Potential benefits of these approaches could be:

Potential downsides of these approaches could be: