Skip to content

rchild-okta/itp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Safari Intelligent Tracking Prevention

Statistics collection

Subframes and redirects

When a host loads a cross-origin iframe, the iframe host is recorded as a subframe under that host. When a host redirects to another host, this redirect relationship is also recorded by the WebResourceLoadStatisticsStore.

This statistics collection is not affected by any combination of:

  • Original subframe location - a subframe can start on the same origin and redirect to a "tracker", or it can start on the "tracker" and redirect back to the same origin.
  • Nested iframes - a subframe on the same origin can load a nested iframe that redirects between hosts.
  • Iframe sandboxing - a subframe can have the sandbox="allow-scripts" attribute, but it will have no effect on stats collection.

Test References:

Hardcoded SSO exceptions

Dow Jones

⚠️ Removed in #188756 - Remove experimental affiliated domain code now that StorageAccess API is available.

Statistics are not logged when navigating between pages on the same domain. There is also some hardcoded logic for ignoring navigations between associated domains - this stemmed from #174661 May get frequently logged out of wsj.com and is a hardcoded list of domain associations that are exempt from statistics collection.

The original commit associates dowjones.com, wsj.com, barrons.com, marketwatch.com, and wsjplus.com, which form a classic SSO scenario - WJS, Barrons, and the others have a sign-in link that redirects to a shared sso.accounts.dowjones.com account chooser.

The code has since moved to ResourceLoadStatistics.cpp, and Dow Jones is still the only exception in the list.

Adobe

⚠️ Removed in #188710 - Remove Adobe SSO exception now that StorageAccess API is available.

Another more general approach to whitelisting SSO scenarios is from #174533 Can no longer log in on abc.go.com, where abc.go.com and others used sp.auth.adobe.com as their IdP and it became marked as prevalent. In this approach, there are no associated domains - a host is whitelisted from any statistics collection and cannot be marked as prevalent.

More detail can be found in the commit. Adobe is still the only host in the whitelist. Note - this whitelist is tied to the needsSiteSpecificQuirks setting, which means that it can be disabled.

Classification

Prevalence

A prevalent resource is a domain that has demonstrated the ability to track users cross-site. It's another name for tracker, with the extra nuance that it's unknown what the domain does with its ability to track (#182664).

When statistics for a resource have changed, they're run through a classifier that categorizes the resource as prevalent or not-prevalent. The prevalence level, combined with user interaction signals, determine what actions to take against the site.

There are two classifiers in the WebKit codebase:

  • Vector Threshold - The older, simpler model that is used when Core Prediction is not enabled.
  • Core Prediction - The newer, machine learning based model.

Vector Threshold

Vector Threshold is the classifier that's used when Core Prediction is not enabled - notably, in all ResourceLoadStatistics tests. It uses raw counts and a simple vector length algorithm to classify prevalence into three buckets - Low (not-prevalent), High (prevalent), and Very High (very prevalent).

a = subresourceUnderTopFrameOriginsCount
b = subresourceUniqueRedirectsToCount
c = subframeUnderTopFrameOriginsCount
d = topFrameUniqueRedirectsToCount

vectorLength(x, y, z) = sqrt(x^2 + y^2 + z^2)

if vectorLength(a, b, c) > 30
  # Prevalence is Very High
else if currentPrevalence == High
  || a > 3
  || b > 3
  || c > 3
  || d > 3
  || vectorLength(a, b, c) > 3
  # Prevalence is High
else
  # Prevalence is Low

The important levels are Low (not-prevalent) and High/Very High (prevalent). The distinction for Very High is only used to report back to Apple via telemetry (#183218). Some examples of statistics that would classify a resource as prevalent:

  • A domain is redirected to by more than 3 other domains
  • A domain is loaded in a subframe by more than 3 other domains

Source References

Test References

Core Prediction

Core Prediction is an SVM Classifier that calculates prevalence using a machine learning model trained on three features:

  • subresourceUnderTopFrameOriginCounts
  • subresourceUniqueRedirectsToCount
  • subframeUnderTopFrameOriginsCount

Aside from the underlying algorithm, there are some other key differences from Vector Threshold:

  • It does not have the granularity of Vector Threshold's Low, High, and Very High. Core Prediction only returns true if prevalent, false if not.
  • It does not make use of top frame redirects. This is either a missing feature, or this means that Vector Threshold will always be used in some code paths to be able to catch first-party bounce trackers.

Source References

User interaction

User interaction on a resource is used to determine how to handle sites that are marked as prevalent. It is categorized into three buckets: no user interaction, recent user interaction, and non-recent user interaction.

Even if a user interaction event is triggered in a subframe, it is always recorded as an interaction on the top domain. From #174120:

No matter where on a webpage a user interacts, the interaction should always be recorded for the top document. Users have no way of understanding what a cross-origin iframe is and they have no visual way to tell content from different origins apart.

To allow users to maintain sessions in embedded third-party services they rarely use as first party, Apple introduced the Storage Access API. From the User Prompt for the Storage Access API section in ITP 2.0:

Successful use of the Storage Access API now counts as user interaction with the third-party and refreshes the 30 days of use before ITP purges the third-party’s website data. By successful use we mean the user was either prompted right now and chose “Allow,” or had previously chosen “Allow.” The fact that successful Storage Access API calls now count as user interaction allows users to stay logged into services they rarely use as first party but keep using as embedded third parties.

User interaction is reported immediately to the UI process, which means that websites can immediately get out of a blocked state instead of waiting for the prevalence digest cycle to run.

Any handled event is a candidate for user interaction, where a handled event is an event that Safari recognizes and internally delegates. An example of an unhandled event that will not be classified as user interaction is an unfinished key chord sequence, i.e. pressing meta with no following keypress.

Source References

Test References

Bounce trackers and redirect collusion

ITP provides protection against bounce trackers, and can detect tracker collusion.

From Intelligent Tracking Prevention 2.0:

ITP 2.0 has the ability to detect when a domain is solely used as a “first party bounce tracker,” meaning that it is never used as a third party content provider but tracks the user purely through navigational redirects.

This translates to marking prevalent any resource that directly or indirectly redirects to a prevalent resource, where redirects are defined as having a 3xx status code. This applies to both top frame and subframe redirects, and is retroactive - not only are future redirects to this resource classified as prevalent, but also all resources that have redirected to this resource in the past.

Direct redirects

In the following example, A is prevalent and B is not prevalent. After B redirects to A, it's marked as prevalent.

# A is prevalent. B is not prevalent.
B -> A
# A,B are prevalent

Tracker collusion

Redirects to multiple resources that end in a prevalent resource is described as tracker collusion, with the graph of redirect nodes described as a collusion graph. All resources in the collusion graph will be marked as prevalent through recursion - the closest resource will be marked as prevalent, the resource that redirected to it will be marked as prevalent, and so on up to a max limit of 50 cycles.

In the following example, A is prevalent, and B, C, D are not prevalent. In the first pass, B is marked as prevalent. In the next pass, C is marked as prevalent. And in the final pass, D is marked as prevalent, which completes the collusion graph.

# A is prevalent. B,C,D are not prevalent.
D -> C -> B -> A
# A,B,C,D are prevalent

Source References

Test References

Effects of classification

Access to cookies in a third-party context

Definitions

  • First-party cookie - A cookie that is set and read on the primary domain
  • Third-party context - Cross-origin subframe
  • Third-party cookie - First-party cookies that are accessed in a third-party context

When a cross-origin iframe is loaded for a resource that's classified as prevalent, access to third-party cookies is blocked.

This had been the intention before ITP 2.0, but previous versions had more leeway around enforcement:

  • There was a 1-day window after user interaction where cookies could be accessed in a third-party context for a prevalent resource (see ITP 1.1 and ITP 1.0).
  • There was a separate class of partitioned cookies that gave cross-origin iframes a way to read and write cookies keyed to the subframe and top frame, which worked as long as the user had interacted with the third-party resource in the last 30 days. Partitioned cookies were recently removed in this commit.

With ITP 2.0, non-prevalent resources can read and write third-party cookies. Prevalent resources cannot, unless they request storage access via the Storage Access API.

Source References

Test References

Automatic cookie access for popups

There is a temporary compatibility fix to access first-party cookies from a third-party subframe when a popup window to the third-party is opened on the same page.

An example scenario:

  1. User clicks a social login button on rp.com
  2. Popup window is opened to idp.com, which has previously been marked prevalent
  3. User interacts with the popup to login to idp.com, which sets a first-party session cookie
  4. The popup closes and sends a postMessage to window.opener, rp.com
  5. The main page opens a cross-origin iframe to idp.com. Because of the compatibility fix, this subframe has access to the first-party session cookie that was previously set.

The popup origin must have had recent user interaction to grant access in the third-party subframe. If there is no recent user interaction, the first-party cookie is not accessible.

From the Automatic storage access for popups section in ITP 2.0:

Many federated logins send authentication tokens in URLs or through the postMessage API, both of which work fine under ITP 2.0. However, some federated logins use popups and then rely on third-party cookie access once the user is back on the opener page. Some instances of this latter category stopped working under ITP 2.0 since domains with tracking abilities are permanently partitioned.

Our temporary compatibility fix is to detect such popup scenarios and automatically forward storage access for the third party under the opener page. Since popups require user interaction, the third party could just as well had called the Storage Access API instead.

Developer Advice: If you provide federated login services, we encourage you to first call the Storage Access API to get cookie access and only do a popup to log the user in or acquire specific consent. The Storage Access API provides a better user experience without new windows and navigations. We’d also like to stress that the compatibility fix for popups is a temporary one. Longterm, your only option will be to call the Storage Access API.

Source Resources

Test Resources

Purging cookies

If a resource is prevalent, it is subject to having its first-party cookies purged depending on when the user has last interacted with the domain. If the user has never interacted with the prevalent domain, or has not interacted within the TimeToLiveUserInteraction (last 30 days), first-party cookies will be deleted.

# Setup: idp.com is prevalent

1. User visits idp.com, interacts with login form, new SID cookie.
2. After 15 days, user redirects to idp.com, no user interaction. SID still exists.

# -- after 30 days with no user interaction, SID cookie is purged --

3. User redirects to idp.com after 30 days, SID is not available.

When cookies are purged, they're deleted and can no longer be accessed even if the resource later becomes non-prevalent. Non-prevalent resources never get their first-party cookies purged, regardless of user interaction.

Source References

Test References

First-party cookies and redirects

Access to first-party cookies in top level navigation redirects is never blocked. This is good for SSO redirects which rely on an existing session in the IdP - even if the domain is prevalent, the existing first-party session cookie will always be available.

However, it is counterintuitive - once the commit to not block first party cookies on redirects was merged, protection against bounce trackers became much more limited. Bounce trackers will be able to set and read cookies on redirects (which is what they want to do), and are only blocked when reading cookies from a third-party context.

Source References

Test References

Stripping Referer Header

When making CORS requests or redirecting to a prevalent resource, the Referer header is restricted - only the origin is sent. This limits the user data that can be collected by a domain that's known to be able to track users.

Example:

# Referer header that's sent to a non-prevalent resource
http://some-store.com/some-category/item123

# Referer header that's sent to a prevalent resource
http://some-store.com/

Source References

Test References

Storage access

When a cross-origin iframe is loaded from a prevalent domain, access to first-party cookies is blocked by default. To support embedded third-party content that use first-party cookies for authentication, Apple introduced the Storage Access API. This API provides a way for cross-origin iframes to request access to their first-party cookies when processing a user gesture such as a tap or a click.

The problem statement, Apple's proposed solution, and community feedback can be found in their WHATWG Proposal.

hasStorageAccess

The document.hasStorageAccess() method can be called at any time to check whether access is already granted, and does not require a user gesture. If it is true, the caller can read and write first-party cookies in a third-party context.

Example

const hasAccess = await document.hasStorageAccess();
if (hasAccess) {
  // Can access first-party cookies
} else {
  // Cannot access first-party cookies
}

Before a call to requestStorageAccess(), the response from hasStorageAccess() depends only on the prevalence of the domain, and not whether there has been previous user interaction:

Prevalence Has had user interaction? hasStorageAccess default response
prevalent yes false
prevalent no false
not-prevalent yes true
not-prevalent no true

Test Resources

requestStorageAccess

The document.requestStorageAccess() method returns a promise that is resolved if storage access was granted and is rejected if access was denied. It must be called within the context of a user gesture, like a tap or a click - if it is not, it will immediately reject before prompting the user for consent.

Example

<script>
function makeRequestWithUserGesture() {
  document.requestStorageAccess().then(
    () => {
      // Storage access is granted:
      // - Access to first-party cookies is available
      // - Subsequent calls to hasStorageAccess() will return true
    },
    () => {
      // Storage access is denied
    }
  );
}
</script>
<button onclick="makeRequestWithUserGesture()"></button>

The algorithm for requestStorageAccess is laid out in full in the Storage Access API WHATWG Proposal:

#1: If the document already has been granted access, resolve.

If requesting access for a prevalent domain where the user has already given consent for this subframe, this will resolve.

Access must be requested for each third-party resource on the page - consent given to one subframe will not implicitly transfer to another subframe, even if that subframe is on the same domain. However, if the user allows access, their choice is persisted for that domain and the next requestStorageAccess call will not prompt the user.

#2: If the document has a null origin, reject.

By default, a sandboxed iframe will have a unique, or null, origin. If this is the case, requestStorageAccess immediately rejects because there are no first-party cookies to return. To get around this, a sandboxed iframe must have the allow-same-origin token, which allows the content to be treated from its original, third-party origin.

#3: If the document's frame is the main frame, resolve.

Resolve immediately if requestStorageAccess is called in the top frame, which will always have access to its own first-party cookies.

#4: If the sub frame's origin is equal to the main frame's, resolve.

Resolve immediately if requestStorageAccess is called in a subframe that is on the same origin as the main frame - ITP only affects access to cookies in a cross-origin context.

For ITP, Safari has a looser definition of "origin" than described in the same-origin policy. It is defined as eTLD+1, where the effective top level domain "eTLD" is .com, .co.uk, etc. This means that resources that share the same eTLD+1 are not affected by ITP, and will be able to access cookies even if the domain is marked as prevalent.

// Subdomains are not affected by ITP 2.0
etld_1('login.social.com')   == 'social.com'; // true
etld_1('email.social.com')   == 'social.com'; // true
etld_1('foo.bar.social.com') == 'social.com'; // true

#5: If the sub frame is not sandboxed, skip to step 7.

#6: If the sub frame doesn't have the token "allow-storage-access-by-user-activation", reject.

The default for sandboxed iframes is to not have access to the new storage functions. Apple introduced a new token allow-storage-access-by-user-activation to opt-in to these storage functions, which means that a sandboxed iframe must have at least these three tokens:

  • allow-scripts - allow scripts to run, which is a prerequisite to calling the storage methods
  • allow-storage-access-by-user-activation - access to the storage methods
  • allow-same-origin - use the original origin for the frame, covered in #2
<iframe
  src="https://thirdpartyorigin.com/example"
  sandbox="allow-storage-access-by-user-activation allow-same-origin allow-scripts" />

#7: If the sub frame's parent frame is not the top frame, reject.

Nested iframes that have passed the previous same-origin and main document checks are never allowed storage access, even if they have the right sandbox tokens, etc. A nested iframe is a frame whose parent frame is not the top frame - i.e. a page embeds an iframe, which embeds another nested iframe.

#8: If the browser is not processing a user gesture, reject.

A call to document.requestStorageAccess() must be made within the context of a user gesture, like a tap or a click.

<!-- This is valid -->
<script>
function makeRequestWithUserGesture() {
  document.requestStorageAccess().then(/* this is valid */);
}
</script>
<button onclick="makeRequestWithUserGesture()"></button>

<!-- This will immediately reject -->
<script>
function makeRequestWithoutUserGesture() {
  document.requestStorageAccess().then(/* this will always reject */);
}
</script>
<body onload="makeRequestWithoutUserGesture()"></body>

#9: Check any additional rules that the browser has. Examples: Whitelists, blacklists, on-device classification, user settings, anti-clickjacking heuristics, or prompting the user for explicit permission. Reject if some rule is not fulfilled.

ITP classification is used in this step.

If requesting access for a non-prevalent domain, this will resolve. This will resolve positively regardless of whether or not there has been previous user interaction on the domain.

If the domain is prevalent and there has been no previous user interaction on the domain, this will immediately reject. There's no chance for prompting the user if the user has not previously interacted with the prevalent domain in a first-party context.

If the domain is prevalent and there has been previous user interaction, Safari will show the ITP 2.0 storage consent prompt. If the user chooses to not allow the domain to track their activity, this will reject.

#10: Grant the document access to cookies and store that fact for the purposes of future calls to hasStorageAccess() and requestStorageAccess().

Source References

Test References

Lifetime of storage access

From the Access Removal section of the Storage Access API WHATWG proposal:

Storage access is granted for the life of the document and as long as the document's frame is attached to the DOM. This means:

  • Access is removed when the sub frame navigates.
  • Access is removed when the sub frame is detached from the DOM.
  • Access is removed when the top frame navigates.
  • Access is removed when the webpage goes away, such as a tab close.

Some concrete examples:

  • If a subframe has been granted storage access and the top frame changes the src of that subframe, the subframe loses storage access.
  • If there are two subframes for the same cross-origin domain on the same page and one has been granted storage access, storage access is not automatically transferred to the other frame. The other frame must also request storage access.
  • If a subframe has been granted storage access and it's moved in the DOM (i.e. by appending it to another element), it loses storage access. This is an example of detaching it from the DOM.
  • If a subframe has been granted storage access and it internally navigates to a different page (i.e. in a redirect scenario), it loses storage access.

Source Resources

Test Resources

User prompt when requesting access

From the User Prompt for the Storage Access API section in ITP 2.0:

If the user allows access, their choice is persisted. If the user declines, their choice is not persisted which allows them to change their mind if they at a later point tap a similar embedded widget that calls the Storage Access API.

The choice to allow access via the requestStorageAccess prompt is persisted for the same lifetime as that domain's cookies and website data:

  • A rolling 30 day window from the last logged user interaction on the site.
  • Or, it is reset when storage is cleared via the remove all website data feature.

Calling requestStorageAccess with this saved value does not prompt the user. However, this call must still be made within the context of a user gesture - if it is not, it will reject.

Example embedded iframe redirect flow

  1. rp.example loads an embedded iframe to a prevalent resource, idp.example/page1.
    • Because idp.example is prevalent, hasStorageAccess() is false on frame load.
  2. idp.example/page1 has two buttons:
    1. A request access button that calls requestStorageAccess().
    2. A navigate button that redirects to idp.example/page2
  3. The user clicks request access and is prompted with the storage access prompt. They grant access.
    • Because they granted access, hasStorageAccess() is now true.
  4. The user then clicks navigate, which redirects them to idp.example/page2.
    • Because there's an internal redirect, storage access is revoked. hasStorageAccess() is false.
    • Calling requestStorageAccess() directly will reject, even though the user has already granted access to this domain. It must be called within the context of a user gesture.
    • Clicking a button that calls requestStorageAccess() will resolve without prompting the user. hasStorageAccess is true.

Source Resources

Exceptions

Popups after requestStorageAccess

Both document.requestStorageAccess() and window.open depend on being called in the context of a user gesture. For cross-origin iframes that open a popup window, this presents a problem - requestStorageAccess is async, so the user gesture context wouldn't normally be set for its callback handlers. This commit provides an override to allow popups to be opened after the resolution of requestStorageAccess.

<script>
function makeRequestWithUserGesture() {
  document.requestStorageAccess().then(
    () => {
      // User gesture still intact, even after the async callback
      window.open('test-window.html', 'test window');
    }
  );
}
</script>
<button onclick="makeRequestWithUserGesture()"></button>

Source References

Test References

Miscellaneous

Resetting ITP statistics and classification

All ITP statistics and classification is stored locally on device, and can be cleared by using the remove all website data feature.

Source References

Test References

Pruning Statistics

There is a cap on the number of statistics records that are saved in the statistics store. When this limit is exceeded, the store is pruned in order of least importance:

  • Non-prevalent resources without user interaction
  • Prevalent resources without user interaction
  • Non-prevalent resources with user interaction
  • Prevalent resources with user interaction

Source References

Test References

ITP Debug mode

ITP comes with a developer debug setting that can be enabled in Develop -> Experimental Features -> ITP Debug Mode. This feature will:

  • Show log messages for ITP classifications and storage decisions
  • Set http://3rdpartytestwebkit.org to be prevalent
  • Allow developers to set their own custom domain as prevalent

Read more about working with ITP Debug mode in the Safari Technology Preview 62 release notes.

Source References

Test References

Telemetry

ITP statistics and classification are mostly on device, but there is some telemetry that is sent to Apple. This is limited to an aggregated set of metrics for things like the number of prevalent resources that had user interaction, the median number of times a top prevalent domain has been loaded as a subframe, etc.

Source References

Test References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published