Skip to content
Data on third party entities and their impact on the web.
Branch: master
Clone or download
Latest commit f3a58c4 Mar 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin fix: more sensible stats logging Mar 14, 2019
data refactor: rename origins -> domains Mar 14, 2019
lib chore: prep for npm publish Mar 14, 2019
sql feat: add query for network requests by origin Mar 13, 2019
.gitignore
.npmignore chore: prep for npm publish Mar 14, 2019
README.md
by-category.png feat: add february 2019 data Mar 6, 2019
package.json
yarn.lock

README.md

Third Party Web

Data on third party entities and their impact on the web.

This document is a summary of which third party scripts are most responsible for excessive JavaScript execution on the web today.

Table of Contents

  1. Goals
  2. Methodology
  3. NPM Module
  4. Updates
  5. Data
    1. Summary
    2. How to Interpret
    3. Third Parties by Category
      1. Ads
      2. Analytics
      3. Social
      4. Video
      5. Developer Utilities
      6. Hosting Platforms
      7. Marketing
      8. Customer Success
      9. Content & Publishing
      10. Libraries
      11. Mixed / Other
    4. Third Parties by Total Impact
  6. Future Work
  7. FAQs
  8. Contributing

Goals

  1. Quantify the impact of third party scripts on the web.
  2. Identify the third party scripts on the web that have the greatest performance cost.
  3. Give developers the information they need to make informed decisions about which third parties to include on their sites.
  4. Incentivize responsible third party script behavior.
  5. Make this information accessible and useful.

Methodology

HTTP Archive is an inititiave that tracks how the web is built. Twice a month, ~4 million sites are crawled with Lighthouse on mobile. Lighthouse breaks down the total script execution time of each page and attributes the execution to a URL. Using BigQuery, this project aggregates the script execution to the origin-level and assigns each origin to the responsible entity.

NPM Module

The entity classification data is available as an NPM module.

const { getEntity } = require("third-party-web");
const entity = getEntity("https://d36mpcpuzc4ztk.cloudfront.net/js/visitor.js");
console.log(entity);
//   {
//     "name": "Freshdesk",
//     "homepage": "https://freshdesk.com/",
//     "categories": ["customer-success"],
//     "domains": ["d36mpcpuzc4ztk.cloudfront.net"]
//   }

Updates

2019-02-01 dataset

Huge props to WordAds for reducing their impact from ~2.5s to ~200ms on average! A few entities are showing considerably less data this cycle (Media Math, Crazy Egg, DoubleVerify, Bootstrap CDN). Perhaps they've added new CDNs/hostnames that we haven't identified or the basket of sites in HTTPArchive has shifted away from their usage.

Data

Summary

Across top ~1 million sites, ~800 origins account for ~65% of all script execution time with the top 100 entities already accounting for ~59%. Third party script execution is the majority chunk of the web today, and it's important to make informed choices.

How to Interpret

Each entity has a number of data points available.

  1. Usage (Total Number of Occurrences) - how many scripts from their origins were included on pages
  2. Total Impact (Total Execution Time) - how many seconds were spent executing their scripts across the web
  3. Average Impact (Average Execution Time) - on average, how many milliseconds were spent executing each script
  4. Category - what type of script is this

Third Parties by Category

This section breaks down third parties by category. The third parties in each category are ranked from first to last based on the average impact of their scripts. Perhaps the most important comparisons lie here. You always need to pick an analytics provider, but at least you can pick the most well-behaved analytics provider.

Overall Breakdown

Unsurprisingly, ads account for the largest identifiable chunk of third party script execution. Other balloons as a category primarily due to Google Tag Manager which is used to deliver scripts in multiple categories. Google Tag Manager script execution alone is responsible for more than half of the "Mixed / Other" category.

breakdown by category

Ads

These scripts are part of advertising networks, either serving or measuring.

Rank Name Usage Average Impact
1 Media Math 662 68 ms
2 Adroll 3,198 94 ms
3 Amazon Ads 22,090 94 ms
4 Scorecard Research 3,578 103 ms
5 Rubicon Project 3,905 106 ms
6 MGID 10,317 114 ms
7 Criteo 64,547 116 ms
8 Market GID 3,873 153 ms
9 Taboola 23,853 176 ms
10 WordAds 32,295 212 ms
11 Google/Doubleclick Ads 1,206,843 215 ms
12 Pubmatic 3,140 225 ms
13 Yahoo Ads 9,578 225 ms
14 AppNexus 14,694 265 ms
15 Yandex Ads 39,330 272 ms
16 Integral Ads 24,532 305 ms
17 Sizmek 4,011 374 ms
18 DoubleVerify 1,988 600 ms
19 MediaVine 9,801 706 ms
20 Moat 14,337 708 ms
21 OpenX 10,729 836 ms
22 33 Across 20,137 863 ms
23 Popads 5,009 1288 ms

Analytics

These scripts measure or track users and their actions. There's a wide range in impact here depending on what's being tracked.

Rank Name Usage Average Impact
1 Alexa 1,265 50 ms
2 Google Analytics 1,163,249 77 ms
3 Mixpanel 5,462 77 ms
4 Snowplow 2,492 77 ms
5 Baidu Analytics 7,041 78 ms
6 Crazy Egg 455 89 ms
7 Hotjar 91,036 92 ms
8 Adobe Analytics 32,173 183 ms
9 Segment 6,998 201 ms
10 Tealium 14,422 207 ms
11 Optimizely 13,482 232 ms
12 Salesforce 40,868 270 ms
13 Yandex Metrica 221,577 356 ms
14 Histats 14,706 390 ms
15 Lucky Orange 6,113 834 ms

Social

These scripts enable social features.

Rank Name Usage Average Impact
1 VK 6,342 65 ms
2 Pinterest 14,331 87 ms
3 Facebook 1,107,461 116 ms
4 Yandex Share 29,555 128 ms
5 LinkedIn 12,260 130 ms
6 Twitter 274,753 146 ms
7 ShareThis 32,318 229 ms
8 Shareaholic 13,268 236 ms
9 AddThis 170,036 245 ms
10 Tumblr 40,855 312 ms
11 Disqus 741 504 ms
12 PIXNET 54,969 605 ms

Video

These scripts enable video player and streaming functionality.

Rank Name Usage Average Impact
1 YouTube 22,093 107 ms
2 Wistia 20,633 257 ms
3 Brightcove 4,933 441 ms

Developer Utilities

These scripts are developer utilities (API clients, site monitoring, fraud detection, etc).

Rank Name Usage Average Impact
1 New Relic 2,334 54 ms
2 Stripe 4,751 70 ms
3 OneSignal 37,165 83 ms
4 Google APIs/SDK 829,509 114 ms
5 App Dynamics 1,929 124 ms
6 Cloudflare 5,190 191 ms
7 PayPal 6,467 229 ms
8 Yandex APIs 57,870 362 ms
9 Distil Networks 11,313 376 ms
10 Sentry 15,981 686 ms

Hosting Platforms

These scripts are from web hosting platforms (WordPress, Wix, Squarespace, etc). Note that in this category, this can sometimes be the entirety of script on the page, and so the "impact" rank might be misleading. In the case of WordPress, this just indicates the libraries hosted and served by WordPress not all sites using self-hosted WordPress.

Rank Name Usage Average Impact
1 Blogger 17,943 47 ms
2 Dealer 23,885 90 ms
3 WordPress 126,052 122 ms
4 Shopify 220,676 158 ms
5 Weebly 35,097 230 ms
6 Hatena Blog 51,333 484 ms
7 Squarespace 87,878 491 ms
8 Wix 192,121 1040 ms

Marketing

These scripts are from marketing tools that add popups/newsletters/etc.

Rank Name Usage Average Impact
1 RD Station 2,517 70 ms
2 Hubspot 14,148 91 ms
3 Listrak 963 128 ms
4 OptinMonster 1,129 132 ms
5 Beeketing 61,179 138 ms
6 Drift 4,073 141 ms
7 Mailchimp 22,992 146 ms
8 Sumo 35,677 385 ms
9 Albacross 1,382 727 ms

Customer Success

These scripts are from customer support/marketing providers that offer chat and contact solutions. These scripts are generally heavier in weight.

Rank Name Usage Average Impact
1 LiveChat 20,433 87 ms
2 Freshdesk 909 140 ms
3 Help Scout 627 164 ms
4 Jivochat 23,628 215 ms
5 Olark 12,258 318 ms
6 Intercom 16,809 334 ms
7 Tawk.to 40,598 345 ms
8 ZenDesk 32,852 421 ms
9 Zopim 53,503 607 ms

Content & Publishing

These scripts are from content providers or publishing-specific affiliate tracking.

Rank Name Usage Average Impact
1 AMP 61,086 199 ms
2 Vox Media 704 456 ms
3 Hotmart 854 785 ms

Libraries

These are mostly open source libraries (e.g. jQuery) served over different public CDNs. This category is unique in that the origin may have no responsibility for the performance of what's being served. Note that rank here does not imply one CDN is better than the other. It simply indicates that the libraries being served from that origin are lighter/heavier than the ones served by another..

Rank Name Usage Average Impact
1 Bootstrap CDN 1,383 48 ms
2 FontAwesome CDN 15,661 102 ms
3 Yandex CDN 2,020 123 ms
4 Adobe TypeKit 4,519 131 ms
5 jQuery CDN 142,889 170 ms
6 Google CDN 744,534 177 ms
7 Cloudflare CDN 101,203 193 ms
8 JSDelivr CDN 24,627 285 ms
9 CreateJS CDN 1,757 3056 ms

Mixed / Other

These are miscellaneous scripts delivered via a shared origin with no precise category or attribution. Help us out by identifying more origins!

Rank Name Usage Average Impact
1 Amazon S3 32,205 156 ms
2 All Other 3rd Parties 1,344,782 204 ms
3 Google Tag Manager 1,098,396 431 ms
4 Parking Crew 4,542 461 ms

Third Parties by Total Impact

This section highlights the entities responsible for the most script execution across the web. This helps inform which improvements would have the largest total impact.

Name Popularity Total Impact Average Impact
Google Tag Manager 1,098,396 473,333 s 431 ms
All Other 3rd Parties 1,344,782 274,947 s 204 ms
Google/Doubleclick Ads 1,206,843 259,963 s 215 ms
Wix 192,121 199,834 s 1040 ms
Google CDN 744,534 131,849 s 177 ms
Facebook 1,107,461 128,923 s 116 ms
Google APIs/SDK 829,509 94,149 s 114 ms
Google Analytics 1,163,249 89,009 s 77 ms
Yandex Metrica 221,577 78,814 s 356 ms
Squarespace 87,878 43,179 s 491 ms
AddThis 170,036 41,730 s 245 ms
Twitter 274,753 40,120 s 146 ms
Shopify 220,676 34,854 s 158 ms
PIXNET 54,969 33,257 s 605 ms
Zopim 53,503 32,501 s 607 ms
Hatena Blog 51,333 24,848 s 484 ms
jQuery CDN 142,889 24,222 s 170 ms
Yandex APIs 57,870 20,926 s 362 ms
Cloudflare CDN 101,203 19,548 s 193 ms
33 Across 20,137 17,375 s 863 ms
WordPress 126,052 15,390 s 122 ms
Tawk.to 40,598 14,007 s 345 ms
ZenDesk 32,852 13,839 s 421 ms
Sumo 35,677 13,749 s 385 ms
Tumblr 40,855 12,755 s 312 ms
AMP 61,086 12,136 s 199 ms
Salesforce 40,868 11,025 s 270 ms
Sentry 15,981 10,966 s 686 ms
Yandex Ads 39,330 10,689 s 272 ms
Moat 14,337 10,154 s 708 ms
OpenX 10,729 8,974 s 836 ms
Beeketing 61,179 8,473 s 138 ms
Hotjar 91,036 8,395 s 92 ms
Weebly 35,097 8,062 s 230 ms
Criteo 64,547 7,496 s 116 ms
Integral Ads 24,532 7,477 s 305 ms
ShareThis 32,318 7,405 s 229 ms
JSDelivr CDN 24,627 7,007 s 285 ms
MediaVine 9,801 6,915 s 706 ms
WordAds 32,295 6,844 s 212 ms
Popads 5,009 6,451 s 1288 ms
Adobe Analytics 32,173 5,885 s 183 ms
Histats 14,706 5,739 s 390 ms
Intercom 16,809 5,614 s 334 ms
CreateJS CDN 1,757 5,370 s 3056 ms
Wistia 20,633 5,294 s 257 ms
Lucky Orange 6,113 5,098 s 834 ms
Jivochat 23,628 5,084 s 215 ms
Amazon S3 32,205 5,008 s 156 ms
Distil Networks 11,313 4,254 s 376 ms
Taboola 23,853 4,190 s 176 ms
Olark 12,258 3,902 s 318 ms
AppNexus 14,694 3,888 s 265 ms
Yandex Share 29,555 3,772 s 128 ms
Mailchimp 22,992 3,357 s 146 ms
Shareaholic 13,268 3,135 s 236 ms
Optimizely 13,482 3,135 s 232 ms
OneSignal 37,165 3,075 s 83 ms
Tealium 14,422 2,990 s 207 ms
YouTube 22,093 2,370 s 107 ms
Brightcove 4,933 2,173 s 441 ms
Yahoo Ads 9,578 2,158 s 225 ms
Dealer 23,885 2,158 s 90 ms
Parking Crew 4,542 2,093 s 461 ms
Amazon Ads 22,090 2,079 s 94 ms
LiveChat 20,433 1,786 s 87 ms
FontAwesome CDN 15,661 1,599 s 102 ms
LinkedIn 12,260 1,594 s 130 ms
Sizmek 4,011 1,501 s 374 ms
PayPal 6,467 1,478 s 229 ms
Segment 6,998 1,406 s 201 ms
Hubspot 14,148 1,287 s 91 ms
Pinterest 14,331 1,245 s 87 ms
DoubleVerify 1,988 1,193 s 600 ms
MGID 10,317 1,174 s 114 ms
Albacross 1,382 1,004 s 727 ms
Cloudflare 5,190 989 s 191 ms
Blogger 17,943 839 s 47 ms
Pubmatic 3,140 707 s 225 ms
Hotmart 854 670 s 785 ms
Market GID 3,873 592 s 153 ms
Adobe TypeKit 4,519 590 s 131 ms
Drift 4,073 575 s 141 ms
Baidu Analytics 7,041 550 s 78 ms
Mixpanel 5,462 420 s 77 ms
VK 6,342 414 s 65 ms
Rubicon Project 3,905 413 s 106 ms
Disqus 741 374 s 504 ms
Scorecard Research 3,578 369 s 103 ms
Stripe 4,751 334 s 70 ms
Vox Media 704 321 s 456 ms
Adroll 3,198 301 s 94 ms
Yandex CDN 2,020 249 s 123 ms
App Dynamics 1,929 240 s 124 ms
Snowplow 2,492 193 s 77 ms
RD Station 2,517 176 s 70 ms
OptinMonster 1,129 149 s 132 ms
Freshdesk 909 127 s 140 ms
New Relic 2,334 126 s 54 ms
Listrak 963 123 s 128 ms
Help Scout 627 103 s 164 ms
Bootstrap CDN 1,383 67 s 48 ms
Alexa 1,265 63 s 50 ms
Media Math 662 45 s 68 ms
Crazy Egg 455 41 s 89 ms

Future Work

  1. Introduce URL-level data for more fine-grained analysis, i.e. which libraries from Cloudflare/Google CDNs are most expensive.
  2. Expand the scope, i.e. include more third parties and have greater entity/category coverage.

FAQs

I don't see entity X in the list. What's up with that?

This can be for one of several reasons:

  1. The entity does not have at least 100 references to their origin in the dataset.
  2. The entity's origins have not yet been identified. See How can I contribute?

How is the "Average Impact" determined?

The HTTP Archive dataset includes Lighthouse reports for each URL on mobile. Lighthouse has an audit called "bootup-time" that summarizes the amount of time that each script spent on the main thread. The "Average Impact" for an entity is the total execution time of scripts whose domain matches one of the entity's domains divided by the total number of occurences of those scripts.

Average Impact = Total Execution Time / Total Occurences

How does Lighthouse determine the execution time of each script?

Lighthouse's bootup time audit attempts to attribute all toplevel main-thread tasks to a URL. A main thread task is attributed to the first script URL found in the stack. If you're interested in helping us improve this logic, see Contributing for details.

The data for entity X seems wrong. How can it be corrected?

Verify that the origins in data/entities.json are correct. Most issues will simply be the result of mislabelling of shared origins. If everything checks out, there is likely no further action and the data is valid. If you still believe there's errors, file an issue to discuss futher.

How can I contribute?

Only about 90% of the third party script execution has been assigned to an entity. We could use your help identifying the rest! See Contributing for details.

Contributing

Updating the Entities

The domain->entity mapping can be found in data/entities.json. Adding a new entity is as simple as adding a new array item with the following form.

{
    "name": "Facebook",
    "homepage": "https://www.facebook.com",
    "categories": ["social"],
    "domains": [
        "www.facebook.com",
        "connect.facebook.net",
        "staticxx.facebook.com",
        "static.xx.fbcdn.net",
        "m.facebook.com"
    ]
}

Updating Attribution Logic

The logic for attribution to individual script URLs can be found in the Lighthouse repo. File an issue over there to discuss further.

Updating the Data

The query used to compute the origin-level data is in sql/origin-execution-time-query.sql, running this against the latest Lighthouse HTTP Archive should give you a JSON export of the latest data that can be checked in at data/YYYY-MM-DD-origin-scripting.json.

Updating this README

This README is auto-generated from the templates lib/ and the computed data. In order to update the charts, you'll need to make sure you have cairo installed locally in addition to yarn install.

# Install `cairo` and dependencies for node-canvas
brew install pkg-config cairo pango libpng jpeg giflib
You can’t perform that action at this time.