WebVitals metrics measurements are incorrect for "slow" loading sites #960

ka3de · 2023-06-30T09:55:05Z

Brief summary

After modifications made in #949 and #943, and comparisons done in #950, we concluded that, in some cases, k6 browser reports incorrect Web Vitals metrics measurements. This happens especially for sites that are more dynamic and take longer to load, as in this cases k6 browser seems to report "intermediate" values corresponding to samples taken before the page is fully loaded, event for which k6 browser seems to not wait for long enough. This affects particularly FCP and LCP metric.

xk6-browser version

commit: c959b57

OS

Ubuntu 20.04.5 LTS

Chrome version

113.0.5672.126 (Official Build) (64-bit)

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Run the following test:

import { browser } from 'k6/x/browser';

export const options = {
  scenarios: {
    ui: {
      executor: 'shared-iterations',
      options: {
        browser: {
            type: 'chromium',
        },
      },
    },
  },
};
  

export default async function () {
  const page = browser.newPage();

  try {
    await page.goto('https://grafana.com')
  } finally {
    page.close();
  } 
}

Expected behaviour

Web Vitals metrics should be similar to the ones reported by Google Lighthouse:

Actual behaviour

Web Vitals metrics reported by k6 browser:

Notice the differences in FCP and LCP.

The text was updated successfully, but these errors were encountered:

ka3de · 2023-07-10T10:18:51Z

I've been investigating this issue further. My main concern was to clarify if k6 browser implementation was incorrect at some point during the Web Vitals metrics gathering, parsing or reporting; from the tooling script injected in the web site, to being reported to k6 browser through a set up binding and finally pushed to k6 core.

For this I defined a set of hypothetical issues that might be producing these metric mismatches and explored them individually. This was done by collecting timestamped events for the moments on which:

Web Vitals JS library is loaded in the web site (from Web Site context)
Web Vitals JS libary reports a sample (from Web Site context)
k6 browser receives a sample through binding call
k6 browser pushes a sample to VU samples channel (specifying whether this one was open or not)
k6 browser removes the Web Vitals binding as part of page close
k6 browser finishes an iteration

A clarification first on `reportAllChanges` flag for Web Vitals library

In our effort to achieve more consistent Web Vitals reporting, a change was introduced in #949 to force the Web Vitals library to report metrics on every value change. Early on during the tests performed for this issue investigation, we observed that this was an incorrect configuration, especially for LCP metric. LCP measures the time it takes to render the largest image or text block visible within the viewport, relative to when the page first started loading. Because there can only be one "largest" element in a page, LCP should only provide one sample per page load. Instead, if we use the reportAllChanges flag, LCP is reported multiple times when rendering intermediate elements that, up until that point, are the largest rendered in the page. This makes k6 browser record incorrect values for LCP on every new page.
Therefore, we should not use reportAllChanges flag for Web Vitals library. Instead, and in order to keep the reporting consistency that this flag was providing, the change implemented in #953 proved to be a better fit and produces a similar effect as recommendations given in the Web Vitals library doc:

Note that some of these metrics will not report until the user has interacted with the page, switched tabs, or the page starts to unload. If you don't see the values logged to the console immediately, try reloading the page (with preserve log enabled) or switching tabs and then switching back.

For this reason, all tests executed as part of the investigation for this issue have been done reverting #949 and including changes implemented in #953. More on this in conclusions and next steps.

Issue hypothesis

Metric samples are lost due to premature iteration end
As per our tests, no metrics samples were lost due to premature iteration end. This hypothesis tried to clarify if there were samples being reported by the Web Vitals metrics script in the web site context that would not be catched by k6 browser implementation. This issue was particularly plausible in iterations that only perform a page.goto action, which, by default, would guarantee waiting until the page load event is fired, but this might not have been enough to receive Web Vitals metrics samples reported from JS script. In practice this was not an issue.

Mismatched time measurements due to injecting Web Vitals script too late
This hypothesis tried to clarify if, due to how the Web Vitals script is injected in every page through CDP, might have an effect in time measurements, maybe because it was being injected too late in the page load process.
In practice this was not an issue as Web Vitals library is injected before any other script in the page. Also, even when instrumenting a website through "standard means" (not through a browser automation tool), the library is also injected as an included script in the page, therefore this should not be a critical point with observed mismatched metrics.

Mismatched measurements due to browser flags
k6 browser uses a set of default flags when launching chrome/chromium browser in order to minimize undesired "background processes" and favor test automation consistency. This hypothesis tried to clarify if these flags could produce significant differences on Web Vitals metrics measurements vs a "default browser execution".
In practice there were not consistent significant differences observed between both "execution modes".

Conclusions

Overall this issue has been useful to discard any mishandling of Web Vitals metrics from k6 browser implementation. Instead what we have observed is that the metrics reported by the Web Vitals library script are the ones that end up being processed and reported by k6 browser. Which means that any sort of mismatched measurements should, in theory, be produced by the library itself.
In general I think we have to accept the particularities of k6 browser execution "context" and the effect that these might have in these measurements. These differences are not consistent across different web sites and, in practice, these are not significantly different than variations observed between different web performance measuring tools (see tests section). Also if we put these differences, which mainly affect FCP and LCP metrics, in perspective with the thresholds defined for these metrics, they very rarely (at least per our tests) produce a change of rating. E.g.: The differences observed don't end up impacting the rating for these metrics as they are still lower than the good threshold value for FCP (1.8s) and LCP (2.5s).

Other takeaways
reportAllChanges flag should not be used for Web Vitals library
As explained earlier, this flag produces incorrect aggregated metrics for LCP in our tests and should not be used.

Use pagehide event on page close
Also mentioned earlier, the changes implemented in #953 has proven to be a good fix in order to achieve the consistency provided by the reportAllChanges flag for Web Vitals library without the undesired drawbacks. This also proves to be a good fix for premature iteration end in tests that just perform a page.goto.

Headless flag for k6 browser has an impact on page rendering
In some cases we could observe that executing the browser in headless/headful mode produced significant changes for FCP and LCP metrics (see zara test case). This again shows that there are many particularities in the execution context hat have an impact on the Web Vitals metrics measurements, and that these should be most probably accepted. It is yet to see if the new headless mode in chrome has an impact on these differences (see #948).

Tests

See the tests performed. These have been run through N executions and values reported in the tables correspond to one execution which was producing measurements similar to the more consistent results through all executions. E.g.: We have verified that the measurements in these tables do not belong to a particular execution that for some reason might have produced significantly different results compared to the majority of other executions.

Tests

grafana.com	k6 Browser	headful	Lighthouse	PageSpeed	WebPageTest
TTF	105.1ms	100.7ms	-	500ms	722ms
FCP	317.6ms	353.9ms	800ms	1800ms	2074ms
LCP	521.3ms	554ms	1600ms	2100ms	3989ms
CLS	0.001913	0.001959	0.003	0.02	0.029

slack.com	k6 Browser	headful	Lighthouse	PageSpeed	WebPageTest
TTF	143.84ms	248.59ms	_	700ms	899ms
FCP	573.6ms	629.19ms	900ms	1500ms	2821ms
LCP	603.1ms	608.19ms	1600ms	2100ms	2821ms
CLS	0.057071	0.061046	0.056	0.15	0.236

Headless/Headful mode produced huge differences between runs for slack.com. From 600ms to 5.38s for LCP and up to 2.5s for FCP in a few executions in headful mode.

github.com	k6 Browser	headful	Lighthouse	PageSpeed	PageWebTest
TTF	133.8ms	142.09ms	-	1000ms	845ms
FCP	359.6ms	450.39ms	700ms	1900ms	3216ms
LCP	375ms	406.19ms	900ms	2700ms	3843ms
CLS	0.002217	0.001416	0	0.11	0

news.ycombinator.com	k6 Browser	headful	Lighthouse	PageSpeed	PageWebTest
TTF	515.39ms	509.7ms	-	600ms	895ms
FCP	721.59ms	735.1ms	600ms	600ms	1464ms
LCP	721.59ms	735.1ms	800ms	600ms	1464ms
CLS	0	0	0	0	0

youtube.com	k6 Browser	headful	Lighthouse	PageSpeed	PageWebTest
TTF	250.35ms	209.89ms	-	700ms	2093ms
FCP	439.1ms	412.3ms	700ms	2600ms	4234ms
LCP	1650ms	1640ms	2600ms	4000ms	6187ms
CLS	0.000816	0.000737	0.001	0.31	0

www.zara.com	k6 Browser	headful	Lighthouse	PageSpeed	PageWebTest
TTF	148.1ms	171.5ms	-	400ms	950ms
FCP	174.89ms	431.6ms	600ms	1100ms	3052ms
LCP	174.89ms	464.9ms	700ms	3000ms	9160ms
CLS	0	0	0.002	0.14	0

Next steps

Revert Report Web Vital metrics on every chance #949 in order to avoid incorrect LCP measurements.
Apply Web Vitals force page hide #953 in order to improve Web Vitals metrics consistent reporting.
Investigate impact of new headless mode in Web Vitals metrics (Investigate working with new type for headless option #948).

ankur22 · 2023-07-10T10:40:56Z

Great bit of analysis 👏 🎉

It feels like there's not much we can do at the moment (apart from what is listed in the Next Steps, which I agree with), we are bound by the Web Vital library and the behaviours of the all the different factors (the browser module, website under test, the OS, and the compute resources, the network, etc).

I think what will be important is how consistent the measurements are from one test run to another -- a test today against grafana.com should result in a similar test result where the margin of error (or delta) is small assuming the factors remain the same.

ka3de · 2023-07-11T07:48:43Z

See comparison charts for FCP and LCP metrics for sites under test:
(Take into account that PageSpeed caches results, therefore the consistent measurements)

grafana.com

slack.com

github.com

news.ycombinator.com

youtube.com

zara.com

We can see that the trends between k6 browser and Lighthouse usually match, and the difference between them in absolute values are not so significant compared to differences with other tools.

ka3de · 2023-07-14T07:48:19Z

An interesting test comparison between k6 Browser and Grafana Faro:

This test was done by instrumenting k6-docs site with Grafana Faro, running the site locally and executing k6 browser tests against it. The interesting point about this test is that it allows us to obtain metrics calculated by k6 browser and Grafana Faro in the same "environment conditions", including hardware, network etc, also taking into account that in this test both measurements for k6 browser and Grafana faro for each sample are obtained as a result of the same page request/page load, as the k6 browser script just executes a page.goto() action, from which it obtains the Web Vitals metrics, and is this same action the one that triggers the page load and the metrics measurements from Faro side due to the website being instrumented.

FCP

Sample	k6 Browser	Faro
1	1070	1080
2	999.4	1060
3	1010	1010
4	962.7	1010
5	1000	1010
6	937.2	986
7	1030	1020
8	951.19	992
9	953.9	974
10	999.5	1010

LCP

Sample	k6 Browser	Faro
1	1070	1080
2	999.4	1060
3	1010	1010
4	1040	1030
5	1000	1030
6	937.2	1030
7	1030	1020
8	951.19	992
9	953.9	974
10	999.5	1010

Try it yourself

k6-docs
Git clone k6-docs repo. In our case we were using commit e8c334f6.

git clone git@github.com:grafana/k6-docs.git
cd k6-docs

Instrument web app with Faro:

npm i -S @grafana/faro-web-sdk

Apply diff

diff --git a/src/components/pages/doc-welcome/faro/faro.js b/src/components/pages/doc-welcome/faro/faro.js
new file mode 100644
index 00000000..0ee82df8
--- /dev/null
+++ b/src/components/pages/doc-welcome/faro/faro.js
@@ -0,0 +1,16 @@
+import { initializeFaro as coreInit } from '@grafana/faro-web-sdk';
+
+export function initializeFaro() {
+  const faro = coreInit({
+    url: 'http://localhost:8027/collect',
+    apiKey: 'api_key',
+    app: {
+      name: 'k6-docs-test',
+      version: '1.0.0',
+    },
+  });
+
+  faro.api.pushLog(['Faro was initialized']);
+}
diff --git a/src/components/pages/doc-welcome/faro/index.js b/src/components/pages/doc-welcome/faro/index.js
new file mode 100644
index 00000000..00cebe8e
--- /dev/null
+++ b/src/components/pages/doc-welcome/faro/index.js
@@ -0,0 +1 @@
+export * from './faro';
diff --git a/src/components/pages/doc-welcome/index.js b/src/components/pages/doc-welcome/index.js
index 6fbdb036..06203d92 100644
--- a/src/components/pages/doc-welcome/index.js
+++ b/src/components/pages/doc-welcome/index.js
@@ -1,3 +1,5 @@
+import { initializeFaro } from './faro';
+
 export * from './cloud';
 export * from './features';
 export * from './help';
@@ -6,3 +8,6 @@ export * from './showcase';
 export * from './what-is';
 export * from './whats-new';
 export * from './manifesto';
+export * from './faro';
+
+initializeFaro();

Run it:

docker-compose up --build

Faro
We can use the faro-web-sdk demo in order to deply the Faro components locally. In our case we were using commit 54618b1.

git clone git@github.com:grafana/faro-web-sdk.git
cd faro-web-sdk
docker compose --profile demo up

Website will be available at http://localhost:8100 and Grafana at http://localhost:300. To see the Web Vitals metrics go to Dashboards -> Grafana Agent -> Frontend.

Test
We were using xk6-browser with commit a8ebe8c.
Which you can build with xk6 such as:

K6_VERSION=master xk6 build --with github.com/grafana/xk6-browser@a8ebe8c

And run the following script:

Simple k6 browser test being used.

import { browser } from 'k6/x/browser';

export const options = {
  scenarios: {
    ui: {
      executor: 'shared-iterations',
      options: {
        browser: {
            type: 'chromium',
        },
      },
    },
  },
};
  

export default async function () {
  try {
    await page.goto('http://localhost:8100/')
  } finally {
    page.close();
  } 
}

Conclusion

So, all in all is good to verify that both k6 browser and Grafana Faro produce almost identical results when executed in the same conditions for a "real world" website (not a simple test site), even if in the end they serve for different purposes. Still, as mentioned in the previous comment, it's important to remark again the impact that the "execution environment" can have in Web Vitals measurements, considering things like hardware and network conditions, but also headless/headful mode, viewport and screen resolution, and also the user interaction with the page.

ka3de added the bug Something isn't working label Jun 30, 2023

ka3de mentioned this issue Jun 30, 2023

Investigate potential discrepancies in Web Vitals Metrics #950

Closed

ka3de added this to the v0.11.0 milestone Jun 30, 2023

ka3de self-assigned this Jul 6, 2023

This was referenced Jul 11, 2023

Revert "Report all web vital metric changes (#949)" #964

Merged

Web Vitals force page hide #953

Merged

ka3de closed this as completed Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebVitals metrics measurements are incorrect for "slow" loading sites #960

WebVitals metrics measurements are incorrect for "slow" loading sites #960

ka3de commented Jun 30, 2023

ka3de commented Jul 10, 2023 •

edited

ankur22 commented Jul 10, 2023

ka3de commented Jul 11, 2023

ka3de commented Jul 14, 2023

WebVitals metrics measurements are incorrect for "slow" loading sites #960

WebVitals metrics measurements are incorrect for "slow" loading sites #960

Comments

ka3de commented Jun 30, 2023

Brief summary

xk6-browser version

OS

Chrome version

Docker version and image (if applicable)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

ka3de commented Jul 10, 2023 • edited

A clarification first on reportAllChanges flag for Web Vitals library

Issue hypothesis

Conclusions

Tests

Next steps

ankur22 commented Jul 10, 2023

ka3de commented Jul 11, 2023

grafana.com

slack.com

github.com

news.ycombinator.com

youtube.com

zara.com

ka3de commented Jul 14, 2023

FCP

LCP

Try it yourself

Conclusion

ka3de commented Jul 10, 2023 •

edited

A clarification first on `reportAllChanges` flag for Web Vitals library