Skip to content

Commit

Permalink
Merge pull request #5553 from khickey-newrelic/develop
Browse files Browse the repository at this point in the history
Update to kpi thresholds
  • Loading branch information
bradleycamacho committed Jan 7, 2022
2 parents 1f8feec + e301a45 commit 1232991
Showing 1 changed file with 113 additions and 26 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -45,80 +45,167 @@ Quality Foundation measures the following KPIs:

<CollapserGroup>
<Collapser
id="js-errors-kpi"
title="JavaScript error rate"
id="availability-kpi"
title="Availability"
>
This KPI measures the number of JavaScript errors per page view.
This KPI measures whether or not your application or its pages can be accessed by your users

**Goal:**

* Remove irrelevant JavaScript errors being tracked either by tuning ingest or using filtering.
* Reduce JavaScript errors that impact customer performance.
* Improve uptime and availablity

** Thresholds:**
* `<` 99% warning
* `<` 95% critical

99% or "2 9's" is a good minimum standard of availability, even for employee applications or sub-pages. We configure these default thresholds into the dashboards. You can easily change this to better suit expectations for your application.
</Collapser>

<Collapser
id="http-errors-kpi"
title="HTTP error rate"
id="core-web-lcp-kpi"
title="Largest contentful paint (LCP)"
>
HTTP errors (or HTTP `4xx` and `5xx` responses) happen when calls to the backend are not successful.
Part of [Core Web Vitals](https://web.dev/vitals/). Largest Contentful Paint (LCP) measures the time it takes to load the largest image after a user has navigated to a new page.

**Goal:**

Measure and reduce the HTTP error rate to ensure your customers are able to do what they came to your site to do.
* Reduce LCP to 2.5 seconds or better for the 75% percentile for all pages or at least the most critical pages.

**Thresholds:**

* Warning: `>` 2.5 seconds
* Critical: `>` 4.0 seconds

LCP thresholds are defined by the team at Google. The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).

</Collapser>


<Collapser
id="core-web-kpi"
title="First input delay (FID)"
>
Part of [Core Web Vitals](https://web.dev/vitals/). Measures the interactivity of a page by tracking the time between user interaction (such as clicking a link or entering text) when the browser begins processing the event.

**Goal:**

Reduce FID to 100 milliseconds or better for the 75% percentile for all pages or at least the most critical pages.

**Thresholds:**

* Warning: `>` 100 milliseconds
* Critical: `>` 300 milliseconds

FID thresholds are defined by the team at Google. The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).


</Collapser>

<Collapser
id="layout-shift-kpi"
title="Cumulative layout shift (CLS)"
>
Part of [Core Web Vitals](https://web.dev/vitals/). Measures how much the page layout shifts during render.

**Goal:**

Maintain a score of 0.1 or less for the 75% percentile for all pages or at least the most critical pages.

**Thresholds:**

* Warning: `>` 0.1 score
* Critical: `>` 0.25 score

CLS thresholds are defined by the team at Google. The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).

</Collapser>

<Collapser
id="ttfb-kpi"
title="Time to first byte (TTFB)"
>
This KPI measures the time from navigation start (a user clicking a link) to the browser receiving the first byte of the response from the server.
This KPI measures the time from navigation start (a user clicking a link) to the browser receiving the first byte of the response from the server. Google considers TTFB secondary to Core Web Vitals. We recommend measuring it for a more complete picture. It can be revealing if you see a change in LCP, because it answers the question as to whether the change occurred server side or client side.

**Goal:**

Reduce the time to first byte by improving CDN, network, and service performance.

**Thresholds:**
* Warning `>` 0.5 seconds
* Critical `>` 1.0 seconds

According to Google and Search Engine People, 500 milliseconds is a decent TTFB for pages with dynamic content. You can find mention of these recommendations [here](https://www.searchenginepeople.com/blog/16081-time-to-first-byte-seo.html).

</Collapser>

<Collapser
id="core-web-lcp-kpi"
title="Largest contentful paint (LCP)"
id="ajax-response-times-kpi"
title="Ajax response times"
>
Part of [Core Web Vitals](https://web.dev/vitals/). Largest Contentful Paint (LCP) measures the time it takes to load the largest image after a user has navigated to a new page.
Slow ajax calls can make the user feel as though nothing is happening or the page is broken. If the response time is slow enough, users may even abandon the journey.

**Goal:**

* Reduce LCP to 2.5 seconds or better for the 75% percentile for all pages or at least the most critical pages.
Measure and improve ajax response times.

**Thresholds:**
* Warning `>` 2 seconds
* Critical `>` 2.5 seconds

These thresholds come from experience with customers across a variety of industries.

</Collapser>

<Collapser
id="core-web-kpi"
title="First input delay (FID)"
id="http-errors-kpi"
title="HTTP error rate"
>
Part of [Core Web Vitals](https://web.dev/vitals/). Measures the interactivity of a page by tracking the time between user interaction (such as clicking a link or entering text) when the browser begins processing the event.
HTTP errors (or HTTP `4xx` and `5xx` responses) happen when calls to the backend are not successful.

**Goal:**

Reduce FID to 100 milliseconds or better for the 75% percentile for all pages or at least the most critical pages.
Measure and reduce the HTTP error rate to ensure your customers are able to do what they came to your site to do.

**Thresholds:**
* Warning `<` 99% of requests are successful
* Critical `<` 97% of requests are successful

These thresholds come from experience with customers across a variety of industries.
We made the assumption that every ajax request is associated with something the user is trying to achieve and treat it accordingly. Because users will often retry failed actions, we allowed for space between warning and critical thresholds.

* If the ajax requests being measured are an important part of the user journey, we recommended aiming for higher success rates, such as 99.5% or 99.9%.
* If the ajax requests are tied to login requests, separate 4xx response codes from 5xx response codes and set a much lower threshold for the 4xx responses. You can look to historical response code rates to determine a reasonable threshold.
</Collapser>

<Collapser
id="layout-shift-kpi"
title="Cumulative layout shift (CLS)"
id="js-errors-kpi"
title="JavaScript error rate"
>
Part of [Core Web Vitals](https://web.dev/vitals/). Measures how much the page layout shifts during render.
This KPI measures the number of JavaScript errors per page view.

**Goal:**

Maintain a score of 0.1 or less for the 75% percentile for all pages or at least the most critical pages.
* Remove irrelevant JavaScript errors being tracked either by tuning ingest or using filtering.
* Reduce JavaScript errors that impact customer performance.

</Collapser>
**Thresholds:**

* Warning: `>` 5% errors per page view
* Critical: `>` 10% errors per page view

These thresholds come from experience with customers across a variety of industries.


</Collapser>

</CollapserGroup>

For each KPI, we defined thresholds - one for warning, another for critical. You might ask where these values come from or how you can be sure they should apply to your application. Our thresholds are the ones recommended by Google (as with Core Web Vitals) or by us, based on our experience across a large number of customers and applications. If you feel strongly that they should be different, you can adjust them, but you should do this at the organizational level rather than on an application by application basis.

Quality Foundation helps you identify where in your application you need to make improvements that will optimize user retention, conversion and satisfaction. It is less about where things are and more about where to get to.

It also shows you what you should be measuring going forward. You can use this to define SLOs (in a service level dashboard) and alert on them.


## Prerequisites

### Required knowledge
Expand Down

0 comments on commit 1232991

Please sign in to comment.