Merge pull request #5553 from khickey-newrelic/develop

Update to kpi thresholds
newrelic · Jan 7, 2022 · 1232991 · 1232991
2 parents 1f8feec + e301a45
commit 1232991
Showing 1 changed file with 113 additions and 26 deletions.
diff --git a/...bility-maturity/customer-experience/quality-foundation-implementation-guide.mdx b/...bility-maturity/customer-experience/quality-foundation-implementation-guide.mdx
@@ -45,80 +45,167 @@ Quality Foundation measures the following KPIs:
 
 <CollapserGroup>
   <Collapser
-    id="js-errors-kpi"
-    title="JavaScript error rate"
+    id="availability-kpi"
+    title="Availability"
   >
-    This KPI measures the number of JavaScript errors per page view.
+    This KPI measures whether or not your application or its pages can be accessed by your users 
 
     **Goal:**
 
-    * Remove irrelevant JavaScript errors being tracked either by tuning ingest or using filtering.
-    * Reduce JavaScript errors that impact customer performance.
+    * Improve uptime and availablity  
+
+** Thresholds:**
+    * `<` 99% warning
+    * `<` 95% critical
 
+99% or "2 9's" is a good minimum standard of availability, even for employee applications or sub-pages.  We configure these default thresholds into the dashboards.  You can easily change this to better suit expectations for your application. 
   </Collapser>
 
   <Collapser
-    id="http-errors-kpi"
-    title="HTTP error rate"
+    id="core-web-lcp-kpi"
+    title="Largest contentful paint (LCP)"
   >
-    HTTP errors (or HTTP `4xx` and `5xx` responses) happen when calls to the backend are not successful.
+    Part of [Core Web Vitals](https://web.dev/vitals/). Largest Contentful Paint (LCP) measures the time it takes to load the largest image after a user has navigated to a new page.
 
     **Goal:**
 
-    Measure and reduce the HTTP error rate to ensure your customers are able to do what they came to your site to do.
+    * Reduce LCP to 2.5 seconds or better for the 75% percentile for all pages or at least the most critical pages.
+
+**Thresholds:**
+
+    * Warning:  `>` 2.5 seconds
+    * Critical: `>` 4.0 seconds
+
+LCP thresholds are defined by the team at Google.  The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).
 
   </Collapser>
-
+
+  <Collapser
+    id="core-web-kpi"
+    title="First input delay (FID)"
+  >
+    Part of [Core Web Vitals](https://web.dev/vitals/). Measures the interactivity of a page by tracking the time between user interaction (such as clicking a link or entering text) when the browser begins processing the event.
+
+    **Goal:**
+
+    Reduce FID to 100 milliseconds or better for the 75% percentile for all pages or at least the most critical pages.
+
+**Thresholds:**
+
+    * Warning:  `>` 100 milliseconds 
+    * Critical: `>` 300 milliseconds 
+
+FID thresholds are defined by the team at Google.  The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).
+
+
+  </Collapser>
+
+  <Collapser
+    id="layout-shift-kpi"
+    title="Cumulative layout shift (CLS)"
+  >
+    Part of [Core Web Vitals](https://web.dev/vitals/). Measures how much the page layout shifts during render.
+
+    **Goal:**
+
+    Maintain a score of 0.1 or less for the 75% percentile for all pages or at least the most critical pages.
+
+**Thresholds:**
+
+    * Warning:  `>` 0.1 score 
+    * Critical: `>` 0.25 score 
+
+CLS thresholds are defined by the team at Google.  The thresholds and the supporting logic behind them can be found [here](https://web.dev/defining-core-web-vitals-thresholds/).
+
+  </Collapser>
+
   <Collapser
     id="ttfb-kpi"
     title="Time to first byte (TTFB)"
   >
-    This KPI measures the time from navigation start (a user clicking a link) to the browser receiving the first byte of the response from the server.
+    This KPI measures the time from navigation start (a user clicking a link) to the browser receiving the first byte of the response from the server. Google considers TTFB secondary to Core Web Vitals. We recommend measuring it for a more complete picture. It can be revealing if you see a change in LCP, because it answers the question as to whether the change occurred server side or client side. 
 
     **Goal:**
 
     Reduce the time to first byte by improving CDN, network, and service performance.
 
+**Thresholds:**
+    * Warning  `>` 0.5 seconds
+    * Critical `>` 1.0 seconds
+
+According to Google and Search Engine People, 500 milliseconds is a decent TTFB for pages with dynamic content.  You can find mention of these recommendations [here](https://www.searchenginepeople.com/blog/16081-time-to-first-byte-seo.html).
+
   </Collapser>
-  
+
   <Collapser
-    id="core-web-lcp-kpi"
-    title="Largest contentful paint (LCP)"
+    id="ajax-response-times-kpi"
+    title="Ajax response times"
   >
-    Part of [Core Web Vitals](https://web.dev/vitals/). Largest Contentful Paint (LCP) measures the time it takes to load the largest image after a user has navigated to a new page.
+    Slow ajax calls can make the user feel as though nothing is happening or the page is broken.  If the response time is slow enough, users may even abandon the journey. 
 
     **Goal:**
 
-    * Reduce LCP to 2.5 seconds or better for the 75% percentile for all pages or at least the most critical pages.
+    Measure and improve ajax response times. 
+
+**Thresholds:**
+    * Warning  `>` 2 seconds 
+    * Critical `>` 2.5 seconds 
+
+These thresholds come from experience with customers across a variety of industries. 
 
   </Collapser>
-  
+
   <Collapser
-    id="core-web-kpi"
-    title="First input delay (FID)"
+    id="http-errors-kpi"
+    title="HTTP error rate"
   >
-    Part of [Core Web Vitals](https://web.dev/vitals/). Measures the interactivity of a page by tracking the time between user interaction (such as clicking a link or entering text) when the browser begins processing the event.
+    HTTP errors (or HTTP `4xx` and `5xx` responses) happen when calls to the backend are not successful.
 
     **Goal:**
 
-    Reduce FID to 100 milliseconds or better for the 75% percentile for all pages or at least the most critical pages.
+    Measure and reduce the HTTP error rate to ensure your customers are able to do what they came to your site to do.
+
+**Thresholds:**
+    * Warning  `<` 99% of requests are successful
+    * Critical `<` 97% of requests are successful
+
+These thresholds come from experience with customers across a variety of industries.
+We made the assumption that every ajax request is associated with something the user is trying to achieve and treat it accordingly.  Because users will often retry failed actions, we allowed for space between warning and critical thresholds. 
 
+*    If the ajax requests being measured are an important part of the user journey, we recommended aiming for higher success rates, such as 99.5% or 99.9%.
+*    If the ajax requests are tied to login requests, separate 4xx response codes from 5xx response codes and set a much lower threshold for the 4xx responses. You can look to historical response code rates to determine a reasonable threshold.
   </Collapser>
 
   <Collapser
-    id="layout-shift-kpi"
-    title="Cumulative layout shift (CLS)"
+    id="js-errors-kpi"
+    title="JavaScript error rate"
   >
-    Part of [Core Web Vitals](https://web.dev/vitals/). Measures how much the page layout shifts during render.
+    This KPI measures the number of JavaScript errors per page view.
 
     **Goal:**
 
-    Maintain a score of 0.1 or less for the 75% percentile for all pages or at least the most critical pages.
+    * Remove irrelevant JavaScript errors being tracked either by tuning ingest or using filtering.
+    * Reduce JavaScript errors that impact customer performance.
 
-  </Collapser>
+**Thresholds:**
+
+   * Warning:  `>` 5% errors per page view
+   * Critical: `>` 10% errors per page view
 
+These thresholds come from experience with customers across a variety of industries.
+
+
+  </Collapser>
+
 </CollapserGroup>
 
+For each KPI, we defined thresholds - one for warning, another for critical.  You might ask where these values come from or how you can be sure they should apply to your application.  Our thresholds are the ones recommended by Google (as with Core Web Vitals) or by us, based on our experience across a large number of customers and applications.  If you feel strongly that they should be different, you can adjust them, but you should do this at the organizational level rather than on an application by application basis.  
+
+Quality Foundation helps you identify where in your application you need to make improvements that will optimize user retention, conversion and satisfaction. It is less about where things are and more about where to get to.
+
+It also shows you what you should be measuring going forward. You can use this to define SLOs (in a service level dashboard) and alert on them. 
+
+
 ## Prerequisites
 
 ### Required knowledge