Figure out perceived performance benchmarks #2005

Mardak · 2017-01-18T18:37:37Z

Split from #1977. Determine "perceived performance" benchmarks (talk to philipp sackl). Mainly page loading to interactability of highlights/topsites. Probably good to have something around search.

dmose · 2017-01-23T17:12:07Z

https://www.instartlogic.com/blog/perceptual-speed-index-psi-measuring-above-fold-visual-performance-web-pages has relevant info, including a link to the VisualMetrics github repo, which has code to compute PSI from a video.

@digitarald might also have thoughts here...

digitarald · 2017-01-23T22:03:25Z

webpagetest (we have a private instance) gives you Speed Index but also can show user timings.

The gold standard for timings depends on you knowing how important each aspect of the content is. I would recommend adding timings for important content manually. Steve Souders has some good reference for this. Combine this with keeping the site interactive while streaming in the elements (webpagetest shows CPU and starts to add interactivity), as @Mardak said.

Happy to chat more about specifics if needed :)

dmose · 2017-04-26T21:21:49Z

In addition to the useful reading that @digitarald suggested, I also got a bunch out of reading LinkedIn's blog post.

After a bunch of research, and ruminating, and poking around and discussion with Kate, I'm going to propose a set of user timings that I think are highest priority. In an ideal world, we'd do a bunch of user research around our existing implementation, but the SDK one is probably not the right starting point for performance stuff like this. So these measurements are in large part informed by what's easy to bootstrap/measure.

In addition to replacing about:newtab, activity stream also replaces about:home.
There are really three key performance contexts here:

about:newtab always preloaded in a hidden context, and then made visible in a fresh tab.

about:home: by default, is the first page that comes up at browser start in a new tab. This is effectively a normal render.

about:home from the toolbar button (not shown by default): will make the current tab load activity stream. Tab-blanking followed by a normal render.

After that, I'll also include an outline of stuff that we are likely to want to look at as time goes on.

I'll break down the user-timings session into graduation issues soon, and then resolve this bug.

Here are the first user timings that I propose we include; they should be useful in both development/profiling, synthetic regression testing, and Real User Monitoring (aka telemetry).

user timing benchmarks (rum now; syn later)
- ensure that opening first browser/tab triggers preload for next about:newtab (ensure that the first tab rendered triggers preload for first about:newtab #2527)
- general
  - ~~add-on initialization time~~
  - about:home load start time (send perceived about:home load_trigger="first_window_opened" telemetry #2658)
- initiation events
  - time at BrowserOpenTab (cmd_md_newNavigatorTab, send load_trigger and visibility performance telemetry for about:newtab #2539), which should kick off:
    - time at command T (send load_trigger and visibility performance telemetry for about:newtab #2539)
    - time at + button (send load_trigger and visibility performance telemetry for about:newtab #2539)
    - time at menu click (send load_trigger and visibility performance telemetry for about:newtab #2539)
  - time at home toolbar button click (send perceived about:home load_trigger="first_window_opened" telemetry #2658)
  - visibility notification time (send load_trigger and visibility performance telemetry for about:newtab #2539)
- KPIs (both probably using componentDidUpdate, http://stackoverflow.com/a/34999925, but with 2 rfas instead of a setTimeout + RFA), see also Dropbox code)
  - FMP/Hero Element: painting of TopSites (send Hero Element (TopSites) painted telemetry #2661)
  - Display Done: timing of top-level components (send display done telemetry #2662)
tti benchmarks
- Search responsiveness (per character)
  - input latency between keystroke and character visibility (send search box input latency telemetry #2663)
  - keystroke autocomplete update (send telemetry for latency between keystroke and autocomplete dropdown #2664)
- time to first real interaction (first keypress or click event) (telemetry for timestamp to first real interaction #2668)
bad states
- top sites data is ready (if later than render time) (telemetry data for bad states #2672)
- highlights data is ready (if later then render time) (telemetry data for bad states #2672)
- bookend all event handlers send bad state if > 10ms (avoid content-process jank by collecting slow event handler telemetry #2669)

There is some perceptual stuff that I hope that we can get out of WebPageTest/Talos as well as performance.timing eventually, but I don't think that's the highest priority, since so much of that is focused on loading stuff over the network, and that's a set of issues we just don't have.

maybe/later

for regressions: get measurements from WebPageTest via Talos
write metric using MozAfterPaint to track final paint time (but this looks hard for content pages)
- https://groups.google.com/forum/#!topic/mozilla.dev.platform/pCLwWdYc_GY
- but, see
  - https://bugzilla.mozilla.org/show_bug.cgi?id=1264798#c12
  - http://searchfox.org/mozilla-central/source/dom/webidl/NotifyPaintEvent.webidl
jank monitor

The workaround is having a "heartbeat" (continues 50ms timer, not having much perf impact). Using the skew of expected time and actual time you can tell if the main thread was blocked. Make sure to also include page visibility as factor as that can throttle timers. Somes sites started using this technique + meaningful paint to guess TTI.

tti
- performance.timing.timeToInteractive
  - syn/RUM: once bug 1299118 lands
painting/display
- first meaningful paint: "the user feels that the primary content of the page is visible” “biggest layout chunk painted”
  - syn/rum: use performance.timing.firstMeaningfulPaint
    - set pref: dom.performance.first-meaningful-paint.enabled
    - after bug 1299117 lands
- SI (progress of above-fold loading)
  - syn: webpagetest
  - rum: https://github.com/WPO-Foundation/RUM-SpeedIndex
- PSI (above-fold loading, but notices visual jitter/layout thrashing)
  - syn: WPT
  - rum: (none)

dmose · 2017-04-27T17:47:39Z

I'm going to edit the above list in place as a checklist.

digitarald · 2017-04-27T23:40:57Z

for each of (about:newtab, about:home)

I am unsure about what the aforementioned about:newtab/about:home measure, is it delay to first paint after input (probably tab opening or blank, depending on interaction). Maybe those can be tracked as what kind of response is expected:

Time to tab blank for navigation within same tab
Time to tab opened and page starting to initialize

For page load metrics, the good old page load event will not provide much detail for SPA apps like AS. It is best to annotate when the components in the viewport load. To break it down even further I would recommend 3 stages:

First non-blank paint: only if first paint is not already content but some placeholder)
Hero Element: the first element to be painted that users interact with most, like the top sites panels)
Visual Complete: All above the fold content is loaded, including images
Display Done: Assuming anything loads below the fold this marks when all components are fully rendered, otherwise Meaningful Paint is Display Done.

This assumes the page is usable after Hero Element, meaning that no lazy loaded component causes noticeable jank (rule of thumb is frames longer than 50ms):

Slice any processing you can into smaller chunks and process it using requestIdleCallback
Never have any processing that can take longer than 10ms in response to an event. Delegate work of any input handler to requestIdleCallback.
Process data off the main thread using workers when possible

dmose · 2017-04-28T15:55:05Z

In addition to replacing about:newtab, activity stream also replaces about:home.
To be clearer, there are really three key performance contexts here:

about:newtab always preloaded in a hidden context, and then made visible in a fresh tab.

about:home: by default, is the first page that comes up at browser start in a new tab. This is effectively a normal render.

about:home from the toolbar button (not shown by default): will make the current tab load activity stream. Tab-blanking followed by a normal render.

The first two cases are clearly the most important. One thing all of these cases is that none of them (will, in the system-addon implementation) load any content directly from the network.

for each of (about:newtab, about:home)
I am unsure about what the aforementioned about:newtab/about:home measure,

Nothing, that was a header indicating that I want simple page-load proxy measures for each of two loading contexts (I just realized the third one while writing this up, so I'll modify the plan :-).

For page load metrics, the good old page load event will not provide much detail for SPA apps like AS.

Agreed; I mostly wanted to collect that to see where it shows up in the sequence compared to other events. That's probably not a good enough reason to collect it, especially given that it'll show up in the profiler when we profile.

It is best to annotate when the components in the viewport load.

As currently written, I was intending to just look at when the top-level React component had rendered and painted (equivalent to Display Done). My suspicion is that given that none of this stuff will be network loaded, and we're offloading more things to other threads, we'll be fast enough for most users that Display Done may show us that we don't need to break it down further.

To break it down even further I would recommend 3 stages:

First non-blank paint: only if first paint is not already content but some placeholder)

We'll add that if we end up needing a placeholder in the system add-on version.

Hero Element: the first element to be painted that users interact with most, like the top sites panels)

Yeah, I just checked our data, and topsites is the one. I'm thinking the best way to do it is using http://stackoverflow.com/a/34999925 .

Visual Complete: All above the fold content is loaded, including images

We don't specifically know exactly what content is going to be above the fold. For that reason, this seems like enough work that I'm inclined to put it off until we see the numbers we get back from Display Done and Hero Element (Top Sites Painted).

@digitarald I presumably it's hard to quantitatively monitor jank until we implement PerformanceFrameTiming. Or is there some other way to do that?

Never have any processing that can take longer than 10ms in response to an event. Delegate work of any input handler to requestIdleCallback.

So it sounds like we want all event handles to have performance.mark bookending?

digitarald · 2017-04-28T20:12:39Z

Yeah, I just checked our data, and topsites is the one. I'm thinking the best way to do it is using http://stackoverflow.com/a/34999925 .

Double rfa might be also an option here and is used by FB et al.

@digitarald I presumably it's hard to quantitatively monitor jank until we implement PerformanceFrameTiming. Or is there some other way to do that?

The workaround is having a "heartbeat" (continues 50ms timer, not having much perf impact). Using the skew of expected time and actual time you can tell if the main thread was blocked. Make sure to also include page visibility as factor as that can throttle timers. Somes sites started using this technique + meaningful paint to guess TTI.

So it sounds like we want all event handles to have performance.mark bookending?

That would be ideal. For more complex interactions you could even break it down into a R(A)IL measure to know how fast the first response paint happens and when the final state is painted.

dmose · 2017-06-05T18:47:34Z

I've spun off a new bug to make plans for synthetic perf tests which references the discussion in this bug.

I believe all the critical stuff from this bug has now been filed (and annotated in #2005 (comment) ), so I'm closing this one.

Mardak added Graduation MVP labels Jan 18, 2017

k88hudson mentioned this issue Feb 27, 2017

Talos, performance testing #1977

Closed

2 tasks

Mardak moved this from Unassigned to Milestone 1 in Land in Nightly / Graduate Mar 15, 2017

tspurway added the P1 label Apr 17, 2017

tspurway added this to the Xchamsiks (April 30) milestone Apr 17, 2017

dmose self-assigned this Apr 20, 2017

jaredlockhart modified the milestones: Xchamsiks (April 30), Yoho (May 14) May 1, 2017

tspurway mentioned this issue May 5, 2017

2017-Q2 2.3 The Activity Stream system add-on does not regress Page Load, Smoothness or Responsiveness in Firefox relative to Tiles [100%] mozilla/activity-stream-okrs#28

Closed

sarracini modified the milestones: Yoho (May 14), Zumtela (May 28) May 15, 2017

Mardak added Fx55 and removed Fx55 labels May 15, 2017

sarracini modified the milestones: Zumtela (May 28), Apocalypse Now (June 11) May 30, 2017

dmose mentioned this issue Jun 4, 2017

send Hero Element (TopSites) painted telemetry #2661

Closed

This was referenced Jun 4, 2017

send display done telemetry #2662

Closed

add spinner telemetry (assuming spinner is going to stay) #2673

Closed

[meta] make plans for synthetic performance tests #2675

Closed

dmose closed this as completed Jun 5, 2017

sarracini moved this from Milestone 1 to Complete in Land in Nightly / Graduate Jun 9, 2017

dmose mentioned this issue Jul 10, 2017

avoid content-process jank by collecting slow event handler telemetry #2669

Closed

This was referenced Jul 21, 2017

visualize when we're janking the main thread #2844

Closed

Main Thread Performance Tracking / Reporting #2526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out perceived performance benchmarks #2005

Figure out perceived performance benchmarks #2005

Mardak commented Jan 18, 2017

dmose commented Jan 23, 2017

digitarald commented Jan 23, 2017

dmose commented Apr 26, 2017 •

edited

Loading

dmose commented Apr 27, 2017

digitarald commented Apr 27, 2017

dmose commented Apr 28, 2017

digitarald commented Apr 28, 2017

dmose commented Jun 5, 2017

Figure out perceived performance benchmarks #2005

Figure out perceived performance benchmarks #2005

Comments

Mardak commented Jan 18, 2017

dmose commented Jan 23, 2017

digitarald commented Jan 23, 2017

dmose commented Apr 26, 2017 • edited Loading

dmose commented Apr 27, 2017

digitarald commented Apr 27, 2017

dmose commented Apr 28, 2017

digitarald commented Apr 28, 2017

dmose commented Jun 5, 2017

dmose commented Apr 26, 2017 •

edited

Loading