Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Add telemetry probe for dex load time #8803

Closed
ecsmyth opened this issue Dec 17, 2019 · 7 comments
Closed

Add telemetry probe for dex load time #8803

ecsmyth opened this issue Dec 17, 2019 · 7 comments
Assignees
Labels
needs:triage Issue needs triage performance Possible performance wins

Comments

@ecsmyth
Copy link

ecsmyth commented Dec 17, 2019

edit: this issue was repurposed to only instrument dex load time instead of startup time but the description for startup probes remain below.

Why/User Benefit/User Problem

Startup time is an oft cited reason for friction and churn. We need to understand how long it takes Fenix to startup for our users so that we can prioritize work to address regressions and outliers that result in decreased engagement or churn. Local and CI-based testing provides an incomplete picture of how Fenix performs at startup.

Impact

Improved understanding of how Fenix performs on startup for our users.

Acceptance Criteria (how do I know when I’m done?)

We should make a best-effort to incrementally instrument as much of startup as possible - until we hit diminishing returns - because it's impossible to do it all. Unfortunately, it will be difficult to define what this looks like in advance so take this as a guiding principle. Msg mcomella if this does not make sense.

Some incremental steps we should try to implement for GA:

  • Add a metric that tracks the start time of GV
  • Add a metric that tracks the duration of FenixApplication.onCreate (this may or may not be helpful but we want to add it to see if it is in practice)
  • Add a metric that tracks FenixApplication.onCreate until first frame drawn
  • Distinguish between first run and other cold starts (don't let this impact performance!)
  • (is this reasonable?) Distinguish between which page was shown, i.e. which startup path was used
  • Time from first frame drawn to reportFullyDrawn? Essentially, capture loading top sites & open tabs because this is dependent on content quantity

Some incremental steps we probably can't implement by GA but we should verify:

  • Add a probe that tracks FenixApplication.onCreate until visual completeness
  • Add warm/hot start times, distinguish from cold start
  • Distinguish between cold starts after upgrading and other cold starts

Here is ecsmyth's ideal startup time probe that inspired these incremental steps:

Glean telemetry probe landed in Fenix release that measures time, as closely as is practical, from when user initiates startup until the app is started and the initial screen is visually complete. The probe should differentiate between hot, cold, and warm startup scenarios and account for differences in the first page rendered (e.g., how desktop measures first paint and first paint of about:home)

┆Issue is synchronized with this Jira Task

@mcomella
Copy link
Contributor

It may be valuable to revisit the cold/warm/hot startup definition doc here.

@mcomella
Copy link
Contributor

mcomella commented Feb 5, 2020

Alessio suggested we can use Glean performance metrics to capture these effectively.

@mcomella mcomella transferred this issue from mozilla-mobile/perf-frontend-issues Feb 27, 2020
@mcomella mcomella added the performance Possible performance wins label Feb 27, 2020
@github-actions github-actions bot added the needs:triage Issue needs triage label Feb 27, 2020
@mcomella mcomella self-assigned this Feb 27, 2020
@mcomella mcomella moved this from Needs prioritization to In progress in Performance, front-end roadmap Feb 27, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Feb 28, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 13, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 13, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 13, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 13, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 18, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 18, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 18, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 18, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Mar 19, 2020
During glean review, keeping as a separate commit to easily see the diff.
mcomella added a commit to mcomella/fenix that referenced this issue Mar 19, 2020
During glean review, keeping as a separate commit to easily see the diff.
mcomella added a commit to mcomella/fenix that referenced this issue Mar 19, 2020
During glean review, keeping as a separate commit to easily see the diff.
@mcomella mcomella moved this from In progress to Waiting in Performance, front-end roadmap Mar 25, 2020
@mcomella
Copy link
Contributor

Waiting on ecsmyth to determine if time for GeckoRuntime.init (i.e. the time Gecko starts on the main thread before continuing work on a background thread) is a valuable probe.

mcomella added a commit to mcomella/fenix that referenced this issue Mar 31, 2020
This wraps a Glean TimespanMetricType to make it safer to measure
duration.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 9, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Apr 9, 2020
We need to access the data in stat to get the process start time, so we
can calculate the time from process start until application.init for the
frameworkStart probe.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 9, 2020
This class controls the central logic around the metrics we want to
record.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 9, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Apr 15, 2020
We primarily want to determine if this is a problem area for us to
investigate rather than a long term measurement to keep so we should set
the expiration date accordingly. Furthermore, this code executes before
crash reporting is init so it's ideal to remove it sooner rather than
later.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 15, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
We need to access the data in stat to get the process start time, so we
can calculate the time from process start until application.init for the
frameworkStart probe.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
This class controls the central logic around the metrics we want to
record.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
We primarily want to determine if this is a problem area for us to
investigate rather than a long term measurement to keep so we should set
the expiration date accordingly. Furthermore, this code executes before
crash reporting is init so it's ideal to remove it sooner rather than
later.
mcomella added a commit to mcomella/fenix that referenced this issue Apr 16, 2020
mcomella added a commit that referenced this issue Apr 17, 2020
We need to access the data in stat to get the process start time, so we
can calculate the time from process start until application.init for the
frameworkStart probe.
mcomella added a commit that referenced this issue Apr 17, 2020
This class controls the central logic around the metrics we want to
record.
mcomella added a commit that referenced this issue Apr 17, 2020
We primarily want to determine if this is a problem area for us to
investigate rather than a long term measurement to keep so we should set
the expiration date accordingly. Furthermore, this code executes before
crash reporting is init so it's ideal to remove it sooner rather than
later.
@mcomella
Copy link
Contributor

I added the "dex launch time" probe: the time between process start and Application.<init>. I'll leave this issue open to remember to investigate the results and potentially file a follow-up for a more involved analysis.

@mcomella mcomella moved this from In progress to Waiting in Performance, front-end roadmap Apr 20, 2020
@mcomella mcomella changed the title Add telemetry probe for startup time Add telemetry probe for dex load time Apr 20, 2020
@mcomella mcomella moved this from Waiting to Done in Performance, front-end roadmap Apr 23, 2020
@mcomella
Copy link
Contributor

I'm going to close as fixed and create a new issue for the analysis.

@mcomella
Copy link
Contributor

Filed #10161 for the analysis.

@liuche liuche mentioned this issue Apr 28, 2020
32 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs:triage Issue needs triage performance Possible performance wins
Development

No branches or pull requests

2 participants