Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Nimbus initialization should be triggered and complete before first page load starts #26320

Closed
csadilek opened this issue Aug 4, 2022 · 0 comments · Fixed by #27650, #27927, fork-house/fenix#14, nathanmkaya/fenix#108 or Leland-Takamine/fenix#159
Assignees
Labels
eng:perf-impact This issue may have an impact on performance: the perf team will investigate Nimbus
Milestone

Comments

@csadilek
Copy link
Contributor

csadilek commented Aug 4, 2022

This is a follow-up and summary of discussions between @jonalmeida, @jhugman and myself when working through #26228.

We want to improve and simplify Nimbus startup and application initialization ordering.

Problem
Nimbus initialization is currently started after engine initialization, and carried out asynchronously. This is by design to limit its impact on startup performance. However, the consequence is a race condition when an experiment and feature configuration is targeted at the engine, e.g., a tracking protection or site isolation setting that's controlled by a Nimbus feature. This means that the first page load after cold start may be carried out with the default settings (the feature's default value), as opposed to the values based on the experiment branch, if Nimbus initialization has not completed. Since cold starts are very common on mobile, and users tend to have short sessions opening only a single link before switching back to another application, this can (more significantly) impact experiment results.

Another problem with late Nimbus initialization is that it's hard to reason about the state of the application after startup. To compensate for the fact that Nimbus initialization may complete at a later point in time, we currently update the corresponding features when Nimbus is ready. We use onUpdatesApplied for this, which will eventually become a large subset of seemingly unrelated settings and feature updates, and ultimately a maintenance burden. While this could be addressed with separate observers, we think the following proposal could solve both problems.

Proposal
To prevent this entire set of problems, we propose changing initialization ordering so that Nimbus initializes and completes1 right after Glean, but before everything else including the engine. As a side note, engine initialization is currently accidentally triggered by initializeGlean when the store is accessed for startup metrics. This should be fixed as well as part of this, and rely on the explicit warmup call again.

1 This does not include waiting for the latest network fetch, just the local initialization and reading from storage.

Performance Impact
Delaying engine startup until Nimbus is complete will have a performance impact we need to measure and land telemetry for. We need to understand the startup cost of Nimbus. This is also required to understand the severity of the current race condition, i.e., how many users are affected from the first page load happening before Nimbus is ready. (Even if that impact is low, it's still worth fixing this to gain simplicity and deterministic initialization ordering.)

I think we should accept some cost to startup time for gaining simplicity and stability with this change.

Known issues
We built a quick prototype that confirmed this can work, but noticed that there's a risk for a cyclical dependency and infinite loop, if Nimbus uses the engine during initialization. This will happen because of its use of our concept.fetch client, or any accidental reference of the engine in createNimbus such as in strictMode.resetAfter. It's important we investigate a way to guard against this, e.g., by making sure Nimbus doesn't use the engine at all.

We will use this ticket to prototype and likely turn it into a meta, as work will be required in both Nimbus and Fenix/A-C to complete this. This should also serve as a guideline and solution for Focus.

┆Issue is synchronized with this Jira Task

@csadilek csadilek added eng:perf-impact This issue may have an impact on performance: the perf team will investigate Nimbus labels Aug 4, 2022
@github-actions github-actions bot added the needs:triage Issue needs triage label Aug 4, 2022
@csadilek csadilek removed the needs:triage Issue needs triage label Aug 4, 2022
jhugman added a commit that referenced this issue Nov 2, 2022
@github-actions github-actions bot added the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Nov 2, 2022
jhugman added a commit that referenced this issue Nov 3, 2022
jhugman added a commit that referenced this issue Nov 7, 2022
jhugman added a commit that referenced this issue Nov 7, 2022
jhugman added a commit that referenced this issue Nov 9, 2022
jhugman added a commit that referenced this issue Nov 11, 2022
jhugman added a commit that referenced this issue Nov 14, 2022
jhugman added a commit that referenced this issue Nov 16, 2022
jhugman added a commit that referenced this issue Nov 17, 2022
@mergify mergify bot closed this as completed in #27650 Nov 17, 2022
mergify bot pushed a commit that referenced this issue Nov 17, 2022
@github-actions github-actions bot reopened this Nov 17, 2022
@github-actions github-actions bot removed the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Nov 17, 2022
@github-actions github-actions bot added this to the 109 milestone Nov 17, 2022
@github-actions github-actions bot added the eng:qa:needed QA Needed label Nov 17, 2022
@jonalmeida jonalmeida removed the eng:qa:needed QA Needed label Nov 18, 2022
@jonalmeida jonalmeida reopened this Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.