Mitigate randomized Experiment selection, feature flags for performance analysis #45

mcomella · 2019-12-20T20:33:19Z

Why/User Benefit/User Problem

In our analysis for debug vs. release builds mozilla-mobile/fenix#6931, we decided to use a release-like build. However, in release-like builds, Experimentation is enabled and users may randomly opt into various experiments: this bug is to understand the implications and mitigate them.

Furthermore, some features are behind "feature flags" so they're enabled in Nightly and debug builds only: we should figure out how to address these too (should this be a separate bug?).

Impact

Without this, performance results will not be reliable because random experiment opt-in may impact our ability to accurately measure performance.

Acceptance Criteria (how do I know when I’m done?)

Mitigate random experimentation opt-in in performance testing
- Note: I've heard "mako is the future" and that fretboard is deprecated for experiments: please verify the behavior here
(separate issue?) Mitigate feature flags (e.g. nightly or debug) in performance testing

mcomella · 2019-12-30T20:10:03Z

Triage: we have no known experiments in Fenix GA so low priority. In fact, Colin is looking at taking experiments out of GA to improve startup perf.

mcomella · 2020-05-14T23:08:08Z

Maybe mozilla-mobile/fenix#6278 will help here.

mcomella · 2020-05-21T00:44:41Z

Triage: the P1 ask is to opt-out of all experiments, especially on CI. We can file a follow-up to opt-in to experiments.

This hinges on when experiments will be reintroduced so I'll contact the folks involved.

mcomella · 2020-07-08T22:46:58Z

triage: we want to wait for eric's opinion but we suspect this could be a high priority, just below startup work

mcomella · 2020-12-07T21:58:43Z

We have experiment selection code on start up now: mozilla-mobile/fenix#16901 Moving to triage.

mcomella · 2021-03-17T22:46:45Z

Triage: the secure storage experiment is landing and csadilek does not expect it to have a perf impact for start up or page load (where our tests are). We could double-check the PR though mozilla-mobile/fenix#18333

mcomella · 2021-03-17T22:50:33Z

Triage: the secure storage experiment is landing and csadilek does not expect it to have a perf impact for start up or page load (where our tests are). We could double-check the PR though mozilla-mobile/fenix#18333

we add something in the visual completeness queue on start up but the impact should be minimal https://github.com/mozilla-mobile/fenix/pull/18333/files#diff-9e2b3999e8d0e181af6decd7d25949eef361031a719a578869967a5339b31737R223
we run an experiment on the main thread on the visual completeness queue: https://github.com/mozilla-mobile/fenix/pull/18333/files#diff-678d3cf18f4b4d9048372417c416bc902c273cf2e31c339bc9b9412a01ebc8e0R23

This shouldn't affect the measurements we take for start up or page load, afaict.

mcomella · 2021-03-24T21:27:42Z

Triage: there are new experiments in discussion. N.B.: they may not land soon or ever land. They are:

Fenix Toolbar Menu Experiment: (Test branch 1: Menu includes Default Browser as a visible option, Control: Default Browser not an option) fenix#18608
Fenix Settings Menu Experiment (Test_branch 2:Default Browser at #1 position, Control: Sync at #1 position) fenix#18376
Fenix New Tab Experiment (Test branch 3: Default Browser Message to a new user on their third session, Control: No Message) fenix#18375

eliserichards · 2021-03-26T17:01:28Z

Those three experiments (1 Leanplum and 2 Nimbus) are going to be implemented after the MR1 release (mid-April). They will be running simultaneously.

mcomella · 2021-05-12T20:57:57Z

The top two issues don't seem like they'll affect start up. The last one mozilla-mobile/fenix#18375 might for FNPRMS where the message will pop up presumably on the homescreen after the third run (since we don't have conditioned profiles).

csadilek also mentions there's a secure storage experiment but it probably doesn't impact start up either.

csadilek also mentions that we have a menu in the secret settings that displays all the known experiments – this could be useful for figuring out when it's useful to work on this issue.

mcomella · 2021-06-03T20:09:24Z

Triage: no new experiments off the top of our heads.

mcomella · 2022-01-26T21:59:11Z

Triage: even if this doesn't affect perf tests, this affects experiments.

mcomella · 2022-02-02T22:03:13Z

Triage: desktop runs into the same problem, let's ask Bas for their opinion. From our discussion, csadilek wants to test what we're shipping in Nightly – i.e. no preferences or overrides. mcomella wants to pin to one set of experiments for each test so our tests are more controlled.

mcomella · 2022-02-17T00:44:44Z

Triage: if Bas remembers correctly, we opt out of all experiments for perf tests (though we should verify this with sparky edit: Sparky thinks this is set here for normandy). To ensure we don't regress performance for code behind experiment flags, 1) we expect developers to regularly run their patches against perf tests on try, 2) when we run the experiment in production, we can compare perf telemetry against the baseline (example), and 3) when the experiments finally get released, we test them against CI and users.

This ensures experiments are tested in isolation but does not test them in combination (particularly because cohorts allocated for experiments cannot be put into other experiments) so it's possible, but unlikely, that if two experiments are tested in isolation and don't cause perf issues, those experiments may regress when turned on simultaneously and we haven't tested that.

To get ourselves closer to desktop, we have some follow-ups to do:

Add perf telemetry to experiment analyses Add performance telemetry to experiment analyses to monitor experiments for perf regressions fenix#23789
Have a perf team member join the experiment telemetry analysis (bas, pslawless)
Implement ability to run perf tests against experiments for developers creating them Enable devs to run perf tests against their changes fenix#23792

MarcLeclair · 2022-02-23T22:12:13Z

Will close since this has become a meta issue.

mcomella added this to Needs prioritization in Performance, front-end roadmap via automation Dec 20, 2019

mcomella moved this from Needs prioritization to Backlog (prioritized) in Performance, front-end roadmap Dec 30, 2019

mcomella moved this from Backlog (prioritized) to Low impact (unprioritized) in Performance, front-end roadmap Feb 17, 2020

mcomella moved this from Low impact (unprioritized) to Needs prioritization in Performance, front-end roadmap May 14, 2020

mcomella moved this from Needs prioritization to Needs prioritization: waiting in Performance, front-end roadmap Jun 10, 2020

mcomella moved this from Needs prioritization: waiting to Needs prioritization in Performance, front-end roadmap Jul 8, 2020

MarcLeclair moved this from Needs prioritization to Backlog (prioritized) in Performance, front-end roadmap Jul 15, 2020

MarcLeclair moved this from Backlog (prioritized) to Waiting in Performance, front-end roadmap Jul 15, 2020

MarcLeclair moved this from Waiting to Needs prioritization: waiting in Performance, front-end roadmap Jul 15, 2020

mcomella moved this from Needs prioritization: waiting to Needs prioritization in Performance, front-end roadmap Dec 7, 2020

MarcLeclair moved this from Needs prioritization to Low impact (unprioritized) in Performance, front-end roadmap Jun 9, 2021

mcomella moved this from Triaged (unordered) to Needs triage in Performance, front-end roadmap Jan 26, 2022

MarcLeclair closed this as completed Feb 23, 2022

Performance, front-end roadmap automation moved this from Needs triage to Done Feb 23, 2022

mcomella removed this from Done in Performance, front-end roadmap Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate randomized Experiment selection, feature flags for performance analysis #45

Mitigate randomized Experiment selection, feature flags for performance analysis #45

mcomella commented Dec 20, 2019 •

edited

Loading

mcomella commented Dec 30, 2019

mcomella commented May 14, 2020

mcomella commented May 21, 2020 •

edited

Loading

mcomella commented Jul 8, 2020

mcomella commented Dec 7, 2020

mcomella commented Mar 17, 2021

mcomella commented Mar 17, 2021

mcomella commented Mar 24, 2021

eliserichards commented Mar 26, 2021 •

edited

Loading

mcomella commented May 12, 2021

mcomella commented Jun 3, 2021

mcomella commented Jan 26, 2022

mcomella commented Feb 2, 2022

mcomella commented Feb 17, 2022 •

edited

Loading

MarcLeclair commented Feb 23, 2022

Mitigate randomized Experiment selection, feature flags for performance analysis #45

Mitigate randomized Experiment selection, feature flags for performance analysis #45

Comments

mcomella commented Dec 20, 2019 • edited Loading

Why/User Benefit/User Problem

Impact

Acceptance Criteria (how do I know when I’m done?)

mcomella commented Dec 30, 2019

mcomella commented May 14, 2020

mcomella commented May 21, 2020 • edited Loading

mcomella commented Jul 8, 2020

mcomella commented Dec 7, 2020

mcomella commented Mar 17, 2021

mcomella commented Mar 17, 2021

mcomella commented Mar 24, 2021

eliserichards commented Mar 26, 2021 • edited Loading

mcomella commented May 12, 2021

mcomella commented Jun 3, 2021

mcomella commented Jan 26, 2022

mcomella commented Feb 2, 2022

mcomella commented Feb 17, 2022 • edited Loading

MarcLeclair commented Feb 23, 2022

mcomella commented Dec 20, 2019 •

edited

Loading

mcomella commented May 21, 2020 •

edited

Loading

eliserichards commented Mar 26, 2021 •

edited

Loading

mcomella commented Feb 17, 2022 •

edited

Loading