Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Fixes #23492: Perf regression of calling isFirefoxDefault from main thread #23556

Merged
merged 4 commits into from
Feb 8, 2022

Conversation

jhugman
Copy link
Contributor

@jhugman jhugman commented Feb 3, 2022

Fixes #23492
Fixes #23618

Perf regression calling slow isFirefoxDefault from the main thread.

Prior #23400, the default browser message was only displayed when part of an experiment. The check that isFirefoxDefault was only done as a final check before the message is displayed.

#23400 changed the order of this, so the slow check was done each time the toolbar was created.

Fixes #23492; this PR does three things:

Now, the default browser message is displayable when an experiment toggles it on.

The isFirefoxDefault is done when the message is toggled on, but:

  1. There are no plans to toggle this on in this release
  2. The messaging system is being re-written to avoid these problems.

Pull Request checklist

  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features. In addition, it includes a screenshot of a successful accessibility scan to ensure no new defects are added to the product.

To download an APK when reviewing a PR:

  1. click on Show All Checks,
  2. click Details next to "Taskcluster (pull_request)" after it appears and then finishes with a green checkmark,
  3. click on the "Fenix - assemble" task, then click "Run Artifacts".
  4. the APK links should be on the left side of the screen, named for each CPU architecture

@jhugman jhugman requested review from a team as code owners February 3, 2022 17:41
@Amejia481 Amejia481 closed this Feb 3, 2022
@Amejia481 Amejia481 reopened this Feb 3, 2022
val config = FxNimbus.features.defaultBrowserMessage.value()
return if (
config.messageLocation == MessageSurfaceId.APP_MENU_ITEM &&
!browsers.isFirefoxDefaultBrowser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how BrowsersCache.all() could impact performace, but maybe we could make it a bit faster if we inline the BrowsersCache.all(context) call with the condition check, this way we will only call BrowsersCache.all( ) when it's the right message? . What do you think?

 config.messageLocation == MessageSurfaceId.APP_MENU_ITEM &&
 !BrowsersCache.all(context).isFirefoxDefaultBrowser

Copy link
Contributor

@Amejia481 Amejia481 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jhugman jhugman added the pr:needs-landing-squashed PRs that are ready to land (squashed) [Will be merged by Mergify] label Feb 3, 2022
@gabrielluong
Copy link
Member

Decision Task broke, restarting

@gabrielluong gabrielluong reopened this Feb 3, 2022
@mcomella
Copy link
Contributor

mcomella commented Feb 3, 2022

I measured the performance difference. Results from original bisection:

  • before regression: 1373ms median
  • regression: 1501ms

Results from this PR:

  • before PR: 1436ms
  • after PR: 1402ms

It seems something may have changed after the initial regression causing the overall regression to be 63ms rather than ~100ms. One notable visual difference is that in the regressing builds there is a new UI element, "Set as default browser". The home screen races various UI elements against each other (when it should inflate from the top) so that may contribute to the different results. This PR improves the regression by another 34ms so we've still regressed ~29ms from the original state. Ideally we can try to address that (e.g. if we regress 30ms 2 more times, we're back to where we started) but I didn't see anything obviously actionable in the profiles. I'd be curious to see what our Nightly results say about this PR rather than my local measurements.

@jhugman jhugman force-pushed the jhugman/23492-perf-regression-on-isFirefoxDefault branch from 5832ed9 to 014f6bb Compare February 7, 2022 11:54
Copy link
Contributor Author

@jhugman jhugman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more aggressive caching of calling into Rust and checking the browser default.

The FML builds on top of the getVariables() -> Variables API and generates straightline kotlin to access those Variables.

The rest of the PR #23400 converts existing Fenix code to use the generated code, so this Pr concentrates on that.

@mcomella please could you re-profile? (are there profiles on CI for PRs with the Performance label?)

nimbusValidation.settingsIcon,
"drawable",
context.packageName
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling getIdentifier is entirely avoidable, for the sake of a validation.

} else {
false
}
} ?: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines do several things:

  • we put all the checking logic into one place.
  • we now only call into Rust once for the default browser feature.
  • we now only check isFirefoxDefaultBrowser after we check the surface id. There are no experiments about this currently, so this never happens now.

@Amejia481
Copy link
Contributor

One notable visual difference is that in the regressing builds there is a new UI element, "Set as default browser".

@jhugman this looks similar to what @mcomella described on #23556 (comment) see #23618 (comment)

@Amejia481
Copy link
Contributor

Amejia481 commented Feb 7, 2022

One notable visual difference is that in the regressing builds there is a new UI element, "Set as default browser".

@jhugman this looks similar to what @mcomella described on #23556 (comment) see #23618 (comment)

I think what is causing the regression is what @mcomella was describing above the "Set as default browser" card is showing all the time. Investigating #23618 I found that
val isExperimentBranch = feature.messageLocation == MessageSurfaceId.HOMESCREEN_BANNER it's always true after 82a6f8c, it looks like this was not the case before. See #23618 (comment) for more context.

@mcomella
Copy link
Contributor

mcomella commented Feb 8, 2022

@mcomella please could you re-profile? (are there profiles on CI for PRs with the Performance label?)

I tried to check out and run this PR but it crashes on start up with:

02-07 16:39:03.891 19530 19563 E AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-5
02-07 16:39:03.891 19530 19563 E AndroidRuntime: Process: org.mozilla.fenix, PID: 19530
02-07 16:39:03.891 19530 19563 E AndroidRuntime: org.mozilla.experiments.nimbus.internal.NimbusFeatureException: A Context is needed but not available. Consider passing in a context to the value() method when close to startup
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.experiments.nimbus.internal.FeatureHolder.value$default(FeatureHolder.kt:7)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.fenix.utils.Settings$searchTermTabGroupsAreEnabled$2.invoke(Settings.kt:4)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.fenix.components.settings.LazyPreference$property$2.invoke(FeatureFlagPreference.kt:4)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:5)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.fenix.components.settings.LazyPreference.getValue(FeatureFlagPreference.kt:3)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.fenix.utils.Settings.getSearchTermTabGroupsAreEnabled(Settings.kt:1)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at org.mozilla.fenix.FenixApplication$initializeGlean$2.invokeSuspend(FenixApplication.kt:23)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:3)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:18)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:1)
02-07 16:39:03.891 19530 19563 E AndroidRuntime:        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:10)

@jhugman btw, did you mean a benchmark or a profile?

(are there profiles on CI for PRs with the Performance label?)

Unfortunately, no – we haven't had the resources to implement something like that (e.g. our daily graphs from nightly are actually manually run with a script on a device on my desk).

For benchmarks, you can run it locally if you have a low-end phone (or a high-end phone you can reproduce the regression on). See https://wiki.mozilla.org/Performance/Fenix/Performance_reviews#Benchmark_locally. This regression was to the cold_main_first_frame benchmark.

@jhugman jhugman force-pushed the jhugman/23492-perf-regression-on-isFirefoxDefault branch from 5428f35 to d96e51b Compare February 8, 2022 11:32
@jhugman
Copy link
Contributor Author

jhugman commented Feb 8, 2022

Ah. Ok yes, I keep forgetting to say: The default browser message was shown as part of the #23400 , and this PR fixes that.

I've added

Fixes #23556

to comment 0.

Fixed this early startup crash, with the suggested fix.

 org.mozilla.experiments.nimbus.internal.NimbusFeatureException: A Context is needed but not available. Consider passing in a context to the value() method when close to startup
        …
        at org.mozilla.fenix.utils.Settings$searchTermTabGroupsAreEnabled$2.invoke(Settings.kt:4)

@jhugman jhugman force-pushed the jhugman/23492-perf-regression-on-isFirefoxDefault branch from d96e51b to 4a22af0 Compare February 8, 2022 11:38
@jhugman
Copy link
Contributor Author

jhugman commented Feb 8, 2022

@jhugman btw, did you mean a benchmark or a profile?

I'm not sure, whichever you needed to find that 102ms regression.

Also question from the curious: given the precision of ms you're giving, I assumed that the error bars were small, however, knowing you're benchmarking on a local machine, I have no idea now. What sort of variation/error bars do you see in timings?

@mergify mergify bot merged commit b230c39 into main Feb 8, 2022
@bors bors bot deleted the jhugman/23492-perf-regression-on-isFirefoxDefault branch February 8, 2022 12:44
@mcomella
Copy link
Contributor

mcomella commented Feb 8, 2022

@jhugman btw, did you mean a benchmark or a profile?

I'm not sure, whichever you needed to find that 102ms regression.

Sure. Results from original bisection:

  • before regression: 1373ms median
  • regression: 1501ms

Results from this PR, taken with our cold_main_first_frame benchmark run on a Moto G5 (i.e. ./perf-tools/measure_start_up.py nightly cold_main_first_frame results.txt):

  • before PR: 1443ms
  • after PR: 1406ms

These results are similar to when I last measured this PR.

Also question from the curious: given the precision of ms you're giving, I assumed that the error bars were small, however, knowing you're benchmarking on a local machine, I have no idea now. What sort of variation/error bars do you see in timings?

I don't know statistics well so I'm not sure how well I can answer this question 😓 (and yes, our results could probably be improved by someone more familiar with statistics). Through trial-and-error we decided to execute 25 iterations of each test and to take the median values of the results to find the duration of a benchmark for a build. When we originally set these values, this had a maximum variation of ~20ms for an A/A test. In the worst case, I think this would let us catch regressions of ~40ms+ (based on this thinking I did). However, the average variation is usually smaller and we can look at the results across multiple days (so hundreds of iterations instead of 25) so in practice we've caught smaller regressions. The variation seems larger these days though:  maybe ~30ms. Our nightly benchmark results dashboards (populated once a week on mondays) is here if you're curious.

For context, we use local machines right now rather than CI due to limited resourcing: CI would be ideal because it's more accessible but it takes more time to implement and debug when issues occur.

pkirakosyan pushed a commit to gexsi/user-agent-android that referenced this pull request Mar 10, 2022
…lt from main thread (mozilla-mobile#23556)

* Fixes mozilla-mobile#23492 — Fixup perf regression of calling isFirefoxDefault from the main thread

* Tightening of near defunct default browser message

* Fixup early crash in debug build

* ktlint
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr:needs-landing-squashed PRs that are ready to land (squashed) [Will be merged by Mergify]
Projects
None yet
4 participants