Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Issue #19147: Move set startup metrics off main thread #19397

Merged
merged 1 commit into from
May 12, 2021

Conversation

rocketsroger
Copy link
Contributor

Pull Request checklist

  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features. In addition, it includes a screenshot of a successful accessibility scan to ensure no new defects are added to the product.

To download an APK when reviewing a PR:

  1. click on Show All Checks,
  2. click Details next to "Taskcluster (pull_request)" after it appears and then finishes with a green checkmark,
  3. click on the "Fenix - assemble" task, then click "Run Artifacts".
  4. the APK links should be on the left side of the screen, named for each CPU architecture

@codecov-commenter
Copy link

codecov-commenter commented May 5, 2021

Codecov Report

Merging #19397 (d31644c) into master (f0d352f) will increase coverage by 0.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #19397      +/-   ##
============================================
+ Coverage     35.08%   35.09%   +0.01%     
  Complexity     1626     1626              
============================================
  Files           542      542              
  Lines         21861    21866       +5     
  Branches       3243     3243              
============================================
+ Hits           7670     7674       +4     
- Misses        13300    13302       +2     
+ Partials        891      890       -1     
Impacted Files Coverage Δ Complexity Δ
...rc/main/java/org/mozilla/fenix/FenixApplication.kt 25.59% <25.00%> (-0.35%) 10.00 <0.00> (ø)
...org/mozilla/fenix/components/BackgroundServices.kt 20.83% <100.00%> (+2.26%) 0.00 <0.00> (ø)
.../src/main/java/org/mozilla/fenix/utils/Settings.kt 72.83% <100.00%> (+0.15%) 36.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0d352f...d31644c. Read the comment docs.

@rocketsroger rocketsroger added the needs:review PRs that need to be reviewed label May 5, 2021
@rocketsroger
Copy link
Contributor Author

As discussed offline, I'll get this merged before performance team can do another round of tests on the build. Removing @mcomella as assigned reviewer. Thanks

@rocketsroger rocketsroger removed the request for review from mcomella May 6, 2021 18:06
setStartupMetrics(components.core.store, settings())
// We avoid blocking the main thread on startup by setting startup metrics on the background thread.
val isLoggedIn =
components.backgroundServices.accountManager.accountProfile() != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the account manager to this place would also early initialize it during the Application creation which we would want to avoid.

I think using a pref to store the loggedIn state when the account manager is done initializing might be a quick fix for this. It's something we're doing for other settings that we want to submit to the ping as well. All future reads of this pref should give you the right value.

An alternative, is if observe the account manager and submit all these metrics when everything is done. I'm not sure how Glean SDK is required to work to know if this will cause faults in our data though, we're already trying to solve this issue right now after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll look into solving this issue.

@jonalmeida
Copy link
Contributor

jonalmeida commented May 7, 2021

I discussed this with Roger offline since I'm also working on a similar problem with Glean in Focus as well. One suggestion that might work is if we can set all our values first, before we submit them. Something like the code below:

val readAndSetMetrics = CoroutineScope(Dispatchers.IO).async {
  Metrics.distributionId.set(value)
  Metrics.defaultBrowser.set(otherValue)
}

MainScope().launch {
  // wait for this task to be complete first.
  readAndSetMetrics.await() 

  // send everything to Glean
  activationPing.checkAndSend()
  installationPing.checkAndSend()
}

@rocketsroger rocketsroger changed the title Closes #19147: Move set startup metrics off main thread Issue #19147: Move set startup metrics off main thread May 7, 2021
@rocketsroger rocketsroger reopened this May 7, 2021
@rocketsroger rocketsroger reopened this May 7, 2021
@mdboom
Copy link
Contributor

mdboom commented May 7, 2021

I'm just a manager and don't have the detailed context to really know one way or the other, but... Would it be worth blocking on a review from @badboy just to confirm this isn't re-introducing problems that #19252 was meant to solve?

@rocketsroger
Copy link
Contributor Author

rocketsroger commented May 7, 2021

I'm just a manager and don't have the detailed context to really know one way or the other, but... Would it be worth blocking on a review from @badboy just to confirm this isn't re-introducing problems that #19252 was meant to solve?

Yes, this change could re-introduce the problems that #19252 that we're trying to solve. However, we have not confirmed #19252 does fix the problem. A performance slowdown of over 8% on all user is in my opinion not acceptable for a workaround. My thought is we first try this change in the background thread and if it doesn't work we can then explore other options or wait for a completed investigation of https://bugzilla.mozilla.org/show_bug.cgi?id=1706637.

I'm hoping for a good compromise between performance and telemetry. Thanks,

@mdboom
Copy link
Contributor

mdboom commented May 7, 2021

That sounds like a good plan, thanks!

@mcomella
Copy link
Contributor

mcomella commented May 7, 2021

However, we have not confirmed #19252 does fix the problem.

Since #19252 is in Nightly, I wonder if we can look at the data returned in Nightly to determine if the current fix solves the issue. Then we can land this to-the-background-thread change and see how it changes the result.

@rocketsroger
Copy link
Contributor Author

rocketsroger commented May 7, 2021

However, we have not confirmed #19252 does fix the problem.

Since #19252 is in Nightly, I wonder if we can look at the data returned in Nightly to determine if the current fix solves the issue. Then we can land this to-the-background-thread change and see how it changes the result.

Did a quick query limiting results from the last two days. There are still about 72% of values are null. However, versus last 28 days of 80% it's an improvement. Here's the query: https://sql.telemetry.mozilla.org/queries/79931

@rocketsroger
Copy link
Contributor Author

As discussion offline with @mcomella I'll hold of merging this change until next week. This gives us more data to confirm that the workaround does work when it's done on the main thread.

@badboy
Copy link
Member

badboy commented May 10, 2021

However, we have not confirmed #19252 does fix the problem.

Since #19252 is in Nightly, I wonder if we can look at the data returned in Nightly to determine if the current fix solves the issue. Then we can land this to-the-background-thread change and see how it changes the result.

Did a quick query limiting results from the last two days. There are still about 72% of values are null. However, versus last 28 days of 80% it's an improvement. Here's the query.
SELECT metrics.metrics.string.metrics_tab_view_setting AS TYPE, count(*) AS number FROM org_mozilla_fenix.metrics WHERE DATE(submission_timestamp) > DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY) GROUP BY 1

If you filter by those client that might have the fix (based on the app build & commit date of the fix, not sure if there's a better way to match into what build a certain commit went), then only 7% still sent NULL values for metrics_tab_view_setting.
So the fix definitely works as expected.

I agree that an 8% slowdown is not acceptable right now. I'll prioritize the investigation on our end this week.

@mcomella mcomella removed the request for review from a team May 10, 2021 23:03
setStartupMetrics(components.core.store, settings())
// We avoid blocking the main thread on startup by setting startup metrics on the background thread.
val store = components.core.store
GlobalScope.launch(Dispatchers.Default) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should probably be the IO dispatcher since we're reading prefs from disk for this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll change it to Dispatchers.IO.

@rocketsroger
Copy link
Contributor Author

rocketsroger commented May 11, 2021

We can go ahead and merge this since we've now confirmed that the original solution of moving startup metrics to after Glean initialization works.

Copy link
Contributor

@jonalmeida jonalmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this patch through the profiler and confirmed with a comparison (without the changes) that we get back our gains in pref.

We can verify the Glean bits work after this has landed in that case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs:review PRs that need to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants