Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

For #8803: add frameworkStart telemetry #9788

Merged
merged 7 commits into from
Apr 17, 2020

Conversation

mcomella
Copy link
Contributor

@mcomella mcomella commented Apr 7, 2020

This is intended to measure the "dex" time before we can execute code. This builds upon my learning of how we want to capture telemetry metrics in #9153.

@Dexterp37 Please review this from a Glean perspective.

@boek or @liuche Please review as a data steward.

@MarcLeclair Please review as the primary code reviewer!

Reviews requests:

  • Data review
  • Review from the Glean team
  • Code review

Pull Request checklist

  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features. In addition, it includes a screenshot of a successful accessibility scan to ensure no new defects are added to the product.

After merge

  • Milestone: Make sure issues finished by this pull request are added to the milestone of the version currently in development.

To download an APK when reviewing a PR:

  1. click on Show All Checks,
  2. click Details next to "Taskcluster (pull_request)" after it appears and then finishes with a green checkmark,
  3. click on the "Fenix - assemble" task, then click "Run Artifacts".
  4. the APK links should be on the left side of the screen, named for each CPU architecture

@mcomella
Copy link
Contributor Author

mcomella commented Apr 7, 2020

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?
  • How long does the Android framework block for on various devices before giving us the ability to execute code?
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
  • Understand how significant the time before we can execute code is so that we can choose whether or not we want to invest in optimizing the behavior.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?
  • We could measure locally on our reference devices but we wouldn't be able to determine 1) the variance across a wide range of devices or 2) how frequent and impactful outliers to local testing may be.
  1. Can current instrumentation answer these questions?

No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Measurement Description Data Collection Category Tracking Bug #
framework_start: timespan Category 1 #8803
framework_start_error: event Category 1 #8803
clock_ticks_per_second: counter Category 1 #8803
  1. How long will this data be collected? Choose one of the following:
  • I want this data to be collected for 6 months initially (potentially renewable).
  1. What populations will you measure?

All.

  1. If this data collection is default on, what is the opt-out mechanism for users?

Standard telemetry opt-out.

  1. Please provide a general description of how you will analyze this data.
  • Look at frameworkStart to understand how long it takes for the framework to start across all devices
  • Watch frameworkStartError to determine if there are any implementation errors.
  • Look at clockTicksPerSecond to see if this value changes across devices and, if so, potentially segment frameworkStart data to group data to the same significant figures
  1. Where do you intend to share the results of your analysis?

Internally.

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:

No.

@mcomella mcomella requested review from Dexterp37 and boek and removed request for boek April 7, 2020 22:22
@codecov-io
Copy link

codecov-io commented Apr 7, 2020

Codecov Report

Merging #9788 into master will increase coverage by 0.14%.
The diff coverage is 87.17%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #9788      +/-   ##
============================================
+ Coverage     19.17%   19.31%   +0.14%     
- Complexity      521      536      +15     
============================================
  Files           336      339       +3     
  Lines         13729    13701      -28     
  Branches       1842     1830      -12     
============================================
+ Hits           2632     2647      +15     
+ Misses        10857    10816      -41     
+ Partials        240      238       -2     
Impacted Files Coverage Δ Complexity Δ
...pp/src/main/java/org/mozilla/fenix/HomeActivity.kt 10.06% <0.00%> (-0.07%) 10.00 <0.00> (ø)
app/src/main/java/org/mozilla/fenix/perf/Stat.kt 87.50% <87.50%> (ø) 5.00 <5.00> (?)
...ain/java/org/mozilla/fenix/perf/StartupTimeline.kt 62.50% <88.88%> (+62.50%) 2.00 <1.00> (+2.00)
...rc/main/java/org/mozilla/fenix/FenixApplication.kt 12.41% <100.00%> (+1.22%) 4.00 <1.00> (+1.00)
...lla/fenix/perf/StartupFrameworkStartMeasurement.kt 100.00% <100.00%> (ø) 9.00 <9.00> (?)
.../fenix/settings/advanced/LocaleManagerExtension.kt 62.50% <0.00%> (-8.34%) 0.00% <0.00%> (ø%)
...nix/components/toolbar/BrowserToolbarController.kt 61.04% <0.00%> (-3.66%) 0.00% <0.00%> (ø%)
.../src/main/java/org/mozilla/fenix/utils/Settings.kt 76.35% <0.00%> (-0.24%) 30.00% <0.00%> (ø%)
...lla/fenix/components/toolbar/DefaultToolbarMenu.kt 45.49% <0.00%> (-0.04%) 11.00% <0.00%> (-2.00%)
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aef827e...b51198c. Read the comment docs.

@Dexterp37
Copy link
Contributor

This is intended to measure the "dex" time before we can execute code.

I can tell you that time! (sorry, had to make the joke)

app/metrics.yaml Outdated
bugs:
- https://github.com/mozilla-mobile/fenix/issues/8803
data_reviews:
- https://github.com/mozilla-mobile/fenix/pull/9788#issuecomment-610648980
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you need to link to the answers of the data-review by the data stewards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll apply these changes after data review. :)

app/metrics.yaml Outdated
bugs:
- https://github.com/mozilla-mobile/fenix/issues/8803
data_reviews:
- https://github.com/mozilla-mobile/fenix/pull/9788#issuecomment-610648980
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

app/metrics.yaml Outdated
bugs:
- https://github.com/mozilla-mobile/fenix/issues/8803
data_reviews:
- https://github.com/mozilla-mobile/fenix/pull/9788#issuecomment-610648980
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

app/pings.yaml Outdated
bugs:
- https://github.com/mozilla-mobile/fenix/issues/8803
data_reviews:
- https://github.com/mozilla-mobile/fenix/pull/9788#issuecomment-610648980
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here :)

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ With the data-review request fields fixed :)

app/metrics.yaml Outdated Show resolved Hide resolved
protected fun recordOnInit() {
// This gets called by more than one process. Ideally we'd only run this in the main process
// but the code to check which process we're in crashes because the Context isn't valid yet.
StartupTimeline.onApplicationInit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment in init is warning that nothing should happen before this. However recordOnInit() looks fairly harmless and I wonder if it could happen that someone adds something in front of it inside this method. Is there any reason we can't inline the call to StartupTimeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I wanted to:

  1. avoid duplicating the comment, "this gets called by more than one process..."
  2. (I care about this less) make it easy for additional Application classes to call this method, if necessary

@pocmo Is MigratingFenixApplication going away soon (merged into FenixApplication)? If so, I'll inline with duplicated comment. If not, I'll add a comment to this method not to add anything above that line.

Copy link
Contributor

@MarcLeclair MarcLeclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math checks out, so does the calculation method. I think using this for Telemetry is the best way we can go about gathering data about the Dex size since we can't control every users environment to be able to track the app creation.

@mcomella mcomella requested a review from liuche April 16, 2020 00:36
Copy link
Contributor

@liuche liuche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-review+

Data Review Form (to be filled by Data Stewards)

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, metrics.yaml gets generated into metrics.md

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, controlled by Fenix telemetry controls in settings

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

6mo

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Type 1, technical data collected about startup time and errors.

  1. Is the data collection request for default-on or default-off?

Default on

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)**

Has 6mo expiry

  1. Does the data collection use a third-party collection tool? If yes, escalate to legal.

No

@mcomella
Copy link
Contributor Author

This is the failure:

[task 2020-04-16T00:50:17.726Z] SUITE: org.mozilla.fenix.perf.StartupHomeActivityLifecycleObserverTest
[task 2020-04-16T00:50:17.726Z]   TEST: WHEN onStop is called THEN the metrics are set and the ping is submitted
[task 2020-04-16T00:50:17.726Z]   FAILURE
[task 2020-04-16T00:50:17.726Z] 
[task 2020-04-16T00:50:17.726Z] java.lang.AssertionError: Verification failed: number of calls happened not matching exact number of verification sequence
[task 2020-04-16T00:50:17.726Z] 
[task 2020-04-16T00:50:17.726Z] Matchers: 
[task 2020-04-16T00:50:17.726Z] StartupFrameworkStartMeasurement(frameworkStartMeasurement#1678).setExpensiveMetric())
[task 2020-04-16T00:50:17.726Z] PingType(startupTimeline#1679).submit(null()))
[task 2020-04-16T00:50:17.726Z] 
[task 2020-04-16T00:50:17.726Z] Calls:
[task 2020-04-16T00:50:17.726Z] 1) StartupFrameworkStartMeasurement(frameworkStartMeasurement#1678).setExpensiveMetric()

I'll rebase and see if it goes away.

We need to access the data in stat to get the process start time, so we
can calculate the time from process start until application.init for the
frameworkStart probe.
This class controls the central logic around the metrics we want to
record.
We primarily want to determine if this is a problem area for us to
investigate rather than a long term measurement to keep so we should set
the expiration date accordingly. Furthermore, this code executes before
crash reporting is init so it's ideal to remove it sooner rather than
later.
@mcomella mcomella merged commit 909ee73 into mozilla-mobile:master Apr 17, 2020
@mcomella mcomella deleted the 8803-telemetry-dex-ticks branch April 17, 2020 16:12
@mcomella mcomella added the Feature:Performance Used for data reviews to label metrics related to performance label Jun 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature:Performance Used for data reviews to label metrics related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants