Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data review: Places maintenance metrics #5122

Closed
bendk opened this issue Sep 8, 2022 · 3 comments
Closed

Data review: Places maintenance metrics #5122

bendk opened this issue Sep 8, 2022 · 3 comments

Comments

@bendk
Copy link
Contributor

bendk commented Sep 8, 2022

  1. What questions will you answer with this data?
  • Is the places maintenance operation taking too long to run and degrading performance and/or draining users' batteries.
  • Is the places maintenance operation effectively pruning users' databases to the target size?
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
  • The operation locks the database, so if it's running for too long it will impact any other operation that accesses the places DB.
  • We want to fine tune the operation so that it effectively prunes the database without taking too much time. The main parameters we want to tune are the number of visits to delete in one pass and the frequency of the passes.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

The SQLite documentation describes the amount of time the operation should take in rough terms, but there are no hard limits or upper bounds. We believe that it shouldn't negatively impact users, but we can't be sure without metrics.

We couldn't think of any ways to determine if the pruning was successful other than metrics.

  1. Can current instrumentation answer these questions?

Not that we know of.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.
  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
    This collection is documented in the Glean Dictionary at https://dictionary.telemetry.mozilla.org/
  1. How long will this data be collected? Choose one of the following:

This is scoped to a time-limited experiment/project until date 04-01-2023

  1. What populations will you measure?

All channels in all countries in all locales, for Android

  1. If this data collection is default on, what is the opt-out mechanism for users?

Standard Firefox telemetry controls

  1. Please provide a general description of how you will analyze this data.

We will examine a histogram of the time taken. If we see too many examples of extremely long maintenance operations, we will rework the code to run faster and/or be interruptable. We will also examine a histogram of places Db size. If there are many database that are significantly over the target size, we will increase either the pruning frequency or amount visits pruned per run.

  1. Where do you intend to share the results of your analysis?

The SACI and Android teams

  1. Is there a third-party tool (i.e. not Glean or Telemetry) that you are proposing to use for this data collection? If so:

No

┆Issue is synchronized with this Jira Task

@eliserichards
Copy link
Contributor

I would say that this data is Category 1 (technical data) instead of interaction data (see https://wiki.mozilla.org/Data_Collection)

@bendk
Copy link
Contributor Author

bendk commented Sep 10, 2022

I would say that this data is Category 1 (technical data) instead of interaction data

Yes that seems better, updated the description.

@eliserichards
Copy link
Contributor

  1. What questions will you answer with this data?
* Is the places maintenance operation taking too long to run and degrading performance and/or draining users' batteries.

* Is the places maintenance operation effectively pruning users' databases to the target size?
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
* The operation locks the database, so if it's running for too long it will impact any other operation that accesses the places DB.

* We want to fine tune the operation so that it effectively prunes the database without taking too much time.  The main parameters we want to tune are the number of visits to delete in one pass and the frequency of the passes.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

The SQLite documentation describes the amount of time the operation should take in rough terms, but there are no hard limits or upper bounds. We believe that it shouldn't negatively impact users, but we can't be sure without metrics.

We couldn't think of any ways to determine if the pruning was successful other than metrics.

  1. Can current instrumentation answer these questions?

Not that we know of.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.
* Measurement Description: Time spent on the runMaintanence()` function.
  Data Collection Category: Category 1 “Technical data”
  Tracking bug: [Rework places maintenance code #5115](https://github.com/mozilla/application-services/issues/5115)

* Measurement Description: DB size after the `runMaintanence()` function.
  Data Collection Category: Category 1 “Technical data”
  Tracking bug: [Rework places maintenance code #5115](https://github.com/mozilla/application-services/issues/5115)
  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
    This collection is documented in the Glean Dictionary at https://dictionary.telemetry.mozilla.org/
  1. How long will this data be collected? Choose one of the following:

This is scoped to a time-limited experiment/project until date 04-01-2023

  1. What populations will you measure?

All channels in all countries in all locales, for Android

  1. If this data collection is default on, what is the opt-out mechanism for users?

Standard Firefox telemetry controls

  1. Please provide a general description of how you will analyze this data.

We will examine a histogram of the time taken. If we see too many examples of extremely long maintenance operations, we will rework the code to run faster and/or be interruptable. We will also examine a histogram of places Db size. If there are many database that are significantly over the target size, we will increase either the pruning frequency or amount visits pruned per run.

  1. Where do you intend to share the results of your analysis?

The SACI and Android teams

  1. Is there a third-party tool (i.e. not Glean or Telemetry) that you are proposing to use for this data collection? If so:

No

┆Issue is synchronized with this Jira Task

Data Review Form (to be filled by Data Stewards)

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

  2. Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Provide details as to the control mechanism available.

    • Yes, through the "Send Usage Data" preference in the application settings for Fenix.
  3. If the request is for permanent data collection, is there someone who will monitor the data over time?

    • No, the data will be collected until April 1, 2023 with the option to remove/renew.
  4. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

    • Category 1 - technical data
  5. Is the data collection request for default-on or default-off?

    • Default-on
  6. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

    • No
  7. Is the data collection covered by the existing Firefox privacy notice?

    • Yes
  8. Does the data collection use a third-party collection tool? If yes, escalate to legal.

    • No

Result

data-review+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants