-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Generate the activation
ping and send it with Glean
#1707
Conversation
4294bab
to
1c33297
Compare
Does the hashing algorithm require more CPU or disk I/O time? The IO dispatcher is more appropriate for disk. The CommonPool dispatcher, which is default, can be undersized on older devices with fewer cores and may deadlock. I don't know if they fixed this yet. |
I suspect this is CPU-bound, there should be no I/O involved in this (other than the |
Launch, if you never join() on the job it returns, is fire and forget. It ignores exceptions and doesn't return a value. This could be what you intend, but it's important to be aware of it. |
Good point, yes, I think this is matching the behaviour I want, which is the following:
I really intend a "fire and forget off-the-main thread" behaviour, while attempting to catch weird behaviours. |
This is failing due to lint style checks: |
Yeah, sorry about that, this is still a draft as I'm waiting for some additional implementation details to come first. I'll complete it tomorrow/next week. |
Oops, this is a PR so I'm moving this comment to the Issue: |
1c33297
to
5c13a65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
My only concern (which you also sort of hinted at), is that we don't have a good way of checking if the ping was actually sent. Would it be worth adding such a thing to the Glean API?
app/metrics.yaml
Outdated
@@ -539,3 +539,36 @@ custom_tab: | |||
notification_emails: | |||
- fenix-core@mozilla.com | |||
expires: "2020-03-01" | |||
|
|||
activation.ping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the .ping
strictly necessary? Feels a bit redundant, but it might just be me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Removing that.
description: > | ||
An hashed and salted version of the Google Advertising ID from the device. | ||
send_in_pings: | ||
- activation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
über-nit: Inconsistent indentation between this and the metric below.
|
||
Logger.info("ActivationPing - generating ping (has `identifier`: ${hashedId != null})") | ||
// TODO: change-me | ||
Glean.sendPings(listOf("activation")) //, sendClientId=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know this already, but this will change once mozilla-mobile/android-components#2792 lands
@mdboom "On this side of the barricade", we're the application: we shouldn't really care about the transport itself, we should trust Glean to do the right thing for us. We should also be explicit, especially if we write the docs, about the expectations :) Calling |
Yeah, thinking about this further, I think it's fine as long as Glean can guarantee that if a ping is queued it gets sent someday (and that's kind of on Glean to ensure). My concern was that this has code to check the ping is only sent once. If that first attempt fails somehow, it will never get sent, which would be a bad outcome. |
5c13a65
to
9d6343e
Compare
9d6343e
to
f9c124c
Compare
@liuche I'd appreciate if you could review the documentation here (the @colintheshots are you the right person to review this fully? If not, any chance you could kindly redirect the review? This is introducing a new custom ping for Fenix, which should be generated only once on new installs and never ever again. See the included docs. @mdboom could you please review my usage of the Glean APIs (and the docs too, for clarity :) )? |
app/metrics.yaml
Outdated
bugs: | ||
- 1538011 | ||
data_reviews: | ||
- TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this need to be updated with a link to the review.
app/metrics.yaml
Outdated
bugs: | ||
- 1538011 | ||
data_reviews: | ||
- TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuche how can I make the data review available? Should it be copied over the related Fenix issue or in my implementation bug on bugzilla?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dexterp37 you can copy the data request here, and I can re-review now that the implementation is also here! I'll flag myself for review here since you mentioned you don't have perms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
* Checks whether or not the activation ping was already | ||
* triggered by the application. | ||
* | ||
* Note that this only tells us that Fenix did trigger the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Note that this only tells us that Fenix did trigger the | |
* Note that this only tells us that Fenix triggered the |
|
||
Logger.info("ActivationPing - generating ping (has `identifier`: ${hashedId != null})") | ||
Pings.activation.send() | ||
markAsTriggered() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to the top of the launch block to avoid the (unlikely) race condition where calling checkAndSend
twice quickly could send two pings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer leaving this as it is: if, for some weird reason, we crash/get killed after that line gets executed, we'd never re-attempt to send the ping. We can easily detect dupes via the hash, on the other hand. So two pings shouldn't be a big deal in this specific case.
// Apply hashing. | ||
try { | ||
val saltedID = unhashedID + salt | ||
val digest = MessageDigest.getInstance("SHA-256") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance we could use something else other than this? Is there anything else available for us to use (we really want bcrypt
here) in the Fenix repo or Android API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As promised in chat, here's a quick review of the documentation only.
(I did not look at any code).
docs/activation.md
Outdated
This ping is intended to provide a measure of the activation of mobile products. | ||
|
||
## Scheduling | ||
The `activation` ping is automatically sent at startup, after Glean is initialized, and contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `activation` ping is automatically sent at startup, after Glean is initialized, and contains | |
The `activation` ping is automatically sent at first startup, after Glean is initialized, and contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this clear this ping is sent only once, only during the very first startup (or during subsequent startups if it hasn't been sent yet?).
docs/activation.md
Outdated
# The `activation` ping | ||
|
||
## Description | ||
This ping is intended to provide a measure of the activation of mobile products. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ping is intended to provide a measure of the activation of mobile products. | |
This ping provides a measure of the activation of mobile products. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"intended to" sounds so passive. It is sent and it is as reliable as other pings.
docs/activation.md
Outdated
|
||
## Scheduling | ||
The `activation` ping is automatically sent at startup, after Glean is initialized, and contains | ||
the following fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mention this in the scheduling section? It is its own section anyway.
docs/activation.md
Outdated
| `identifier` | String | An hashed and salted version of the Google Advertising ID from the device. | | ||
| `activation_id` | UUID | An alternate identifier, not correlated with the client_id, generated once and only sent with the activation ping. | | ||
|
||
The `activation` ping also includes the common [ping sections]https://github.com/mozilla-mobile/android-components/blob/master/components/service/glean/docs/pings/pings.md#ping-sections) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `activation` ping also includes the common [ping sections]https://github.com/mozilla-mobile/android-components/blob/master/components/service/glean/docs/pings/pings.md#ping-sections) | |
The `activation` ping also includes the common [ping sections](https://github.com/mozilla-mobile/android-components/blob/master/components/service/glean/docs/pings/pings.md#ping-sections) |
docs/telemetry.md
Outdated
@@ -14,6 +14,10 @@ Fenix creates and tries to send a "baseline" ping. It is defined inside the [`me | |||
|
|||
Fenix sends event pings that allows us to measure feature performance. These are defined inside the [`metrics.yaml`](https://github.com/mozilla-mobile/fenix/blob/master/app/metrics.yaml) file. | |||
|
|||
## Activation | |||
|
|||
Fenix sends an activation ping once, at startup. The ping is documented [`here`](activation.md) file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fenix sends an activation ping once, at startup. The ping is documented [`here`](activation.md) file. | |
Fenix sends an activation ping once, at startup. Further documentation can be found in [Activation Ping](activation.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation looks good to me with one nit.
Can you post the data request here and I'll fill it out again here so it's public and associated with the code.
type: string | ||
lifetime: ping | ||
description: > | ||
An hashed and salted version of the Google Advertising ID from the device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include here that this is never sent with the client id. Also document the case where this is the client id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed the first request. We are no longer sending the client_id
, not even when the GAID is not available. In that case, we're generating a random UUID (the activation_id
), which will be stable throughout the life of the application, and exclusively sent with this ping.
data-review? @liuche 1) What questions will you answer with this data? We plan to count the exact number of activations, per distribution, per manufacturer. 2) Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses: The BD team needs an exact method for counting these activations across manufacturers. The activation data will also help us evaluate the difference in activations from the other data sources which are currently being used for activation counts (e.g. adjust) 3) What alternative methods did you consider to answer these questions? Why were they not sufficient? We tried many alternatives. For example, client-ids gave more clients than device shipments (see bug 1481215). There can be a variety of reasons for this, outlined in the bug. We don’t include distribution information in other data sources, like adjust, so we couldn’t count using those. 4) Can current instrumentation answer these questions? 5) List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.
Glean pings also include metadata which is being sent with this ping as well, with the exclusion of the Additional documentation for this ping is provided as part of this PR. 6) How long will this data be collected? Choose one of the following:
7) What populations will you measure?
8) If this data collection is default on, what is the opt-out mechanism for users? 9) Please provide a general description of how you will analyze this data. 10) Where do you intend to share the results of your analysis? |
e12ce0b
to
44558d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data Review Form (to be filled by Data Stewards)
-
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Documentation in metrics.yaml, and an additional activation.md file what is being collected in this activation ping, and under what circumstances. -
Is there a control mechanism that allows the user to turn the data collection on and off?
No, this ping is sent at activation. This is separate from any client_id, and is used to measure activations only for partnership distribution versions in order to count activations through a distribution, which is necessary for BD.
If the answer to either of the first two questions is no, reviewers give an r-. Incremental changes to measurements or systems that have previously gone through analysis review may not require additional review.
This is only for partner distributions, and we've gone through an additional approval process for this specific use case - this ping is totally isolated from client ID and is necessary for counting Fenix activations in distributions.
-
If the request is for permanent data collection, is there someone who will monitor the data over time?**
6 months, arana will monitor this from the BD side. -
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? **
Type 4 - Google Ad ID -
Is the data collection request for default-on or default-off?
Default on -
Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?
Yes, the hashed + salted GAID, and a randomly generated id sent only once for disambiguation -
Is the data collection covered by the existing Firefox privacy notice? If unsure: escalate to legal if:
-
The data includes new identifiers; OR
-
The data falls within the Web activity category AND is default-on.
Yes, we have discussed this with Marshall and this data is approved to be collected in this specific use case of partner distributions of Fenix.
-
Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)**
Yes, this probe will expire -
Does the data collection use a third-party collection tool? If yes, escalate to legal.
No
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how the pings.yaml has turned out for documentation purposes.
getAdvertisingID()?.let { unhashedID -> | ||
// Add some salt to the ID, before hashing. For this specific use-case, it's ok | ||
// to use the same salt value for all the hashes. We want hashes to be stable | ||
// within a single product, but we don't want hashes to be the same across different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The packageName is not stable between Fenix Beta and Fenix Nightly, etc. They're treated as different products.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. @fbertsch, is this ok? Or should we use a Fenix-specific salt (regardless of the channel), e.g. org.mozilla.fenix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want the same salt across all the Fenix products, to reduce double counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed this by providing a static salt.
// Generate the activation_id. | ||
Activation.activationId.generateAndSet() | ||
|
||
CoroutineScope(Dispatchers.Default).launch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that calling launch without joining on its job will fire-and-forget. Exceptions will be swallowed and there's no guarantee it ever completes. This might be your intent, but you should be aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes, this is intended behaviour: we should be fine to swallow exceptions here and re-attempt, if something went wrong, next time we start.
@@ -195,9 +198,9 @@ class GleanMetricsService(private val context: Context) : MetricsService { | |||
code.set(defaultEngine.identifier) | |||
name.set(defaultEngine.name) | |||
submissionUrl.set(defaultEngine.buildSearchUrl("")) | |||
|
|||
Glean.setUploadEnabled(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hawkinsw I noticed that Glean.setUploadEnabled
was moved here instead of being called before Glean.initialize
in #2088 . I'm moving it back in this PR, since this is the right usage of the API. Was there a specific reason to move it here? Maybe some bug on the Glean end we need to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Thanks for chatting about this offline and thank you for fixing it!
This fetches the Google Advertising ID, salts it and then applies hashing before sending a ping with it, at startup. Hashing and salting are used in order to prevent ourselves to correlate advertising IDs from the same user running different products we own off a single device. We will never send the client_id and the Google Advertising ID in the same ping.
This fetches the Google Advertising ID, salts it and then applies hashing before sending a ping with it,
at startup. Hashing and salting are used in order to prevent ourselves to correlate advertising IDs
from the same user running different products we own off a single device. We will never send the
client_id and the Google Advertising ID in the same ping: we only send the client_id as a backup in case
the Google Advertising ID is not available.
Pull Request checklist