Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling PR for telemetry #1112

Merged
merged 13 commits into from May 31, 2019
Merged

Rolling PR for telemetry #1112

merged 13 commits into from May 31, 2019

Conversation

@linacambridge
Copy link
Contributor

@linacambridge linacambridge commented May 7, 2019

Depends on mozilla/dogear#41.

Pull Request checklist

  • Quality: This PR builds and tests run cleanly
    • cargo test --all produces no test failures
    • cargo clippy --all --all-targets --all-features runs without emitting any warnings
    • cargo fmt does not produce any changes to the code
    • ./gradlew ktlint detekt runs without emitting any warnings
    • swiftformat --swiftversion 4 megazords components/*/ios && swiftlint runs without emitting any warnings or producing changes
    • Note: For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.
@linacambridge linacambridge requested review from mhammond and thomcc May 7, 2019
Copy link
Member

@mhammond mhammond left a comment

This looks great and goes a lot further than I expected :) I've a few small queries, but wouldn't mind looking at it again and at the glean integration, but nice work!

@@ -1,4 +1,4 @@
libraryVersion: 0.27.1
libraryVersion: 0.27.1-lina5

This comment has been minimized.

@mhammond

mhammond May 8, 2019
Member

I assume this should be reverted ;)

components/places/src/bookmark_sync/mod.rs Outdated Show resolved Hide resolved
components/places/sql/create_shared_schema.sql Outdated Show resolved Hide resolved
components/places/src/api/places_api.rs Outdated Show resolved Hide resolved
components/places/src/bookmark_sync/store.rs Outdated Show resolved Hide resolved
let mut reuploads = Vec::new();
let mut stmt = self.db.prepare(&format!(
"WITH
ranks(rank, at, guid, action) AS (

This comment has been minimized.

@mhammond

mhammond May 8, 2019
Member

/me googles sqlite rank.
/me goes "Nice!"

IIUC, this will rely on a single timestamp for all entries written here, which I believe you do correctly in store.rs, but I wonder if a comment there indicating how important the single timestamp is is worthwhile?

This comment has been minimized.

@linacambridge

linacambridge May 10, 2019
Author Contributor

It definitely is. I'd also like to test this out for large logs—what happens if we have 10k events? 100k?

components/places/src/bookmark_sync/store.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,248 @@
/* This Source Code Form is subject to the terms of the Mozilla Public

This comment has been minimized.

@mhammond

mhammond May 8, 2019
Member

how do you picture the glean integration working? I guess I was kinda assuming that the glean integration part might just do the string -> json -> glean in the same module and avoid exposing this to the app layer, but that's probably because I don't understand how that integration will shape up.

This comment has been minimized.

@linacambridge

linacambridge May 11, 2019
Author Contributor

IIUC, the integration needs to happen at the app layer, that's part of the problem. 😕 Fenix defines the schema for the pings (YAML files that are code-generated into Kotlin bindings), so the ping needs to get passed through Android Components. a-c can't directly translate this into Glean, because it provides Glean as a "service", not a "feature", and it doesn't define the actual metric names.

Edit: I don't think that last part is true, Glean defines its own metrics.yaml and pings.yaml files. Maybe we can push the engine ping definitions into a-c?

We could have Android Components pass JSON strings around, and have Fenix deserialize them, but I don't think they'd be too keen on that. Their wrappers around our APIs (Connection::syncHistory and Connection::syncBookmarks) would need to return strings, we'd need to pass those opaque strings up through the different layers to Fenix, and Fenix would need to know that it's a JSON telemetry ping that it should unpack and send.

So the way this works is:

  • We deserialize the ping from Rust into a mozilla.appservices.support.SyncTelemetryPing.
  • Android Components passes that up through all its layers to a new onStoreSynced() observer, which takes the ping as the argument. (There's some additional serialization we have to do here, since it also runs the sync in a worker. See mozilla-mobile/android-components#2971).
  • Fenix unpacks the SyncTelemetryPing, and calls the Glean submission methods like this.

@grigoryk, does that make sense to you? Are there ways we can simplify?

This comment has been minimized.

@linacambridge

linacambridge May 11, 2019
Author Contributor

(Also, I'm planning to add a comment to this file explaining how this all works, and why! 😄)

Copy link
Contributor

@thomcc thomcc left a comment

Did a quick skim (I'm assuming this is still not ready for full review?), only real comment is about us needing to free the strings returned from syncing on iOS to avoid leaking a decently large JSON string on each sync.

components/sync15/src/telemetry.rs Outdated Show resolved Hide resolved
components/places/ios/Places/RustPlacesAPI.h Outdated Show resolved Hide resolved
@linacambridge linacambridge force-pushed the telemetry branch from 7e5e832 to ecc9df6 May 29, 2019
.buildconfig-android.yml Outdated Show resolved Hide resolved
components/places/Cargo.toml Outdated Show resolved Hide resolved
/// Fetches a list of items that have been reuploaded with new structure for
/// the last 5 syncs. This is captured in logs and telemetry, and suggests
/// that we might be in a sync loop with another client (bug 1530145).
fn fetch_consecutive_reuploads(&self) -> Result<Vec<ConsecutiveReupload>> {

This comment has been minimized.

@linacambridge

linacambridge May 30, 2019
Author Contributor

I'm going to punt on this for later. There's probably a better way to collect this data, anyway—for example, tracking the number of bookmarks with BookmarkValidityState::Reupload and BookmarkValidityState::Replace in validation telemetry, along with the other problems.

By themselves, consecutive uploads for the same record aren't bad—they're a symptom of other problems that we can already track—and validation and structure change data seems sufficient for now. Also, this query is pretty big, complicated, scans the entire moz_bookmarks table and the WITH views for each sync, and doesn't capture everything...an invalid remote item that's unchanged locally will have a Local merge state, not RemoteWithNewStructure.

@linacambridge
Copy link
Contributor Author

@linacambridge linacambridge commented May 30, 2019

@mhammond, @thomcc OK, I think this is ready! I got rid of consecutive reupload tracking, and just report validation for bookmarks—and, of course, the regular engine data.

Copy link
Member

@mhammond mhammond left a comment

Awesome Lina, great work!

@@ -116,6 +119,18 @@ internal inline fun <U> rustCall(syncOn: Any, callback: (RustError.ByReference)
}
}

@Suppress("TooGenericExceptionThrown")

This comment has been minimized.

@mhammond

mhammond May 31, 2019
Member

It's unfortunate this is duplicated from push - do you see any scope to share this in a followup?

This comment has been minimized.

@thomcc

thomcc May 31, 2019
Contributor

Can't prior to #1103, since places_destroy_string != push_destroy_string. After that we can find a place for shared dtors and such, potentially.

@@ -254,6 +254,8 @@ pub enum SyncFailure {

pub fn sync_failure_from_error(e: &Error) -> SyncFailure {
SyncFailure::Unexpected {
// TODO: Distinguish between error types, truncate, and anonymize

This comment has been minimized.

@mhammond

mhammond May 31, 2019
Member

We should have a followup to do this - hopefully the SyncStatus/SyncResult stuff we landed recently makes that a little easier.

@mhammond
Copy link
Member

@mhammond mhammond commented May 31, 2019

(although the taskcluster failure looks like a real issue)

This commit reverts the protobuf schema, in favor of passing the
serialized ping directly. On iOS, we can pass this ping directly
through to the existing ping sender. On Android, we'll need to
unpack the ping into a Kotlin structure, and pass that to Fenix
via Android Components. Fenix can then assemble a Glean ping from
the Kotlin `SyncTelemetryPing`.
This commit reports Dogear's problem counts, along with potential sync
loops caused by reuploading new structure for the same record in five
consecutive syncs, in validation results.
* Move `SyncTelemetryPing.kt` into a new `sync15` library. This library
  will be consumed directly by a-c, and exposed to Fenix.
* Free the ping JSON string in Swift.
* Pass `Engine`, not `EngineIncoming` and `Validation`, to
  `Store::apply_incoming`.
* Implement `SyncTelemetryPing::toJSON()` for serializing pings to JSON on
  the Kotlin side. This will be used to pass the ping payload in an
  `androidx.work.Data` wrapper in a-c.
@linacambridge linacambridge force-pushed the telemetry branch from 683ef0b to 861328c May 31, 2019
@linacambridge linacambridge force-pushed the telemetry branch from 861328c to f6ca45a May 31, 2019
@linacambridge linacambridge merged commit f6ca45a into master May 31, 2019
11 of 12 checks passed
11 of 12 checks passed
ci/circleci: Sync integration tests Your tests failed on CircleCI
Details
Taskcluster (pull_request) TaskGroup: success
Details
ci/circleci: Check Rust dependencies Your tests passed on CircleCI!
Details
ci/circleci: Check Rust formatting Your tests passed on CircleCI!
Details
ci/circleci: Check Swift formatting Your tests passed on CircleCI!
Details
ci/circleci: Deploy website Your tests passed on CircleCI!
Details
ci/circleci: Lint Bash scripts Your tests passed on CircleCI!
Details
ci/circleci: Lint Rust with clippy Your tests passed on CircleCI!
Details
ci/circleci: Rust benchmarks Your tests passed on CircleCI!
Details
ci/circleci: Rust tests - beta Your tests passed on CircleCI!
Details
ci/circleci: Rust tests - stable Your tests passed on CircleCI!
Details
ci/circleci: iOS build and test Your tests passed on CircleCI!
Details
@linacambridge linacambridge deleted the telemetry branch May 31, 2019
linacambridge added a commit that referenced this pull request Jun 6, 2019
Follow-up to #1112.
linacambridge added a commit that referenced this pull request Jun 6, 2019
linacambridge added a commit that referenced this pull request Jun 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants