Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling PR for telemetry #1112

Merged
merged 13 commits into from May 31, 2019

Conversation

Projects
None yet
3 participants
@linacambridge
Copy link
Member

commented May 7, 2019

Depends on mozilla/dogear#41.

Pull Request checklist

  • Quality: This PR builds and tests run cleanly
    • cargo test --all produces no test failures
    • cargo clippy --all --all-targets --all-features runs without emitting any warnings
    • cargo fmt does not produce any changes to the code
    • ./gradlew ktlint detekt runs without emitting any warnings
    • swiftformat --swiftversion 4 megazords components/*/ios && swiftlint runs without emitting any warnings or producing changes
    • Note: For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

@linacambridge linacambridge requested review from mhammond and thomcc May 7, 2019

@mhammond
Copy link
Member

left a comment

This looks great and goes a lot further than I expected :) I've a few small queries, but wouldn't mind looking at it again and at the glean integration, but nice work!

@@ -1,4 +1,4 @@
libraryVersion: 0.27.1
libraryVersion: 0.27.1-lina5

This comment has been minimized.

Copy link
@mhammond

mhammond May 8, 2019

Member

I assume this should be reverted ;)

Show resolved Hide resolved components/places/src/bookmark_sync/mod.rs Outdated
Show resolved Hide resolved components/places/sql/create_shared_schema.sql Outdated
Show resolved Hide resolved components/places/src/api/places_api.rs Outdated
Show resolved Hide resolved components/places/src/bookmark_sync/store.rs Outdated
let mut reuploads = Vec::new();
let mut stmt = self.db.prepare(&format!(
"WITH
ranks(rank, at, guid, action) AS (

This comment has been minimized.

Copy link
@mhammond

mhammond May 8, 2019

Member

/me googles sqlite rank.
/me goes "Nice!"

IIUC, this will rely on a single timestamp for all entries written here, which I believe you do correctly in store.rs, but I wonder if a comment there indicating how important the single timestamp is is worthwhile?

This comment has been minimized.

Copy link
@linacambridge

linacambridge May 10, 2019

Author Member

It definitely is. I'd also like to test this out for large logs—what happens if we have 10k events? 100k?

Show resolved Hide resolved components/places/src/bookmark_sync/store.rs Outdated
@@ -0,0 +1,248 @@
/* This Source Code Form is subject to the terms of the Mozilla Public

This comment has been minimized.

Copy link
@mhammond

mhammond May 8, 2019

Member

how do you picture the glean integration working? I guess I was kinda assuming that the glean integration part might just do the string -> json -> glean in the same module and avoid exposing this to the app layer, but that's probably because I don't understand how that integration will shape up.

This comment has been minimized.

Copy link
@linacambridge

linacambridge May 11, 2019

Author Member

IIUC, the integration needs to happen at the app layer, that's part of the problem. 😕 Fenix defines the schema for the pings (YAML files that are code-generated into Kotlin bindings), so the ping needs to get passed through Android Components. a-c can't directly translate this into Glean, because it provides Glean as a "service", not a "feature", and it doesn't define the actual metric names.

Edit: I don't think that last part is true, Glean defines its own metrics.yaml and pings.yaml files. Maybe we can push the engine ping definitions into a-c?

We could have Android Components pass JSON strings around, and have Fenix deserialize them, but I don't think they'd be too keen on that. Their wrappers around our APIs (Connection::syncHistory and Connection::syncBookmarks) would need to return strings, we'd need to pass those opaque strings up through the different layers to Fenix, and Fenix would need to know that it's a JSON telemetry ping that it should unpack and send.

So the way this works is:

  • We deserialize the ping from Rust into a mozilla.appservices.support.SyncTelemetryPing.
  • Android Components passes that up through all its layers to a new onStoreSynced() observer, which takes the ping as the argument. (There's some additional serialization we have to do here, since it also runs the sync in a worker. See mozilla-mobile/android-components#2971).
  • Fenix unpacks the SyncTelemetryPing, and calls the Glean submission methods like this.

@grigoryk, does that make sense to you? Are there ways we can simplify?

This comment has been minimized.

Copy link
@linacambridge

linacambridge May 11, 2019

Author Member

(Also, I'm planning to add a comment to this file explaining how this all works, and why! 😄)

@thomcc
Copy link
Contributor

left a comment

Did a quick skim (I'm assuming this is still not ready for full review?), only real comment is about us needing to free the strings returned from syncing on iOS to avoid leaking a decently large JSON string on each sync.

Show resolved Hide resolved ...t/android/src/main/java/mozilla/appservices/support/SyncTelemetryPing.kt Outdated
Show resolved Hide resolved components/sync15/src/telemetry.rs Outdated
Show resolved Hide resolved components/places/ios/Places/RustPlacesAPI.h Outdated

@linacambridge linacambridge force-pushed the telemetry branch from 7e5e832 to ecc9df6 May 29, 2019

Show resolved Hide resolved .buildconfig-android.yml Outdated
Show resolved Hide resolved components/places/Cargo.toml Outdated
/// Fetches a list of items that have been reuploaded with new structure for
/// the last 5 syncs. This is captured in logs and telemetry, and suggests
/// that we might be in a sync loop with another client (bug 1530145).
fn fetch_consecutive_reuploads(&self) -> Result<Vec<ConsecutiveReupload>> {

This comment has been minimized.

Copy link
@linacambridge

linacambridge May 30, 2019

Author Member

I'm going to punt on this for later. There's probably a better way to collect this data, anyway—for example, tracking the number of bookmarks with BookmarkValidityState::Reupload and BookmarkValidityState::Replace in validation telemetry, along with the other problems.

By themselves, consecutive uploads for the same record aren't bad—they're a symptom of other problems that we can already track—and validation and structure change data seems sufficient for now. Also, this query is pretty big, complicated, scans the entire moz_bookmarks table and the WITH views for each sync, and doesn't capture everything...an invalid remote item that's unchanged locally will have a Local merge state, not RemoteWithNewStructure.

@linacambridge

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@mhammond, @thomcc OK, I think this is ready! I got rid of consecutive reupload tracking, and just report validation for bookmarks—and, of course, the regular engine data.

@mhammond
Copy link
Member

left a comment

Awesome Lina, great work!

@@ -116,6 +119,18 @@ internal inline fun <U> rustCall(syncOn: Any, callback: (RustError.ByReference)
}
}

@Suppress("TooGenericExceptionThrown")

This comment has been minimized.

Copy link
@mhammond

mhammond May 31, 2019

Member

It's unfortunate this is duplicated from push - do you see any scope to share this in a followup?

This comment has been minimized.

Copy link
@thomcc

thomcc May 31, 2019

Contributor

Can't prior to #1103, since places_destroy_string != push_destroy_string. After that we can find a place for shared dtors and such, potentially.

@@ -254,6 +254,8 @@ pub enum SyncFailure {

pub fn sync_failure_from_error(e: &Error) -> SyncFailure {
SyncFailure::Unexpected {
// TODO: Distinguish between error types, truncate, and anonymize

This comment has been minimized.

Copy link
@mhammond

mhammond May 31, 2019

Member

We should have a followup to do this - hopefully the SyncStatus/SyncResult stuff we landed recently makes that a little easier.

@mhammond

This comment has been minimized.

Copy link
Member

commented May 31, 2019

(although the taskcluster failure looks like a real issue)

linacambridge added some commits Apr 17, 2019

Use JSON, not protobufs for pings.
This commit reverts the protobuf schema, in favor of passing the
serialized ping directly. On iOS, we can pass this ping directly
through to the existing ping sender. On Android, we'll need to
unpack the ping into a Kotlin structure, and pass that to Fenix
via Android Components. Fenix can then assemble a Glean ping from
the Kotlin `SyncTelemetryPing`.
Report bookmark tree structure problems.
This commit reports Dogear's problem counts, along with potential sync
loops caused by reuploading new structure for the same record in five
consecutive syncs, in validation results.
Address review feedback.
* Move `SyncTelemetryPing.kt` into a new `sync15` library. This library
  will be consumed directly by a-c, and exposed to Fenix.
* Free the ping JSON string in Swift.
* Pass `Engine`, not `EngineIncoming` and `Validation`, to
  `Store::apply_incoming`.
* Implement `SyncTelemetryPing::toJSON()` for serializing pings to JSON on
  the Kotlin side. This will be used to pass the ping payload in an
  `androidx.work.Data` wrapper in a-c.

@linacambridge linacambridge force-pushed the telemetry branch from 683ef0b to 861328c May 31, 2019

linacambridge added some commits May 29, 2019

@linacambridge linacambridge force-pushed the telemetry branch from 861328c to f6ca45a May 31, 2019

@linacambridge linacambridge merged commit f6ca45a into master May 31, 2019

11 of 12 checks passed

ci/circleci: Sync integration tests Your tests failed on CircleCI
Details
Taskcluster (pull_request) TaskGroup: success
Details
ci/circleci: Check Rust dependencies Your tests passed on CircleCI!
Details
ci/circleci: Check Rust formatting Your tests passed on CircleCI!
Details
ci/circleci: Check Swift formatting Your tests passed on CircleCI!
Details
ci/circleci: Deploy website Your tests passed on CircleCI!
Details
ci/circleci: Lint Bash scripts Your tests passed on CircleCI!
Details
ci/circleci: Lint Rust with clippy Your tests passed on CircleCI!
Details
ci/circleci: Rust benchmarks Your tests passed on CircleCI!
Details
ci/circleci: Rust tests - beta Your tests passed on CircleCI!
Details
ci/circleci: Rust tests - stable Your tests passed on CircleCI!
Details
ci/circleci: iOS build and test Your tests passed on CircleCI!
Details

@linacambridge linacambridge deleted the telemetry branch May 31, 2019

linacambridge added a commit that referenced this pull request Jun 6, 2019

Merge #1226.
Follow-up to #1112.

linacambridge added a commit that referenced this pull request Jun 6, 2019

linacambridge added a commit that referenced this pull request Jun 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.