Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

[glean] Check generated ping content against their json schema#1417

Merged
pocmo merged 2 commits into
mozilla-mobile:masterfrom
mdboom:check-against-schema
Nov 26, 2018
Merged

[glean] Check generated ping content against their json schema#1417
pocmo merged 2 commits into
mozilla-mobile:masterfrom
mdboom:check-against-schema

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented Nov 19, 2018

The purpose of this change is to be able to write unit tests that validate the generated pings (in JSON) against the JSON schema that will be validating the pings on the backend in production (and thus catch any schema violations early, prior to merging).

@Dexterp37 did some initial investigation doing this with Java/Kotlin. tl;dr: there are two candidate libraries for JSON schema available, one doesn't work with Android due to differences in the org.json API, and one is basically a dormant project that doesn't support recent versions of JSON schema.

The approach here is to use Python (which we are already using for code generation at build time) to do the schema validation. This has the added benefit that we can use exactly the same schema validation library that we use in production, so we don't have to "hope" that it has the same corner case bugs as some other JSON schema validation library.

Prior to this change, the Python environment was installed to the application build directory (samples/glean/build). In order to use it for unit testing, we need to be able to use it in the from the library, so it now installs the Python environment to the library's build directory (components/services/glean/build). The same Python environment is thus used for both purposes: doing JSON schema validation as part of the library's unit tests and build-time code generation for the application. Additionally, the same Python library/script (glean_parser) is used for both purposes.

One wrinkle is that the library needs to know how to find the Python executable within the library build directory, even when run as part of the application's build process. There doesn't seem to be a way to write a Gradle plugin that looks for things relative to the plugin's location, only to the build or project root. Therefore, using Glean's custom gradle plugin is a two step process: import the plugin using apply from: and then calling a function setupGlean and passing in the path to the library relative to the project root. That definitely should be documented, but I thought I would solicit feedback on the approach first.

Requires:

@mdboom mdboom requested a review from a team as a code owner November 19, 2018 21:49
Copy link
Copy Markdown
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like the idea of using python to work around the problem of not having an updated java/kotlin library to do validation. However, I also think we should not mess with the scripts used by consumers of our library to do that. I left a few comments below, for a first pass.

Comment thread components/service/glean/build.gradle Outdated
apply plugin: 'kotlin-android'

// Configure the Python environments
envs {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after discussing this with you I admit I'm still a bit confused.

We're moving the setup code from sdk_generator.gradle to the library's build.gradle. I think that doing this would prevent 3rd party applications to process YAML files: if you create an application external to android-components, include glean and trigger the build process, it would now probably fail: I'm not sure we'd be able to call the tasks from the glean's gradle.

I claim we should think the sample application as an external entity even though it lives in this repo: it should have no way to access things outside its direct reach, other than through gradle inclusion.

In short, I think it's ok for us to download the environment both in the app and when building the library, and have it in both places.


fun checkPingSchema(content: JSONObject) {
val proc = ProcessBuilder(
listOf("./build/bootstrap/Miniconda3/bin/python", "-m", "glean_parser", "check")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a better way to do this, without hardcoding paths. Maybe adding this to the resources? @pocmo , thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those paths being used in the gradle scripts and the tests I can't think of a shared place where both world have access. From the gradle script you could inject it into a constant; but that constant would be available to all glean code and not limited to tests.

Comment thread samples/glean/build.gradle Outdated

apply from: '../../components/service/glean/scripts/sdk_generator.gradle'

ext.setupGlean('./components/service/glean')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, this woldn't be possible if the sample app lived outside of this repo. Let's keep the scripts as they are, even if we end up with two conda environments. That's ok, as the library and the app are two distinct things, as if they were different repos.

@pocmo is there something we could do to cache the conda installer/packages to be nicer to them?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the buildDir of rootProject to share the Python environment? I'm not sure whether this could cause weird side effects when building parallel though (although some parts of samples-glean needs to wait for services-glean to complete).

Having the environment twice is not the best for saving disk space; however that's how the setup with real apps will be too. So i'm not too concerned.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the buildDir of rootProject to share the Python environment? I'm not sure whether this could cause weird side effects when building parallel though (although some parts of samples-glean needs to wait for services-glean to complete).

Mh, good point. @pocmo, is the following reasoning correct:

  • rootProject.buildDir == buildDir for most, common apps
  • rootProject.buildDir != buildDir if glean is used by a library/component in an app with multiple components

I guess that could be ok. Moreover, this saves us some trouble when we'll start collecting telemetry from libraries, preventing them from requiring an environment each when building (think of components in android-components, maybe some of them want to be instrumented and we don't really want to have an environment for each of them). Right?

Having the environment twice is not the best for saving disk space; however that's how the setup with real apps will be too. So i'm not too concerned.

Agreed! However, in general, one wouldn't really need to install the glean development environment. App developer would just need to have one environment. I think the sample within this repo is a special case, but a corner case as well :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this comment the first time around. As it stands, this PR is creating a Python environment for the library itself, and each application that uses it. This could be unnecessarily large when we start using this in libraries, as @Dexterp37 points out. But I would argue for merging this as-is with a follow-on bug to address that when we get there.

@pocmo pocmo added the 🕵️‍♀️ needs review PRs that need to be reviewed label Nov 20, 2018
@pocmo pocmo self-requested a review November 20, 2018 10:43
@pocmo pocmo self-assigned this Nov 20, 2018
@mdboom mdboom force-pushed the check-against-schema branch from ec3efde to 75dd3ac Compare November 20, 2018 18:11
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 20, 2018

Codecov Report

Merging #1417 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1417   +/-   ##
=========================================
  Coverage     83.51%   83.51%           
  Complexity     1804     1804           
=========================================
  Files           228      228           
  Lines          7007     7007           
  Branches       1121     1121           
=========================================
  Hits           5852     5852           
  Misses          635      635           
  Partials        520      520

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 079b2ea...d2431ba. Read the comment docs.

@mdboom mdboom force-pushed the check-against-schema branch from 75dd3ac to 98cb241 Compare November 20, 2018 18:50
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Nov 20, 2018

Thanks. You are right -- gradle clearly doesn't work how I assumed, and if using an external dependency that may get shipped as a binary and thus wouldn't have the Python environment in the build products.

I've updated this so there is a separate Python environment for the library and the application. It seems to work well at the cost of some disk space. Each conda environment takes about 305MB. Using a standard upstream Python environment is a little smaller at 220MB. Since I don't see any need for conda for this project, we might as well save the space.

@Dexterp37: Of particular note, we should probably test this on Windows if you can prior to merging. It's not expected to work until mozilla/glean_parser#15 is merged and a new release tagged, however.

}}

if (android.hasProperty('applicationVariants')) {
ext.gleanGenerateMetricsAPI(android.applicationVariants.all)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the library-local metrics.yaml for the baseline telemetry will go (in a future PR).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being pedantic about this 🤦‍♂️ But why is this change needed? (along with all the changes in this file)

For installing the environment, we just need to include the plugins and sdk_generator.gradle in our gradle file, right? If these changes are not strictly required for this PR, I'd be for dropping them: it would make it easier to reason about them in the follow-up PR related to the library metrics.yaml.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using this plugin in the context of the library, there is to android.applicationVariants (makes sense, because we aren't building an application in that context), so line 77 of the original gives an "undefined variable" exception. This turns the chunk that runs the generator into a function so it can be run only when building the application, and in a future PR, it will be called differently to handle the needs of the library. And it allows the rest of the file (the setting up of the Python environment) to be shared in both the library and application contexts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha! Thanks for the clarification, it makes sense!

Copy link
Copy Markdown
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting there but, due to my pedantic nature, I have some more questions below. Sorry about that 🙉

Comment thread components/service/glean/scripts/sdk_generator.gradle Outdated
}}

if (android.hasProperty('applicationVariants')) {
ext.gleanGenerateMetricsAPI(android.applicationVariants.all)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being pedantic about this 🤦‍♂️ But why is this change needed? (along with all the changes in this file)

For installing the environment, we just need to include the plugins and sdk_generator.gradle in our gradle file, right? If these changes are not strictly required for this PR, I'd be for dropping them: it would make it easier to reason about them in the follow-up PR related to the library metrics.yaml.

@mdboom mdboom changed the title WIP: [glean] Check generated ping content against their json schema [glean] Check generated ping content against their json schema Nov 21, 2018
@mdboom mdboom force-pushed the check-against-schema branch from 6a8f813 to 75cf258 Compare November 21, 2018 16:13
@mdboom mdboom force-pushed the check-against-schema branch from f62e041 to 0a9cd23 Compare November 21, 2018 20:25
Comment thread components/service/glean/build.gradle Outdated
@pocmo pocmo removed the 🕵️‍♀️ needs review PRs that need to be reviewed label Nov 22, 2018
@mdboom mdboom force-pushed the check-against-schema branch from 0a9cd23 to 672c498 Compare November 26, 2018 15:03
@mdboom mdboom force-pushed the check-against-schema branch from 672c498 to 07b6c3a Compare November 26, 2018 15:08
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Nov 26, 2018

This is passing CI, and I think is ready for a final look and merge. :whew:

@pocmo pocmo merged commit eb52542 into mozilla-mobile:master Nov 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants