[glean] Check generated ping content against their json schema#1417
Conversation
Dexterp37
left a comment
There was a problem hiding this comment.
Generally, I like the idea of using python to work around the problem of not having an updated java/kotlin library to do validation. However, I also think we should not mess with the scripts used by consumers of our library to do that. I left a few comments below, for a first pass.
| apply plugin: 'kotlin-android' | ||
|
|
||
| // Configure the Python environments | ||
| envs { |
There was a problem hiding this comment.
Even after discussing this with you I admit I'm still a bit confused.
We're moving the setup code from sdk_generator.gradle to the library's build.gradle. I think that doing this would prevent 3rd party applications to process YAML files: if you create an application external to android-components, include glean and trigger the build process, it would now probably fail: I'm not sure we'd be able to call the tasks from the glean's gradle.
I claim we should think the sample application as an external entity even though it lives in this repo: it should have no way to access things outside its direct reach, other than through gradle inclusion.
In short, I think it's ok for us to download the environment both in the app and when building the library, and have it in both places.
|
|
||
| fun checkPingSchema(content: JSONObject) { | ||
| val proc = ProcessBuilder( | ||
| listOf("./build/bootstrap/Miniconda3/bin/python", "-m", "glean_parser", "check") |
There was a problem hiding this comment.
I wonder if there's a better way to do this, without hardcoding paths. Maybe adding this to the resources? @pocmo , thoughts?
There was a problem hiding this comment.
With those paths being used in the gradle scripts and the tests I can't think of a shared place where both world have access. From the gradle script you could inject it into a constant; but that constant would be available to all glean code and not limited to tests.
|
|
||
| apply from: '../../components/service/glean/scripts/sdk_generator.gradle' | ||
|
|
||
| ext.setupGlean('./components/service/glean') |
There was a problem hiding this comment.
As stated above, this woldn't be possible if the sample app lived outside of this repo. Let's keep the scripts as they are, even if we end up with two conda environments. That's ok, as the library and the app are two distinct things, as if they were different repos.
@pocmo is there something we could do to cache the conda installer/packages to be nicer to them?
There was a problem hiding this comment.
Would it be possible to use the buildDir of rootProject to share the Python environment? I'm not sure whether this could cause weird side effects when building parallel though (although some parts of samples-glean needs to wait for services-glean to complete).
Having the environment twice is not the best for saving disk space; however that's how the setup with real apps will be too. So i'm not too concerned.
There was a problem hiding this comment.
Would it be possible to use the
buildDirofrootProjectto share the Python environment? I'm not sure whether this could cause weird side effects when building parallel though (although some parts of samples-glean needs to wait for services-glean to complete).
Mh, good point. @pocmo, is the following reasoning correct:
rootProject.buildDir == buildDirfor most, common appsrootProject.buildDir != buildDirif glean is used by a library/component in an app with multiple components
I guess that could be ok. Moreover, this saves us some trouble when we'll start collecting telemetry from libraries, preventing them from requiring an environment each when building (think of components in android-components, maybe some of them want to be instrumented and we don't really want to have an environment for each of them). Right?
Having the environment twice is not the best for saving disk space; however that's how the setup with real apps will be too. So i'm not too concerned.
Agreed! However, in general, one wouldn't really need to install the glean development environment. App developer would just need to have one environment. I think the sample within this repo is a special case, but a corner case as well :)
There was a problem hiding this comment.
Sorry I missed this comment the first time around. As it stands, this PR is creating a Python environment for the library itself, and each application that uses it. This could be unnecessarily large when we start using this in libraries, as @Dexterp37 points out. But I would argue for merging this as-is with a follow-on bug to address that when we get there.
ec3efde to
75dd3ac
Compare
Codecov Report
@@ Coverage Diff @@
## master #1417 +/- ##
=========================================
Coverage 83.51% 83.51%
Complexity 1804 1804
=========================================
Files 228 228
Lines 7007 7007
Branches 1121 1121
=========================================
Hits 5852 5852
Misses 635 635
Partials 520 520Continue to review full report at Codecov.
|
75dd3ac to
98cb241
Compare
|
Thanks. You are right -- gradle clearly doesn't work how I assumed, and if using an external dependency that may get shipped as a binary and thus wouldn't have the Python environment in the build products. I've updated this so there is a separate Python environment for the library and the application. It seems to work well at the cost of some disk space. Each conda environment takes about 305MB. Using a standard upstream Python environment is a little smaller at 220MB. Since I don't see any need for conda for this project, we might as well save the space. @Dexterp37: Of particular note, we should probably test this on Windows if you can prior to merging. It's not expected to work until mozilla/glean_parser#15 is merged and a new release tagged, however. |
| }} | ||
|
|
||
| if (android.hasProperty('applicationVariants')) { | ||
| ext.gleanGenerateMetricsAPI(android.applicationVariants.all) |
There was a problem hiding this comment.
This is where the library-local metrics.yaml for the baseline telemetry will go (in a future PR).
There was a problem hiding this comment.
Sorry for being pedantic about this 🤦♂️ But why is this change needed? (along with all the changes in this file)
For installing the environment, we just need to include the plugins and sdk_generator.gradle in our gradle file, right? If these changes are not strictly required for this PR, I'd be for dropping them: it would make it easier to reason about them in the follow-up PR related to the library metrics.yaml.
There was a problem hiding this comment.
When using this plugin in the context of the library, there is to android.applicationVariants (makes sense, because we aren't building an application in that context), so line 77 of the original gives an "undefined variable" exception. This turns the chunk that runs the generator into a function so it can be run only when building the application, and in a future PR, it will be called differently to handle the needs of the library. And it allows the rest of the file (the setting up of the Python environment) to be shared in both the library and application contexts.
There was a problem hiding this comment.
Ah, gotcha! Thanks for the clarification, it makes sense!
Dexterp37
left a comment
There was a problem hiding this comment.
I think we're getting there but, due to my pedantic nature, I have some more questions below. Sorry about that 🙉
| }} | ||
|
|
||
| if (android.hasProperty('applicationVariants')) { | ||
| ext.gleanGenerateMetricsAPI(android.applicationVariants.all) |
There was a problem hiding this comment.
Sorry for being pedantic about this 🤦♂️ But why is this change needed? (along with all the changes in this file)
For installing the environment, we just need to include the plugins and sdk_generator.gradle in our gradle file, right? If these changes are not strictly required for this PR, I'd be for dropping them: it would make it easier to reason about them in the follow-up PR related to the library metrics.yaml.
6a8f813 to
75cf258
Compare
f62e041 to
0a9cd23
Compare
0a9cd23 to
672c498
Compare
672c498 to
07b6c3a
Compare
|
This is passing CI, and I think is ready for a final look and merge. :whew: |
The purpose of this change is to be able to write unit tests that validate the generated pings (in JSON) against the JSON schema that will be validating the pings on the backend in production (and thus catch any schema violations early, prior to merging).
@Dexterp37 did some initial investigation doing this with Java/Kotlin. tl;dr: there are two candidate libraries for JSON schema available, one doesn't work with Android due to differences in the
org.jsonAPI, and one is basically a dormant project that doesn't support recent versions of JSON schema.The approach here is to use Python (which we are already using for code generation at build time) to do the schema validation. This has the added benefit that we can use exactly the same schema validation library that we use in production, so we don't have to "hope" that it has the same corner case bugs as some other JSON schema validation library.
Prior to this change, the Python environment was installed to the application build directory (
samples/glean/build). In order to use it for unit testing, we need to be able to use it in the from the library, so it now installs the Python environment to the library's build directory (components/services/glean/build). The same Python environment is thus used for both purposes: doing JSON schema validation as part of the library's unit tests and build-time code generation for the application. Additionally, the same Python library/script (glean_parser) is used for both purposes.One wrinkle is that the library needs to know how to find the Python executable within the library build directory, even when run as part of the application's build process. There doesn't seem to be a way to write a Gradle plugin that looks for things relative to the plugin's location, only to the build or project root. Therefore, using Glean's custom gradle plugin is a two step process: import the plugin using
apply from:and then calling a functionsetupGleanand passing in the path to the library relative to the project root. That definitely should be documented, but I thought I would solicit feedback on the approach first.Requires: