Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #8842: add jetstream/analysis schemas #8847

Merged
merged 29 commits into from
Jun 9, 2023

Conversation

mikewilli
Copy link
Contributor

@mikewilli mikewilli commented May 17, 2023

Because

  • We want to have a better data contract between Jetstream and Experimenter

This commit

  • Creates schemas for the data products that Jetstream produces
  • Adds a general python package structure with basic schema tests

Copy link
Collaborator

@jaredlockhart jaredlockhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoaaaa neeeaaattt it's beautiful 🥹

Okay so we should add some stuff like formatting/linting/tests etc, do you want to do some/any of that in this pr or just get this landed and hook more of that stuff up later?

Also I guess we should setup infrastructure to publish this to pypi and then consume it in experimenter/jetstream? That we can definitely do as followups.

But this is exactly what I was hoping for it looks great 🎉 🎉 🎉

@mikewilli
Copy link
Contributor Author

Yea, so some of that I was considering for future PRs (publish to pypi, consumption in jetstream/experimenter).

For formatting/linting, I guess I need to ensure that our existing stuff runs against this new dir, and that makes sense to setup here/now.

For tests, do you want to see some tests specific to the schemas themselves, or would it be better to wait until we're using the schemas and then write tests of the functionality they enable alongside that?

@jaredlockhart
Copy link
Collaborator

I think a minimal test of just having an example json that stresses the whole schema and running them through it would be nice here. But since this is living at the top level you'll probably need to setup something around it to get pytest going, maybe just a poetry project? Maybe wrap it in a Docker container to ensure there's a stable Python environment for it? (which is what Docker was actually invented for, funnily enough)

Copy link
Contributor

@scholtzan scholtzan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some attributes that looking in Jetstream are set to Optional, so I think they should be optional here too.

Other than that, looks great.

schemas/jetstream/analysis_errors.py Outdated Show resolved Hide resolved
schemas/jetstream/metadata.py Outdated Show resolved Hide resolved
schemas/jetstream/metadata.py Outdated Show resolved Hide resolved
Makefile Outdated
Comment on lines 247 to 257
schemas_black_check:
(cd schemas && poetry run black --check --diff .)

schemas_ruff_check:
(cd schemas && poetry run ruff .)

schemas_check: schemas_black_check schemas_ruff_check
(cd schemas && poetry run pytest)

schemas_code_format:
(cd schemas && poetry run black . && poetry run ruff --fix .)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this on my local and got:

E   ModuleNotFoundError: No module named 'mozilla_nimbus_schemas'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure I fixed this in the latest commit, but test this out again if you get a chance.

assert ae.metric == "test-metric"
assert ae.analysis_basis == "enrollments"
ae_json = ae.json()
print(ae_json)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, good catch. Thanks!

Copy link
Contributor

@b4handjr b4handjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the majority of the tests here aren't really needed. Pydantic has great coverage for most of what you are testing. I think the json parts of the tests would be good to keep, as you did write a custom json_loads function.

@mikewilli
Copy link
Contributor Author

I think the majority of the tests here aren't really needed. Pydantic has great coverage for most of what you are testing. I think the json parts of the tests would be good to keep, as you did write a custom json_loads function.

Yea, makes sense to me. I wrote some of them partly to test how it worked, but you're right! I'll take a sweep and see what I can remove.

@jaredlockhart jaredlockhart self-requested a review June 9, 2023 16:19
Copy link
Collaborator

@jaredlockhart jaredlockhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed again and everything looks great 🎉

So followup are we gonna publish this to pypi? 😱

@mikewilli mikewilli enabled auto-merge June 9, 2023 20:03
@mikewilli mikewilli added this pull request to the merge queue Jun 9, 2023
Merged via the queue into mozilla:main with commit a2de3ae Jun 9, 2023
@mikewilli mikewilli deleted the 8842-jetstream-schema branch June 9, 2023 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants