-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1013413 - Allow updating validator hashes without redeployment, and ... #445
Conversation
5685f6e
to
9312f26
Compare
@@ -40,15 +40,25 @@ def process_request(self, request): | |||
if not hasattr(request, 'authed_from'): | |||
request.authed_from = [] | |||
|
|||
three_legged = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind putting this in settings instead?
Many test failures looking for libraries.json. I'm assuming because this isn't pointing at the right amo-validator version. |
Yeah. I ran the tests against the correct validator version locally, but since we moved it to PyPI I can't update the version in this repo until that gets published. |
I'm not crazy about the API being one endpoint to set all of the data. I think it would be a lot nicer to have a Pulling the data back out of the db in the right format might add a few seconds of processing time but we can deal with that if it is too slow. |
Can you update requirements/prod.txt to have |
I thought about going the full DB model route, and there were a few reasons I chose not to:
I also designed the metadata format to be fairly flexible. It currently allows several other properties, such as the list of files in an XPI that belong to the framework, to be edited, and I'd like to be able to extend it as necessary without significant updates to Olympia. As for the validation code, it could probably be made simpler, and it could arguably be removed. There should only ever be two or three people who even have access to the API, and the data should always be generated by code in the validator repository which can be expected to be reliable. |
9312f26
to
29069df
Compare
Updated amo-validator requirement for tests. |
29069df
to
6ae36d2
Compare
Fixed flake8 errors, but I disagree with |
…nd publish known libraries in readable format.
6ae36d2
to
bc074d7
Compare
required=True) | ||
|
||
def clean_checksum_json(self): | ||
# Do basic schema validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe... It's a field cleaning function, so its purpose is fairly clear. I'm not sure there's any benefit to exposing this particular detail in a docstring.
I believe this PR is linked/needs mozilla/amo-validator#271 ? However, I don't think it contains code to post to the API in here, does it? Actually, I'd like to avoid the whole API thing, and instead have a task on olympia that will: Also, I second @mstriemer: storing the data in a full model is the way to go. We don't want to have to do type checks ourselves, it's error prone. And if we need to extend the model from time to time, we'll use migrations as we do for all the other models we already have. We do not want custom/home-made solutions when there is a good and idiomatic way to do it. Regarding the unnecessary changes, or the changes that are not linked to this bug, please remove them or move them to other PRs. I really like the idea of having a public page listing everything we support. This is an awesome idea. Passing the apps and library metadata straight to the validator is also excellent, I want that. Thanks! |
It needs it for the validator schema changes, yes. The code to post to the API is in a different repo for the moment.
We need more control over these than a cronjob would allow. Many of the checksums are manually added only after review, and it's important that we be able to change what libraries/versions we support without having to deploy new code to production. This has been a huge problem going back several years, which is why we designed this to allow us this kind of flexibility. For the moment, I think it would be OK to split the API changes into a separate PR, and rely on a management command to update the data for the initial deployment. I don't want to move the data updates out of direct control of a human, though. Once we've been using the library fetcher for a while, and we're satisfied with it, it might be a good idea to add a server-side cronjob, just as long as manual updates remain possible.
The problem with converting this into models is that this is a deep and complex data structure, which is mainly used by code outside of this codebase, without access to SQL or django's ORM. Just constructing the models is going to be a lot of work, and serializing them to JSON, and ensuring that doing so does not cause problems, is going to be even more work, and much more error-prone. It would also mean that the library fetcher code would always have to run on a server with direct access to the AMO DB, which is something I definitely do not want, since it makes updates a much heavier-weight process, when the entire purpose of this change is to make it lighter-weight. It also makes it much more difficult for people running local copies of the validator to maintain their own checksums DBs, since running updates would now require an initialized database and a full django ORM to make any changes and convert them back to JSON. I don't think we gain anything significant from building out an entire ORM model tree just to avoid a couple of dozen lines of basic JSON schema validation. I'd be happy to move the validation code to the validator itself, since it is essentially the owner of the data structure. |
Here's the rough version of the client code for updating hashes via the API: https://github.com/kmaglione/amo-oauth The last commit is all that differs from Andy's client. |
Closing for now, it can't be merged cleanly. If its going to be worked on, please re-open. |
This depends on the related validator pull request which allows it to accept these checksums.
Some notes on particular changes:
When I tried to test these changes with DRF enabled, I came across a lot of brokenness. I think I wound up fixing a lot of code that only applies to three-legged auth and isn't actually used, and then fixed a few things that applied to three-legged auth and broke during my testing. I also left in some code for switching between three-legged and two-legged auth that can be removed if that's not something we'll ever be doing.
In any case, since DRF is currently switched off, if those changes significantly complicate matters and might delay the review, I'll move most of them to a separate PR.
I also made some changes in
api/views.py
, namelyAPIView.__call__
that were necessary for an earlier implementation of my code, but aren't now. I'd be happy to remove them, if they're objectionable, but I think they're an improvement, so I've left them.I think the rest of this is pretty straightforward. The main changes are:
App and checksum metadata are now passed to the validator directly as python objects, rather than being written to intermediate files.
Checksum data now has a public page letting developers know what is and isn't supported.
Checksum data can be fetched and update via the API. I decided to go this route, rather than an admin page, mainly so we can update libraries routinely without requiring much manual process, but also because Marketplace has been going down the route of converting admin pages to API calls. I'm also considering adding a utility to the validator to fetch production checksums from AMO for local use.