Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1013413 - Allow updating validator hashes without redeployment, and ... #445

Closed
wants to merge 2 commits into from

Conversation

kmaglione
Copy link
Contributor

This depends on the related validator pull request which allows it to accept these checksums.

Some notes on particular changes:

When I tried to test these changes with DRF enabled, I came across a lot of brokenness. I think I wound up fixing a lot of code that only applies to three-legged auth and isn't actually used, and then fixed a few things that applied to three-legged auth and broke during my testing. I also left in some code for switching between three-legged and two-legged auth that can be removed if that's not something we'll ever be doing.

In any case, since DRF is currently switched off, if those changes significantly complicate matters and might delay the review, I'll move most of them to a separate PR.

I also made some changes in api/views.py, namely APIView.__call__ that were necessary for an earlier implementation of my code, but aren't now. I'd be happy to remove them, if they're objectionable, but I think they're an improvement, so I've left them.

I think the rest of this is pretty straightforward. The main changes are:

  1. App and checksum metadata are now passed to the validator directly as python objects, rather than being written to intermediate files.

  2. Checksum data now has a public page letting developers know what is and isn't supported.

  3. Checksum data can be fetched and update via the API. I decided to go this route, rather than an admin page, mainly so we can update libraries routinely without requiring much manual process, but also because Marketplace has been going down the route of converting admin pages to API calls. I'm also considering adding a utility to the validator to fetch production checksums from AMO for local use.

@@ -40,15 +40,25 @@ def process_request(self, request):
if not hasattr(request, 'authed_from'):
request.authed_from = []

three_legged = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind putting this in settings instead?

@mstriemer
Copy link
Contributor

Many test failures looking for libraries.json. I'm assuming because this isn't pointing at the right amo-validator version.

@kmaglione
Copy link
Contributor Author

Yeah. I ran the tests against the correct validator version locally, but since we moved it to PyPI I can't update the version in this repo until that gets published.

@mstriemer
Copy link
Contributor

I'm not crazy about the API being one endpoint to set all of the data. I think it would be a lot nicer to have a FileHash model or something that has id, framework, version, filename, banned, hash columns or whatever seems useful. Using a model and accepting one row at a time should remove the complexity of the ChecksumsForm class which is doing a lot of manual type checking.

Pulling the data back out of the db in the right format might add a few seconds of processing time but we can deal with that if it is too slow.

@mstriemer
Copy link
Contributor

Can you update requirements/prod.txt to have -e git+https://github.com/kmaglione/amo-validator@framework-stuff#egg=amo-validator instead of amo-validator==<version> for now so we can see the tests pass?

@kmaglione
Copy link
Contributor Author

I thought about going the full DB model route, and there were a few reasons I chose not to:

  1. I want most of this process to be able to happen with as little user-interaction as possible, and without requiring a front-end. This could still be done with a more complicated API, but I think it would become significantly more complicated.

  2. The automatic library fetch/update code I wrote for the validator needs to have direct access to the current metadata. We could always fetch the Olympia-generated JSON before running it, but I'm not especially happy about the idea.

  3. I want the update process to be able to update local metadata without regard for the server, so that we can test changes, for instance, or add local hashes for special cases. Reliably syncing these changes to Olympia after they've been tested, or using separate code paths for testing, is going to get extremely complicated.

I also designed the metadata format to be fairly flexible. It currently allows several other properties, such as the list of files in an XPI that belong to the framework, to be edited, and I'd like to be able to extend it as necessary without significant updates to Olympia.

As for the validation code, it could probably be made simpler, and it could arguably be removed. There should only ever be two or three people who even have access to the API, and the data should always be generated by code in the validator repository which can be expected to be reliable.

@kmaglione
Copy link
Contributor Author

Updated amo-validator requirement for tests.

@kmaglione
Copy link
Contributor Author

Fixed flake8 errors, but I disagree with ./apps/api/forms.py:24:53: E226 missing whitespace around arithmetic operator. PEP8 suggests leaving no whitespace around higher precedence operators in expressions with mixed precedence.

required=True)

def clean_checksum_json(self):
# Do basic schema validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be a docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... It's a field cleaning function, so its purpose is fairly clear. I'm not sure there's any benefit to exposing this particular detail in a docstring.

@magopian
Copy link
Contributor

I believe this PR is linked/needs mozilla/amo-validator#271 ? However, I don't think it contains code to post to the API in here, does it?

Actually, I'd like to avoid the whole API thing, and instead have a task on olympia that will:
1/ call the lib updater (taken out from the validator, and released separately?) by providing it the current library metadata from the database
2/ get the updated metadata fromt he lib updater, and update the database with it
This way, the validator is just a service that will take the needed input (the apps and library metadata) and return the validation. No more back and forth via local files or API.

Also, I second @mstriemer: storing the data in a full model is the way to go. We don't want to have to do type checks ourselves, it's error prone. And if we need to extend the model from time to time, we'll use migrations as we do for all the other models we already have. We do not want custom/home-made solutions when there is a good and idiomatic way to do it.

Regarding the unnecessary changes, or the changes that are not linked to this bug, please remove them or move them to other PRs.

I really like the idea of having a public page listing everything we support. This is an awesome idea. Passing the apps and library metadata straight to the validator is also excellent, I want that. Thanks!

@kmaglione
Copy link
Contributor Author

I believe this PR is linked/needs mozilla/amo-validator#271 ? However, I don't think it contains code to post to the API in here, does it?

It needs it for the validator schema changes, yes. The code to post to the API is in a different repo for the moment.

Actually, I'd like to avoid the whole API thing, and instead have a task on olympia that will:
1/ call the lib updater (taken out from the validator, and released separately?) by providing it the current library metadata from the database
2/ get the updated metadata fromt he lib updater, and update the database with it

We need more control over these than a cronjob would allow. Many of the checksums are manually added only after review, and it's important that we be able to change what libraries/versions we support without having to deploy new code to production. This has been a huge problem going back several years, which is why we designed this to allow us this kind of flexibility.

For the moment, I think it would be OK to split the API changes into a separate PR, and rely on a management command to update the data for the initial deployment. I don't want to move the data updates out of direct control of a human, though. Once we've been using the library fetcher for a while, and we're satisfied with it, it might be a good idea to add a server-side cronjob, just as long as manual updates remain possible.

Also, I second @mstriemer: storing the data in a full model is the way to go. We don't want to have to do type checks ourselves, it's error prone. And if we need to extend the model from time to time, we'll use migrations as we do for all the other models we already have. We do not want custom/home-made solutions when there is a good and idiomatic way to do it.

The problem with converting this into models is that this is a deep and complex data structure, which is mainly used by code outside of this codebase, without access to SQL or django's ORM. Just constructing the models is going to be a lot of work, and serializing them to JSON, and ensuring that doing so does not cause problems, is going to be even more work, and much more error-prone.

It would also mean that the library fetcher code would always have to run on a server with direct access to the AMO DB, which is something I definitely do not want, since it makes updates a much heavier-weight process, when the entire purpose of this change is to make it lighter-weight. It also makes it much more difficult for people running local copies of the validator to maintain their own checksums DBs, since running updates would now require an initialized database and a full django ORM to make any changes and convert them back to JSON.

I don't think we gain anything significant from building out an entire ORM model tree just to avoid a couple of dozen lines of basic JSON schema validation. I'd be happy to move the validation code to the validator itself, since it is essentially the owner of the data structure.

@kmaglione
Copy link
Contributor Author

Here's the rough version of the client code for updating hashes via the API: https://github.com/kmaglione/amo-oauth

The last commit is all that differs from Andy's client.

@andymckay
Copy link

Closing for now, it can't be merged cleanly. If its going to be worked on, please re-open.

@andymckay andymckay closed this Oct 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants