Skip to content
This repository has been archived by the owner on Feb 1, 2019. It is now read-only.

Refactor framework and library detection code. #271

Closed
wants to merge 4 commits into from

Conversation

kmaglione
Copy link
Contributor

Sorry. This is kind of large.

Thoughts for easier review:

The JetpackCFX.validate method is mostly code that was in 'jetpack.js', but with a lot of dead code removed.

The 'test_libraries.py` file is mostly code from the very strangely misnamed 'test_libraryblacklist.py' file, but git's rename tracking is not smart enough to notice.

The code in 'metadata/libfetcher.py' and 'metadata/update.py' can be skipped, if you prefer. It won't be run on servers, and we'll probably be changing it locally without submitting PRs immediately for a lot of runs.

There's a related update to Olympia coming to handle out-of-band updates of the metadata JSON and add some integration with the file viewer.

@kmaglione
Copy link
Contributor Author

Oh, and fortunately most of the "files changed" are just deletions of old cruft and metadata files.

@magopian
Copy link
Contributor

magopian commented Feb 4, 2015

This PR is not mergeable anymore, could you please rebase it?

When git isn't able to detect the renaming/moving of files, you can help it, I believe, by using git mv. Maybe this would make this PR more readable?

I'm afraid I'm not familiar enough with the codebase (yet?) to review this PR, maybe @mstriemer will have some time to give it a look?

@diox
Copy link
Member

diox commented Feb 4, 2015

Bug number ? What is the plan exactly here, if you are removing the old lists of files ? "we'll probably be changing it locally without submitting PRs immediately for a lot of runs" is scary.

@mstriemer
Copy link
Contributor

I didn't get to look at all the code but the cleanup looks a lot nicer than the old code, thanks.

I'm assuming the out-of-band changes PR in olympia will allow for libraries.json to be updated outside of git. Will this update the libraries.json file on the server or is there some other mechanism to get those changes into the validator?

How will the out-of-band changes interact with the checked-in version?

I'm just wondering what the game plan is here so I know what to expect in all of these changes.

@kmaglione
Copy link
Contributor Author

Huh. I rebased before pushing. I guess I'll try again.

I did use git mv, but unfortunately it doesn't make a difference.

@kmaglione
Copy link
Contributor Author

Related bugs: https://bugzil.la/1069570 https://bugzil.la/1013413

Yes, the out-of-band changes will be to update the libraries.json data. The file itself won't be changed. AMO will pass the data directly to the validate method, much like it currently does for the application metadata.

@magopian
Copy link
Contributor

@kmaglione could you please rebase (again :/)?

@kmaglione
Copy link
Contributor Author

Rebased and fixed some conflicts.

VALID_VERSION = re.compile(r"^\d[\d.]+$")


def filter_re(expr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring?

@@ -30,16 +28,16 @@ def _test_rdf(err, install):
"""Wrapper for install.rdf testing to make unit testing so much
easier."""

shouldnt_exist = ("hidden", )
shouldnt_exist = "hidden",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'd rather ["hidden"] for perfect readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove these changes for now. I'm not sure how they wound up in this PR.

@magopian
Copy link
Contributor

Thank you for your work, it is impressive.

In the future, please consider:
1/ multiple atomic commits
2/ multiple PRs if it's fixing unrelated things (eg: rewriting or improving some code to clean it is ok, completely rewriting huge blobs isn't)
3/ committing "file moves" and "code modifications" separately, to be able to only review the latter
4/ add comments and docstring as much as possible, this really helps understanding the code, reviewing it, and maintaining it in the future
5/ document the rationale, the changes... in the PR description. The discussion we had in person was very helpful for that (understanding the flow of lib updates, storing the hashes in olympia's database...)
6/ code simplicity. This is very important for maintainability. Code must be as clear, readable and straightforward as possible. Avoid complex and convoluted solutions at all costs, even if they are more concise or "more clever". In fact, avoid it in particular if they are.

Regarding the PR itself, as discussed, it would make sense to distribute the lib fetcher separately in the future, maybe even release it in a generic enough way so it may be useful to people outside of Mozilla?
Also, I don't think the code to update olympia's codebase with the hashes from the latest versions of the libs is in this PR, so if possible, i'd like:
1/ a separate bug/PR for the new lib updater code (not necessarily as a separate repo yet)
2/ a separate bug/PR to update olympia's database (using the web API if I remember correctly what you wanted to do from our discussion at Whistler?)
3/ maybe do not implement 2/ for now, as that would:
- avoid having to wait for the fix of the oauth code on olympia
- still let us run a cron job to take a local file and populate it in the DB (that would even be easier to implement)

@@ -31,7 +31,9 @@ def validate(path, format="json",
The format to return the results in. Defaults to "json". Currently, any
other format will simply return the error bundle.
`approved_applications`:
Path to the list of approved application versions
Path to the list of approved application versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that it can be a dict (same for the library_metadata below).

Should we remove the app_versions.json file from the repository now that it's not used anymore? That would be convenient, as we won't have to maintain it manually anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still used for now, when the validator is run standalone. Ideally, I'd like to be able to fetch it on demand from AMO, in the future, but at the moment we still need it.

@kmaglione
Copy link
Contributor Author

2/ multiple PRs if it's fixing unrelated things (eg: rewriting or improving some code to clean it is ok, completely rewriting huge blobs isn't)

I think the only unrelated things that snuck in here were in installrdf.py. I'm going to pull those changes out and put them into a separate PR.

3/ committing "file moves" and "code modifications" separately, to be able to only review the latter

I think there was only one file that actually moved here (test_libraryblacklist.py). The code from jetpack.py that I didn't delete actually got merged into frameworks.py in a way that I don't think rename tracking would have helped. I'll try to see if I can fix the former with a rebase.

6/ code simplicity. This is very important for maintainability. Code must be as clear, readable and straightforward as possible. Avoid complex and convoluted solutions at all costs, even if they are more concise or "more clever". In fact, avoid it in particular if they are.

I think you're going to point these out when you see them. I try to keep most of my code as simple as possible, so we probably have different definitions.

1/ a separate bug/PR for the new lib updater code (not necessarily as a separate repo yet)

I think I'll move them to a separate repo.

2/ a separate bug/PR to update olympia's database (using the web API if I remember correctly what you wanted to do from our discussion at Whistler?)

I have a separate repo for this, based on andymckay/amo-oauth. I need to do some cleanup, then I'll put it up somewhere for review.

3/ maybe do not implement 2/ for now, as that would:

  • avoid having to wait for the fix of the oauth code on olympia
  • still let us run a cron job to take a local file and populate it in the DB (that would even be easier to implement)

I'm not sure a cronjob be sufficient here, since we need to add manual hashes fairly often. It might be sufficient for the short term to split out the oauth changes and add a management command so someone from ops can update the hashes in the short term, though. I'd really rather not wait too long.

@kmaglione
Copy link
Contributor Author

I think this is everything.

I also removed the metadata updater and some unrelated changes that originally got sucked in, moved the metadata validation code that was into Olympia into libraries.py and test_libraries.py, and factored out some common code from a few test files.

@kmaglione
Copy link
Contributor Author

It turns out I forgot a couple of things, actually. I have an update which changes "banned": (bool|string) to "deprecated": int, and updates the README. It's marooned on my office computer at the moment, since it seems to have gotten kicked off the network. Will update later today.

@kmaglione
Copy link
Contributor Author

Looks like I forgot to push this last week. This is now mergeable again, and I think it addresses everything.

@magopian
Copy link
Contributor

A few comments aren't answered yet (eg #271 (comment) and #271 (comment)).

Regarding your last comment:
1/ you said "I think I'll move them to a separate repo.": is that done? If so, shouldn't the documentation (in the README, about the new data structure for libs/frameworks hashes) be in this new repo instead?
2/ your answer about the cronjob not being sufficient: I have a bad feeling about this, it really looks over-engineered.

I understand your need to have a simpler (not "blocked" by the filesystem/repo) way to update the hashes. In that case, the "usual" solution would be to create an admin model for that. Or a management command, yes.

This PR is really large, I can't remember what's in there, and what isn't, can you please sum it up? Is it "only" the code to use the new "lib/framework hashes data structure"? Does it still have the code related to the lib updating? Or the "posting to AMO to update its data"?

@kmaglione
Copy link
Contributor Author

1/ you said "I think I'll move them to a separate repo.": is that done?

Yes, but I haven't made it public yet.

If so, shouldn't the documentation (in the README, about the new data structure for libs/frameworks hashes) be in this new repo instead?

I think it makes sense to have it in this repo, since the validator deals with it extensively.

2/ your answer about the cronjob not being sufficient: I have a bad feeling about this, it really looks over-engineered.

I understand your need to have a simpler (not "blocked" by the filesystem/repo) way to update the hashes. In that case, the "usual" solution would be to create an admin model for that. Or a management command, yes.

That would be the usual solution, yes. The problem with that solution, though, is that all of the admin pages we currently have are kludgy and poorly-maintained. This one would need to deal with managing much more complex data, would probably have to be updated more often, would have to be updated in sync with the validator, would most likely have to include an incredible amount of logic for mapping DB models to data structures the validator deals with, would require a local tool anyway for hashing libraries and frameworks that need to be manually approved, would then require either uploading those new hashes to a running server and re-downloading the libraries database to test, or otherwise re-implementing the same logic again in a local client...

And, as for a management command, that would mean needlessly relying on someone from ops every time we need to add a hash, or change a flag.

In either case, we'd still need to implement all of the same logic. And on the server side, I think it would require much more complexity, not less.

At any rate, I'm not ruling out moving more of the logic to Olympia, but I'd prefer to go live with the current solution first, and talk about what needs to change if it doesn't appear to be working well.

This PR is really large, I can't remember what's in there, and what isn't, can you please sum it up?

The current incarnation has:

  • Code for detecting the following frameworks, and what files belong to the framework:
    • BestToolbars
    • Conduit
    • Crossrider
    • Jetpack (CFX)
    • Jetpack (JPM)
    • OpenForge
    • Kango
    • Scriptify
  • New code for detecting known libraries with the added ability to:
    • Mark libraries, frameworks, or versions as banned.
    • Add warnings for particular libraries ("Prototype.js may not be used in browser overlays.").
    • Warn when a file which should belong to a framework is unknown.

The current version of the patch I have for AMO leaves out the API code for the moment, and just has an admin page with a file upload field. I think that should be sufficient in the short term.

In the medium term, we're going to need the API fixes in my original PR anyway if we're actually going to deliver on our promise of a signing API.

@magopian
Copy link
Contributor

Ok, here's the deal. This PR has grown so huge that I can't (and don't want to) go over it again. It includes so many different changes, some unrelated, that I just can't grok it.

On the other hand, I don't want to reject it "just because I'm too lazy to go over it (again)", and I know now (after some discussion with you on irc) that at least some of this work is needed for the future of the validator (and dealing with the chrome-style add-ons API). Also, you're the only current active maintainer, and you have agreed on a plan with management, if I understand things correctly.

So unless you find someone who's ready to review it, I'll just say "merge it while I'm looking the other way".

To be honest, I'm pretty annoyed at this whole thing, and would like this to not happen again in the future. If there's anything I can do to help this not happening again, please let me know.

@muffinresearch
Copy link
Contributor

I'm thinking if it would be useful to focus on capturing these changes on the addons-validator project - we're going to need to cover the same processes, but as there's clearly improvements to the amo-validator here it would be nice to get them rather than us just port what's there.

A starting point might be to add some issues breaking down what's in this patch on the addons-validator project so we can address them as we bring across the JS lib/framework detection?

@EnTeQuAk
Copy link
Contributor

EnTeQuAk commented Jun 3, 2016

I'm going to close this for cleaning up reasons, not sure how relevant this PR still is. Feel free to reopen anytime though :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants