-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic colander based query validation added #102
Conversation
64a7b2c
to
54a8bf8
Compare
a871ee4
to
04fd2f2
Compare
* added TRAVIS_PYTHON_VERSION detection to run flake8 only in py3.5 enviroment as version breakage in py2.7 is too hard to manage * added basic colander schema validation
04fd2f2
to
4de99b5
Compare
I also started to pin the versions of all packages with explicit hashes in the requirements.txt file. We had a mix of libraries that were being installed at different version numbers between py2.7 for the EMR enviroment and py3.5 for the webheads. |
@birdsarah If you have a chance to review this amidst the pycon program that would be great. Otherwise, @Dexterp37 could you please do a fly-by? I'm out for the rest fo July. |
Reviewing |
Leaving this to @birdsarah :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR completely changes the environments that are being tested. The result is that only python 3.5 with the requirements.txt
file is being tested.
If you want to do that, then clean this up completely - remove the environment_emr.yaml, and all the emr and conda related code in .travis.yaml.
Does any of the code in this repo run on / need to run on an EMR environment? If not, then there's no need to have it all hanging around, but what's in this PR is a mess - it makes it look like 2.7 and emr are running, when they're not.
tests/conftest.py is empty - should be removed.
.travis.yml
Outdated
|
||
after_success: | ||
- coveralls | ||
- covhttp://docs.travis-ci.com/user/ci-environment/eralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is broken.
Makefile
Outdated
@@ -9,5 +9,7 @@ test: | |||
python setup.py develop | |||
python setup.py test | |||
flake8 taar tests | |||
|
|||
|
|||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this extra comment line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, vim artifact. Removing it now.
bin/hashfreeze
Outdated
@@ -0,0 +1,8 @@ | |||
#/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be #!
bin/hashfreeze
Outdated
@@ -0,0 +1,8 @@ | |||
#/bin/bash | |||
touch requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is guaranteed to be in the correct directory.
for package_ver in `pip freeze |grep -v hashin` | ||
do | ||
echo "Processing "$package_ver | ||
hashin $package_ver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashin is not necessarily installed on people's machines and you haven't added a README to explain how to use this - or explaining how and when to use make freeze
.
taar/schema.py
Outdated
import colander | ||
|
||
|
||
class RecommendationManagerQuery(colander.MappingSchema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful if this were named RecommendationManagerQuerySchema
so that it was readily readable as a schema when used elsewhere.
taar/schema.py
Outdated
""" | ||
client_id = colander.SchemaNode(colander.String()) | ||
limit = colander.SchemaNode(colander.Int()) | ||
extra_data = colander.SchemaNode(colander.Mapping()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the following code:
taar/recommenders/recommendation_manager.py
branch_selector = extra_data.get('branch', 'control')
if branch_selector not in ('control', 'linear', 'ensemble'):
return []
To me this is exactly the kind of thing that colander could be setup for validating and then removing this validation code from the python.
Also, Why not just have an optional branch
argument instead of the extra_data catch all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, I would expect to do validation very early - when data is entering the system. In our case that would be in the taar-api
- by this point, I don't really feel like we're getting much protection value from the validation .e.g. we're logging in the passed in client id all over the place.
If we think that splitting taar-api as a seperate module / package / repo is still the right call then it seems to me like we would want a third repo with a schema in that both call from and that taar-api uses and that we can then do more stingeng validation on locale, platform etc.
Also, looking at the extra_data
branch logic in the taar-api
code we can see that branch always gets passed through and set as control, so I think we could be a little less redundant if this was all a bit more joined up.
taar/schema.py
Outdated
Mostly useful for evoloving unittests and APIs in a stable way. | ||
""" | ||
client_id = colander.SchemaNode(colander.String()) | ||
limit = colander.SchemaNode(colander.Int()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set a min of 0 and max of ??.
taar/schema.py
Outdated
|
||
Mostly useful for evoloving unittests and APIs in a stable way. | ||
""" | ||
client_id = colander.SchemaNode(colander.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be more constrained in what are acceptable client_id's - at least a max length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A max length check has been added, but I'd rather not put any other kind of constraint on this field. It's driven by the AMO team and adding more constraints would end up making us tightly coupled with AMO changes.
@@ -107,7 +107,7 @@ def get(self, client_id): | |||
ctx['clock'] = Clock() | |||
ctx['cache'] = JSONCache(ctx) | |||
manager = RecommendationManager(ctx.child()) | |||
recommendation_list = manager.recommend({'client_id': 'some_ignored_id'}, | |||
recommendation_list = manager.recommend('some_ignored_id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't really tested that our new schema assertions are working, but I'm okay with that assuming that we trust our schema validation library - because I don't believe in testing configuration.
However, we should have a simple test that asserts that we are validating against our schema - so that code can't get inadvertently bypassed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the way the schema validator is applied against the recommend method. It's now a decorator to clean up the awkward code at the beginning of the recommend method. The argument specification is cached on first run so this shouldn't add too much over head on the function calls.
We should also fix the |
…ile squashing flake8 problems
76ec789
to
f0898e5
Compare
@birdsarah I think I've addressed everything except for the removal of the python 2.7 test enviroment. That was removed because we never use Python 2.7 in production except in the EMR enviroment (which was kept as a test env). |
Can you clarify - do we want to test the emr environment? |
@birdsarah yes - we do want to test in the EMR enviroment. EMR tests can be seen running over here: https://travis-ci.org/mozilla/taar/jobs/404706772 |
I would say this isn't testing the EMR environment because:
|
This adds the minimal code to use colander to validate arguments being passed into the
recommend
method of the RecommendationManager class.This should probably be refactored into a decorator that extracts the function definition using inspect.argspec to capture and cache the function specification and run the schema validator on the arguments.
This closes off #96