Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic colander based query validation added #102

Closed
wants to merge 8 commits into from

Conversation

crankycoder
Copy link
Contributor

This adds the minimal code to use colander to validate arguments being passed into therecommend method of the RecommendationManager class.

This should probably be refactored into a decorator that extracts the function definition using inspect.argspec to capture and cache the function specification and run the schema validator on the arguments.

This closes off #96

@crankycoder crankycoder force-pushed the features/96-schema branch 2 times, most recently from 64a7b2c to 54a8bf8 Compare July 12, 2018 19:34
@coveralls
Copy link

coveralls commented Jul 13, 2018

Coverage Status

Coverage increased (+0.01%) to 90.744% when pulling 4e8e55a on features/96-schema into e0d3970 on master.

* added TRAVIS_PYTHON_VERSION detection to run flake8 only in py3.5
  enviroment as version breakage in py2.7 is too hard to manage
* added basic colander schema validation
@crankycoder
Copy link
Contributor Author

I also started to pin the versions of all packages with explicit hashes in the requirements.txt file. We had a mix of libraries that were being installed at different version numbers between py2.7 for the EMR enviroment and py3.5 for the webheads.

@mlopatka
Copy link
Contributor

@birdsarah If you have a chance to review this amidst the pycon program that would be great. Otherwise, @Dexterp37 could you please do a fly-by? I'm out for the rest fo July.

@birdsarah
Copy link
Contributor

Reviewing

@Dexterp37
Copy link
Contributor

Leaving this to @birdsarah :)

Copy link
Contributor

@birdsarah birdsarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR completely changes the environments that are being tested. The result is that only python 3.5 with the requirements.txt file is being tested.

If you want to do that, then clean this up completely - remove the environment_emr.yaml, and all the emr and conda related code in .travis.yaml.

Does any of the code in this repo run on / need to run on an EMR environment? If not, then there's no need to have it all hanging around, but what's in this PR is a mess - it makes it look like 2.7 and emr are running, when they're not.


tests/conftest.py is empty - should be removed.

.travis.yml Outdated

after_success:
- coveralls
- covhttp://docs.travis-ci.com/user/ci-environment/eralls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken.

Makefile Outdated
@@ -9,5 +9,7 @@ test:
python setup.py develop
python setup.py test
flake8 taar tests


#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this extra comment line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, vim artifact. Removing it now.

bin/hashfreeze Outdated
@@ -0,0 +1,8 @@
#/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be #!

bin/hashfreeze Outdated
@@ -0,0 +1,8 @@
#/bin/bash
touch requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is guaranteed to be in the correct directory.

for package_ver in `pip freeze |grep -v hashin`
do
echo "Processing "$package_ver
hashin $package_ver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashin is not necessarily installed on people's machines and you haven't added a README to explain how to use this - or explaining how and when to use make freeze.

taar/schema.py Outdated
import colander


class RecommendationManagerQuery(colander.MappingSchema):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful if this were named RecommendationManagerQuerySchema so that it was readily readable as a schema when used elsewhere.

taar/schema.py Outdated
"""
client_id = colander.SchemaNode(colander.String())
limit = colander.SchemaNode(colander.Int())
extra_data = colander.SchemaNode(colander.Mapping())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the following code:

taar/recommenders/recommendation_manager.py

         branch_selector = extra_data.get('branch', 'control')
         if branch_selector not in ('control', 'linear', 'ensemble'):
             return []

To me this is exactly the kind of thing that colander could be setup for validating and then removing this validation code from the python.

Also, Why not just have an optional branch argument instead of the extra_data catch all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, I would expect to do validation very early - when data is entering the system. In our case that would be in the taar-api - by this point, I don't really feel like we're getting much protection value from the validation .e.g. we're logging in the passed in client id all over the place.

If we think that splitting taar-api as a seperate module / package / repo is still the right call then it seems to me like we would want a third repo with a schema in that both call from and that taar-api uses and that we can then do more stingeng validation on locale, platform etc.

Also, looking at the extra_data branch logic in the taar-api code we can see that branch always gets passed through and set as control, so I think we could be a little less redundant if this was all a bit more joined up.

taar/schema.py Outdated
Mostly useful for evoloving unittests and APIs in a stable way.
"""
client_id = colander.SchemaNode(colander.String())
limit = colander.SchemaNode(colander.Int())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set a min of 0 and max of ??.

taar/schema.py Outdated

Mostly useful for evoloving unittests and APIs in a stable way.
"""
client_id = colander.SchemaNode(colander.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to be more constrained in what are acceptable client_id's - at least a max length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A max length check has been added, but I'd rather not put any other kind of constraint on this field. It's driven by the AMO team and adding more constraints would end up making us tightly coupled with AMO changes.

@@ -107,7 +107,7 @@ def get(self, client_id):
ctx['clock'] = Clock()
ctx['cache'] = JSONCache(ctx)
manager = RecommendationManager(ctx.child())
recommendation_list = manager.recommend({'client_id': 'some_ignored_id'},
recommendation_list = manager.recommend('some_ignored_id',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really tested that our new schema assertions are working, but I'm okay with that assuming that we trust our schema validation library - because I don't believe in testing configuration.

However, we should have a simple test that asserts that we are validating against our schema - so that code can't get inadvertently bypassed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the way the schema validator is applied against the recommend method. It's now a decorator to clean up the awkward code at the beginning of the recommend method. The argument specification is cached on first run so this shouldn't add too much over head on the function calls.

@birdsarah
Copy link
Contributor

We should also fix the .travis.yaml file so that it's not showing pass test runs when they have multiple failing lines, which is currently the case.

@crankycoder
Copy link
Contributor Author

@birdsarah I think I've addressed everything except for the removal of the python 2.7 test enviroment. That was removed because we never use Python 2.7 in production except in the EMR enviroment (which was kept as a test env).

@birdsarah
Copy link
Contributor

Can you clarify - do we want to test the emr environment?

@crankycoder
Copy link
Contributor Author

@birdsarah yes - we do want to test in the EMR enviroment. EMR tests can be seen running over here: https://travis-ci.org/mozilla/taar/jobs/404706772

@birdsarah
Copy link
Contributor

@birdsarah yes - we do want to test in the EMR enviroment. EMR tests can be seen running over here: https://travis-ci.org/mozilla/taar/jobs/404706772

I would say this isn't testing the EMR environment because:

  1. The EMR environment isn't installed https://travis-ci.org/mozilla/taar/jobs/404706772#L682 (see my comment https://github.com/mozilla/taar/pull/102/files#r202635694)
  2. Even if it were installed, this PR then pip installs a different set of packages over the top of the environment. https://travis-ci.org/mozilla/taar/jobs/404706772#L731 (see my comment https://github.com/mozilla/taar/pull/102/files#r202636103 - which I see now was terse and not making this point clearly - my apologies)

@crankycoder crankycoder mentioned this pull request Jul 25, 2018
@crankycoder crankycoder deleted the features/96-schema branch August 13, 2018 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants