Basic colander based query validation added #102

crankycoder · 2018-07-12T18:12:32Z

This adds the minimal code to use colander to validate arguments being passed into therecommend method of the RecommendationManager class.

This should probably be refactored into a decorator that extracts the function definition using inspect.argspec to capture and cache the function specification and run the schema validator on the arguments.

This closes off #96

coveralls · 2018-07-13T12:53:49Z

Coverage increased (+0.01%) to 90.744% when pulling 4e8e55a on features/96-schema into e0d3970 on master.

* added TRAVIS_PYTHON_VERSION detection to run flake8 only in py3.5 enviroment as version breakage in py2.7 is too hard to manage * added basic colander schema validation

crankycoder · 2018-07-13T20:11:54Z

I also started to pin the versions of all packages with explicit hashes in the requirements.txt file. We had a mix of libraries that were being installed at different version numbers between py2.7 for the EMR enviroment and py3.5 for the webheads.

mlopatka · 2018-07-15T08:20:56Z

@birdsarah If you have a chance to review this amidst the pycon program that would be great. Otherwise, @Dexterp37 could you please do a fly-by? I'm out for the rest fo July.

birdsarah · 2018-07-16T09:52:41Z

Reviewing

Dexterp37 · 2018-07-16T10:04:40Z

Leaving this to @birdsarah :)

birdsarah

This PR completely changes the environments that are being tested. The result is that only python 3.5 with the requirements.txt file is being tested.

If you want to do that, then clean this up completely - remove the environment_emr.yaml, and all the emr and conda related code in .travis.yaml.

Does any of the code in this repo run on / need to run on an EMR environment? If not, then there's no need to have it all hanging around, but what's in this PR is a mess - it makes it look like 2.7 and emr are running, when they're not.

tests/conftest.py is empty - should be removed.

birdsarah · 2018-07-16T10:02:26Z

.travis.yml


 after_success:
- coveralls
+- covhttp://docs.travis-ci.com/user/ci-environment/eralls


This is broken.

birdsarah · 2018-07-16T10:04:53Z

Makefile

@@ -9,5 +9,7 @@ test:
 	python setup.py develop
 	python setup.py test
 	flake8 taar tests
-
-
+	#


What is this extra comment line?

Ah, vim artifact. Removing it now.

birdsarah · 2018-07-16T10:05:06Z

bin/hashfreeze

@@ -0,0 +1,8 @@
+#/bin/bash


This should be #!

birdsarah · 2018-07-16T10:08:07Z

bin/hashfreeze

@@ -0,0 +1,8 @@
+#/bin/bash
+touch requirements.txt


I don't think this is guaranteed to be in the correct directory.

birdsarah · 2018-07-16T10:09:05Z

bin/hashfreeze

+for package_ver in `pip freeze |grep -v hashin`
+do 
+    echo "Processing "$package_ver
+    hashin $package_ver


hashin is not necessarily installed on people's machines and you haven't added a README to explain how to use this - or explaining how and when to use make freeze.

birdsarah · 2018-07-16T10:35:05Z

taar/schema.py

+import colander
+
+
+class RecommendationManagerQuery(colander.MappingSchema):


I think it would be helpful if this were named RecommendationManagerQuerySchema so that it was readily readable as a schema when used elsewhere.

birdsarah · 2018-07-16T10:39:49Z

taar/schema.py

+    """
+    client_id = colander.SchemaNode(colander.String())
+    limit = colander.SchemaNode(colander.Int())
+    extra_data = colander.SchemaNode(colander.Mapping())


We have the following code:

taar/recommenders/recommendation_manager.py

branch_selector = extra_data.get('branch', 'control') if branch_selector not in ('control', 'linear', 'ensemble'): return []

To me this is exactly the kind of thing that colander could be setup for validating and then removing this validation code from the python.

Also, Why not just have an optional branch argument instead of the extra_data catch all?

More generally, I would expect to do validation very early - when data is entering the system. In our case that would be in the taar-api - by this point, I don't really feel like we're getting much protection value from the validation .e.g. we're logging in the passed in client id all over the place.

If we think that splitting taar-api as a seperate module / package / repo is still the right call then it seems to me like we would want a third repo with a schema in that both call from and that taar-api uses and that we can then do more stingeng validation on locale, platform etc.

Also, looking at the extra_data branch logic in the taar-api code we can see that branch always gets passed through and set as control, so I think we could be a little less redundant if this was all a bit more joined up.

birdsarah · 2018-07-16T11:06:33Z

taar/schema.py

+    Mostly useful for evoloving unittests and APIs in a stable way.
+    """
+    client_id = colander.SchemaNode(colander.String())
+    limit = colander.SchemaNode(colander.Int())


do we want to set a min of 0 and max of ??.

birdsarah · 2018-07-16T11:06:50Z

taar/schema.py

+
+    Mostly useful for evoloving unittests and APIs in a stable way.
+    """
+    client_id = colander.SchemaNode(colander.String())


Do we want to be more constrained in what are acceptable client_id's - at least a max length?

A max length check has been added, but I'd rather not put any other kind of constraint on this field. It's driven by the AMO team and adding more constraints would end up making us tightly coupled with AMO changes.

birdsarah · 2018-07-16T11:09:00Z

tests/test_recommendation_manager.py

@@ -107,7 +107,7 @@ def get(self, client_id):
    ctx['clock'] = Clock()
    ctx['cache'] = JSONCache(ctx)
    manager = RecommendationManager(ctx.child())
-    recommendation_list = manager.recommend({'client_id': 'some_ignored_id'},
+    recommendation_list = manager.recommend('some_ignored_id',


Haven't really tested that our new schema assertions are working, but I'm okay with that assuming that we trust our schema validation library - because I don't believe in testing configuration.

However, we should have a simple test that asserts that we are validating against our schema - so that code can't get inadvertently bypassed.

I've reworked the way the schema validator is applied against the recommend method. It's now a decorator to clean up the awkward code at the beginning of the recommend method. The argument specification is cached on first run so this shouldn't add too much over head on the function calls.

birdsarah · 2018-07-16T11:28:18Z

We should also fix the .travis.yaml file so that it's not showing pass test runs when they have multiple failing lines, which is currently the case.

…ile squashing flake8 problems

crankycoder · 2018-07-17T02:45:49Z

@birdsarah I think I've addressed everything except for the removal of the python 2.7 test enviroment. That was removed because we never use Python 2.7 in production except in the EMR enviroment (which was kept as a test env).

birdsarah · 2018-07-17T10:42:18Z

Can you clarify - do we want to test the emr environment?

crankycoder · 2018-07-17T16:40:24Z

@birdsarah yes - we do want to test in the EMR enviroment. EMR tests can be seen running over here: https://travis-ci.org/mozilla/taar/jobs/404706772

birdsarah · 2018-07-18T16:20:56Z

@birdsarah yes - we do want to test in the EMR enviroment. EMR tests can be seen running over here: https://travis-ci.org/mozilla/taar/jobs/404706772

I would say this isn't testing the EMR environment because:

The EMR environment isn't installed https://travis-ci.org/mozilla/taar/jobs/404706772#L682 (see my comment https://github.com/mozilla/taar/pull/102/files#r202635694)
Even if it were installed, this PR then pip installs a different set of packages over the top of the environment. https://travis-ci.org/mozilla/taar/jobs/404706772#L731 (see my comment https://github.com/mozilla/taar/pull/102/files#r202636103 - which I see now was terse and not making this point clearly - my apologies)

crankycoder force-pushed the features/96-schema branch 2 times, most recently from 64a7b2c to 54a8bf8 Compare July 12, 2018 19:34

crankycoder force-pushed the features/96-schema branch from a871ee4 to 04fd2f2 Compare July 13, 2018 19:52

* updated bin/hashfreeze to use explicit versions

4de99b5

* added TRAVIS_PYTHON_VERSION detection to run flake8 only in py3.5 enviroment as version breakage in py2.7 is too hard to manage * added basic colander schema validation

crankycoder force-pushed the features/96-schema branch from 04fd2f2 to 4de99b5 Compare July 13, 2018 20:06

crankycoder requested review from birdsarah and mlopatka July 13, 2018 20:12

birdsarah suggested changes Jul 16, 2018

View reviewed changes

crankycoder added 6 commits July 16, 2018 13:26

re-enable test cases for python2.7+EMR. Accidentally disabled them wh…

bbf62a4

…ile squashing flake8 problems

fixed accidental breakage with coveralls

b32175c

removed vim artifact in Makefile and fixed hashbang in bin/hashfreeze

905a91d

updated README to include instructions on rebuilding dependency hashes

78eba3e

reworked the schema validator so that it's applied as a decorator

3580bb9

fixed inspect.signature to use funcsigs instead for py2.7 compatibility

f0898e5

crankycoder force-pushed the features/96-schema branch from 76ec789 to f0898e5 Compare July 17, 2018 02:36

put a max length of 200 on the client_id

4e8e55a

crankycoder mentioned this pull request Jul 25, 2018

Features/38 plugins #106

Closed

crankycoder closed this Aug 13, 2018

crankycoder deleted the features/96-schema branch August 13, 2018 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic colander based query validation added #102

Basic colander based query validation added #102

crankycoder commented Jul 12, 2018

coveralls commented Jul 13, 2018 •

edited

Loading

crankycoder commented Jul 13, 2018

mlopatka commented Jul 15, 2018

birdsarah commented Jul 16, 2018

Dexterp37 commented Jul 16, 2018

birdsarah left a comment

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

crankycoder Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

birdsarah Jul 16, 2018

crankycoder Jul 17, 2018

birdsarah Jul 16, 2018

crankycoder Jul 17, 2018

birdsarah commented Jul 16, 2018

crankycoder commented Jul 17, 2018

birdsarah commented Jul 17, 2018

crankycoder commented Jul 17, 2018

birdsarah commented Jul 18, 2018

		import colander


		class RecommendationManagerQuery(colander.MappingSchema):

Basic colander based query validation added #102

Basic colander based query validation added #102

Conversation

crankycoder commented Jul 12, 2018

coveralls commented Jul 13, 2018 • edited Loading

crankycoder commented Jul 13, 2018

mlopatka commented Jul 15, 2018

birdsarah commented Jul 16, 2018

Dexterp37 commented Jul 16, 2018

birdsarah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

birdsarah commented Jul 16, 2018

crankycoder commented Jul 17, 2018

birdsarah commented Jul 17, 2018

crankycoder commented Jul 17, 2018

birdsarah commented Jul 18, 2018

coveralls commented Jul 13, 2018 •

edited

Loading