Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type coercion validator #71

Closed
wants to merge 10 commits into from
Closed

Conversation

brettatoms
Copy link

By adding a coerce: <some callable> validator to a schema the return value of the callable will be used as the value to validate against in the document. This overwrite the value in the document so requires that you make a copy of the document before passing it to the validator if you don't want your original document changed.

This isn't fool proof as it only catches TypeError and ValueError in coercion validator but it is incredibly handy to have for sanitizing and validating HTTP request data.

@nicolaiarocci
Copy link
Member

This seems very similar to what is already achievable with function based custom validation.

@brettatoms
Copy link
Author

The difference is that the coercion always runs first and the value returned by the coerce function is set in the model/document for that key. That means the other validators always check against the coerced value in the model rather than the original value. It's helpful for things like http query parameters which always come in as strings and so you can sanitize and validate them in one step.

@brettatoms
Copy link
Author

I use it like this so when I get the friend_ids from the request.values its already converted to a list.

schema = {
    'friend_ids': {'type': 'table_id_list', 'coerce': utils.parse_csv}
}
values = validate(request.values.to_dict(), schema)
friend_ids = values.get('friend_ids', [])

@nicolaiarocci
Copy link
Member

I think that Cerberus should focus on validation. Transformation is a different thing, and we want to follow the "separation of concern" principle in this case.

@brettatoms
Copy link
Author

It's a pretty common feature to similar frameworks:

https://github.com/alecthomas/voluptuous#validation-functions
https://github.com/podio/valideer#adaptation
http://docs.pylonsproject.org/projects/colander/en/latest/basics.html#deserialization
https://github.com/keleshev/schema#validatables

This is extemely convenient:

schema = {"num": "2", "type": "number"}  // broken
schema = {"num": "2", "type": "number", "coerce": int}  // yay!

@nicolaiarocci
Copy link
Member

Yup there has been someone else proposing something like this in the past.Let's see if more people would like to add transformation features.

@nicolaiarocci
Copy link
Member

@aleksey-kutepov
Copy link

I think that Cerberus should focus on validation

To my mind such type coercion has the same semantics as schema validation. And I'd like to see such errors along with schema errors. So as for me I see no reasons to separate these concerns. In addition it is possible to schema = {'a_list': {'type': 'list', 'schema': {'type': 'integer'}}} which is essentially the same but more limited.

@nicolaiarocci
Copy link
Member

Alright folks. Looks like the feedback we've been waiting for has arrived, and I stand corrected 😄.

@brettatoms would you also update the documentation?

@nicolaiarocci nicolaiarocci reopened this May 7, 2015
@brettatoms
Copy link
Author

@nicolaiarocci Thanks! Should be good to go.

@aleksey-kutepov
Copy link

@nicolaiarocci @brettatoms Thanks you very much

@funkyfuture
Copy link
Member

some thoughts concerning the design:

schema = {'property': {'type': 'integer', 'coerce': int}}

seems redundant as it will be of type integer implicitly. this should be enough:

schema = {'property': {'coerce': int}}

i don't like the idea what i'd like better is something like this, so doc isn't touched necessarily:

coerced_doc = v.validate(doc)  # breaks compatibility

or

coerced_doc = v.coerce_values(doc)  # which implies v.validate()
# or even
c = cerbereus.Coercer(…)  # that instance will have an own Validator-instance as property
coerced_doc = c.validate(doc)

or

v.validate(doc)
coerced_doc = v.coerced_doc

what about defining coerce similar like custome type-checks? that would require a sub-class to use it, but the design is more stringent.

there should be some tests.

@@ -327,6 +333,13 @@ def validate_schema(self, schema):
raise SchemaError(errors.ERROR_UNKNOWN_RULE % (
constraint, field))

def _validate_coerce(self, coerce, field, value):
try:
value = coerce(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is coerce implemented? it should propably be a method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coerce argument is the callable that you define in the schema. See https://github.com/nicolaiarocci/cerberus/pull/71/files#diff-aebc53cd4926f3a579adb7ba188de369R214

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah. with a definition of methods like for checking types, these functions / then methods can be designed to consider context. i guess sooner or later someone will need that.

@funkyfuture
Copy link
Member

oh, can schema-validation improve fool-proofness?

@brettatoms
Copy link
Author

  1. You can certainly omit 'type': 'integer' if your doing a type coercion. I also use the coercion to do something like parse a string as a comma separated list and then use the type validator to verify each element in the list matches a regex.

  2. I also don't like that the coercion is destructive but it was way simpler to implement it this way without breaking anything else. I tend to do make a copy of the dict before passing it in if I want to keep the original around: validate(request.values.to_dict(), schema)

  3. I tend to validate my schemas like this to get back the document and avoid breaking compatibility:

    class Validator(cerberus.Validator):
        def __call__(self, document, *args, **kwargs):
            self.validate(document, *args, **kwargs)
            return self.document
    
    
    @contextmanager
    def validator():
        v = Validator()
        try:        
            yield v
        finally:
            if len(v.errors) > 0:
                raise ValidationError(v.errors)
    
    
    with validator() as validate:
        values = validate(doc, schema)
  4. Yes there should be tests.

@funkyfuture
Copy link
Member

  1. then i propose you add such example to the documentation. this better shows the possibilities and makes sense. examples that include redundancy often lead to misunderstandings. in that case it can be interpreted that 'coerse': int somehow correlates directly to 'type': 'integer'.
  2. i think it's reasonable that the library ensures a non-destructive way. i also showed three approaches on that.
  3. with schema-validation i meant the part where Validator checks the provided schema. so it should check the callable's existence, 'callability' and signature.
  4. i like the idea to implement __call__ and propose to upstream it to Validator. already present.
  5. as pointed out in the line-comment: to define coersion-routines in the class offers the possibility to contextualize.

@nicolaiarocci
Copy link
Member

Only concern I have on this is that it transforms the input. Of course there's a note in the docs, and that's probably enough. Also, in several scenarios this might actually the desired behaviour, which leads me back to my original concern that we this update we are, in fact, transitioning from a pure validation tool to a validation and transformation utility.

I wonder if we should make transformation an explicit opt-in, like setting up a transformation propriety which is False by default. When this property is False then a coerce in the schema will raise a schema error. This property could also be allowed as a initialization argument. Thoughts?

@funkyfuture
Copy link
Member

transitioning from a pure validation tool to a validation and transformation utility.

in short: i have my doubts that just calling a function is enough to achieve that properly and to fit more complex use-cases. i pointed at my concerns before. if anybody wants, i can elaborate more on these which are unclear.

@brettatoms
Copy link
Author

@nicolaiarocci How about doing a copy.copy() on the document before setting it to to self.document in Validator._validate() so that its not destructive to the original input?

@funkyfuture
Copy link
Member

since referenced objects are potentially changed one should use copy.deepcopy.

@nicolaiarocci
Copy link
Member

@brettatoms yes, absolutely.

I initially ruled it out because I thought that people might actually want to apply transformations (we also had requests like that in the past). Also agree with @funkyfuture that deepcopy is better. Would you update the PR and docs?

@martijnvermaat
Copy link
Contributor

For reference, an older similar request was #3

@aleksey-kutepov
Copy link

make transformation an explicit opt-in

How about subclassing cerberus.Validator with something like cerberus.CoerceValidator and do coercion related stuff there. We'd get backward compatibility along with explicit declaration.
Though this requires some refactoring of Validator._validate

@funkyfuture
Copy link
Member

if there was an extra-class, wouldn't it be simpler to leave the Validator-class as it is and let a CoerceValidator use it? as far as i understood, the coercion would happen first anyway.

    class CoerceValidator():
        def __init__(self, *args, **kwargs):
            self.validator = Validator(*args, **kwargs)

        def validate(…):
            do_coersion()
            self.validator.validate(…)

but i don't see much harm in this now. however, it'd be easy to deepcopy and write changed value-objects to another class-property than document. for example, validated_document.

and i still think that it's also a good idea to implement checks in validate_schema to avoid later runtime-errors. e.g. an object without __call__-method is referenced in a schema.

also, i can anticipate that if i wanted to use that feature (and atm there is actually one project where i would), i would have to rely on context (e.g. a basepath to resolve a path), which is hardly accessible from a function and sould be available as properties of a class-instance. so i'm absolutely pro an extendibility like for type-checking. but that can be added later too.

nicolaiarocci added a commit that referenced this pull request May 13, 2015
@nicolaiarocci
Copy link
Member

Thanks, this has been merged in 8fc11a1. Feel free to update both AUTHORS and CHANGES with your full name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants