Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validates schema field errors #442

Conversation

maximkulkin
Copy link
Contributor

Improves validation error reporting functionality enabling reporting field errors from whole-schema (validates_schema) validator.

Introduces new public function marshmallow.utils.merge_errors which allows deep merging of errors. It can be useful to simplify accumulating errors in users' validators.

Updates validation error documentation to clarify error messages format.

Fixes #441

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch 2 times, most recently from 7212047 to 399b9ef Compare April 26, 2016 19:47
@maximkulkin
Copy link
Contributor Author

Hey guys, any update on this ?

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch from 399b9ef to d327e97 Compare May 18, 2016 16:53
@sloria
Copy link
Member

sloria commented May 22, 2016

Apologies for the delay. I am hesitant to merge this, as it adds significant complexity to the codebase. I would like more feedback on the #441 before merging this.

@maximkulkin
Copy link
Contributor Author

@sloria What parts you think are still complex? IMO having a well tested function that does heavy lifting of unifying error format actually simplifies things. The error merging function is exposed as public API so users can also benefit from it. As of existing library code, it became more compact than before.

@deckar01
Copy link
Member

deckar01 commented May 25, 2016

It looks like this PR is adding about 60 lines of code to do what was being done in about 20 before. The increase in complexity seems to be that it is combining 2 collections of arbitrary type, instead of adding a collection of error messages to the error dictionary as before.

I would not oppose this as a public utility if there was a strong use case, but it seems like a public utility is a little out of scope for #441.

@maximkulkin
Copy link
Contributor Author

maximkulkin commented May 25, 2016

@deckar01 Good point on counting lines. Those additional 40 lines is for value added:

  1. Even before Marshmallow supported different forms of errors (string, list of strings, dictionary of errors), but this functionality was rather limited (e.g. you couldn't report field errors from whole-schema validation function, although it seem to be possible by raising ValidationError with dictionary of errors. That didn't work).
  2. The public function is to simplify building errors inside whole-schema validation function: you just build error for your particular field and merge_errors will handle the rest:
@marshmallow.validates_schema
def validate_all(self, data):
    errors = {}
    if data['foo']['bar'] >= data['baz']['bam']:
        errors = marshmallow.merge_errors(
            errors, {'foo': {'bar': 'Should be less than bam'}}
        )
    if data['foo']['quux'] >= data['baz']['bam']:
        errors = marshmallow.merge_errors(
            errors, {'foo': {'quux': 'Should be less than bam'}}
        )
    ...
    if errors:
        raise ValidationError(errors)

You could accumulate errors without that function by carefully coding which goes where and building all those dictionaries if you like, but it could be tricky. merge_errors is a convenient utility for this job.

@deckar01
Copy link
Member

deckar01 commented May 26, 2016

  • The ValidationError seems to only support a single error, but supports associating the error with multiple fields.
  • The @validate_schema decorator can only handle a single exception that gets raised.
  • The fields parameter for ValidationError probably accepts fields regardless of how deeply they are nested, but field_names has no way to express nested field names.

A common design pattern is for a validator to return a list of errors. Nested field names are starting to be expressed using dotted nesting syntax.

@marshmallow.validates_schema
def validate_all(self, data):
    errors = []
    if data['foo']['bar'] < data['baz']['bam']:
        errors.append(ValidationError('Should be less than baz.bam', ['foo.bar']))
    if data['foo']['quux'] < data['baz']['bam']:
        errors.append(ValidationError('Should be less than baz.bam', ['foo.quux']))
    ...
    return errors

@maximkulkin
Copy link
Contributor Author

maximkulkin commented May 26, 2016

@deckar01

The ValidationError seems to only support a single error, but supports associating the error with multiple fields.

If you look into documentation, it says, that first argument to ValidationError is String, list, or dictionary of error messages. If a dict, the keys will be field names and the values will be lists of messages. Obviously, it is by design to allow reporting errors for multiple fields.

The @validate_schema decorator can only handle a single exception that gets raised.

Completely agree, couldn't have said better than this. Although I'm not sure why you have mentioned that in this topic's regard.

The fields parameter for ValidationError probably accepts fields regardless of how deeply they are nested, but field_names has no way to express nested field names.

Actually, this whole design of ValidationError arguments (apart from the first one) is not clear to me. What is the use case for reporting the same error for multiple fields (which will be copied to all fields). Although in my PR I tried to maintain the same behavior, I think that those arguments are leftovers of some old error reporting design and are not actually used that much.

That code snippet that you've presented is not supported as of now as far as I know. Although I agree that common design pattern for validator is to return list of errors, this is not how it is done right now: value of validation function is not used anywhere, you need to raise ValidationError instead. Reporting errors as an exception has some advantages, e.g. you can mix deserialization with validation and stop deserialization when you encounter error you can not proceed with.

Dotted syntax seem to be the case for specifying attributes for "only" and "exclude" schema parameters, I haven't seen mention of plans to use it for error reporting. Moreover, having errors for particular sub-object grouped together is more convenient than filtering strings by prefixes. Compare

{
  'foo.bar.baz': 'Error 1',
  'foo.bar.bam': 'Error 2',
  'foo.quux': 'Error 3',
}

to

{
  'foo': {
    'bar': {
      'baz': 'Error 1',
      'bam': 'Error 2',
    },
    'quux': 'Error 3',
  }
}

If you want to parse errors and display them in place with edit controls, the second variant is far easier to work with.

@deckar01
Copy link
Member

Obviously, it is by design to allow reporting errors for multiple fields.

Actually, this whole design of ValidationError arguments (apart from the first one) is not clear to me. What is the use case for reporting the same error for multiple fields (which will be copied to all fields).

There don't seem to be any complex examples in the docs or the tests, but here is a possible use case for a single error on multiple fields:

class PriceSchema(Schema):
    amount = fields.Decimal()
    currency = fields.Str()

class CheapProductSchema(Schema):
    price = fields.Nested(PriceSchema)
    shipping = fields.Nested(PriceSchema)
    tax = fields.Nested(PriceSchema)

    @validates_schema
    def validate_product_is_cheap(self, data):
        if data['price']['amount'] + data['shipping']['amount'] + data['tax']['amount'] > 10:
            raise ValidationError('Product is not cheap', ['price', 'shipping', 'tax'])

    @validates_schema
    def validate_product_currencies_match(self, data):
        if len(set([data['price']['currency'], data['shipping']['currency'], data['tax']['currency']])) > 1:
            raise ValidationError('Product prices must use the same currency', ['price', 'shipping', 'tax'])

Notice how all 3 fields share responsibility for the error and the error could be fixed by changing any of the values.

That code snippet that you've presented is not supported ...
... value of validation function is not used anywhere ...

It isn't currently, but I think it would be a developer friendly interface to support the use case you have proposed.

I haven't seen mention of plans to use [dotted syntax] for error reporting.

I am proposing it. When the discussion about how to represent nested field names came up for only and exclude, dotted syntax was preferred over a nested collection.

@maximkulkin
Copy link
Contributor Author

That's an interesting example of using those extra fields in ValidationError, but in the result you will have the same error reported multiple times (on all fields) which in some cases could be confusing for users (e.g. in your example incorrect currency in one price results errors reported for all prices). Also, in case of currency validation, it would be sometimes desirable to report that error for "currency" component of prices, not for price as a whole. Current implementation with fields does not allow that, although with this patch you can still do that if you like:

    @validates_schema
    def validate_product_currencies_match(self, data):
        errors = {}
        if data['price']['currency'] != data['shipping']['currency']:
            errors = merge_errors(errors, {
                'shipping': {'currency': 'Should be the same as in price'}
            })
        if data['price']['currency'] != data['tax']['currency']:
            errors = merge_errors(errors, {
                'tax': {'currency': 'Should be the same as in price'}
            })
        if errors:
            raise ValidationError(errors)

I agree that not using exceptions in validation has some pros, returning errors as a list of ValidationError instances looks awkward (because e.g. you still won't be able to report nested errors). Even in that case, it would be better to return string/list/dictionary of errors and in that case merge_errors utility will come in handy.

As of dotted syntax, I like that proposal and I wish you luck in settling it down, but it is completely orthogonal to this case.

@deckar01
Copy link
Member

deckar01 commented May 26, 2016

@sloria The merge_errors utility is introduced to merge the validation error messages of unrelated errors together so they can be raised as single error after the entire schema has been validated. I interpreted the docs as ValidationError is meant to express one error involving potentially multiple fields. What do you think about using it for multiple errors? What do you think about using a nested dictionary to express nested fields?

@maximkulkin
Copy link
Contributor Author

maximkulkin commented May 26, 2016

@deckar01 You interpret documentation wrong, ValidationError was used to report nested errors long before. Here is code sample:

from marshmallow import Schema, fields

class FooSchema(Schema):
    bar = fields.Integer(required=True)

class BarSchema(Schema):
    foo = fields.Nested(FooSchema, required=True)

BarSchema(strict=True).validate({'foo': {'bar': 'abc'}})
# => marshmallow.exceptions.ValidationError: {'foo': {'bar': [u'Not a valid integer.']}}

You get ValidationError raised that contains multiple errors (including errors for nested fields).

Before this patch, errors for each particular field could be reported only from one place - marshmallow.Field subclass instance, and there was no need to merge them. But now each field can get errors from either field validator or schema validator(s), that's why merge_errors utility was added.

@maximkulkin
Copy link
Contributor Author

@sloria Looks like no more objections, right?

@deckar01
Copy link
Member

My remaining objections:

  • merge_errors() does not belong in utils, because it is so tightly coupled with ValidationError.
  • Requiring a utility function to build input for the ValidationError constructor is a poor interface.

ValidationError was used to report nested errors

I will concede that ValidationError is in fact being used to represent multiple unrelated field errors to provide a single schema level validation error object. It would have been better if this behavior was given its own class like SchemaError, but it didn't so I think the use case is justified.

What do you think about moving the merge_errors() logic into the error class and exposing it as an instance method like add_fields()?

I am imagining an interface like:

@marshmallow.validates_schema
def validate_all(self, data):
    errors = ValidationError()
    if data['foo']['bar'] >= data['baz']['bam']:
        error.add_fields({'foo': {'bar': 'Should be less than bam'}})
    if data['foo']['quux'] >= data['baz']['bam']:
        error.add_fields({'foo': {'quux': 'Should be less than bam'}})
    ...
    if not errors.is_empty():
        raise errors

I think that would be easier to use as a developer and clearer to document.

@maximkulkin
Copy link
Contributor Author

I agree that some kind of error builder object would be nice (in our code we use such object). As of usage of ValidationError to be that builder - I'm not sure.

Also, in our builder we use a string attribute path to simplify reporting nested errors:

  errors_builder = ErrorsBuilder()
  errors_builder.add_error('foo/bar/baz', 'Some message')
  print errors_builder.errors
  # => {'foo': {'bar': {'baz': 'Some message'}}}

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch from 8fbf6cd to 1176c52 Compare June 17, 2016 21:40
@maximkulkin
Copy link
Contributor Author

@sloria @deckar01 Looks like there is a problem with Travis CI running tests. Is there a known way to fix that?

@deckar01
Copy link
Member

It seems to have been caused by a breaking change in invoke from v0.12.2 to v0.13.0. I opened an issue to see if they can provide some insight. pyinvoke/invoke#362

A temporary fix would be to pin the version in dev-requirements.txt to invoke==0.12.2.

@maximkulkin
Copy link
Contributor Author

@deckar01 I guess it's not important to have latest version of invoke for this project, so can you put a PR for that so everybody else can rebase?

@deckar01
Copy link
Member

Invoke fix PR: #475

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch 2 times, most recently from 28580c5 to 5061015 Compare June 21, 2016 05:18
@maximkulkin
Copy link
Contributor Author

Ok, I have addressed comments. Can we finish with this?

raise ValidationError(errors)

schema = MySchema()
errors = schema.validate({'foo': 2, 'bar': {'baz': 5}, 'bam': 6})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR errors would have been

{'_schema': [{'bam': 'Value should be less than foo', 'bar': {'baz': 'Non-matching value'}}]}

but now is

{'bam': 'Value should be less than foo', 'bar': {'baz': 'Non-matching value'}}

The _schema key is documented in the Schema Level Validation section of the Extending page.

Assigning schema validation errors to specific fields is documented in the Storing Errors on Specific Fields section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly: before this PR, reported errors were not compliant to schema which says that inside lists you could have only strings (not dictionaries). This makes not much sense to report field errors inside "_schema". It looks inconsistent.
This PR fixes that. It still allows reporting errors for "_schema": just raise validation error with string or list of strings and they will be reported as "_schema" errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are confusing the documentation for the message parameter of ValidationError with the documentation for the return value of Schema.load(). The errors value returned by load() is a named tuple, and its value is documented in the sections on Schema Level Validation and Storing Errors on Specific Fields.

It looks inconsistent. This PR fixes that.

The "inconsistency" between the two interfaces is fully documented. This PR is a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confusing message parameter. What you're saying ("The errors value returned by load() is a named tuple") is incorrect. The result of load() is named tuple out of which errors field is a dictionary of errors:

class FooSchema(marshmallow.Schema):
    bar = marshmallow.fields.String(required=True)

FooSchema().load({'bar': 123})
# => UnmarshalResult(data={}, errors={'bar': [u'Not a valid string.']})

So UnmarshalResult is your named tuple while errors is a dictionary of errors.

https://marshmallow.readthedocs.io/en/dev/extending.html#schema-level-validation describes what will happen if you report a string error inside function decorated with validates_schema. This stays the same with this PR. Storing errors on specific fields also works as before.

| The "inconsistency" between the two interfaces is fully documented. This PR is a breaking change.
Point me into documentation where it shows raising dicts in ValidationError and expecting it to not be treated as nested field errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see the issue. You pointed out that the messages attribute says:

if [it is] a dict, the keys will be field names and the values will be lists of messages.

This documentation is misleading, because it is seems to actually be referring to the way Unmarshaller.run_validator() internally builds errors by copying the message for each field name, not how you have to build your messages dictionary or how it is going to transform your messages dictionary. I think this comment probably should have been added to Schema.load() considering the implementation.

The implementation of Unmarshaller.run_validator() explicitly allows an arbitrary message dictionary argument to pass through as an "error message" value.

Point me into documentation where it shows raising dicts in ValidationError and expecting it to not be treated as nested field errors.

Is it documented that passing a dictionary to the message argument of ValidationError is supposed to treat the keys as field names? I don't think that the messages attribute of ValidationError was actually meant to suggest that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the messages attribute of ValidationError was actually meant to suggest that behavior.

But I do. I don't understand why I need to persuade you to use this feature. If your use cases are fine without it, don't use it. My use cases require that. I do not see a reason to forbid doing that (reporting multiple different errors for different (nested) fields from whole-schema validator).

I haven't seen a better suggestions to do that. The proposed interface and the way resulting errors are reported look consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why I need to persuade you to use this feature.

If this were purely a new feature it would not be an issue, but you had to modify existing test cases, which makes this PR backwards incompatible.

You are arguing that the documentation suggests there is a feature you want, but it is not implemented, so it is a bug that needs to be fixed. I am arguing that your interpretation of the documentation conflicts with the documentation for the message argument, the ValidationError implementation, and the decorator tests. One ambiguous footnote is not a strong argument for introducing breaking changes in v2.

If your use cases are fine without it, don't use it.

The test cases you changed represent a use case that is not "fine without it". Those test cases are supported by a reasonable interpretation of the existing documentation.

I haven't seen a better suggestions to [report] multiple different errors for different (nested) fields from whole-schema validator

You could break your whole-schema validator up into several schema validators, so that you can raise one ValidationError per error message. If you want to be able to assign error messages deeply nested fields, I would be interested in reviewing a PR that adds the dotted nesting syntax to the fields argument which would be a backwards compatible change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before (in original issue description), the test case that I modified look like that feature (using non-string structure as error message) was accidental. The test itself aims to verify completely different aspect of the library (passing original data to validator). If there would be a special test that checked that it is possible to use non-string error messages than fine. But there is no such test case.

As of breaking validator into separate validators, this is not acceptable for me:

  1. You still can not report errors for nested fields (only fields of top level object)
  2. You can not report errors for list elements (since you can not make library re-run validator multiple times, reporting one error at a time).

@maximkulkin
Copy link
Contributor Author

Of course @deckar01! Why bother when you can just put labels the way you like!! EZ

@deckar01
Copy link
Member

deckar01 commented Jul 4, 2016

@maximkulkin I apologize for for any frustration I have caused. I value your opinion and want to see marshmallow satisfy your use case.

I did some detective work 🔍 and found that #110 explicitly added support for arbitrary dictionaries in 1.2.0.

Fix error handling when passing a dict or list to a ValidationError #110. Thanks @ksesong for reporting.

  1. These two commits contain explicit tests for the behavior in question.
  2. Instead of directly testing the changes in marshalling.py, the test cases were added for Schema.validator.
  3. When the @validates_schema decorator was introduced it did not tests this behavior.
  4. When the deprecated Schema.validator API was removed in Remove legacy API #73, the tests were deleted even though the behavior of marshalling.py never changed.

@maximkulkin
Copy link
Contributor Author

Ok, now #110 shows that people experienced problem with not being able to report multiple errors long ago (since 1.2.0), but haven't enough knowledge how to design it elegantly. The situation improved since then: with my #382 you can actually have errors for different items in list, so you do not need to implement custom schemas for error messages just to point on particular error location.

That proves the point that this feature is in demand.

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch from 5061015 to 0909283 Compare July 7, 2016 01:06
@deckar01
Copy link
Member

deckar01 commented Jul 7, 2016

I appreciate that you improved how marshmallow handles the specific use case presented in #110 with #382, but I can't advocate breaking a feature in v2 that has been present since v1.2.0.

This PR would force errors to be strings, but it is very useful for errors to be dictionaries that contain machine readable metadata about validation. Taking away this expressivity is a step backwards.

I would encourage you to close this PR and move the discussion to your original issue. I'm positive we can find an interface that satisfies your use case while preserving the functionality other developers rely on.

@maximkulkin
Copy link
Contributor Author

maximkulkin commented Jul 8, 2016

The problem with reporting non-string errors is that it makes resulting errors ambiguous: was at the message itself or just a nested field errors. This is result of solution sloppiness in 1.2.

If you want to report rich error messages, you can wrap them in your own types (other than lists and dicts) and you'll be fine:

import marshmallow
from collections import namedtuple

MyError = namedtuple('MyError', ['code', 'message'])

class FooSchema(marshmallow.Schema):
    foo = marshmallow.fields.String(required=True)

    @marshmallow.validates_schema
    def validate_schema(self, data):
        raise marshmallow.ValidationError({'foo': MyError(code=123, message='blah')})

print FooSchema().validate({})
# => {'foo': [u'Missing data for required field.', MyError(code=123, message='blah')]}

or do this

class FooSchema(marshmallow.Schema):
    foo = marshmallow.fields.String(required=True)

    @marshmallow.validates_schema
    def validate_schema(self, data):
        raise marshmallow.ValidationError(MyError(code=123, message='blah'))

print FooSchema().validate({})
# => {'foo': [u'Missing data for required field.'], u'_schema': [MyError(code=123, message='blah')]}

I propose moving this to next major release for making incompatible changes. I do not see why we need to keep accumulating mistakes instead of fixing them.

@maximkulkin maximkulkin force-pushed the validates-schema-field-errors branch 2 times, most recently from 084eea5 to 5011560 Compare July 8, 2016 19:56
@sloria
Copy link
Member

sloria commented Mar 4, 2017

Closing this for now, as it has become stale. May reopen this after we evaluate #441 and #581 further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants