New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect implementation of List field deserialization #345

Closed
maximkulkin opened this Issue Nov 30, 2015 · 8 comments

Comments

Projects
None yet
6 participants
@maximkulkin
Contributor

maximkulkin commented Nov 30, 2015

Problem explanation:

from marshmallow import Schema, fields, validate, validates, validates_schema, ValidationError


class IntRangeSchema(Schema):
    first = fields.Integer(required=True)
    last = fields.Integer(required=True)


class PoolSchema(Schema):
    name = fields.String(required=True)
    ranges = fields.List(fields.Nested(IntRangeSchema))

    @validates('ranges')
    def validate_ranges(self, ranges):
        print 'validate_ranges %s' % repr(ranges)

When I load valid data, the validate_ranges() is called with a list of dictionaries (which is expected):

schema = PoolSchema(strict=True)
data = {
    'name': 'pool1',
    'ranges': [
        {'first': 1, 'last': 10},
        {'first': 11, 'last': 20},
    ],
}
print repr(schema.load(data).data)
# => validate_ranges [{'last': 10, 'first': 1}, {'last': 20, 'first': 11}]
# => {'ranges': [{'last': 10, 'first': 1}, {'last': 20, 'first': 11}], 'name': u'pool1'}

But when I load an invalid data, then the validates_ranges() receives only a non-valid dictionary, not a list:

data = {
    'name': 'pool2',
    'ranges': [
        {'first': 1, 'last': 10},
        {'last': 10},
    ],
}            
print repr(schema.load(data).data)
# => validate_ranges {'last': 10}
# => ... ValidationError

Moreover, if I add complete schema validation (via validates_schema), this is what passed to it:

{'ranges': {'last': 10}, 'name': u'pool2'}

I see multiple problems here:

  1. naive deserialization in List field type
  2. running validations for fields that have already failed by other means.

The second issue is addressed in #323, so I'll focus on the first one.

Let's take List field as example. Deserialization of list types is basically:

def _deserialize(self, value, attr, data):
    return [self.container.deserialize(each) for each in value]

By definition container is a field type which will report deserialization/validation errors by raise ValidationError exceptions. That means that you will get an error for the first invalid item in the list and the rest of items won't be validated at all. The proper way would be to accumulate all errors:

def _deserialize(self, value, attr, data):
    if not utils.is_collection(value):
        self.fail('invalid')
    result = []
    errors = {}
    for idx, each in enumerate(value):
        try:
            result.append(self.container.deserialize(each))
        except ValidationError as e:
            result.append(e.data)
            errors.update({str(idx): e.message})

    if errors:
        raise ValidationError(errors, data=result)

    return result
@touilleMan

This comment has been minimized.

touilleMan commented Dec 9, 2015

+1
I was about to submit the same issue

@immerrr

This comment has been minimized.

Contributor

immerrr commented Dec 22, 2015

+1 on this one

@chekunkov

This comment has been minimized.

chekunkov commented Dec 23, 2015

Hi @maximkulkin

why do you use

ranges = fields.List(fields.Nested(IntRangeSchema))

instead of the recommended (see note)

ranges = fields.Nested(IntRangeSchema, many=True)

?

See also #319 which discusses fields.Nested(..., many=True) validation drawbacks.

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Dec 23, 2015

  1. (although I didn't know that you could do that) it is very non-intuitive. What is intuitive is having a field type for each of your container types which actually do the job correctly.
  2. This works only for creating fields that are collections of other objects (which are described with another Schema class derivatives). If you want to create a list of simple, non-object types (e.g. IPAddress field which actually is just a customized string), you can not (or at least, should not) use fields.Nested.
@justanr

This comment has been minimized.

Contributor

justanr commented Dec 24, 2015

The reason for the nested field accepting many is because it's nesting a full schema inside of another schema, not just a single field.

It's like saying SomeSchema(many=True).dump(list_of_stuff)

@justanr

This comment has been minimized.

Contributor

justanr commented Dec 24, 2015

I will add I'm +1 on collecting all the errors in List though.

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Dec 24, 2015

@justanr Thank you for pointing that out

@sloria

This comment has been minimized.

Member

sloria commented Dec 24, 2015

@maximkulkin Thank you for the detailed report.

I think collecting errors is the desired and expected behavior. I would gladly review and merge a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment