New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested field with many=True and required=True gives unexpected error for missing data #319

Closed
chekunkov opened this Issue Nov 3, 2015 · 16 comments

Comments

Projects
None yet
8 participants
@chekunkov

chekunkov commented Nov 3, 2015

Say I have following schema definition

from marshmallow import Schema, fields


class ObjectSchema(Schema):
    id = fields.Integer(required=True)
    name = fields.String(required=True)


class CollectionSchema(Schema):
    data = fields.Nested(ObjectSchema, many=True, required=True)

Here's how this schema works for several invalid inputs

In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})

In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={})

In [8]: CollectionSchema().load({'data': [{}]})
Out[8]: UnmarshalResult(data={'data': [{}]}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})

IMO Out[8] is the only correct validation error from the examples - it correctly describes the problem - all required fields are missing in nested item. For other examples I actually expect something like:

In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': ['Missing data for required field.']})

shows that 'data' field is required and missing. I'm missing this one badly.

In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={'data': ['Field may not be empty list.']})

shows that field contains empty list - for my usecase it's the same as missing field. Well, this one is arguable, maybe Nested field requires another parameter to enforce validation errors in case of empty nested value.

So currently marshmallow cannot distinguish between these three cases which complicates automated error handling in JSON api I'm working on now - as I said errors do not describe the problem properly. What do you think - is it an issue or I'm missing something and there's better approach to this problem?

@sloria

This comment has been minimized.

Member

sloria commented Nov 9, 2015

Thanks for the thorough report, @chekunkov .

For case 2: required validation applies only to missing inputs; empty lists, empty strings, zero, etc. are still considered input. If you want to disallow empty lists, you could do:

from marshmallow import Schema, fields, validate

class ObjectSchema(Schema):
    id = fields.Integer(required=True)
    name = fields.String(required=True)


class CollectionSchema(Schema):
    data = fields.Nested(ObjectSchema,
                         validate=validate.Length(min=1,
                                                  error='Field may not be an empty list'),
                         many=True,
                         required=True)

print(CollectionSchema().load({'data': []}).errors)
# {'data': ['Field may not be an empty list']}

Case 1 is less clearly cut. #235 was implemented such that nested required fields would be validated if the outer field was missing. @max-orhai, can you comment on the rationale for this decision?

For the time being, you could work around the issue with a custom Nested field:

from marshmallow import Schema, fields, validate, missing

class Nested(fields.Nested):
    def _validate_missing(self, value):
        if value is missing and getattr(self, 'required', False):
            self.fail('required')
        return super()._validate_missing(value)

class ObjectSchema(Schema):
    id = fields.Integer(required=True)
    name = fields.String(required=True)

class CollectionSchema(Schema):
    data = Nested(ObjectSchema,
                    validate=validate.Length(min=1,
                                            error='Field may not be an empty list'),
                    many=True,
                    required=True)

print(CollectionSchema().load({}).errors)
# {'data': ['Missing data for required field.']}
@max-orhai

This comment has been minimized.

max-orhai commented Nov 12, 2015

The rationale for _validate_missing is as follows:
Consider a web API using a nested schema in which both inner and outer fields are required, so valid JSON data looks like (e.g.) {outerRequired: {innerRequired: true}}.

Now suppose an API client submits a request containing valid data for
the outer schema, but missing the key for the inner schema, like
{outerRequired: {extra: true}}. Clearly, the client should be informed
that the inner schema was missing a required field. That was how
Marshmallow worked before #235.

But what if the client submits a request, say {extra: true}, that
doesn't have a key for the outer field? Before #235, the client would
have to submit an additional request just to discover the inner
required field. After #235, the client can be informed in a single
response of everything missing from the request data, now matter how
many levels deep.

In short, _validate_missing is an aid to discoverability. I hope that
clears things up for you.

On Sun, Nov 8, 2015, at 05:18 PM, Steven Loria wrote:

Thanks for the thorough report, @chekunkov[1] .

For case 2: required validation applies only to missing inputs; empty
lists, empty strings, zero, etc. are still considered input. If you
want to disallow empty lists, you could do:

from marshmallow import Schema, fields, validate

class ObjectSchema(Schema):
id = fields.Integer(required=True) name =
fields.String(required=True)

class CollectionSchema(Schema):
data = fields.Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'),
many=True, required=True)

print(CollectionSchema().load({'data': []}).errors)

{'data': ['Field may not be an empty list']}

Case 1 is less clearly cut. #235[2] was implemented such that nested
required fields would be validated if the outer field was missing. @max-
orhai[3], can you comment on the rationale for this decision?

For the time being, you could work around the issue with a custom
Nested field:

from marshmallow import Schema, fields, validate, missing

class Nested(fields.Nested):
def _validate_missing(self, value): if value is missing and
getattr(self, 'required', False): self.fail('required')
return super()._validate_missing(value)

class ObjectSchema(Schema):
id = fields.Integer(required=True) name =
fields.String(required=True)

class CollectionSchema(Schema):
data = Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'), many=True,
required=True)

print(CollectionSchema().load({}).errors)

{'data': ['Missing data for required field.']}

— Reply to this email directly or view it on GitHub[4].

Links:

  1. https://github.com/chekunkov
  2. #235
  3. https://github.com/max-orhai
  4. #319 (comment)
@chekunkov

This comment has been minimized.

chekunkov commented Dec 18, 2015

thanks for perfect examples and workarounds @sloria !

But what if the client submits a request, say {extra: true}, that
doesn't have a key for the outer field?

@max-orhai honestly I don't think it's a good idea to overcomplicate "A lightweight library for converting complex objects to and from simple Python datatypes" to handle edge cases like this. maybe it works for your example, when nested field contains required single nested object, but above I provided an example of the case where this advanced deep validation leads to a confusion. by saving some time for client with some particular schema this validation gives ambiguous error responses for others.

what if nested fields is a list, what if it can be empty {outer: []}? in this case the only valid response for submitted request {extra: True} would be

{'outer': ['Missing data for required field.'], 
 'extra': ['Rogue field]}

not

{'outer': {0: {'innerRequiredField': ['Missing data for required field.']}}, 
 'extra': ['Rogue field]}

or what if outer should be of length 10? following your explanation current validation logic should returns something like

{'outer': 
    {0: {'innerRequiredField': ['Missing data for required field.']}},
    {1: {'innerRequiredField': ['Missing data for required field.']}},
    {2: {'innerRequiredField': ['Missing data for required field.']}},
    # ...
    {9: {'innerRequiredField': ['Missing data for required field.']}},
}

but instead it returns {'outer': ['Length must be between 10 and 10.']} which doesn't look consistent with that error example you provided - which once again leads to a confusion.

@max-orhai

This comment has been minimized.

max-orhai commented Dec 23, 2015

@chekunkov Thanks for sharing your concerns. This feature is helpful to my project's clients, which is why I submitted #235. I don't think it is overcomplicated, but it should be easy to disable: I'd be happy to help review a pull request that made _validate_missing an optional parameter to the Nested field constructor, if you submit one.

I agree that the interaction between missing data for Nested, many=True fields and the Length validator could be enhanced, but I don't see any ambiguity beyond what's already present without _validate_missing, assuming that a list of required data should never be empty. Suppose we have the following schema:

class Inner(Schema):
    num = fields.Number
    url = fields.URL

class Outer(Schema):
    inners = fields.Nested(Inner, many=True, required=True, validate=Length(min=2, max=2))

Now, when deserializing {} through an Outer, we get back the error {'inners': ['Length must be between 2 and 2.']} -- which is valid, although not quite as helpful as we might like, since Length doesn't know anything about the Inner schema. If it did, we could have the more helpful

{'inners': [
    {'num': ['Missing data for required field.', 'Must be a number'],
     'url': ['Missing data for required field.', 'Must be a URL']},
    {'num': ['Missing data for required field.', 'Must be a number'],
     'url': ['Missing data for required field.', 'Must be a URL']}
]}

Note that the list of errors for the 'inners' field contains exactly two dicts, each in the format described by the Inner schema, just like valid data for that field would. If inner was constructed without the Length validator, _validate_missing places a single dict in that list, again under the assumption that a list of required dicts should not be empty. If an empty list should be acceptable data there, say to let a client explicitly declare the absence of something, then it would indeed be useful to disable _validate_missing. If that is actually a common case, then maybe this is a feature that should have to be explicitly enabled.

For what it's worth, I think that much of the confusion around this issue originates in the many=True convention for Nested fields. I'd prefer to be able to write fields.List(fields.Nested(inner_schema)) instead: in this case, it would make it clearer that the outer list itself is required, apart from any validation of its length or contents.

@chekunkov

This comment has been minimized.

chekunkov commented Dec 23, 2015

@max-orhai Thanks for the detailed explanation.

I don't think it is overcomplicated, but it should be easy to disable: I'd be happy to help review a pull request that made _validate_missing an optional parameter to the Nested field constructor, if you submit one.

Sounds fair. Unfortunately I've chosen another library for my project, so I'm not sure I'll be able to submit a PR in the nearest future - it requires a bit deeper diving into the sources and I don't have much time to do that now. But maybe I'll try to use Marshmallow again for the next project and chances are I'll have a chance to spend more time contributing to it.

@deckar01

This comment has been minimized.

Member

deckar01 commented Jun 27, 2016

I think the intention behind the deeply nested error messages is well founded. It does not provide so much additional information that it hinders usability.

I would suggest using a custom validator that creates less verbose errors instead of changing the required errors.

@deckar01

This comment has been minimized.

Member

deckar01 commented Jun 28, 2016

The validation for required is handled separately because it must precede execution of the validators callables. If a required value is not passed, validation should fail fast (as there is no value to pass to the rest of the validators).

After reading #334 (comment) I realized that the required validator provides a shortcut behavior that a custom validator cannot. It seems like the custom error string returned by a required callable in #334 would override the nested error dictionary, which would resolve this issue.

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Jul 13, 2016

@max-orhai @deckar01 This functionality is all ways broken.

Consider the case when you have deeply nested JSON structure:
{'foo': {'bar': {'baz': {'bam': {'quux': true}}}}}
User sends request
{'foo': {'bap': {'baz': {'bam': {'quux': true}}}}}
Imagine his confusion when he gets error like that ?!
{'foo': {'bar': {'baz': {'bam': {'quux': ['Missing required field']}}}}}
Then user starts complaining about API being insane and all ways broken, because the value for 'quux' is just there.

What he really should get is
{'foo': {'bar': ['Missing required field']}}

As with required fields, you could have a bunch of validators that will trigger only if some fields are present (e.g. validating string with regex). Even if user will know after sending request once which fields are required, he will still need to do multiple iterations to figure out what type should particular field be and what are other constraints for it exist.

More of this in #495

@deckar01

This comment has been minimized.

Member

deckar01 commented Jul 13, 2016

It might be a good idea to fall back to the original behavior by default and require an explicit parameter to enable more verbose error messages. I think it is important for error messages to be clear and concise by default.

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Jul 13, 2016

I would suggest to disable it completely because:

  1. It looks questionable
  2. It complicates code
  3. It does not work with Nested schemas inside other containers (e.g. fields.List)
@martinstein

This comment has been minimized.

martinstein commented Jun 20, 2017

Quick comment from my side, since this issue currently has the status "Feedback welcome": I agree with @chekunkov and @maximkulkin that the current functionality is very surprising/weird. I was expecting mm.fields.Nested(SomeSchema, many=True, required=True) to give me strict feedback when that field was missing completely. Instead I get error messages about missing data within the list, even though nothing was provided at all for that field.

I would strongly prefer if the change from #235 was reverted back to the old, simple behavior (or at least that should be the default behavior). Marshmallow shouldn't try to be too clever in what it does. If people want this advanced functionality, it should be on an opt-in basis.

@martinstein

This comment has been minimized.

martinstein commented Jun 20, 2017

In general, I think a good approach for the API-design of a validation-library should be to favor strictness and simplicity over convenience. Don't do any nested validation of data that doesn't even exist.

@okke-formsma

This comment has been minimized.

okke-formsma commented Nov 28, 2017

I'm running into a few issues with Marshmallow, as I don't see any way to make a field required, even if it's length can be zero.

I have a bulk request, and a valid bulk request is a request with no operations. However, there is no way I can express this in Marshmallow.

As a workaround I've created a new Nested class:

class NestedAllowEmpty(fields.Nested):
    def _validate_missing(self, value):
        if value is missing and getattr(self, 'required', False):
            self.fail('required')
        if len(value) > 0:
            return super()._validate_missing(value)

Please reconsider the default behavior.

@lafrech

This comment has been minimized.

Member

lafrech commented Jan 9, 2018

I was just caught by this while working on #378 and I also think current behaviour is surprising and this change should be reverted.

I disagree with the rationale presented in #319 (comment). There are ways to document an API, for instance using apispec to generate an OpenAPI doc. I can't really picture my clients using this trial and error approach to learn what data is required. (OK, in fact, I'm pretty sure that's what they do, we all do... But this is wrong.) The error message is not meant to document the API. We don't expect the clients to send garbage just to get an error message specifying what the server expects. Besides, for those using the trial and error approach, reverting this change will just make it a few more steps.

I'd like to change this in v3.0. Any objection?

To be explicit, I'm talking about case 1 in OP. Regarding case 2, I agree with @sloria's answer, this is the right behaviour.

@lafrech

This comment has been minimized.

Member

lafrech commented Mar 18, 2018

I just proposed a PR that removes the feature: #754.

@sloria

This comment has been minimized.

Member

sloria commented Mar 18, 2018

I agree with @lafrech 's analysis above. Thanks for the PR. Let's make it so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment