Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong message on type validation errors #1303

Closed
cristi23 opened this issue Jul 17, 2019 · 6 comments

Comments

@cristi23
Copy link
Contributor

commented Jul 17, 2019

Before going too deep

I'm using marshmallow==2.19.5 and I noticed that there's a release candidate for marshmallow 3 so if this issue has somehow been solved, please ignore the rest. However, the only issue I noticed which seemed like it might have some relevance is this: #852, but the PR for this doesn't seem to address the problem that I'm having. (PR: https://github.com/marshmallow-code/marshmallow/pull/1067/files).

Reproducing the issue

Let's consider the following code with 3 schema definitions:

from marshmallow import Schema, fields

class TwoLevelsNestedResource(Schema):
    some_irrelevant_field = fields.String(required=True)


class OneLevelNestedResource(Schema):
    field_a = fields.String(required=True)
    field_b = fields.Integer(required=True)
    field_c = fields.Nested(TwoLevelsNestedResource, required=True)


class MainResource(Schema):
    main_field_a = fields.String(required=True)
    main_field_b = fields.Integer(required=True)
    nested_field = fields.Nested(OneLevelNestedResource, required=True, many=False)

Running the following function would result in the data being validated without any errors which is what we want:

def load_correct_data():
    print(MainResource().load({
        'main_field_a': '',
        'main_field_b': 0,
        'nested_field': {
            'field_a': '',
            'field_b': 0,
            'field_c': {
                'some_irrelevant_field': ''
            }
        }
    }).errors)

load_correct_data()

Running the next function should print a validation error because the data which is inputted is not correct.

def load_wrong_data_a():
    print(MainResource().load('not what we want').errors)

load_wrong_data_a()

As the root level object we should send a dictionary, not a string. Therefore, we should get a type validation error. The problem is that we should get whatever is defined as the default error message for the fields.Nested type which is "Invalid type.". However, any of the following 2 error messages might be returned at different calls of the same function.

{'_schema': ['Invalid input type.']}
{'_schema': ['Invalid type.']}

Furthermore, the following code has the same issue, one level deeper in the schema:

def load_wrong_data_b():
    print(MainResource().load(dict(
        main_field_a='',
        main_field_b=0,
        nested_field='again, not what we want',
    )).errors)

load_wrong_data_b()

The two possible outputs, alternatively are:

{'nested_field': {'_schema': ['Invalid input type.']}}
{'nested_field': {'_schema': ['Invalid type.']}}

Again, just like in the previous case, since nested_field expects a nested object, a dict, the error we're expecting here is "Invalid type.". Unfortunately "Invalid input type." will be returned from time to time.

Possible causes

As far as I could tell from debugging the code in marshmallow.marshalling, for each schema definition we're going through the list of defined fields (https://github.com/marshmallow-code/marshmallow/blob/2.x-line/src/marshmallow/marshalling.py#L248) and then attempt to fetch the value of that field (https://github.com/marshmallow-code/marshmallow/blob/2.x-line/src/marshmallow/marshalling.py#L252).
In case data is a string instead of a dict like in the situation I presented above, an AttributeError will be raised and then an error for the type validation will be added (https://github.com/marshmallow-code/marshmallow/blob/2.x-line/src/marshmallow/marshalling.py#L255-L257).

The problem here seems to be the fact that we're in fact using .error_messages['type'] on the field_obj we're actually using first from the list of fields defined in a schema. This means that we're not actually looking at the error messages defined for the field (or schema root) which is should be a dict in case of fields.Nested but instead at the error messages for whatever child field which is picked up first by the iterator.

This can be easily observed if we alter the initial schema definitions like this:

class TwoLevelsNestedResource(Schema):
    some_irrelevant_field = fields.String(required=True)


class OneLevelNestedResource(Schema):
    field_a = fields.String(required=True, error_messages={
        'type': 'field_a must be string'
    })
    field_b = fields.Integer(required=True, error_messages={
        'type': 'field_b must be int'
    })
    field_c = fields.Nested(TwoLevelsNestedResource, required=True, error_messages={
        'type': 'field_c must be dict'
    })


class MainResource(Schema):
    main_field_a = fields.String(required=True, error_messages={
        'type': 'main_field_a must be string'
    })
    main_field_b = fields.Integer(required=True, error_messages={
        'type': 'main_field_b must be int'
    })
    nested_field = fields.Nested(OneLevelNestedResource, required=True, many=False, error_messages={
        'type': 'nested_field must be dict'
    })

Then, when running load_wrong_data_a(), we can get one of the following:

{'_schema': ['nested_field must be dict']}
{'_schema': ['main_field_b must be int']}
{'_schema': ['main_field_a must be string']}

As we can see, all of these are actually custom error messages for the fields of the root schema definition.
When running load_wrong_data_b() we can confirm this behaviour:

{'nested_field': {'_schema': ['field_a must be string']}}
{'nested_field': {'_schema': ['field_b must be int']}}
{'nested_field': {'_schema': ['field_c must be dict']}}

Workaround

Until this issue is fixed, configuring the schema to preserve the order of the fields helps with getting a consistent error message, but the underlying issue of the message being wrong is still there.

from marshmallow import Schema, fields


class TwoLevelsNestedResource(Schema):
    some_irrelevant_field = fields.String(required=True)


class OneLevelNestedResource(Schema):
    class Meta:
        ordered = True

    field_a = fields.String(required=True, error_messages={
        'type': 'field_a must be string'
    })
    field_b = fields.Integer(required=True, error_messages={
        'type': 'field_b must be int'
    })
    field_c = fields.Nested(TwoLevelsNestedResource, required=True, error_messages={
        'type': 'field_c must be dict'
    })


class MainResource(Schema):
    class Meta:
        ordered = True

    main_field_a = fields.String(required=True, error_messages={
        'type': 'main_field_a must be string'
    })
    main_field_b = fields.Integer(required=True, error_messages={
        'type': 'main_field_b must be int'
    })
    nested_field = fields.Nested(OneLevelNestedResource, required=True, many=False, error_messages={
        'type': 'nested_field must be dict'
    })

Because adding Meta.ordered = True would determine marshmallow to use ordered dicts (this is redundant in python3.7) and ordered sets for manipulating the fields, we would then get only one possible error when running load_wrong_data_b():

{'nested_field': {'_schema': ['field_a must be string']}}
@deckar01

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

It looks like this was fixed in v3 by #857. The original issue was about unknown, but the fix corrected an underlying issue that should probably be ported to v2.

Here is a minimal repro:

from marshmallow import Schema, fields

class Test(Schema):
    foo = fields.String(required=True, error_messages={'type': 'foo must be string'})

Test().load('')
# UnmarshalResult(data={}, errors={u'_schema': ['foo must be string']})

@deckar01 deckar01 added the bug label Jul 17, 2019

@cristi23

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

OK. Thanks for the quick update. It does indeed seem like there's some potential that the fix you indicated might be a solution in this situation, too.

What are the next steps here? Should we wait until there's a release for version 3 or are other 2.x releases planned which could include a fix for this issue as well?

@lafrech

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

Thanks for the detailed bug report.

2.x is still maintained and should be fixed. A PR would be absolutely welcome.

@cristi23

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2019

Ok. I will try to see if I can adapt the existing fix you made from v3 to v2.x and then create a PR. Hopefully I will be able to get it done by the end of the week.

@sloria

This comment has been minimized.

Copy link
Member

commented Jul 21, 2019

Thanks for offering to tack this @cristi23 . If you end up not being able to work on this, unassign yourself and let us know.

@cristi23

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

I created a PR - #1323

Let me know if it's all good. From my tests it should be working fine.

The only small thing I still don't like is the fact that there's a small inconsistency where sometimes "Invalid input type." (for most cases) will be returned in some cases, but also "Invalid type." (when sending wrong data to Nested fields) in others. However, this is just nitpicking. I made the changes such that all the tests are still green.

@sloria sloria closed this Aug 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.