Add option to throw validation error on extra keys #524

tuukkamustonen · 2016-09-09T12:11:24Z

I'm not sure if I'm missing something, but I would like to throw error if extra keys are provided, as in:

>>> class AlbumSchema(Schema):
...     title = fields.Str()

>>> AlbumSchema(strict=True).load({'extra': 2}, allow_extra=False)
Traceback (most recent call last):
...
marshmallow.exceptions.ValidationError: {'_schema': ["Extra arguments passed: ['extra']"]}

I've been using the following implementation (taken from marshmallow-code/webargs#87 (comment)):

class BaseSchema(Schema):

    @pre_load
    def validate_extra(self, in_data):
        if not isinstance(in_data, dict):
            return

        extra_args = [key for key in in_data.keys() if key not in self.fields]
        if extra_args:
            raise ValidationError('Extra arguments passed: {}'.format(extra_args))

I would expect this to be a common need, however, so could it be supported by the library out-of-the-box?

lafrech · 2016-09-09T13:23:36Z

FYI, there's another example of how to do this in the docs:

http://marshmallow.readthedocs.io/en/latest/extending.html#validating-original-input-data

tuukkamustonen · 2016-09-09T14:20:15Z

@lafrech Oh, right. That's practically the same, just using @validates_schema instead. I suppose that would indeed be a better decorator to use as the check throws ValidationError.

As it's in the docs, I assume maintainers have a negative stand on supporting this through a parameter?

lafrech · 2016-10-02T20:33:04Z

Duplicate of #515.

sloria · 2016-10-05T02:33:36Z

We do not currently have plans to add this feature. I recommend using the recipe in the docs for now.

I am not completely opposed to adding this in the future, given a strong case and enough user support.

Closing this for now, but will reopen if there is need for further discussion.

barakalon · 2017-04-22T15:25:01Z

+1 for adding built in support for this.

tuukkamustonen · 2017-05-03T06:07:56Z

I now need to support this together with many=True.

With something like:

class Sch(Schema):
    foo = fields.Int()

    @validates_schema(pass_original=True)
    def v(self, data, orig_data):
        ...
        raise ValidationError('Extra arguments passed.', ...)

With Sch(many=True).load([{'foo': 1}, {'bar': 1}]), @validates_schema(pass_original=True) gets passed the array.

I can iterate the array and find the extra keys per index, but I cannot sensibly report the errors per index (because raising ValidationError does not support assigning to indexes with field_names... or does, but value would be ['error'] and not {'_schema': ['error']} if I got it right). I have to report the extra keys per _schema, which is not nice.

Any workarounds? Also see #621 as related.

bivald · 2018-03-20T15:46:36Z

+1

In my case it's to avoid ambiguity. The API expects a set of keys to be present per method and only those keys. We use Marshmallow to power an API and you really want the boundaries of each API method to be crisp. I don't want anyone to be able to "think"/"guess" that a key is valid because the API accepts it without sending an error.

A different use case is what I got from one of our developers today:

"I'm creating a page with a title and a slug but the API doesn't accept the slug. It gives no error, it just creates a slug based on the title instead."

Turns out the user sent the following JSON payload:

{
   "page_title": "My Page Title",
   "slug": "a-specific-slug"
}

But our API wants page_title and page_slug. However, supplying the slug is optional. So it doesn't complain if the slug isn't present, it just defaults to creating a slug from the title.

The backup option for us is to force page_slug key, but letting it be False if you want to auto-generate it. Perhaps thats what we'll go with.

So that's my use case.

ramnes · 2018-05-28T14:33:09Z

Somehow related to #595, which is the exact opposite. Just in case this issue is reopened one day and we can come with a solution that covers both needs.

deckar01 · 2018-05-28T16:37:56Z

There has been some recent discussion about how only and exclude handle unknown field names. In 2.x they are silently ignored, but in #636 raising errors to help developers catch bugs was considered as a stronger primary use case than dynamically generated data that contains unnecessary fields.

lafrech · 2018-05-28T19:30:44Z

Aren't these independent things?

This and Implement tolerant deserialization to allow unknown fields #595 is about dealing with unknown keys (no field for that key) at serialization/deserialization time.
only and exclude act at instantiation time.

Just because only can't add fields at instantiation time, doesn't mean the Schema can't pass transparently data with unknown keys.

Or maybe you meant it is technically independent, but it would be more consistent to apply a similar logic here. Or maybe I'm just confused.

deckar01 · 2018-05-28T22:42:38Z

I thought this could use a second look since we are starting to favor strictness by default when it helps detect common developer mistakes. Ignoring extra data is a valid use case, but it can lead to silent data loss.

As general rules, I think:

We should warn developers when it looks like they have made a mistake
Developers should have to opt into destructive operations explicitly

lafrech · 2018-05-29T07:19:42Z

OK, I get your point. I don't mind reopening this.

We should warn developers when it looks like they have made a mistake

Yes, as much as possible.

Developers should have to opt into destructive operations explicitly

Alright. Regarding this specific case, I think there's no safe side of the fence. Passing transparently unknown fields car also be a security flaw. I rely on my schemas to guard the entrance of my database. I'd hate to realize that I've been accepting data from field I did not explicitly declare, that might even override DB attribute that are meant to be dealt with internally. In fact, I prefer to reject too much than to accept too much, but that's a matter of use case.

The feature in #595 could be nice, but I don't think I'd make it default. Especially since it provides no validation. (Well, unless we enhance it by adding the possibility to specify a Default field for unknown keys).

deckar01 · 2018-05-29T16:33:40Z

I feel like I have seen this in another serialization library before. I can't seem to find a reference at the moment.

extra
- "warn" (default)
- "ignore"
- "allow"

Edit: Found it. https://github.com/alecthomas/voluptuous#extra-dictionary-keys

sloria · 2018-05-30T03:49:51Z

Colander has similar options:

        ``unknown`` controls the behavior of this type when an unknown
        key is encountered in the cstruct passed to the
        ``deserialize`` method of this instance.  All the potential
        values of ``unknown`` are strings.  They are:
        - ``ignore`` means that keys that are not present in the schema
          associated with this type will be ignored during
          deserialization.
        - ``raise`` will cause a :exc:`colander.Invalid` exception to
          be raised when unknown keys are present in the cstruct
          during deserialization.
        - ``preserve`` will preserve the 'raw' unknown keys and values
          in the appstruct returned by deserialization.
        Default: ``ignore``.

https://docs.pylonsproject.org/projects/colander/en/latest/api.html#colander.Mapping

lafrech · 2018-05-30T07:22:43Z

This API makes sense. Currently, Marshmallow only provides ignore option. We could provide those three options.

We could even go further and let the user set a default Field when using preserve/allow, but I'm not sure it is so useful, and it is never too late to add it in a later step.

nicktimko · 2018-05-31T16:13:10Z

The recipe provided also falls apart when using the load_from option in the fields. I'm also wondering if it would then be the responsibility of @pre_loaders to drop keys if transforming them into other fields.

sloria · 2018-06-01T03:05:58Z

Here's a more thorough review of this behavior in other validation libraries: https://gist.github.com/sloria/2fac357710f13e855a5b9cf8f05a4da0

ptdel · 2018-06-09T01:28:55Z

just bumping this to show interest, this would be nice functionality to have. In our usage of marshmallow, we have shared schemas that are used to both validate serialized data coming too and from our database, but also for the url parameters of the api fronting this data to ensure request arguments coming in for those columns are valid query arguments. For my data tier I am happy to have unexpected things be dropped, however when validating query parameters at the web request level I would like to be able to tell people interacting with the API what invalid arguments they supplied.

sloria · 2018-06-09T16:09:33Z

Yeah, I think we should move forward with this.

We will need to decide on a default:

raise: Raise an error for unknown fields (like voluptuous, schema, cerberus, and schematics)
remove: Don't allow unknown fields through, but don't error (like DRF, colander). This is the current behavior.
allow: Allow unknown fields through

We also need to decide on an API. A class Meta option probably makes sense.

from marshmallow import RAISE, REMOVE, ALLOW

class UserSchema(Schema):
    class Meta:
        unknown = ALLOW

Thoughts?

ramnes · 2018-06-09T19:10:28Z

As we discussed in my PR, I really lean toward specifying that kind of behavior at instanciation time:

UserSchema(unknown=ALLOW)

It allows one schema to be used in different contexts, where you might want different behaviors on unknown keys. Also, it seems to me that the implementation would be at least as simple.

ptdel · 2018-06-09T22:54:31Z

I agree with @ramnes that the ability to designate behavior per-context would be very useful, and schema re-use would be very straightforward. It more or less fits my use-case exactly 💸 .

ramnes · 2018-06-11T16:34:55Z

There you go: #838

😉

sloria · 2018-06-14T04:19:19Z

We still need to decide on a default. I think we should rule out "allow"--it is too permissive for the common use case. That leaves "ignore" and "raise".

Let's take a poll:

Add an emoji reaction to this comment with your vote.

😄 = "ignore" - Discard unknown fields but don't error. This is the current behavior. DRF, Colander, and Valideer have this as the default.
🎉 = "raise" - Raise an error for unknown fields. Voluptuous, Cerberus, schema, and Schematics have this as the default.

nicktimko · 2018-06-14T04:47:29Z

If you do anything other than keep the current behavior, I'd imagine that would delay the release to a minor or possibly a major version. Is that the plan regardless?

If backwards compatible changes can provide the features faster in a future patch, I would much prefer that, then maybe upset the status quo in a major.

lafrech · 2018-06-14T06:47:11Z

This change will be applied to Marshmallow v3 (major version) in any case.

It could be backported to Marshmallow v2 with "ignore" as default, but I don't think this is planned.

sloria · 2018-06-14T16:02:57Z

Another thought: REMOVE might be a more clear name than IGNORE. Users might be confused about the difference between ALLOW and IGNORE.

nicktimko · 2018-06-14T18:14:40Z

🚲🏠
"Do what with unknown fields?"

IGNORE unknown fields, REMOVE unknown fields, DROP unknown fields
RAISE an error, FAIL on them,
ALLOW them (and do what?), ADD them (where...), COLLECT.../shrug

ramnes · 2018-06-14T18:19:05Z

Yeah, to be honest I don't give much importance to those three values naming, it's not like people are going to use them all the time; they'll just check the doc or use their autocompletion. Whatever you guys prefer will do it just fine.

lafrech · 2018-06-14T19:34:34Z

OK, my turn. I like DROP, RAISE, and what about PASS?

BTW, there could be an evolution in the future allowing to specify a default Field to use for unknown keys. Just a suggestion I made once, although I have no interest in it today. I thought I'd mention that possibility in case some potential names would fit better than others with it.

sloria · 2018-06-14T20:37:52Z

I like DROP, RAISE, and what about PASS?

I like those.

BTW, there could be an evolution in the future allowing to specify a default Field to use for unknown keys.

Sure, we could even add support for unknown=fields.Str() in the future. I don't think it's something we should add now, though.

tuukkamustonen · 2018-06-15T07:18:21Z

REMOVE only if the library actually manipulates data and removes the unspecified fields. If it simply ignores those, but doesn't actually remove them, then IGNORE is better.

lafrech · 2018-06-15T07:24:01Z

Input data is not modified, so unknown keys are not removed from it. IGNORE might be ambiguous as it could be understood as "don't validate", which is PASS. Hence, DROP, as data is being passed but unknown keys are dropped in the process.

tuukkamustonen · 2018-06-15T07:58:34Z

Doesn't kwarg name unknown=... already tell what we are ignoring (ie. not validating) - just the unknown keys? Other keys are processed and validated, sure.

ptdel · 2018-06-15T19:59:23Z

would it be a high level of effort to make it an addition to the typical MarshalResult() to be handled .unknown or something of that likeness? I would take PASS and RAISE and opt to use @pre_* and @post_* functions to handle them (possibly for further validation to try and rectify typos, etc with then intention of extending into .data).

sloria · 2018-06-16T10:32:51Z

Doesn't kwarg name unknown=... already tell what we are ignoring (ie. not validating)

@tuukkamustonen Yes, but "ignoring" could still be conflated with "ignore validation without removing", which is why DROP is clearer.

would it be a high level of effort to make it an addition to the typical MarshalResult() to be handled .unknown or something of that likeness?

@ptdel MarshalResult no longer exists in marshmallow 3 because dump returns the serialized data dictionary. Would @lafrech 's proposal above re: specifying a field for unknown fields meet your use case?

ptdel · 2018-06-17T15:16:10Z

@sloria yeah actually I like @lafrech 's idea. if it's some sort of catch-all dictionary that contains any unexpected key:value pairs then it's easy to interact with through @pre_ and @post_ decorators.

sloria · 2018-06-23T13:32:17Z

This is closed by #838 . Thanks everyone for your input. I've created follow-up issues for the changes/enhancements discussed in this issue and the PR.

Change default for unknown to RAISE #851 - Change default to RAISE
Make the error messages for "unknown fields" and "invalid data type" configurable #852 - Make the error message configurable
Allow specifying output field when using unknown=INCLUDE #853 - Specify output field

BOPOHOB · 2018-08-22T11:09:21Z

Product version of our project was broken because of this commit))

lafrech · 2018-08-22T11:51:29Z

@BOPOHOB, not sure what you mean, here.

This feature was included in a 3.0 beta version.

For production purpose, the 2.x branch is recommended.

If you're using 3.x beta, you should expect things to break from times to times.

Also, this feature itself shouldn't be breaking. The breaking change is RAISE becoming default (#851).

BOPOHOB · 2018-08-22T12:06:35Z

@lafrech Yes, it was my bad) But we did not expect such changes in 3.0.0b subminor

deckar01 · 2018-08-22T19:40:00Z

@nicktimko Keep the conversation constructive please.

https://marshmallow.readthedocs.io/en/dev/code_of_conduct.html

If someone discovers marshmallow via the github page, the readme instructs them to install the prerelease version without any warning that it is a beta version. We might want to consider making it a little clearer.

https://github.com/marshmallow-code/marshmallow#get-it-now

Siva-Karthi · 2019-03-27T10:23:46Z

in marshmallow V3 you can do it using metsa class like below
from marshmallow import Schema, INCLUDE

class UserSchema(Schema):
class Meta:
unknown = RAISE
https://marshmallow.readthedocs.io/en/3.0/quickstart.html#handling-unknown-fields

jfaleiro · 2019-05-24T14:00:19Z

Any specific reason on why this feature is available on load but not on validate?

def test_validate(self):
        class ModelSchema(Schema):
            class Meta:
                unknown = RAISE
            name = fields.String()

        schema = ModelSchema()
        data = dict(name='jfaleiro', xyz=2)
        schema.validate(data) # OK
        schema.load(data) # fail

sloria · 2019-05-29T02:17:06Z

@jfaleiro This feature is avaialable on validate. The validate method returns the error dictionary rather than raising an exception, so if you print the return value of schema.validate(data) in your example, you will see

{'xyz': ['Unknown field.']}

mirko · 2019-11-08T00:06:48Z

Any specific reason on why this feature is available on load but not on validate?

I guess I'm having somehow the same question: I'd just like to validate, but ignore unknown fields.
So I figured validate(.., unknown=EXCLUDE) would do it - but unknown doesn't seem to be a supported argument on validate().
Since I try to follow the python way of not using Exceptions for program flow, I figured that'd be the way to go. Am I mistaken? Shall I use load() and catch ValidationError instead?

lafrech mentioned this issue Sep 30, 2016

"load_only" fields #87

Closed

sloria mentioned this issue Oct 5, 2016

Add option to fail on unexpected fields #515

Closed

sloria closed this as completed Oct 5, 2016

lafrech mentioned this issue Mar 13, 2017

Turn off automatic skipping of unknown fields #127

Closed

sebpiq mentioned this issue Jun 9, 2017

validates_schema to reject unknown fields with pass_many=True #643

Closed

lafrech mentioned this issue Apr 3, 2018

How is deserialization affected by only and partial? #535

Open

deckar01 reopened this May 28, 2018

sloria added this to the 3.0 milestone May 28, 2018

sloria added the feedback welcome label Jun 9, 2018

ramnes mentioned this issue Jun 11, 2018

Handle unknown fields with EXCLUDE, INCLUDE or RAISE #838

Merged

This was referenced Jun 23, 2018

Change default for unknown to RAISE #851

Closed

Allow specifying output field when using unknown=INCLUDE #853

Open

sloria closed this as completed Jun 23, 2018

sloria mentioned this issue Jul 5, 2018

New feature for iterating over fields_dict in marshalling/unmarshalling #718

Closed

csirmazbendeguz mentioned this issue Oct 17, 2019

ValidationError is not raised on unknown query parameters marshmallow-code/webargs#435

Closed

Add option to throw validation error on extra keys #524

Add option to throw validation error on extra keys #524

Comments

tuukkamustonen commented Sep 9, 2016

lafrech commented Sep 9, 2016

tuukkamustonen commented Sep 9, 2016

lafrech commented Oct 2, 2016

sloria commented Oct 5, 2016

barakalon commented Apr 22, 2017

tuukkamustonen commented May 3, 2017 • edited Loading

bivald commented Mar 20, 2018

ramnes commented May 28, 2018

deckar01 commented May 28, 2018

lafrech commented May 28, 2018

deckar01 commented May 28, 2018

lafrech commented May 29, 2018

deckar01 commented May 29, 2018 • edited Loading

sloria commented May 30, 2018

lafrech commented May 30, 2018

nicktimko commented May 31, 2018

sloria commented Jun 1, 2018

ptdel commented Jun 9, 2018 • edited Loading

sloria commented Jun 9, 2018

ramnes commented Jun 9, 2018

ptdel commented Jun 9, 2018

ramnes commented Jun 11, 2018

sloria commented Jun 14, 2018

nicktimko commented Jun 14, 2018

lafrech commented Jun 14, 2018

sloria commented Jun 14, 2018

nicktimko commented Jun 14, 2018

ramnes commented Jun 14, 2018 • edited Loading

lafrech commented Jun 14, 2018

sloria commented Jun 14, 2018

tuukkamustonen commented Jun 15, 2018

lafrech commented Jun 15, 2018

tuukkamustonen commented Jun 15, 2018

ptdel commented Jun 15, 2018

sloria commented Jun 16, 2018 • edited Loading

ptdel commented Jun 17, 2018

sloria commented Jun 23, 2018

BOPOHOB commented Aug 22, 2018

lafrech commented Aug 22, 2018 • edited Loading

BOPOHOB commented Aug 22, 2018

deckar01 commented Aug 22, 2018

Siva-Karthi commented Mar 27, 2019 • edited Loading

jfaleiro commented May 24, 2019 • edited Loading

sloria commented May 29, 2019

mirko commented Nov 8, 2019

tuukkamustonen commented May 3, 2017 •

edited

Loading

deckar01 commented May 29, 2018 •

edited

Loading

ptdel commented Jun 9, 2018 •

edited

Loading

ramnes commented Jun 14, 2018 •

edited

Loading

sloria commented Jun 16, 2018 •

edited

Loading

lafrech commented Aug 22, 2018 •

edited

Loading

Siva-Karthi commented Mar 27, 2019 •

edited

Loading

jfaleiro commented May 24, 2019 •

edited

Loading