Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Return deserialized data when strict=True #598

Closed
sloria opened this issue Mar 10, 2017 · 10 comments
Closed

RFC: Return deserialized data when strict=True #598

sloria opened this issue Mar 10, 2017 · 10 comments

Comments

@sloria
Copy link
Member

sloria commented Mar 10, 2017

Status quo

load currently returns a namedtuple (UnmarshalResult) containing both the deserialized data and error messages.

from marshmallow import Schema, fields, ValidationError

class UserSchema(Schema):
    id = fields.Int(required=True)

    class Meta:
        strict = True


schema = UserSchema()
try:
    result = schema.load({'id': 42})
except ValidationError as e:
    print(e.messages)
assert result.data['id'] == 42

Proposed API

When strict=True, return only the deserialized data. Messages can already be accessed from ValidationError.messages.

from marshmallow import Schema, fields, ValidationError

class UserSchema(Schema):
    id = fields.Int(required=True)

    class Meta:
        strict = True


schema = UserSchema()
try:
    data = schema.load({'id': 42})
except ValidationError as e:
    print(e.messages)
assert data['id'] == 42
@douglas-treadwell
Copy link

I am in favor of strict=True by default, but I think the API for load() returning different KINDS of data depending on configuration might be confusing.

@douglas-treadwell
Copy link

douglas-treadwell commented Mar 10, 2017

I think it's better if methods always return the same "kind" or "shape" of data. However, here's an alternative...

What about a second method such as .loaddata() which only retrieves the data and ignores the errors? This way there's a shorthand for people who don't care about the errors. Obviously also the corresponding .loadsdata(). I suppose this isn't much shorter though than .load().data... maybe your current API here is plenty good!

@maximkulkin
Copy link
Contributor

maximkulkin commented Mar 10, 2017

@douglas-treadwell I vote not to do that. Once Marshmallow goes this route, there are plenty of option combinations to introduce as a new methods: loaddata(), loadmany(), loadstrictmany()...

@sloria I would rather opt to deprecate strict option at all and just do validation error exceptions all the time. Then load() could return just data in an nonconfusing way

@douglas-treadwell
Copy link

@maximkulkin I agree and withdraw my suggestion about loaddata(). It's perfectly fine to do load().data.

@sloria
Copy link
Member Author

sloria commented Mar 14, 2017

I like @maximkulkin 's suggestion. So we would

  1. Implement Strict Schema Validation should be enabled by default #377 and make strict mode the default
  2. Modify load and dump return (de)serialized data rather than a (Un)MarshalResult (as in the proposed API above
  3. Raise a DeprecationWarning if strict=False is passed to either the Schema constructor or as a class Meta option

@sloria
Copy link
Member Author

sloria commented Mar 22, 2017

One use case that would be missed if we deprecate strict=False would be accessing partially valid data. You can currently do the following:

from marshmallow import Schema, fields

class MySchema(Schema):
    always_valid = fields.Field()
    always_invalid = fields.Field(validate=[lambda val: False])


sch = MySchema()

data = {'always_valid': 42, 'always_invalid': 24}

res = sch.load(data)

# Partially valid data
print(res.data)  # => {'always_valid': 42}
# Error messages
print(res.errors)  # => {'always_invalid': ['Invalid value.']}

A simple solution would be to store partially valid data on ValidationError, so that you could do:

try:
    data = schema.load(data)
except ValidationError as err:
    # Partially valid data
    valid_data = err.valid_data
    # Error messages
    errors = err.messages
    # Raw input data
    raw_data = err.data

@sloria
Copy link
Member Author

sloria commented Mar 23, 2017

I've gone ahead and implemented the valid_data attribute as proposed in my last comment, so the "partially valid data" use case is addressed whether or not we go ahead with the original proposal.

@lafrech
Copy link
Member

lafrech commented Jan 4, 2018

To anyone interested in this, I've been working on it in #711. I believe it is almost ready.

The docs still need to be updated.

I'm unsure about how validate should behave.

Currently, by default, I kept current behaviour: it will raise ValidationError iff strict is True. And since strict is now always True, it always raises. Basically, it is now pretty much the same as load, but without the post_load methods:

    def load(self, data, many=None, partial=None):
        return self._do_load(data, many, partial=partial, postprocess=True)

   def validate(self, data, many=None, partial=None):
        self._do_load(data, many, partial=partial, postprocess=False)

Maybe it would be more useful if it caught ValidationError and returned an error dict (which could be interpreted as a boolean by the caller).

You could do

    if MySchema().validate(input_data)

or

    errors = MySchema().validate(input_data)

For instance, you'd do:

    errors = MySchema().validate(input_data)
    if errors:
        ... # Manage errors
    else:
        ...  # Proceed

rather than

    try:
        MySchema().validate(input_data)
    except ValidationError as exc:
        errors = exc.errors
        ...  # Manage errors
    else:
        ...  # Proceed

I favor try/except logics when the "normal" workflow does not trigger an exception, but if the user explicitly calls validate to check before acting, it makes sense to return a value rathen than raising an exception.

@sloria
Copy link
Member Author

sloria commented Jan 4, 2018

I favor try/except logics when the "normal" workflow does not trigger an exception, but if the user explicitly calls validate to check before acting, it makes sense to return a value rathen than raising an exception.

I agree; it's intuitive for validate() to return errors rather than raise an error.

@lafrech
Copy link
Member

lafrech commented Jan 4, 2018

I agree; it's intuitive for validate() to return errors rather than raise an error.

Great. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants