New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Return deserialized data when strict=True #598

Closed
sloria opened this Issue Mar 10, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@sloria
Member

sloria commented Mar 10, 2017

Status quo

load currently returns a namedtuple (UnmarshalResult) containing both the deserialized data and error messages.

from marshmallow import Schema, fields, ValidationError

class UserSchema(Schema):
    id = fields.Int(required=True)

    class Meta:
        strict = True


schema = UserSchema()
try:
    result = schema.load({'id': 42})
except ValidationError as e:
    print(e.messages)
assert result.data['id'] == 42

Proposed API

When strict=True, return only the deserialized data. Messages can already be accessed from ValidationError.messages.

from marshmallow import Schema, fields, ValidationError

class UserSchema(Schema):
    id = fields.Int(required=True)

    class Meta:
        strict = True


schema = UserSchema()
try:
    data = schema.load({'id': 42})
except ValidationError as e:
    print(e.messages)
assert data['id'] == 42
@douglas-treadwell

This comment has been minimized.

douglas-treadwell commented Mar 10, 2017

I am in favor of strict=True by default, but I think the API for load() returning different KINDS of data depending on configuration might be confusing.

@douglas-treadwell

This comment has been minimized.

douglas-treadwell commented Mar 10, 2017

I think it's better if methods always return the same "kind" or "shape" of data. However, here's an alternative...

What about a second method such as .loaddata() which only retrieves the data and ignores the errors? This way there's a shorthand for people who don't care about the errors. Obviously also the corresponding .loadsdata(). I suppose this isn't much shorter though than .load().data... maybe your current API here is plenty good!

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Mar 10, 2017

@douglas-treadwell I vote not to do that. Once Marshmallow goes this route, there are plenty of option combinations to introduce as a new methods: loaddata(), loadmany(), loadstrictmany()...

@sloria I would rather opt to deprecate strict option at all and just do validation error exceptions all the time. Then load() could return just data in an nonconfusing way

@douglas-treadwell

This comment has been minimized.

douglas-treadwell commented Mar 10, 2017

@maximkulkin I agree and withdraw my suggestion about loaddata(). It's perfectly fine to do load().data.

@sloria

This comment has been minimized.

Member

sloria commented Mar 14, 2017

I like @maximkulkin 's suggestion. So we would

  1. Implement #377 and make strict mode the default
  2. Modify load and dump return (de)serialized data rather than a (Un)MarshalResult (as in the proposed API above
  3. Raise a DeprecationWarning if strict=False is passed to either the Schema constructor or as a class Meta option
@sloria

This comment has been minimized.

Member

sloria commented Mar 22, 2017

One use case that would be missed if we deprecate strict=False would be accessing partially valid data. You can currently do the following:

from marshmallow import Schema, fields

class MySchema(Schema):
    always_valid = fields.Field()
    always_invalid = fields.Field(validate=[lambda val: False])


sch = MySchema()

data = {'always_valid': 42, 'always_invalid': 24}

res = sch.load(data)

# Partially valid data
print(res.data)  # => {'always_valid': 42}
# Error messages
print(res.errors)  # => {'always_invalid': ['Invalid value.']}

A simple solution would be to store partially valid data on ValidationError, so that you could do:

try:
    data = schema.load(data)
except ValidationError as err:
    # Partially valid data
    valid_data = err.valid_data
    # Error messages
    errors = err.messages
    # Raw input data
    raw_data = err.data

sloria added a commit that referenced this issue Mar 23, 2017

@sloria

This comment has been minimized.

Member

sloria commented Mar 23, 2017

I've gone ahead and implemented the valid_data attribute as proposed in my last comment, so the "partially valid data" use case is addressed whether or not we go ahead with the original proposal.

rowillia added a commit to rowillia/marshmallow that referenced this issue Aug 3, 2017

lafrech added a commit to Nobatek/marshmallow that referenced this issue Jan 3, 2018

Return deserialized data rather than data + errors
If errors are detected, ValidationError is raised which contains errors
in its 'messages' attribute.
Remove MarshalResult and UnmarshalResult
Closes marshmallow-code#598

lafrech added a commit to Nobatek/marshmallow that referenced this issue Jan 4, 2018

Return deserialized data rather than data + errors
If errors are detected, ValidationError is raised which contains errors
in its 'messages' attribute.
Remove MarshalResult and UnmarshalResult
Closes marshmallow-code#598
@lafrech

This comment has been minimized.

Member

lafrech commented Jan 4, 2018

To anyone interested in this, I've been working on it in #711. I believe it is almost ready.

The docs still need to be updated.

I'm unsure about how validate should behave.

Currently, by default, I kept current behaviour: it will raise ValidationError iff strict is True. And since strict is now always True, it always raises. Basically, it is now pretty much the same as load, but without the post_load methods:

    def load(self, data, many=None, partial=None):
        return self._do_load(data, many, partial=partial, postprocess=True)

   def validate(self, data, many=None, partial=None):
        self._do_load(data, many, partial=partial, postprocess=False)

Maybe it would be more useful if it caught ValidationError and returned an error dict (which could be interpreted as a boolean by the caller).

You could do

    if MySchema().validate(input_data)

or

    errors = MySchema().validate(input_data)

For instance, you'd do:

    errors = MySchema().validate(input_data)
    if errors:
        ... # Manage errors
    else:
        ...  # Proceed

rather than

    try:
        MySchema().validate(input_data)
    except ValidationError as exc:
        errors = exc.errors
        ...  # Manage errors
    else:
        ...  # Proceed

I favor try/except logics when the "normal" workflow does not trigger an exception, but if the user explicitly calls validate to check before acting, it makes sense to return a value rathen than raising an exception.

@sloria

This comment has been minimized.

Member

sloria commented Jan 4, 2018

I favor try/except logics when the "normal" workflow does not trigger an exception, but if the user explicitly calls validate to check before acting, it makes sense to return a value rathen than raising an exception.

I agree; it's intuitive for validate() to return errors rather than raise an error.

@lafrech

This comment has been minimized.

Member

lafrech commented Jan 4, 2018

I agree; it's intuitive for validate() to return errors rather than raise an error.

Great. Done.

@sloria sloria closed this in 389a38d Jan 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment