New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fields deserialization more strict ? #231

Closed
touilleMan opened this Issue Jun 29, 2015 · 9 comments

Comments

Projects
None yet
5 participants
@touilleMan

touilleMan commented Jun 29, 2015

Working with the library, it appears to me that the deserialization module has a lack of strictness from my point of view:

class DocSchema(Schema):
    str = fields.String()
   bool = fields.Boolean()

# Ok, None is not implicitly acceptable
DocSchema().load({'str': None, 'bool': None})
# UnmarshalResult(data={}, errors={'str': ['Field may not be null.'], 'bool': ['Field may not be null.']})

# But it's not the case for numbers
DocSchema().load({'str': 42, 'bool': 42})
# UnmarshalResult(data={'str': '42', 'bool': True}, errors={})

# Or for any complex structure (list, dict)
DocSchema().load({'str': {}, 'bool': {}})
# UnmarshalResult(data={'str': '{}', 'bool': False}, errors={})

Is there a way to enforce real strict validation ? Passing a dict as argument of a string field without getting any error sounds like a silent fail for me...

@zachguo

This comment has been minimized.

zachguo commented Jul 20, 2015

👍 I'm having same issues.

@density

This comment has been minimized.

density commented Aug 9, 2015

I think you can fix this by creating your own Field subclass and implementing a stricter _deserialize method. But I agree that the Fields should not be so forgiving. Would love to hear @sloria's view on this -- happy to help implement it if it's what the people want :)

@sloria sloria added this to the 2.0.0 (final) milestone Aug 25, 2015

@sloria

This comment has been minimized.

Member

sloria commented Sep 11, 2015

I agree with the premise of this issue.

For frame of reference, I wrote a script to compare the outputs of marshmallow, colander, and django-rest-framework:

DATA = {'foo': 42, 'bar': 24}

### marshmallow ###

from marshmallow import Schema, fields

class MySchema(Schema):
    foo = fields.Str()
    bar = fields.Bool()

data, errors = MySchema().load(DATA)

print('--- marshmallow ---')
print(errors)

### colander ###

import colander as col

class MySchema(col.MappingSchema):
    foo = col.SchemaNode(col.String())
    bar = col.SchemaNode(col.Boolean())

print('--- colander ---')
try:
    MySchema().deserialize(DATA)
except col.Invalid as err:
    print(err.asdict())
except:
    pass

### DRF ###

from django.conf import settings
settings.configure(
    INSTALLED_APPS=(
        'rest_framework',
    )
)
import django
django.setup()

from rest_framework import serializers as ser


class MySer(ser.Serializer):

    foo = ser.CharField()
    bar = ser.BooleanField()


ser = MySer(data=DATA)
ser.is_valid()
print('--- DRF ---')
print(ser.errors)

Output

--- marshmallow ---
{}
--- colander ---
{'foo': "42 is not a string: {'foo': ''}"}
--- DRF ---
{'bar': ['"24" is not a valid boolean.']}

Strangely enough, colander considers numbers invalid input for String but valid input for Boolean.
DRF is the exact opposite: numbers are valid String input but invalid Boolean input.

When the inputs are changed to dicts or collections, colander and DRF both crash with TypeError: unhashable type.

As suggested by OP, numbers, dictionaries, and collections should probably not be considered valid input for String and Boolean fields.

I've started work on this in the strictness branch. With these changes, the output of the script becomes:

--- marshmallow ---
{'foo': ['Not a valid string.'], 'bar': ['Not a valid boolean.']}
--- colander ---
{'foo': "42 is not a string: {'foo': ''}"}
--- DRF ---
{'bar': ['"24" is not a valid boolean.']}

I welcome any further feedback.

@density

This comment has been minimized.

density commented Sep 11, 2015

Sounds good to me. Interesting that the others got it wrong as well. I'm assuming we won't be accepting strings where a number or boolean is expected either right?

@sloria

This comment has been minimized.

Member

sloria commented Sep 12, 2015

@density Correct.

I've also made error messages more consistent:

from collections import OrderedDict

DATA = OrderedDict([
    ('foo', 42),  # invalid string
    ('bar', 24),  # invalid bool
    ('baz', 'invalid-integer'),
    ('qux', 'invalid-float'),
    ('spam', 'invalid-decimal'),
    ('eggs', 'invalid-datetime'),
])

def print_errors(err_dict):
    for field, msg in DATA.items():
        if field in err_dict:
            print('{field}:   {msg}'.format(field=field, msg=err_dict[field]))

### marshmallow ###

from marshmallow import Schema, fields

class MSchema(Schema):
    foo = fields.Str()
    bar = fields.Bool()
    baz = fields.Int()
    qux = fields.Float()
    spam = fields.Decimal(2, 2)
    eggs = fields.DateTime()

_, errors = MSchema().load(DATA)

print('--- marshmallow ---')
print_errors(errors)

### colander ###

import colander as col

class CSchema(col.MappingSchema):
    foo = col.SchemaNode(col.String())
    bar = col.SchemaNode(col.Boolean())
    baz = col.SchemaNode(col.Integer())
    qux = col.SchemaNode(col.Float())
    spam = col.SchemaNode(col.Decimal(2, 2))
    eggs = col.SchemaNode(col.DateTime())

print('--- colander ---')
try:
    CSchema().deserialize(DATA)
except col.Invalid as err:
    print_errors(err.asdict())
except:
    pass

### DRF ###

from django.conf import settings
settings.configure(INSTALLED_APPS=['rest_framework'])
import django
django.setup()

from rest_framework import serializers as ser

class MySer(ser.Serializer):
    foo = ser.CharField()
    bar = ser.BooleanField()
    baz = ser.IntegerField()
    qux = ser.FloatField()
    spam = ser.DecimalField(2, 2)
    eggs = ser.DateTimeField()

ser = MySer(data=DATA)
ser.is_valid()
print('--- DRF ---')
print_errors(ser.errors)

Output:

--- marshmallow ---
foo:   ['Not a valid string.']
bar:   ['Not a valid boolean.']
baz:   ['Not a valid integer.']
qux:   ['Not a valid number.']
spam:   ['Not a valid number.']
eggs:   ['Not a valid datetime.']
--- colander ---
foo:   42 is not a string: {'foo': ''}
baz:   "invalid-integer" is not a number
qux:   "invalid-float" is not a number
spam:   "invalid-decimal" is not a number
eggs:   Invalid date
--- DRF ---
bar:   ['"24" is not a valid boolean.']
baz:   ['A valid integer is required.']
qux:   ['A valid number is required.']
spam:   ['A valid number is required.']
eggs:   ['Datetime has wrong format. Use one of these formats instead: YYYY-MM-DDThh:mm[:ss[.uuuuuu]][+HH:MM|-HH:MM|Z].']
@sloria

This comment has been minimized.

Member

sloria commented Sep 14, 2015

The strictness branch is now merged.

@sloria sloria closed this Sep 14, 2015

@density

This comment has been minimized.

density commented Sep 22, 2015

@sloria I think we should make fields.List strict as well. The following example throws no errors but silently converts the string into a list:

from marshmallow import Schema, fields

class TestSchema(Schema):
    l = fields.List(fields.Number())

TestSchema(strict=True).load({'l':'1234333'})
@sloria

This comment has been minimized.

Member

sloria commented Sep 22, 2015

Good catch @density . Will fix shortly.

sloria added a commit that referenced this issue Sep 22, 2015

Non-collection values are invalid `List` values
See #231 (comment)

Improves implementation of utils.is_collection
@stefanoborini

This comment has been minimized.

stefanoborini commented Feb 5, 2018

We are facing a similar lack of strictness with a String field that, when presented with a tuple, converts it into a string and stores it as such. We would like to guarantee that no repr() conversion takes place, and only legitimate strings go in the field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment