New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only is not bound by declared and additional fields on serializaton #636

Closed
jan-23 opened this Issue May 30, 2017 · 13 comments

Comments

Projects
None yet
5 participants
@jan-23

jan-23 commented May 30, 2017

The only parameter is correctly bound to the fields that are defined in Meta.fields (#183):

class MySchema(Schema):
    class Meta:
        fields = ('a', )

MySchema(only=('b', )).dump({'a': 1, 'b': 2}).data == {}

Strangely it is not bound to the declared and additional fields:

class MySchema(Schema):
    a = fields.Str()
    class Meta:
        additional = ('b', )

MySchema(only=('c', )).dump({'a': 1, 'b': 2, 'c': 3}).data == {'c': 3}

It would be more consistent if the second example also returns an empty dict.

@lafrech lafrech added the bug label May 14, 2018

@lafrech

This comment has been minimized.

Member

lafrech commented May 14, 2018

I can reproduce with latest 3.x revision.

@sloria

This comment has been minimized.

Member

sloria commented May 20, 2018

I was also able to reproduce this on the 2.x-line branch.

@sloria

This comment has been minimized.

Member

sloria commented May 20, 2018

That said, it looks like the fix for this in #824 is backwards-incompatible, so this might need to go in 3.x without a backport. See the removed tests in #824. Instead of erroring upon serialization when an invalid field is passed to only, it passes silently.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 20, 2018

I'm not sure passing silently is a desirable behavior. If a developer has a typo in one of their only parameters, it is useful to communicate this to them with an error instead of hoping they notice it and requiring them debug the source of the data loss. That is how it currently works as long as the "invalid" key doesn't happen to match some data that is being dumped.

I think the bug here is that the second example isn't errorring. There is nothing in the documentation to indicate that only parameters that are not in the schema should act like additional fields and pass through without validation. I think we should proactively validate the only values against the fields instead of relying on missing data to indirectly cause an error.

@ikilledthecat

This comment has been minimized.

Contributor

ikilledthecat commented May 21, 2018

I agree throwing an error is definitely better. But another test test_only_bounded_by_fields will start failing in #824.

def test_only_bounded_by_fields():
    class MySchema(Schema):

        class Meta:
            fields = ('foo', )

    sch = MySchema(only=('baz', ))
    assert sch.dump({'foo': 42}) == {}

I tried to minimize the number of tests I had to remove/change.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 21, 2018

fields is part of the declared fields, so it or additional would need to be considered when validating only.

@ikilledthecat

This comment has been minimized.

Contributor

ikilledthecat commented May 21, 2018

@deckar01 sorry I couldn't quite get the point you're making.

Just to clarify and expand on what I was trying to point out ealier, in the test, MySchema has a single field foo. The only parameter passed in is baz. The question is if we should throw an error here. I had allowed only params to be dropped in #824 so I wouldn't have to drop this test. I think this would be big change. If everyone agrees that we should raise an error here then i can push in another commit.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 22, 2018

I think it is more important for breaking changes to improve the developer experience than minimize the number of test that have to be changed.

Here is some insight into the discussion that bound only to fields:

I agree that only should not cause the Schema to access attributes that are not defined by the fields option.

What is the desired behavior if passing a field name to only that isn't defined in fields? Should an error be raised? Perhaps a ValueError or a RuntimeError? No error?


The suggested behavior is implemented: only is bounded by the fields class Meta option.

If you need to raise an error when invalid fields fare passed, you can do something like: #273 (comment)

#183 (comment)

@sloria

This comment has been minimized.

Member

sloria commented May 22, 2018

I agree with @deckar01 . I was under the impression that this could be fixed in a compatible way on 2.x . But since this requires a breaking change, we should do it the right way, which would be to raise an error if an invalid field is passed.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 22, 2018

How should the error be worded? "foo" is not a valid field for MySchema maybe? Should we warn about multiple invalid fields at once? "foo", "bar", and "baz" are invalid fields for MySchema?

@ikilledthecat

This comment has been minimized.

Contributor

ikilledthecat commented May 22, 2018

I think its best to warn about multiple invalid fields at once. I have made similar errors in the past and its always frustrating to see one typo at a time.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 22, 2018

Would it make sense to validate exclude the same way?

@lafrech

This comment has been minimized.

Member

lafrech commented May 22, 2018

Related to #698.

There seems to be an agreement on validating only, raising errors if names are passed that don't match a field (be it declared explicitly or added with fields or additional).

To be more explicit, both examples in OP would fail with "b is not a valid field for MySchema" or "c is not...".

I think exclude should be treated the same way.

(The only use case I can think of that would be broken is when using programmatically defined schemas where it might be easier for the caller to pass a list of fields names to exclude and let the schema just ignore fields that don't exist. For instance, the caller could pass exclude=('blacklisted_field', ) to all his schemas without being sure the field actually exists in all of them. But this sounds like an edge case and I'm not convinced it is worth consideration. Besides, it might be doable with a Schema subclass. So I'm happy with an exception being raised.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment