New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested single string 'only' behavior edge case #800

Closed
timc13 opened this Issue May 8, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@timc13

timc13 commented May 8, 2018

When using Nested fields, passing a single string field to only does not work as expected when applying a transform with on_bind_field

from marshmallow import fields, Schema


def test_nested_only_on_bind_field():

    def to_camel_case(snake_str):
        components = snake_str.split('_')
        return components[0] + ''.join(x.title() for x in components[1:])

    class CamelCaseSchema(Schema):
        def on_bind_field(self, field_name, field_obj):
            field_obj.data_key = to_camel_case(field_name)

    class UserRoleSchema(CamelCaseSchema):
        user_id = fields.Number()
        role_name = fields.String()

    class UserSchema(CamelCaseSchema):
        id = fields.Number()
        roles = fields.Nested(UserRoleSchema, many=True, only='role_name')

    user = {
        'id': 1,
        'roles': [
            {'user_id': 1, 'role_name': 'admin'},
            {'user_id': 1, 'role_name': 'test'},
        ]
    }

    serialized_user = UserSchema().dumps(user)

    assert serialized_user == {"id": 1, "roles": ["admin", "test"]}

This results in the following error:

/env/lib/python3.6/site-packages/marshmallow/schema.py:476: in dumps
    serialized = self.dump(obj, many=many, update_fields=update_fields)
/env/lib/python3.6/site-packages/marshmallow/schema.py:428: in dump
    **kwargs
/env/lib/python3.6/site-packages/marshmallow/marshalling.py:147: in serialize
    index=(index if index_errors else None)
/env/lib/python3.6/site-packages/marshmallow/marshalling.py:68: in call_and_store
    value = getter_func(data)
/env/lib/python3.6/site-packages/marshmallow/marshalling.py:141: in <lambda>
    getter = lambda d: field_obj.serialize(attr_name, d, accessor=accessor)
/env/lib/python3.6/site-packages/marshmallow/fields.py:250: in serialize
    return self._serialize(value, attr, obj)
/env/lib/python3.6/site-packages/marshmallow/fields.py:455: in _serialize
    return utils.pluck(ret, key=self.only)
/env/lib/python3.6/site-packages/marshmallow/utils.py:318: in pluck
    return [d[key] for d in dictlist]
KeyError: 'role_name'

This seems to be only affecting the single string case, as passing only=['role_name'] works as expected.

@timc13 timc13 changed the title from Nested only behavior edge case to Nested single string 'only' behavior edge case May 8, 2018

@taion

This comment has been minimized.

Contributor

taion commented May 13, 2018

Would it make sense to remove this support entirely for v3? ref #802

For example, only on Schema does not support a string at all.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 13, 2018

I was able to reproduce this error in v3.0.0b10. I reproduced the same issue in v2.15.2 using dump_to.

Here is a simplified repro case (for v3.0.0b10):

from marshmallow import fields, Schema

class UserRoleSchema(Schema):
    user_id = fields.Number()
    role_name = fields.String(data_key='roleName')

class UserSchema(Schema):
    id = fields.Number()
    roles = fields.Nested(UserRoleSchema, many=True, only='role_name')

user = {
    'id': 1,
    'roles': [
        {'user_id': 1, 'role_name': 'admin'},
        {'user_id': 1, 'role_name': 'test'},
    ]
}

serialized_user = UserSchema().dumps(user)

The issue appears to be that only is being used by Schema.__filter_fields() as a key into the unserialized data and by Nested._serialize() as a key into the serialized data. Nested_serialize() needs to use the same data_key logic as Marshaller.serialize().

@deckar01

This comment has been minimized.

Member

deckar01 commented May 13, 2018

Would it make sense to remove this support entirely for v3?

I think modulating the Nested functionality this much based on the type of a parameter is confusing. The feature is useful though.

What about a dedicated field type to encapsulate this behavior?

class UserSchema(Schema):
    id = fields.Number()
    roles = fields.Pluck(UserRoleSchema, 'role_name', many=True)

I think that interface would communicate the behavior without needing to dig into the docs.

@taion

This comment has been minimized.

Contributor

taion commented May 14, 2018

Sorry, I don't mean removing only, I mean removing supporting a string for only, as this isn't supported on Schema. In other words, instead of supporting:

child = fields.Nested(ChildSchema, only='field_name')

Support only:

child = fields.Nested(ChildSchema, only=('field_name',))
@deckar01

This comment has been minimized.

Member

deckar01 commented May 14, 2018

@taion That is how I interpreted your original comment. Using only as a string isn't a shortcut for keeping one field, it actually replaces the object with a field value. I agree with getting rid of the string option for only, but I think we should add an interface to replace and continue supporting the pluck use case.

@lafrech

This comment has been minimized.

Member

lafrech commented May 14, 2018

It's confusing that only can be a list of strings or a string, and it is even more confusing that a string is not just a shortcut for a list containing a single string.

Plucking a single field is a different feature than embedding a schema, and using the same Field is strange. If that use case must be supported, a different Field like Pluck type sounds better suited.

@sloria

This comment has been minimized.

Member

sloria commented May 14, 2018

I think modulating the Nested functionality this much based on the type of a parameter is confusing. The feature is useful though.

It's confusing that only can be a list of strings or a string, and it is even more confusing that a string is not just a shortcut for a list containing a single string...Plucking a single field is a different feature than embedding a schema, and using the same Field is strange.

Valid points. In retrospect, the current API is awkward.

I think @deckar01 's fields.Pluck proposal makes sense. Let's go with it.

@sloria sloria added this to the 3.0 milestone May 14, 2018

@taion

This comment has been minimized.

Contributor

taion commented May 14, 2018

Oh, I totally misread. Sorry!

@deckar01 deckar01 self-assigned this May 14, 2018

@deckar01 deckar01 added the bug label May 14, 2018

@taion

This comment has been minimized.

Contributor

taion commented May 15, 2018

As a side note, if we go ahead with #810, then __filter_fields is gone anyway. Though I agree that fields.Pluck makes more sense regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment