Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: replace load_from and dump_to with a single parameter #717

Closed
lafrech opened this issue Jan 5, 2018 · 10 comments
Closed

RFC: replace load_from and dump_to with a single parameter #717

lafrech opened this issue Jan 5, 2018 · 10 comments

Comments

@lafrech
Copy link
Member

lafrech commented Jan 5, 2018

See discussion in #714.

In the general case, dump/load is symmetric and if you have a model field that is exposed with another name, you use the API name in the schema, and specify the model name as attribute.

         class MySchema(Schema):
             ApiName = fields.String(attribute='model_name')

dump_to and load_from introduce asymmetry, allowing to specify a different key for dump and load:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_load_name', dump_to='api_dump_name')

If you want to reproduce use case 1 using these, you need to specify both load_from and dump_to and make sure they match:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_name', dump_to='api_name')

The flexibility brought by having both load_from and dump_to comes at a price: complexity for the user but also in the code (marshmallow and related libs like webargs, apispec...), potential undefined cases for some weird combinations...

Assuming symmetry is not required, a single parameter should be enough. There is however a limitation with attribute. It can't be used for APIs where the keys are invalid Python variable names:

class MySchema(Schema):
    ApiKey = fields.String(attribute='apikey')
    weird-key = fields.String(attribute='weird_key')  # syntax error

Currently, load_from and dump_to can be used for this as shown above.

The proposal here is to introduce a new parameter to unite them all. Let's call it key for now, short of a better name:

class MySchema(Schema):
    apikey = fields.Int(key='ApiKey')
    weird_key = fields.Int(key='weird-key')

If we do that, then there is no point in attribute anymore (except backward compatibility). So we'd remove attribute, dump_to and load_from.

Any objection to this?

Any suggestion for a better name?

@tinproject
Copy link

tinproject commented Jan 14, 2018

I was reading the discussion on PR #174. Just to add some context and see if I understand well (the doc is not always very clear):

We have a Python Object that by means of an Schema instance we serialize (dump) into a Serialized Representation, that we can take the backwards process and deserialize (load) into a Python Object.

A Schema have Fields that knows how the data can be de/serialized. That Fields can take an attribute from a Python Object and serialize it into a key/name in a map of a Serialized Representation, and back.

Some comments answering the questions:

  • attribute should be kept to decouple the schemas from your Python models/objects. The attribute parameter represents the name of the attribute of the Python Object where the data should be serialized from / deserialize to. If not present (None) Marshmallow assumes that the object attribute have the same name as the schema field. Same current behavior.

  • load_from, dump_to arguments should be consolidated into one that represents the key/name on the Serialized Representation, with the mirror behavior of attribute but for the Serialized Representation side: if is not present (None) Marshmallow should assume that the key/name is he same as the schema field name.

  • If you still need different values in load_from and dump_to what you need is two different schemas

  • A key name could be too terse without a clear doc on what it represents. Other names could be: key_name, name (JSON), field (OpenAPI), field_name, field_key, etc.

@lafrech
Copy link
Member Author

lafrech commented Jan 14, 2018

@tinproject, thanks for stating this clearly, in a perhaps less convoluted way than I did.

I agree with everything you said.

The only I part wasn't totally convinced about is why we need to decouple the schema from the model. I use the schema to decouple API I/O from model objects, but I don't see the need for a schema field with a name that differs from both the model and the API. From an architectural point of view, I mean. From a practical perspective, I just identified a use case. setattr allows you to set attributes with names that are not legal Python variable names (such as 'my-attribute'). I don't see a reason for doing so, but assuming you're dealing with such an object, having the attribute parameter comes in handy. For this, plus compatibility with existing code, I agree we might as well keep attribute.

The new parameter sort of overlaps with attribute, as

my_field = Field(attribute='field')

would be equivalent to

field = Field(key='my_field')

but it does not matter IMHO. Keeping both shouldn't be an issue in terms of code maintenance. I'm more worried about the load_from / dump_to parameters. They lead to confusion, and undefined cases (see an example here: marshmallow-code/apispec#178), so these are the one I'd like to get rid of.

Regarding the name of the new attribute, I think name is as unintuitive as key. I like field. My only concern: the elements of the schemas are also called "fields", and this parameter does not expect a schema field name but the name of a field in the serialiazed representation. Yet, I don't think this is blocking and it doesn't take 20 lines of documentation to make it clear, so field would be fine for me.

@tinproject
Copy link

@lafrech

One good case to have the option to decouple the model from the schema is when refactor the model, not having to change the schema and the related tests, only the attribute names. Also if you have models with names coming directly from a database, and you want to have different names (more clear) in your schema to use in validations for example.

I believe that flexibility, and symmetry on design is well worth.

To simplify the code both attribute and key/name can be initialized with the name of the field at __init__ and then used directly in de/serialization without looking to the field name.

@lafrech
Copy link
Member Author

lafrech commented Jan 18, 2018

Hmmm, at __init__, the field does not know its name, does it?

And I suppose people with enough imagination could come up with use cases where a field's name is changed during execution, or a Field instance is used in several places of a Schema.

I'm afraid this simplification is not possible. Too bad, because it sounded nice.

Or am I missing something?

@lafrech
Copy link
Member Author

lafrech commented Jan 18, 2018

I just pushed a draft (#724).

I'm still unsure about the name. field is a bit disturbing as it is the name of the Marhmallow field, so you'd be writing field.field = ....

I would have liked to avoid key_name of field_name or ser_key (ser as "serialized") to keep symmetry with attribute.

Maybe key would be fine, after all. Any idea anyone?

@lafrech lafrech changed the title RFC: replace attribute, load_from and dump_to with a single parameter RFC: replace load_from and dump_to with a single parameter Jan 24, 2018
@taion
Copy link
Contributor

taion commented Jan 24, 2018

This is a great RFC. I really like the idea.

I would suggest data_key as a name. It's relatively terse, but is still more informative than just key, and is consistent with the argument names in the Schema method signatures:

    def dump(self, obj, many=None, update_fields=True, **kwargs):
    def load(self, data, many=None, partial=None):

I do think it makes sense to drop attribute, but maybe this is opinionated. In my view, the schema should match either the Python object or the serialized representation. It wouldn't really make much sense for the schema not to match one or the other. Given that, for my purposes at least, I've generally found it much easier to make the schema match the Python object.

As such, while something like this "key" parameter is very useful, attribute just makes it hard to do things like "find the field for a given attribute on my object", which is something that comes up more in my own code than the reverse of "find the attribute for a given data key" (which instead just happens via the deserialization process itself).

As an example, suppose I'm building a CRUD HTTP endpoint, and I want to enable filtering (?size=3) on one of the columns in the backing model. It's very convenient in that case to be able to map together model columns and schema fields. Explicitly mapping data keys to schema fields happens much less often.

@dsully
Copy link

dsully commented Mar 4, 2018

+1 - I could really use this while dealing with a legacy CamelCase schema, in a DRY manner.

@lafrech
Copy link
Member Author

lafrech commented Mar 18, 2018

I just rebased #724.

@sloria, any thoughts about the new parameter name?

@sloria
Copy link
Member

sloria commented Mar 18, 2018

@lafrech Thanks for keeping that up to date. I think data_key is a fine name. Let's go with that.

@sloria sloria closed this as completed Mar 24, 2018
bstinsonmhk added a commit to bstinsonmhk/duffy that referenced this issue Apr 25, 2018
We use `dump_to` in our schemas, they're deprecating this keyword in
favor of another one:
marshmallow-code/marshmallow#717

This keeps duffy deployable for now.
bstinsonmhk added a commit to CentOS/duffy that referenced this issue Aug 21, 2018
We use `dump_to` in our schemas, they're deprecating this keyword in
favor of another one:
marshmallow-code/marshmallow#717

This keeps duffy deployable for now.
@vgavro
Copy link
Contributor

vgavro commented Oct 11, 2018

For anyone interested in different keys for serialization/deserialization - here is hack for marshmallow>=3.0.0b8
https://github.com/vgavro/requests-client/blob/0d88c8f907ae2b8e9f77ae2c7144741032acc0b8/requests_client/schemas.py#L90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants