RFC: replace load_from and dump_to with a single parameter #717

lafrech · 2018-01-05T09:12:11Z

See discussion in #714.

In the general case, dump/load is symmetric and if you have a model field that is exposed with another name, you use the API name in the schema, and specify the model name as attribute.

         class MySchema(Schema):
             ApiName = fields.String(attribute='model_name')

dump_to and load_from introduce asymmetry, allowing to specify a different key for dump and load:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_load_name', dump_to='api_dump_name')

If you want to reproduce use case 1 using these, you need to specify both load_from and dump_to and make sure they match:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_name', dump_to='api_name')

The flexibility brought by having both load_from and dump_to comes at a price: complexity for the user but also in the code (marshmallow and related libs like webargs, apispec...), potential undefined cases for some weird combinations...

Assuming symmetry is not required, a single parameter should be enough. There is however a limitation with attribute. It can't be used for APIs where the keys are invalid Python variable names:

class MySchema(Schema):
    ApiKey = fields.String(attribute='apikey')
    weird-key = fields.String(attribute='weird_key')  # syntax error

Currently, load_from and dump_to can be used for this as shown above.

The proposal here is to introduce a new parameter to unite them all. Let's call it key for now, short of a better name:

class MySchema(Schema):
    apikey = fields.Int(key='ApiKey')
    weird_key = fields.Int(key='weird-key')

If we do that, then there is no point in attribute anymore (except backward compatibility). So we'd remove attribute, dump_to and load_from.

Any objection to this?

Any suggestion for a better name?

The text was updated successfully, but these errors were encountered:

tinproject · 2018-01-14T18:47:32Z

I was reading the discussion on PR #174. Just to add some context and see if I understand well (the doc is not always very clear):

We have a Python Object that by means of an Schema instance we serialize (dump) into a Serialized Representation, that we can take the backwards process and deserialize (load) into a Python Object.

A Schema have Fields that knows how the data can be de/serialized. That Fields can take an attribute from a Python Object and serialize it into a key/name in a map of a Serialized Representation, and back.

Some comments answering the questions:

attribute should be kept to decouple the schemas from your Python models/objects. The attribute parameter represents the name of the attribute of the Python Object where the data should be serialized from / deserialize to. If not present (None) Marshmallow assumes that the object attribute have the same name as the schema field. Same current behavior.
load_from, dump_to arguments should be consolidated into one that represents the key/name on the Serialized Representation, with the mirror behavior of attribute but for the Serialized Representation side: if is not present (None) Marshmallow should assume that the key/name is he same as the schema field name.
If you still need different values in load_from and dump_to what you need is two different schemas
A key name could be too terse without a clear doc on what it represents. Other names could be: key_name, name (JSON), field (OpenAPI), field_name, field_key, etc.

lafrech · 2018-01-14T22:20:52Z

@tinproject, thanks for stating this clearly, in a perhaps less convoluted way than I did.

I agree with everything you said.

The only I part wasn't totally convinced about is why we need to decouple the schema from the model. I use the schema to decouple API I/O from model objects, but I don't see the need for a schema field with a name that differs from both the model and the API. From an architectural point of view, I mean. From a practical perspective, I just identified a use case. setattr allows you to set attributes with names that are not legal Python variable names (such as 'my-attribute'). I don't see a reason for doing so, but assuming you're dealing with such an object, having the attribute parameter comes in handy. For this, plus compatibility with existing code, I agree we might as well keep attribute.

The new parameter sort of overlaps with attribute, as

my_field = Field(attribute='field')

would be equivalent to

field = Field(key='my_field')

but it does not matter IMHO. Keeping both shouldn't be an issue in terms of code maintenance. I'm more worried about the load_from / dump_to parameters. They lead to confusion, and undefined cases (see an example here: marshmallow-code/apispec#178), so these are the one I'd like to get rid of.

Regarding the name of the new attribute, I think name is as unintuitive as key. I like field. My only concern: the elements of the schemas are also called "fields", and this parameter does not expect a schema field name but the name of a field in the serialiazed representation. Yet, I don't think this is blocking and it doesn't take 20 lines of documentation to make it clear, so field would be fine for me.

tinproject · 2018-01-15T09:59:34Z

@lafrech

One good case to have the option to decouple the model from the schema is when refactor the model, not having to change the schema and the related tests, only the attribute names. Also if you have models with names coming directly from a database, and you want to have different names (more clear) in your schema to use in validations for example.

I believe that flexibility, and symmetry on design is well worth.

To simplify the code both attribute and key/name can be initialized with the name of the field at __init__ and then used directly in de/serialization without looking to the field name.

lafrech · 2018-01-18T08:11:18Z

Hmmm, at __init__, the field does not know its name, does it?

And I suppose people with enough imagination could come up with use cases where a field's name is changed during execution, or a Field instance is used in several places of a Schema.

I'm afraid this simplification is not possible. Too bad, because it sounded nice.

Or am I missing something?

lafrech · 2018-01-18T08:49:37Z

I just pushed a draft (#724).

I'm still unsure about the name. field is a bit disturbing as it is the name of the Marhmallow field, so you'd be writing field.field = ....

I would have liked to avoid key_name of field_name or ser_key (ser as "serialized") to keep symmetry with attribute.

Maybe key would be fine, after all. Any idea anyone?

taion · 2018-01-24T18:37:06Z

This is a great RFC. I really like the idea.

I would suggest data_key as a name. It's relatively terse, but is still more informative than just key, and is consistent with the argument names in the Schema method signatures:

    def dump(self, obj, many=None, update_fields=True, **kwargs):
    def load(self, data, many=None, partial=None):

I do think it makes sense to drop attribute, but maybe this is opinionated. In my view, the schema should match either the Python object or the serialized representation. It wouldn't really make much sense for the schema not to match one or the other. Given that, for my purposes at least, I've generally found it much easier to make the schema match the Python object.

As such, while something like this "key" parameter is very useful, attribute just makes it hard to do things like "find the field for a given attribute on my object", which is something that comes up more in my own code than the reverse of "find the attribute for a given data key" (which instead just happens via the deserialization process itself).

As an example, suppose I'm building a CRUD HTTP endpoint, and I want to enable filtering (?size=3) on one of the columns in the backing model. It's very convenient in that case to be able to map together model columns and schema fields. Explicitly mapping data keys to schema fields happens much less often.

dsully · 2018-03-04T21:57:39Z

+1 - I could really use this while dealing with a legacy CamelCase schema, in a DRY manner.

lafrech · 2018-03-18T14:31:13Z

I just rebased #724.

@sloria, any thoughts about the new parameter name?

sloria · 2018-03-18T17:54:32Z

@lafrech Thanks for keeping that up to date. I think data_key is a fine name. Let's go with that.

We use `dump_to` in our schemas, they're deprecating this keyword in favor of another one: marshmallow-code/marshmallow#717 This keeps duffy deployable for now.

vgavro · 2018-10-11T09:50:22Z

For anyone interested in different keys for serialization/deserialization - here is hack for marshmallow>=3.0.0b8
https://github.com/vgavro/requests-client/blob/0d88c8f907ae2b8e9f77ae2c7144741032acc0b8/requests_client/schemas.py#L90

sloria added the feedback welcome label Jan 5, 2018

This was referenced Jan 9, 2018

Inspect attribute and/or load_from/dump_to properties for marshmallow extension marshmallow-code/apispec#118

Closed

Marshmallow ext: Name should be set to dump_to even if load_from is not specified marshmallow-code/apispec#178

Closed

lafrech added the backwards incompat label Jan 12, 2018

lafrech added this to the 3.0 milestone Jan 12, 2018

lafrech mentioned this issue Jan 18, 2018

Merge load_from and dump_to into data_key #724

Merged

4 tasks

taion mentioned this issue Jan 24, 2018

[WIP] Respect field attribute in ColumnFilter 4Catalyzer/flask-resty#161

Closed

lafrech changed the title ~~RFC: replace attribute, load_from and dump_to with a single parameter~~ RFC: replace load_from and dump_to with a single parameter Jan 24, 2018

lafrech mentioned this issue Jan 28, 2018

Swagger: use dump_to even if load_from does not match marshmallow-code/apispec#183

Merged

sloria closed this as completed Mar 24, 2018

bstinsonmhk mentioned this issue Apr 25, 2018

Pin the version of marshmallow to 3.0.0b6 CentOS/duffy#7

Merged

dequis mentioned this issue Jun 9, 2018

The removal of load_from/dump_to in 3.x is not documented #837

Closed

bstinsonmhk added a commit to CentOS/duffy that referenced this issue Aug 21, 2018

Pin the version of marshmallow to 3.0.0b6

9b2b08b

We use `dump_to` in our schemas, they're deprecating this keyword in favor of another one: marshmallow-code/marshmallow#717 This keeps duffy deployable for now.

deckar01 mentioned this issue Mar 7, 2022

Use "data_key" only for loading #1955

Closed

lafrech mentioned this issue Dec 5, 2022

RFC alternative key in serialized data #2062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: replace load_from and dump_to with a single parameter #717

RFC: replace load_from and dump_to with a single parameter #717

lafrech commented Jan 5, 2018 •

edited

Loading

tinproject commented Jan 14, 2018 •

edited

Loading

lafrech commented Jan 14, 2018

tinproject commented Jan 15, 2018

lafrech commented Jan 18, 2018

lafrech commented Jan 18, 2018 •

edited

Loading

taion commented Jan 24, 2018

dsully commented Mar 4, 2018

lafrech commented Mar 18, 2018

sloria commented Mar 18, 2018

vgavro commented Oct 11, 2018

RFC: replace load_from and dump_to with a single parameter #717

RFC: replace load_from and dump_to with a single parameter #717

Comments

lafrech commented Jan 5, 2018 • edited Loading

tinproject commented Jan 14, 2018 • edited Loading

lafrech commented Jan 14, 2018

tinproject commented Jan 15, 2018

lafrech commented Jan 18, 2018

lafrech commented Jan 18, 2018 • edited Loading

taion commented Jan 24, 2018

dsully commented Mar 4, 2018

lafrech commented Mar 18, 2018

sloria commented Mar 18, 2018

vgavro commented Oct 11, 2018

lafrech commented Jan 5, 2018 •

edited

Loading

tinproject commented Jan 14, 2018 •

edited

Loading

lafrech commented Jan 18, 2018 •

edited

Loading