New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump_only whitelist/inverse / load_only whilst including when dumping #453

Closed
nickw444 opened this Issue May 14, 2016 · 9 comments

Comments

Projects
None yet
5 participants
@nickw444

nickw444 commented May 14, 2016

I feel like this kind of touches on #375, but that didn't seem to align with the behaviour that I think many would find useful. Also highly related to #61

It would be useful to use the meta class (or constructor) to be able to specify a set of fields that are whitelisted for deserialisation. If the field doesn't exist in the iterable, it cannot be deserialised.

If a user of marshmallow had a large schema class, with many fields, but only wanted 1 of those fields to be deserializable/writable, from my understanding they could either:

  • name every field they want as read only in dump_only attribute (lots of fields, prone to errors, ie missing a field)
  • When loading data, specify only=('fieldname',) upon instantiating the schema. (prone to errors, forgetting to put this in when instantiating)

Am I missing something, or is this a shortcoming that could be addressed?

Also to note, the naming of load_only feels like a misnomer without reading the docs in detail - If you specify a field in load_only, it will only be available for loading, not dumping - but taking another perspective to this could imply that it's an iterable of fields that are whitelisted to be loaded, but can still be dumped.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 25, 2016

... the naming of load_only feels like a misnomer ...

I see what you mean with load_only. I assumed the wrong behavior until I looked at the docs.

load_only: A list or tuple of fields to skip during serialization
dump_only: A list or tuple of fields to skip during deserialization, read-only fields

A good way to illustrate why this is a strange interface is to create a schema with a field that is both load_only and dump_only.

class TestSchema(Schema):
    field1 = fields.Field()
    field2 = fields.Field()

    class Meta:
        load_only=('field1',)
        dump_only=('field1',)

field1 will not be serialized nor deserialized in this example. load_only actually means "don't dump" and dump_only means "don't load".

load dump
True
False dump_only load_only

A clearer terminology might be "force" and "skip" which have clear connotations and no implicit behaviors.

load dump
True force_load force_dump
False skip_load skip_dump

The only ambiguity is priority, but I believe that in the rest of the library exclusion has priority over inclusion and instances have priority over classes.

@nickw444

This comment has been minimized.

nickw444 commented May 30, 2016

I see exactly what you're saying, and I understand the functionality of using load_only and dump_only.

Whilst better names such as force_load, etc do improve the usability of the library, it still wouldn't expose API to whitelist fields from being loaded/dumped. (Unless specifying force_load or force_dump implies that any unlisted fields shouldn't be loaded or dumped)

@deckar01

This comment has been minimized.

Member

deckar01 commented May 31, 2016

Unless specifying force_load or force_dump implies that any unlisted fields shouldn't be loaded or dumped.

No, those options wouldn't solve the issue of whitelisting deserialized fields.

The option currently used to whitelist a set of fields on a schema is fields, but it only applies to serialization, not deserialization, which seems a bit asymmetrical considering there is nothing about the name "fields" that implies serialization exclusively.

a user of marshmallow had a large schema class, with many fields, but only wanted 1 of those fields to be deserializable

It looks like partial loading (the partial option for Schema.load()) might be an existing solution. The "partial" white list has to be supplied on each load(), which doesn't address the concern of "forgetting to put [it] in", but the set of fields could be assigned to a variable for reuse.

deserializable_fields = ('fieldname',)
...
data, errors = schema.load(partial=deserializable_fields)

If the white list really needs to be declared on a schema class or instance, maybe a separate schema class should be declared.

@lafrech

This comment has been minimized.

Member

lafrech commented Nov 10, 2016

@deckar01 Interesting comment. I feel the same about the load_only + dump_only case.

Looks like we're bearing the weight of history. Starting from scratch, we'd use better terms, like dont_load or skip_load or anything less ambiguous. But here we are.

These are features so common that there must be a lot of those in the codes using Marshmallow, and the benefit for changing this is minimal. You get struck by this once, then you learn and you move on.

Yet, is there any reason (apart from the time it takes to do it...) not to propose new keywords, and slowly deprecate the old ones?

@lafrech

This comment has been minimized.

Member

lafrech commented Feb 19, 2017

@deckar01 wrote:

A clearer terminology might be "force" and "skip" which have clear connotations and no implicit behaviors.

@sloria, do you think Marshmallow 3.0 could introduce skip_load/skip_dump and deprecate load_only/dump_only?

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Feb 19, 2017

What does "force" actually mean? If the field is not marked with "required", will "force" flag force to be required?

I don't get why people are so concerned about naming and claim that it is used a lot. It should not. Most common - "password" is load_only, "id" - dump_only. If you have use cases that require you defining complex dependencies and load/dump schemas, you should really consider splitting them into a separate schemas (maybe with reusing common parts through base schema). That's it.

PS I think load_only / dump_only semantics are OK.

@lafrech

This comment has been minimized.

Member

lafrech commented Feb 19, 2017

Just to clarify, there are two points at stake here:

  • the whitelist feature, which is what the OP is about,
  • the naming of load_only/dump_only (side note in the OP)

Regarding the second point, there are two potentially surprising things about those names:

As @nickw444 said in OP

Also to note, the naming of load_only feels like a misnomer without reading the docs in detail - If you specify a field in load_only, it will only be available for loading, not dumping - but taking another perspective to this could imply that it's an iterable of fields that are whitelisted to be loaded, but can still be dumped.

And as @deckar01 said

A good way to illustrate why this is a strange interface is to create a schema with a field that is both load_only and dump_only

I found this a bit surprising and so did my colleagues.

As I said, it is no big deal. You get hit once, and then you understand and you know. It might not be worth the pain of changing load_only/dump_only in the whole library (and all the codes using it). But since it is a breaking change (mitigated by keeping old terms working although considered deprecated), the release of v3 would be a good time to do it.

I don't mean to argue about this. I don't really mind, that was just a suggestion. And the "not worth the pain" argument is perfectly relevant.

@maximkulkin

This comment has been minimized.

Contributor

maximkulkin commented Feb 19, 2017

I agree with one thing: "load_only"/"dump_only" feilds in Meta class IS indeed confusing. It would be much easier to specify those flags directly in a field definition:

class UserSchema(m.Schema):
    id = mf.String(dump_only=True)
    email = mf.String(required=True)
    password = mf.String(load_only=True)

So, in my understanding the Meta fields are mostly used when you're integrating with third-party schema system (e.g. SqlAlchemy), you want to automatically import all their stuff and then customize it further. So, I would say that this support should have never existed in the core library in the first place and let integration libraries handle all those customizations.

@sloria

This comment has been minimized.

Member

sloria commented Mar 19, 2017

I don't plan on adding a whitelist feature because it would be redundant and potentially confusing API. I think the OP's use case can be solved with a base class that uses the on_bind_field hook. Something like

from marshmallow import Schema, fields

class ReadOnlySchema(Schema):
    """Schema that every field dump-only if it doesn't explicitly set load_only=True.
    """
    def on_bind_field(self, field_name, field_obj):
        # Any field that is not load-only will be set as dump-only
        if not field_obj.load_only:
            field_obj.dump_only = True


class MySchema(ReadOnlySchema):
    id = fields.Str()
    password = fields.Str(load_only=True)

schema = MySchema()

assert schema.fields['id'].dump_only is True
assert schema.fields['password'].load_only is True
assert schema.load({'id': 'ignored', 'password': 'secret'}).data == {'password': 'secret'}
assert schema.dump({'id': 'abc12', 'password': 'ignored'}).data == {'id': 'abc12'}

As for the names dump_only and load_only--these make sense in the context of the dump_only and load_only field constructor parameters. @maximkulkin does have a point that the class Meta options are used primarily by libraries that dynamically generate fields and therefore should be implemented in those libraries. That said, I think we'll keep those for the time being, if only to maintain compatibility.

Also, I generally try to avoid parameter names that have a negative word in them, e.g. "skip", because double negatives (skip_load=False) add cognitive load for the user.

Closing for now. We can reopen if further discussion is necessary.

@sloria sloria closed this Mar 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment