New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make model dump/load symmetric for aliased fields #160

Merged
merged 7 commits into from May 24, 2018

Conversation

3 participants
@bendemaree
Contributor

bendemaree commented Apr 19, 2018

When using field aliases, I expected to be able to dump a model to a dict and then load that result back into the same model, but field aliases aren't considered when populating a model. This changes that behavior to allow this; in other words, MyModel(**MyModel(a="foo").dict()) now works if a has an alias.

The field alias is considered first to maintain as much of the existing behavior, but this change does have the potential to break user code, as data that previously failed validation could now pass.

Thanks for this library, @samuelcolvin! It's a very nice tool to have in the toolbox. 馃槃

@codecov

This comment has been minimized.

codecov bot commented Apr 19, 2018

Codecov Report

Merging #160 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #160      +/-   ##
==========================================
+ Coverage   99.89%   99.89%   +<.01%     
==========================================
  Files           9        9              
  Lines         992      995       +3     
  Branches      205      206       +1     
==========================================
+ Hits          991      994       +3     
  Partials        1        1
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Apr 19, 2018

I can't accept a change like this as it alters current behaviour too much.

Better to add a kwarg to dict use_aliases which defaults to false, but when true uses the aliases as keys.

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented Apr 19, 2018

Understood; I鈥檒l get this updated as an opt-in behavior. Thanks!

@bendemaree bendemaree force-pushed the bendemaree:symmetric-dump-load branch from 0e0e298 to 5289bc6 Apr 20, 2018

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented Apr 20, 2018

WDYT of this approach? I'd rather be able to opt-in on a model-by-model basis, rather than having to hunt down every call site for dict(), which seems error-prone.

If this looks reasonable to you, I'll update the documentation as well. Thanks!

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Apr 24, 2018

see #164, I think this is a cleaner way of doing it with fewer potential unexpected consequences.

If you agree I'll close this and merge #164.

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented Apr 26, 2018

#164 looks like the implementation of your initial feedback, I think. I understood what you were suggesting, but I'm not convinced it's cleaner. I'll share a bit more of my use case in hopes it might highlight why I think it might be cleaner to define this in the model config instead.

We've implemented common models for a large number of objects that we are ingesting from a third-party API. That API provides a reasonable structure for the data, but in numerous instances, we've needed to alias field names so that we can use these objects in downstream services in a more developer-friendly way. These objects, after being loaded from the third-party JSON and validated, are then serialized and placed on a message queue, where other consumers are able to receive the serialized data and load it back into the exact same pydantic model (with the model distributed as shared code). In many cases, those consumers then re-serialize all or part of the models they've loaded from messages they've received, and they send them to other components.

Thus, as I mentioned, we have a growing number of dict() calls already. Not only would updating them be a hassle, but later on, it would be an easy mistake for a developer to forget use_aliases when calling dict()--a mistake they may only learn about when another system consumes their serialized message and the message fails validation.

By having the model itself configured to allow aliases, this problem is avoided. The model remains the canonical definition of how it will serialize/deserialize itself--this is already the case for other aspects of serialization/deserialization; in my mind, this is a very close sibling to allow_extra and ignore_extra.

I hope this provides some useful context for why I added the option to the model config. I'm also curious what the unexpected consequences you have in mind are; I'm probably blind to them, having not seen the library evolve, and I definitely don't want to cause any!

Thanks for your consideration. 馃槃

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Apr 28, 2018

#164 looks like the implementation of your initial feedback, I think.

It is, in the end implementing another PR to show what I meant was easier than a comment with loads of code.

I've come round to your suggestion, if you still think this is required once you've read through my explanation below I'll merge this. (documentation and history will need to be updated first)

I'm also curious what the unexpected consequences you have in mind are

Let's say we have

class Foobar(BaseModel):
    card_number: str
    expiry: date = date(2020, 1, 1)

    class Config:
        allow_population_by_alias = True
        fields = {
            'card_number': {
                'alias': 'cardNumber'
            }
        }

That model expects something like {"cardNumber": "4242..."} but you've added allow_population_by_alias = True so your model works with some backend. Now there's a weird case where someone can also submit {"card_number": "4242..."} which you didn't intend for the public API but will work. This will be fine with pydantic but could mess up other components.

Thus, as I mentioned, we have a growing number of dict() calls already.

You could just have a base model you use everywhere like

class OurBaseModel(BaseModel):
    def dict(self, *kwargs):
        kwargs.setdefault('use_aliases', True)
        return super().dict(**kwargs)

or use_aliases default value could be added to Config.

are then serialized and placed on a message queue, where other consumers are able to receive the serialized data and load it back into the exact same pydantic model

This would be much better accumplished using pickle. Pickling and unpickling models will be much more performant as validation doesn't have to be rerun upon every deserialisation, it also gets round this very problem.

I setup pydantic to play well with pickle for this very case, see here.

@samuelcolvin samuelcolvin force-pushed the samuelcolvin:master branch from 29c562e to 36a2061 Apr 28, 2018

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented May 1, 2018

Thanks for going over this--I appreciate your thorough response. 馃槃

I agree with what you're saying w.r.t. similar, but ultimately undesired, values being considered valid. I think this concern can/should be strongly stated in the documentation for this new config field, for the reasons you've laid out. I'll have an update to the docs and history shortly.

Perhaps the root of my confusion is in my interpretation of "alias;" I'd understood it as setting up an equivalence between the field names, rather than a one-way transformation, and it was that confusion that makes me think having this noted and configurable could improve the DX for others. OTOH, sometimes I can't words well, so it might just be me!

This would be much better accomplished using pickle.

It would definitely be faster, but I don't mind re-validating when the payload is coming off the wire...and I've had some bad experiences with pickle that still sting a bit. Appreciate the suggestion, though; it was a selling point when looking at pydantic that you'd ensured compatibility with pickle. Not every author can be bothered to do so, I think. 馃榿

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 2, 2018

I agree that alias isn't a perfect term for this, but then again I can't think of a better one, input_field_name is uuuuuugly. Let's just document it better and stick to "alias"

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 2, 2018

It's also worth noting that inheritance should "work" (aka, do something magic and cause massive confusion) between different models.

So you could have

class FrontendFoobar(BaseModel):
    ...
    class Config:
        foobar = 'xxx'
        fields = {'card_number': 'cardNumber', 'expiry_date': 'expiryDate'}

class BackendFoobar(FrontendFoobar):
    class Config:
        fields = {}

Here BackendFoobar.Config.foobar == 'xxx' because of magic config inheritance, but also `data = frontend_foobar.dict() backend_foobar(**data)``` should work because you've got rid of the aliases. Perhaps this is an uglier sollution than all the others, not sure.

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 16, 2018

@bendemaree what's your intention on this?

@dsully

This comment has been minimized.

dsully commented May 20, 2018

+1 I could use this.

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented May 20, 2018

@samuelcolvin I will have the remaining changes up tomorrow. My apologies; I forgot that I was going on vacation for a few weeks a day or two after committing to this. 鈽癸笍

@bendemaree

This comment has been minimized.

Contributor

bendemaree commented May 23, 2018

@samuelcolvin I've updated the documentation; hopefully it's to your standard. I also took the liberty of opening the changelog for a new patch version. Apologies for the additional delay on this!

@samuelcolvin

otherwise LGTM.

population by alias disabled (the default), trying to parse an object with only the key ``cardNumber`` will fail.
However, if you enable population by alias, this previously-invalid object will now populate the ``card_number``
field from ``cardNumber``. This may be desired for some use cases, but in others (like the one given here, perhaps),
relaxing strictness could introduce bugs.

This comment has been minimized.

@samuelcolvin

samuelcolvin May 24, 2018

Owner

wait this is the wrong way around I think, shouldn't it be

As an example, say you have a field named ``card_number`` with the alias ``cardNumber``. With
``allow_population_by_alias = False`` disabled, trying to parse an object with only the key 
``card_number`` will fail (the ``cardNumber`` field is required). However, if you enable population 
by alias, this previously-invalid object will now populate the ``card_number`` field from 
``card_number``. This may be desired or acceptable for some use cases, but in others it could 
introduce bugs.

?

This comment has been minimized.

@bendemaree

bendemaree May 24, 2018

Contributor

Oh my...thank you for catching that! 馃槼

@samuelcolvin samuelcolvin merged commit 62d39d9 into samuelcolvin:master May 24, 2018

3 checks passed

codecov/project 99.89% (+<.01%) compared to 31683f8
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 24, 2018

Great, thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment