Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json Encoders are ignored in nested structures #2277

Closed
3 tasks done
mvanderlee opened this issue Jan 20, 2021 · 21 comments · Fixed by #3941
Closed
3 tasks done

Json Encoders are ignored in nested structures #2277

mvanderlee opened this issue Jan 20, 2021 · 21 comments · Fixed by #3941
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@mvanderlee
Copy link

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this is a bug

Bug

When nesting a model with customer encoders, they are ignored by the parent.

from datetime import datetime, timedelta 
from pydantic import BaseModel 
from pydantic.json import timedelta_isoformat 
 
 
class WithCustomEncoders(BaseModel): 
    dt: datetime 
    diff: timedelta 
 
    class Config: 
        json_encoders = { 
            datetime: lambda v: v.timestamp(), 
            timedelta: timedelta_isoformat, 
        } 
 
class ParentWithoutEncoders(BaseModel): 
    child: WithCustomEncoders
 
m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100)) 
print(m.json())

p = ParentWithoutEncoders(child=m)
print(p.json())


#> {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
#> {"child": {"dt": "2032-06-01T00:00:00", "diff": 360000.0}}

Expected output:

#> {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
#> {"child": {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}}
@mvanderlee mvanderlee added the bug V1 Bug related to Pydantic V1.X label Jan 20, 2021
@PrettyWood
Copy link
Member

Hello @mvanderlee
The config depends on the model not the type of the field, even if this one is also a model.
So you'll need to duplicate your config or add it in your own custom BaseModel.

@mvanderlee
Copy link
Author

mvanderlee commented Jan 20, 2021

Duplicating the config is a non-starter for us.
So you're saying that there is currently no way to ensure a Model will always be serialized a certain way? Other than manually defining a json encoder and always specifying it on json.dump. Which again, would be a non-started for us.

I need to allow Models to be 2-way serializable.

You mentioned a custom BaseModel, do you have an example?

@PrettyWood
Copy link
Member

PrettyWood commented Jan 20, 2021

Sure thing!

from datetime import datetime, timedelta
from pydantic import BaseModel as PydanticBaseModel
from pydantic.json import timedelta_isoformat


class BaseModel(PydanticBaseModel):
    """
    All the instances of BaseModel should serialize
    those types the same way
    """
    class Config:
        json_encoders = {
            datetime: lambda v: v.timestamp(),
            timedelta: timedelta_isoformat,
        }


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders


m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100))
print(m.json())
# {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}

p = ParentWithoutEncoders(child=m)
print(p.json())
# {"child": {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}}

The magic is possible because Config of a subclass inherits from the Config of its parent!

You can also change default value of BaseConfig, which all Config classes inherit from

from datetime import datetime, timedelta
from pydantic import BaseConfig, BaseModel
from pydantic.json import timedelta_isoformat

# All the instances of BaseModel should serialize those types the same way
BaseConfig.json_encoders = {
    datetime: lambda v: v.timestamp(),
    timedelta: timedelta_isoformat,
}


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders


m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100))
print(m.json())
# {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}

p = ParentWithoutEncoders(child=m)
print(p.json())
# {"child": {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}}

Hope it helps :)

@mvanderlee
Copy link
Author

mvanderlee commented Jan 21, 2021

Right, but now if I add a datetime field to the parent. Then both would use the same format.
I'm looking to be able to define how a particular Model serializes, not all data types within the context.

We can validate, and deserialize per model, but not serialize per model as far I can tell.

from datetime import datetime, timedelta
from pydantic import BaseModel, validator
from pydantic.json import timedelta_isoformat


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta

    @validator('dt', pre=True)
    def validate_dt(cls, v):
        try:
            return datetime.fromtimestamp(v)
        except Exception as e:
            raise ValueError('must be a valid timestamp', e)

    class Config:
        json_encoders = {
            datetime: lambda v: v.timestamp(),
            timedelta: timedelta_isoformat,
        }


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders
    p_dt: datetime

    @validator('p_dt', pre=True)
    def validate_p_dt(cls, v):
        try:
            return datetime.fromisoformat(v)
        except Exception as e:
            raise ValueError('must be valid iso string', e)

    class Config:
        json_encoders = {
            datetime: lambda v: v.isoformat()
        }

raw_m = '{"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}'
raw_p = '{"child": {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}, "p_dt": "2032-06-01T00:00:00"}'

m = WithCustomEncoders.parse_raw(raw_m)
p = ParentWithoutEncoders.parse_raw(raw_p)

print(m.json())
print(p.json())

assert m.json() == raw_m
assert p.json() == raw_p

# {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
# {"child": {"dt": "2032-06-01T00:00:00", "diff": 360000.0}, "p_dt": "2032-06-01T00:00:00"}

# Traceback (most recent call last):
#   File "<stdin>", line 50, in <module>
# AssertionError

The only way I currently see is to override the _iter function, i.e.:

class WithCustomEncoders(BaseModel): 
    dt: datetime 
    diff: timedelta 
 
    @validator('dt', pre=True) 
    def validate_dt(cls, v): 
        try: 
            return datetime.fromtimestamp(v) 
        except Exception as e: 
            raise ValueError('must be a valid timestamp', e) 
 
    class Config: 
        json_encoders = { 
            datetime: lambda v: v.timestamp(), 
            timedelta: timedelta_isoformat, 
        } 
         
    def _iter(self, *args, **kwargs): 
        for key, v in super()._iter(*args, **kwargs): 
            if key == 'dt': 
                yield key, v.timestamp() 
            elif key == 'diff': 
                yield key, timedelta_isoformat(v) 
            else: 
                yield key, v

Edit: _iter(...) if it should also apply to dict(...) otherwise overriding json(...) in a similar fashion would work as well if it should only apply to json dumping.

@shyamd
Copy link

shyamd commented Jan 21, 2021

I agree with @mvanderlee that it's not clear at first that the parent model that is converting to JSON establishes all the encoders and the nested models are converted using those. This appears to be because the whole object is converted into a dict and then the JSON encoder of the parent model is applied to the dictionary. An alternative might be to recursively call the JSON encoder on any pydantic BaseModel children and incorporate along the way.

@mvanderlee if you're interested in a PR, you might look at modifying https://github.com/samuelcolvin/pydantic/blob/13a5c7d676167b415080de5e6e6a74bea095b239/pydantic/main.py#L474-L509

This is likely to be much slower than the single call to JSON encoder, so I'd recommend making it an option or even a new method?

@goroderickgo
Copy link

goroderickgo commented Mar 13, 2021

Agreed with the other folks here that I can see why the current behavior is implemented the way it is, but there may be use-cases where you might want to serialize different components of the same type differently.

Perhaps I hadn't been looking at the right parts of the documentation, but I didn't really understand the Config inheritance until I stumbled upon this issue and saw @PrettyWood's explanation. Would it be possible to add some of this discussion to the documentation if it's not already there?

In the interim, based on @PrettyWood's example, what I came up with was just to create subclasses of the nested object (maybe this is overkill, would appreciate any suggestions!). Using a similar example:

class DatetimeA(datetime.datetime):
    """Datetime that will be encoded as a string in pydantic Config."""
    def __new__(cls, *args, **kwargs):
        return datetime.datetime.__new__(cls, *args, **kwargs)

class DatetimeB(datetime.datetime):
    """Datetime that will be encoded as a timestamp in pydantic Config."""
    def __new__(cls, *args, **kwargs):
        return datetime.datetime.__new__(cls, *args, **kwargs)

class MyBaseModel(BaseModel):
    class Config:
        json_encoders = {
            DatetimeA: lambda v: str(v),
            DatetimeB: lambda v: v.timestamp()
    }

class TestModel(MyBaseModel):
    datetime_a: DatetimeA
    datetime_b: DatetimeB

class ParentModel(MyBaseModel):
    test_model: TestModel

test_model = TestModel(
    datetime_a=DatetimeA(2020, 1, 1),
    datetime_b=DatetimeB(2015, 1, 1)
)

parent_model = ParentModel(
    test_model=test_model
)

print(test_model.json())
print(parent_model.json())

Output:

# '{"datetime_a": "2020-01-01 00:00:00", "datetime_b": 1420088400.0}'
# '{"test_model": {"datetime_a": "2020-01-01 00:00:00", "datetime_b": 1420088400.0}}'

@aiguofer
Copy link

I just ran into this issue... after a while of debugging I realized that my Config wasn't respected in nested models.

Although for my project it's OK to simply set a "global" encoder, I can see the value in defining how to encode a specific model within that model's Config and having that respected even when that model is nested. This is especially true when a module/library exposes public pydantic models for others to use.

@lilyminium
Copy link
Contributor

I opened a PR at #3941 addressing this with a keyword argument (use_nested_encoders). It's a fairly simple band-aid onto the issue and probably the top flaw is that it does not work when models_as_dict=False, because that just passes raw data into json.dumps with a global encoder. Would love to get feedback, or continue discussing here!

@samuelcolvin samuelcolvin added this to the Version 2 milestone Apr 2, 2022
@samuelcolvin
Copy link
Member

I can see the argument for this, but it will have to wait until v2.

@lilyminium
Copy link
Contributor

Thanks -- can you let me know how that affects #3941? Should it (continue being) on hold?

@samuelcolvin
Copy link
Member

I've changed my mind, @lilyminium you PR looks great, needs some fixes, but I'd be happy to accept it for v1.10.

@ryzhakar
Copy link

ryzhakar commented Sep 4, 2022

Just to clarify for a passing-by reader: this doesn't seem to be solved. The PR above was initially accepted and then reverted. Now the problems waits till the V2 release.

@samuelcolvin
Copy link
Member

samuelcolvin commented Sep 4, 2022

Correct, sorry for confusion.

See #4456 for discussion of strategies in v2.

@aguckenber-chwy
Copy link

aguckenber-chwy commented Nov 17, 2022

When will this be released? We are really in need of it but it was not released in 1.10.2.

@samuelcolvin
Copy link
Member

V2 should be released some time in Q1 2023. I had hoped to have it out this year, but I haven't had as much help as I'd hoped for.

@aguckenber-chwy
Copy link

aguckenber-chwy commented Nov 17, 2022

I've changed my mind, @lilyminium you PR looks great, needs some fixes, but I'd be happy to accept it for v1.10.

A sorry wrong issue I ended up leaving the comment on. Anyway this part can be part of V1? #3941

@samuelcolvin
Copy link
Member

No, v1 develop has stopped except for bug fixes.

@aguckenber-chwy
Copy link

So much for your comment above, I guess you had a change of heart. Will wait for V2 and fork it to fix it.

@samuelcolvin
Copy link
Member

Well I think it was more a omission, the pr wasn't completed in time for v1.10, it's too big to be included in a patch release and there won't be a v1.11. Thus by default the change has to wait for V2.

Remember pydantic isn't (yet) properly funded and it's developed and maintained by volunteers.

Thanks for your understanding. I'm working flat out on V2, no one wants it released more than me.

@aguckenber-chwy
Copy link

aguckenber-chwy commented Nov 17, 2022

Well I think it was more a omission, the pr wasn't completed in time for v1.10, it's too big to be included in a patch release and there won't be a v1.11. Thus by default the change has to wait for V2.

Remember pydantic isn't (yet) properly funded and it's developed and maintained by volunteers.

Thanks for your understanding. I'm working flat out on V2, no one wants it released more than me.

I really do appreciate the work by the way, and the library. I was just disappointed to google my issue, find a branch that was merged 1 month prior to v1.10 that fixed it, but wasn't released in 1.10, wasting 6 hours trying to find a workaround because Pydantic doesn't listen to nested encoders and dealing with circular references has been a pain, but I really appreciate the work you do! Keep it up! Sorry about the tone. Just upset that it didn't make it in V1.10.

@dmontagu
Copy link
Contributor

dmontagu commented Apr 27, 2023

It seems to me that the specific problems brought up in this issue can mostly be resolved now through the use of the new @model_serializer and @field_serializer in v2. We have actually removed the Config.json_encoders thing in v2 in favor of this approach, but if you run into problems caused by the removal of json_encoders, please open a new issue about it and we'll be happy to attempt to resolve your issues and/or come up with a way to further ease migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants