Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config / runtime option for SerializeAsAny #6423

Closed
4 of 13 tasks
Tracked by #7928 ...
davidhewitt opened this issue Jul 4, 2023 · 14 comments · Fixed by #8830
Closed
4 of 13 tasks
Tracked by #7928 ...

Config / runtime option for SerializeAsAny #6423

davidhewitt opened this issue Jul 4, 2023 · 14 comments · Fixed by #8830
Assignees
Labels
feature request help wanted Pull Request welcome
Milestone

Comments

@davidhewitt
Copy link
Contributor

davidhewitt commented Jul 4, 2023

Initial Checks

  • I have searched Google & GitHub for similar requests and couldn't find anything
  • I have read and followed the docs and still think this feature is missing

Description

(Inspired by the problem in #6403.)

The idea is to add serialize_as_any as a config (or runtime) option.

As a config option, the model would be serialized in a duck-typed fashion (as V1 would behave), but this would not apply to fields of the model:

class GenericBase(BaseModel):
    model_config = ConfigDict(serialize_as_any=True)

As a runtime option, all fields of the model would be serialized in a duck-typed fashion, recursively:

GenericBase().model_dump_json(serialize_as_any=True)

The name serialize_as_any is open to bikeshedding - I've suggested it to match the SerializeAsAny type annotation. One possibly compelling alternative name is strict (maybe ser_strict as the config option to separate from strict validation).


Context

In Pydantic V2 we changed the serialization from duck-typed to serialize exactly the model specified. That is to say, in the following model:

class GenericBase(BaseModel):
    pass

class ListType(GenericBase):
    type: str = "list"
    element_type: GenericBase = Field()

class StringType(RootModel, GenericBase):
    root: str = "string"

Then in V2 the ListType.element_type field will always serialize as {} (as that's the contents of GenericBase), but the equivalent in V1 would serialize as the concrete subclass.

Concretely:

# V2 behaviour
ListType(element_type=StringType()).model_dump_json()
'{"type":"list","element_type":{}}'

# V1 behaviour
ListType(element_type=StringType()).json()
'{"type":"list","element_type":"string"}'

The way to get the V1 behaviour back in V2 (duck-typed serialization) is by wrapping the annotation in SerializeAsAny.

class ListType(GenericBase):
    type: str = "list"
    element_type: SerializeAsAny[GenericBase] = Field()

This will now fix serialization for ListType, but other uses of GenericBase will all need to similarly be wrapped in SerializeAsAny.


Alternatives

Some possible alternatives to the model config:

class MyBase(BaseModel):
    pass

MyBase = Annotated[MyBase, SerializeAsAny]
@serialize_as_any
class MyBase(BaseModel):
    pass
class MyBase(BaseModel, serialize_as_any=True):
    pass

I'm unaware of an alternative strategy for the runtime option at this time.

Affected Components

Selected Assignee: @adriangb

@pydantic-hooky pydantic-hooky bot added the unconfirmed Bug not yet confirmed as valid/applicable label Jul 4, 2023
@adriangb
Copy link
Member

adriangb commented Jul 4, 2023

Related: pydantic/pydantic-core#740

@vkozmik
Copy link

vkozmik commented Jul 28, 2023

Hi, I would really vote for this feature as upgrading a project to add SerializeAsAny on multiple places is very tiresome and brings multiples errors.

@dmontagu
Copy link
Contributor

dmontagu commented Jul 28, 2023

@adriangb where do things stand with the more global ways of enabling duck-typing serialization? Not sure if that was abandoned or paused or what. Seems like it might be worth a config setting unless I'm misremembering challenges with that.

I know you had the PR for the runtime flag but not sure if that addresses the "make it work everywhere by default" use case.

@adriangb
Copy link
Member

I think I ran into technical limitations getting it working, but I don’t fully remember. We can try again.

@vkozmik
Copy link

vkozmik commented Aug 1, 2023

Thanks, that would be appreciated! We can probably workaround this by own schema generation but it is not nice.

@pb376
Copy link

pb376 commented Oct 24, 2023

Is this still something planned for release? If there are specific tasks remaining I would be happy to help. This change has proven to be quite annoying compared to more typical ways to deal with this issue like annotating fields as sensitive/hidden.

@stephenmarkacs
Copy link

We have code that uses pydantic and have been unable to upgrade due to the serialization change as we'd need to make changes in many places in the code. We wanted to share that this config option would be a very helpful feature for us.

@dmontagu
Copy link
Contributor

dmontagu commented Nov 7, 2023

I still think this is a good idea and would be preferred/helpful for a lot of people. @sydney-runkle I think this is worth adding to the priorities list. I also personally think we should prioritize getting something out to address well before any kind of v3 push (just don't want this to get lumped into the "we'll do it in v3" pile).

@sydney-runkle sydney-runkle added this to the v2.6.0 milestone Nov 10, 2023
@sydney-runkle sydney-runkle added help wanted Pull Request welcome and removed unconfirmed Bug not yet confirmed as valid/applicable labels Dec 4, 2023
@samuelcolvin
Copy link
Member

I think we should call this "seared duck" as it's duck typing serialisation.

@sydney-runkle sydney-runkle modified the milestones: v2.6.0, v2.7.0 Jan 17, 2024
@courtland
Copy link

FWIW this is also a blocker for upgrading our large pydantic-based project to v2. We have a custom BaseModel subclass used throughout, with lots of nested models inheriting from it. Not keen on adding SerializeAsAny everywhere.

This new serialization behavior default in v2 is also a bit surprising. We use SecretStr to prevent secrets from being leaked in serialization/dumps.

@rmorshea
Copy link

rmorshea commented Feb 27, 2024

Is there any reasonable workaround for this, perhaps by overwriting __get_pydantic_core_schema__? I made a naive attempt to copy what SerializeAsAny does:

class MyModel(BaseModel):
    @classmethod
    def __get_pydantic_core_schema__(cls, source: type[Any], handler: GetCoreSchemaHandler) -> core_schema.CoreSchema:
        return SerializeAsAny(). __get_pydantic_core_schema__(source, handler)

But this results in pydantic_core._pydantic_core.PydanticSerializationError: Error calling function <lambda>: ValueError: Circular reference detected (id repeated)

I'm building tools used by scientists who are unlikely to understand or remember that they need to use the SerializeAsAny type annotation.

@ptomecek
Copy link

Is there any reasonable workaround for this, perhaps by overwriting __get_pydantic_core_schema__? I made a naive attempt to copy what SerializeAsAny does:

class MyModel(BaseModel):
    @classmethod
    def __get_pydantic_core_schema__(cls, source: type[Any], handler: GetCoreSchemaHandler) -> core_schema.CoreSchema:
        return SerializeAsAny(). __get_pydantic_core_schema__(source, handler)

But this results in pydantic_core._pydantic_core.PydanticSerializationError: Error calling function <lambda>: ValueError: Circular reference detected (id repeated)

I'm building tools used by scientists who are unlikely to understand or remember that they need to use the SerializeAsAny type annotation.

@rmorshea This is how I have worked around it for now (though excited for a better solution). Many thanks to #6381 for inspiration

from pydantic import BaseModel, SerializeAsAny
from pydantic._internal._model_construction import ModelMetaclass
from typing import Any, Dict, Tuple

# See https://github.com/pydantic/pydantic/issues/6381 
class _SerializeAsAnyMeta(ModelMetaclass):
    def __new__(self, name: str, bases: Tuple[type], namespaces: Dict[str, Any], **kwargs):
        annotations: dict = namespaces.get("__annotations__", {})

        for base in bases:
            for base_ in base.__mro__:
                if base_ is BaseModel:
                    annotations.update(base_.__annotations__)

        for field, annotation in annotations.items():
            if not field.startswith("__"):
                annotations[field] = SerializeAsAny[annotation]

        namespaces["__annotations__"] = annotations

        return super().__new__(self, name, bases, namespaces, **kwargs)
    
class MyBaseModel(BaseModel, metaclass=_SerializeAsAnyMeta):
    pass

class A(BaseModel):
    x: int = 0
    
class B_old(BaseModel):
    a: BaseModel
    
class B_new(MyBaseModel):
    a: BaseModel
    
print(B_old(a=A()).model_dump())
print(B_new(a=A()).model_dump())

@vkozmik
Copy link

vkozmik commented Feb 28, 2024

You can probably do something like this, even though I think its a bit slow:

# Force SerializeAsAny to all fields of the model
@classmethod
def __get_pydantic_core_schema__(cls, source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
    schema = handler(source_type)
    schema_to_update = schema

    while "schema" in schema_to_update and schema_to_update["type"] != "model-fields":
        schema_to_update = schema_to_update["schema"]

    fields_to_update = schema_to_update["fields"]
    for field, field_schema in fields_to_update.items():
        while field_schema["type"] in ["definitions", "function-before", "default", "model-field"]:
            field_schema = field_schema["schema"]
        field_schema["serialization"] = core_schema.wrap_serializer_function_ser_schema(
            lambda x, h: h(x), schema=core_schema.any_schema()
        )
    return schema

@rmorshea
Copy link

rmorshea commented Mar 2, 2024

Unfortunately both solutions appear to break if you use any custom types with specialized serialization logic. For example, I have a custom type annotation that serializes Pandas DataFrames into the Apache Parquet file format. Adding SerializeAsAny causes Pydantic to complain Unable to serialize unknown type: <class 'pandas.core.frame.DataFrame'> because the Any annotation from SerializeAsAny is taking precedence over my custom type.

Put another way, the workarounds above are not synonymous with SerializeAsAny[MyModel] rather, they implicitly wrapping every field of MyModel with SerializeAsAny which, as in my case, isn't always desirable.

If there is no other option but to modify the fields of MyModel (rather than wrapping MyModel itself) the only real solution seems to be recursively drilling down into each annotation to wrap only subclasses of BaseModel in SerializeAsAny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.