Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for value-based polymorphism #503

Closed
ralbertazzi opened this issue May 3, 2019 · 13 comments
Closed

Support for value-based polymorphism #503

ralbertazzi opened this issue May 3, 2019 · 13 comments
Labels

Comments

@ralbertazzi
Copy link

ralbertazzi commented May 3, 2019

Hi guys! I'd love to use pydantic but I'm finding it hard to understand how I could use polymorphic types. Say that I have these classes:

class BaseItem:
   pass

class ConcreteItemA(Base):
   a: str

class ConcreteItemB(Base):
   b: int

and their corresponding JSON representation, where the type becomes JSON field:

{
    "type": "item-a",
    "a": "some-string"
},
{
    "type": "item-b",
    "b": 10
}

I'd like to have a model (possibly BaseItem) that is capable of doing this kind of multiplexing, both in serialization and deserialization (i.e. I want to load a ConcreteItem, but I don't know which item until I read the json). Just to add more compelxity, the hierarchy could be deeper and some items might need self-referencing (i.e. an item that has a List[BaseItem]).

Is there anything built-in in pydantic? Any hint on how this could be achieved?

Thanks!

@samuelcolvin
Copy link
Member

I'm currently unavailable.

@samuelcolvin
Copy link
Member

Sorry, that was a bad joke about the issue id.

This is possible, but without knowing all of what you're doing it's hard to give a full solution, still here's a broad outline:

from typing import Union, List

from pydantic import BaseModel, validator

class ConcreteItemA(BaseModel):
    type: str
    a: str

    @validator('type')
    def check_type(cls, v):
        if v != 'item-a':
            raise ValueError('not item-a')
        return v

class ConcreteItemB(BaseModel):
    type: str
    b: int

    @validator('type')
    def check_type(cls, v):
        if v != 'item-b':
            raise ValueError('not item-a')
        return v

class BaseItem(BaseModel):
   root: List[Union[ConcreteItemA, ConcreteItemB]]

m = BaseItem(root=[
    {
        'type': 'item-a',
        'a': 'some-string'
    },
    {
        'type': 'item-b',
        'b': 10,
    }
])
print(m.root)
print(m.dict())

Gives:

[<ConcreteItemA type='item-a' a='some-string'>, <ConcreteItemB type='item-b' b=10>]
{'root': [{'type': 'item-a', 'a': 'some-string'}, {'type': 'item-b', 'b': 10}]}

There are a couple of warts on this approach I'm afraid:

  • you have to use a validator to force type, that should be fixed in Implement const keyword in Schema. #469, or you could use a single element enum, but that's just as ugly
  • unfortunately the polymorphism only works here on the root field, not on the base model itself, eg. you can't do Union[ConcreteItemA, ConcreteItemB].parse_obj(...) or something. I'm afraid I don't know a good way around this except to do something like
e = None
for model_cls in [ConcreteItemA, ConcreteItemB]:
    try:
        return model_ls(**data)
    except ValidationError as e:
        error = e
raise e

Which is effectively what pydantic is doing when it sees root: Union[ConcreteItemA, ConcreteItemB] anyway.

Hope that helps, let me know if you need more info.

@likern
Copy link

likern commented May 3, 2019

@samuelcolvin
Right now I'm doing the same loop to identify best suitable type for model

e = None
for model_cls in [ConcreteItemA, ConcreteItemB]:
    try:
        return model_ls(**data)
    except ValidationError as e:
        error = e
raise e

But, as my validations are done in backend service and I have to check quite large and nested json payloads for me it seems quite inefficient.

What about introducing new class, say Resolver. With which you can register all available models:

  • Resolver.add_model(model)

And later you can call Resolver.validate(json) (like Model.parse_obj()) which will return:

  • Best suitable (and most specific) model
  • None If couldn't find any appropriate model (because errors are swallowed in for loop) or raise exception with description why none of models were applicable (but the last one might be difficult to implement)

So, Resolver.validate(json) will never raise exception.
But the more important property of Resolver - it will be able detect correct model. And, because Resolver will have all models, I can do that in the most efficient way.

For example, Resolver.validate(json) can start checking with unique required fields.

from datetime import datetime
from typing import Optional
from pydantic import BaseModel, validator

class ModelA(BaseModel):
    type: str
    message: str

class ModelB(BaseModel):
    type: Optional[str]
    message: str = None #required, default None

If json doesn't contain field type we can discard ModelA without checking other fields. The same technique could be applied for const fields.

@samuelcolvin
Copy link
Member

I have to check quite large and nested json payloads for me it seems quite inefficient

You can do the deserializing once, regardless of how you later try to validate the data.

If you really care about performance, you could do something like

model_lookup = {'item-a': ConcreteItemA, 'item-b': ConcreteItemB, ...}
data = ujson.loads(raw_data)
if not isinstance(data, dict):
  raise ValueError('not a dict')
try:
  model_cls = model_lookup[data['type']
except KeyError:
  raise ...

m = model_cls(**data)

The point is that by this state your into specifics of your application, which don't belong in pydantic.

But the more important property of Resolver - it will be able detect correct model. And, because Resolver will have all models, I can do that in the most efficient way.

I don't see how resolver can be significantly more efficient than the loop approach above, without significantly rewriting pydantic. The loop appoach is what we currently do for Union and it works well.

Best suitable (and most specific) model

I think this sounds a bit magic, either data is valid for a model or it's not - some floating measure of "specificity" sounds like an unnecessary complexity. If it is need, again it's probably application specific.

I don't personally think Resolver is much use, but if you wanted a utility function for trying to validate data against multiple models that could be done as part of #481, it could work on both dataclasses or models which would be useful.

@likern
Copy link

likern commented May 3, 2019

I think this sounds a bit magic, either data is valid for a model or it's not - some floating measure of "specificity" sounds like an unnecessary complexity

I think it's already working that way. If we have two models both successfully validating json - the model, which will be returned depends on order in for loop (which model happens first).

What to do, if data valid for two models?
For my case (where models are used for routing):

class IssueAction(str, Enum):
    opened = 'opened'

class IssueEvent(BaseModel):
    action: IssueAction
    issue: IssuePayload

class IssueOpened(IssueEvent):
    action: Final[IssueAction] = IssueAction.opened

@webhook.handler(IssueEvent)
async def issue_event(issue: IssueEvent):
    print("[EVENT] Some general event")

@webhook.handler(IssueOpened)
async def issue_opened(event: IssueOpened):
    print("[EVENT] Issue was opened")

For incoming json payload:

{
 "action": "opened",
 "issue": { "id": 5 }
}

it will be valid for both cases, and which handler will be called depends on the order of registering handlers. That's not what would I want.

The point is that by this state your into specifics of your application, which don't belong in pydantic.

That's the point of my idea - do not tie to applications specifics and do not write (these checks already should be done in pydantic validation sooner or later)

if data['type'] == "issue":
 if data["status"] == "opened":
   process_opened_issue(data)
  elif data["status"] == "closed":
   process_closed_issue(data)
elif data['type'] == "pull request":
  ...

Because:

  1. this is already validation, thus we get double validation (partial manual and later complete automatic with pydantic)
  2. that validation / routing might be incorrect / incomplete
  3. it's not always such simple to detect type based on one special field, depending on protocol it might be combination of several fields and presense (or even absense) of specific fields (even nested)

If you really care about performance, you could do something like

I'm thinking about general solution, where you describe protocol (models) and subscribe to interesting events (model applicable to json) and this works without any other partial data parsing and custom routing logic.

I don't personally think Resolver is much use

If implemented in general case it might be in use for any webhook implementation (and I think this is a common case where any service wants to integrate with other third-party service; also it's a case if you want to get real-time notifications - instead of constant polling). Because almost always you will get all events (different json types) in webhook call.

That's why I propose Resolver someone in pydantic. It's different than general BaseModel interface (where you know under which Model you want to validate data on), whereas for Resolver you have to determine model and validate data on that model (simultaneously, because determine model is also validation). And for loop approach (as shown above) not working in that case, when we are dealing with general and more specific inherited models (IssueEvent and IssueOpened)

@petermorrowdev
Copy link

How about using a custom data type that grabs the type name from values in the validator?

from enum import Enum
from pydantic import BaseModel

class ItemA(BaseModel):
    x: int
    y: int

class ItemB(BaseModel):
    i: str
    j: str

class ItemType(Enum):
    A = 'item-a'
    B = 'item-b'

class PolyItem:
    type_map = {
        ItemType.A: ItemA,
        ItemType.B: ItemB,
    }

    @classmethod
    def __get_validators__(cls):
        yield cls.validate
    
    @classmethod
    def validate(cls, v, values):
        item_type = values['type']
        ItemModel = cls.type_map[item_type]
        return ItemModel(**v)

class Record(BaseModel):
    type: ItemType
    item: PolyItem
>>> Record(type='item-a', item=dict(x=1, y=1))
Record(type=<ItemType.A: 'item-a'>, item=ItemA(x=1, y=1))

>>> Record(type='item-b', item=dict(x=1, y=1))
ValidationError: 2 validation errors for Record
item -> i
  field required (type=value_error.missing)
item -> j
  field required (type=value_error.missing)

The PolyItem custom data type implicitly requires a type field in values and that each ItemType has a model in the type_map, so the validate method will throw KeyErrors in those scenarios. It also is currently not validating that v is a dict/mapping.

@outergod
Copy link

I was also running into this until I'd finally realized I could do this:

class ConcreteItemA(BaseModel):
    a: str
    type: str = Field("item-a", const=True)

class ConcreteItemB(BaseModel):
    b: int
    type: str = Field("item-b", const=True)

class BaseItem(BaseModel):
    __root__: Union[ConcreteItemA, ConcreteItemB]

Hope this helps anyone who also comes across the same issue and ends up here.

kervel pushed a commit to Kapernikov/linkml that referenced this issue Dec 20, 2022
This makes the following modifications:
* The type designators are const=True now in pydantic so that pydantic can use them to determine which class needs to be instantiated
* We specify a union of all possible types (using class_descendants) for the range of an slot that refers to a class having a type designator

References:
* pydantic/pydantic#503
* linkml#1099
kervel pushed a commit to linkml/linkml that referenced this issue Jan 21, 2023
This makes the following modifications:
* The type designators are const=True now in pydantic so that pydantic can use them to determine which class needs to be instantiated
* We specify a union of all possible types (using class_descendants) for the range of an slot that refers to a class having a type designator

References:
* pydantic/pydantic#503
* #1099
@nhairs
Copy link

nhairs commented May 2, 2023

I have also ended up here.

I was also running into this until I'd finally realized I could do this:

class ConcreteItemA(BaseModel):
    a: str
    type: str = Field("item-a", const=True)

class ConcreteItemB(BaseModel):
    b: int
    type: str = Field("item-b", const=True)

class BaseItem(BaseModel):
    __root__: Union[ConcreteItemA, ConcreteItemB]

Hope this helps anyone who also comes across the same issue and ends up here.

This is useful to know about.

It also looks like you'd want to combine it with Discriminated Unions

You can also use typing.Literal (which also works with Enums) to force the type field to be a certain value

So combining it all together:

from enum import StrEnum, auto
from typing import Literal, Union
from pydantic import BaseModel, Field
from pydantic.tools import parse_obj_as

class ItemType(StrEnum):
    A = auto()
    B = auto()

class ConcreteItemA(BaseModel):
    a: str
    type: Literal[ItemType.A] = ItemType.A

class ConcreteItemB(BaseModel):
    b: int
    type: Literal[ItemType.B] = ItemType.B

class BaseItem(BaseModel):
    __root__: Union[ConcreteItemA, ConcreteItemB] = Field(...,descriminator="type")

# Test
# ------------------------------------
test_a = {"__root__": {"type": ItemType.A, "a": "foo"}}
test_b = {"__root__": {"type": ItemType.B, "b": 9000}}

a = parse_obj_as(BaseItem, test_a)
repr(a)

b = parse_obj_as(BaseItem, test_b)
repr(b)

@dmontagu
Copy link
Contributor

dmontagu commented May 8, 2023

Another option:

class BaseItem(BaseModel):
   type: Literal['item-a', 'item-b']

class ConcreteItemA(BaseItem):
    type: Literal['item-a']
    a: str

class ConcreteItemB(BaseItem):
    type: Literal['item-a']
    b: int

or

class ConcreteItemA(BaseModel):
    type: Literal['item-a']
    a: str

class ConcreteItemB(BaseModel):
    type: Literal['item-a']
    b: int

BaseItem = Union[ConcreteItemA, ConcreteItemB]

depending on what you are trying to do this may require some tweaks, but hopefully one of the above serves as a good starting point. I agree with @nhairs that Discriminated Union is the right concept here.

@AndreiPashkin
Copy link

class BaseItem(BaseModel):
   type: Literal['item-a', 'item-b']

Could be also:

import enum


class ItemType(enum.Enum):
    A: 'item-a'
    B: 'item-b'


class BaseItem(BaseModel):
    type: ItemType


class ConcreteItemA(BaseItem):
    type: Literal[ItemType.A]
    a: str


class ConcreteItemB(BaseItem):
    type: Literal[ItemType.B]
    b: int

@TheKevJames
Copy link

For anyone running into this thread now that we've got pydantic v2, __root__ has been removed and replaced with pydantic.RootModel inheritance and const has been replaced with typing.Literal.

If you are doing:

class ConcreteItemA(BaseModel):
    a: str
    type: str = Field("item-a", const=True)

class ConcreteItemB(BaseModel):
    b: int
    type: str = Field("item-b", const=True)

class BaseItem(BaseModel):
    __root__: Union[ConcreteItemA, ConcreteItemB]

# >>> BaseItem(type='item-a', a='foo').__root__
# ConcreteItemA(a='foo', type='item-a')

you should now be doing:

class ConcreteItemA(pydantic.BaseModel):
    a: str
    type: Literal['item-a']

class ConcreteItemB(pydantic.BaseModel):
    b: int
    type: Literal['item-b']

class BaseItem(pydantic.RootModel):
    root: Annotated[ConcreteItemA | ConcreteItemB,
                    pydantic.Field(discriminator='type')]

# >>> BaseItem(type='item-a', a='foo').root
# ConcreteItemA(a='foo', type='item-a')

If you used to have a "catch-all" class, such as:

class ConcreteItemA(BaseModel):
    a: str
    type: str = Field("item-a", const=True)

class ConcreteItemB(BaseModel):
    b: int
    type: str = Field("item-b", const=True)

class Unhandled(BaseModel):
    pass

class BaseItem(BaseModel):
    __root__: Union[ConcreteItemA, ConcreteItemB, Unhandled]

then as far as I can tell this is no longer possible to handle via RootModels. If anyone has a way to solve that, I'd be grateful to hear about it!

@jgarvin
Copy link

jgarvin commented Aug 29, 2023

@TheKevJames is there no way to do this that doesn't require listing every possible class up front? it would be nice if classes could be registered at run/import time. In my case I want to save configuration objects, where each configurable class has its own configuration class, which there is an ever growing number of.

@nhairs
Copy link

nhairs commented Aug 30, 2023

is there no way to do this that doesn't require listing every possible class up front?

AFAICT there is no way at the moment, or at least not through the APIs I explored.

In my (the?) ideal case, I'd expect to be able to automatically support all sub-classes of a class like so:

import enum
from typing import Literal, Annotated
from pydantic import BaseModel, Field

class ItemType(enum.IntEnum):
    FOO = 1
    BAR = 2


class BaseItem(BaseModel):
    item_type: ItemType


class FooItem(BaseItem):
    item_type: Literal[ItemType.FOO] = ItemType.FOO
    foo: int = 9000


class BarItem(BaseItem):
    item_type: Literal[ItemType.BAR] = ItemType.BAR
    bar: str = "baz"


class Sale(BaseModel):
    price: int = 100
    item: Annotated[BaseItem, Field(discriminator="item_type")]


test_foo = {"price": 1, "item": {"item_type": 1, "foo": 42}}
foo_sale = Sale.model_validate(test_foo)
print(foo_sale)

test_bar = {"price": 2, "item": {"item_type": 2, "bar": "rab"}}
bar_sale = Sale.model_validate(test_bar)
print(bar_sale)

However as you will see if you run this code, pydantic requires the the discriminated field be a Literal (or Union).

Traceback (most recent call last):
  File "pydantic_unions.py", line 24, in <module>
    class Sale(BaseModel):
< trimmed >
pydantic.errors.PydanticUserError: Model 'BaseItem' needs field 'item_type' to be of type `Literal`
For further information visit https://errors.pydantic.dev/2.3/u/discriminator-needs-literal

However feels like it should be possible to use BaseItem.__subclasses__() instead of using a Literal. If we wanted to do so, I'd expect that we'd need to modify pydantic._internal._discriminated_union._ApplyInferredDiscriminator._infer_discriminator_values_for_inner_schema. Perhaps we should raise this as a feature request instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants