V2 Generic Improvements #5351

samuelcolvin · 2023-04-03T14:38:22Z

generic dataclasses
a TypedDict to gather these
as @dmontagu mentioned, generic tests looks slow - is this indicative of generics being slow? Can we sort it by caching core-schema if there are no generic arguments?

The text was updated successfully, but these errors were encountered:

dmontagu · 2023-04-04T18:45:30Z

Generic tests benchmarking:

This script was written based on tests.test_generics.test_generics_work_with_many_parametrized_base_models:

import time
from typing import TypeVar, Generic
from pydantic import BaseModel


def run(n_models: int):
    t0 = time.time()

    T = TypeVar('T')
    C = TypeVar('C')

    class A(BaseModel, Generic[T, C]):
        x: T
        y: C

    class B(A[int, C], BaseModel, Generic[C]):
        pass

    models = []
    for i in range(n_models):
        class M(BaseModel):
            pass

        M.__name__ = f'M{i}'
        models.append(M)

    generics = []
    for m in models:
        Working = B[m]
        generics.append(Working)

    t1 = time.time()
    print(f"{t1 - t0:.2f} seconds elapsed")


run(3000)

On my computer, on commit af73124 (immediately before the schema building refactor), the runtime is ~1.38 seconds.

On the current main branch, I get ~2.40 seconds.

(I will do the same check for validation time shortly, though I suspect the impact is smaller.)

dmontagu · 2023-04-04T18:59:22Z

Benchmarking script for validating with generic models:

import time
from pathlib import Path
from typing import Generic, TypeVar, Union

from pydantic import BaseModel

T = TypeVar('T')

# Based on tests.test_generics.test_generic_recursive_models

class GenericModel1(BaseModel, Generic[T]):
    ref: 'GenericModel2[T]'

    model_config = dict(undefined_types_warning=False)


class GenericModel2(BaseModel, Generic[T]):
    ref: Union[T, GenericModel1[T]]

    model_config = dict(undefined_types_warning=False)


GenericModel1.model_rebuild()
GenericModelStr = GenericModel1[str]

class RecursiveModelStr1(BaseModel):
    ref: 'RecursiveModelStr2'

    model_config = dict(undefined_types_warning=False)


class RecursiveModelStr2(BaseModel):
    ref: Union[str, RecursiveModelStr1]

    model_config = dict(undefined_types_warning=False)

RecursiveModelStr1.model_rebuild()
RecursiveModelStr = RecursiveModelStr1

def run1(n_models: int):
    t0 = time.time()
    for _ in range(n_models):
        # RecursiveModelStr.model_validate(dict(ref=dict(ref=dict(ref=dict(ref='123')))))
        # GenericModelStr.model_validate(dict(ref=dict(ref=dict(ref=dict(ref='123')))))
        GenericModel1[str].model_validate(dict(ref=dict(ref=dict(ref=dict(ref='123')))))
    t1 = time.time()
    print(f"{t1 - t0:.2f} seconds elapsed")


run1(1000000)

# ~1.89s if you use a non-generic recursive model
# ~1.90s if you use a generic model and cache (i.e., don't re-call __class_getitem__)
# ~2.70s if you use a generic model and don't cache (i.e., re-call __class_getitem__)

# ~1.88s on pre-refactor (commit af73124) if you cache generic
# 3.25s on pre-refactor (commit af73124) if you don't cache generic

Good news is that it seems to be highly performant. The __class_getitem__ has also been sped up after the refactor, though it is still pretty slow relative to validation (yay pydantic-core performance). I wonder if there's more low-hanging fruit there, though obviously would advise people not be calling __class_getitem__ all the time.

dmontagu · 2023-04-04T19:11:39Z

I'll note that we can get an ~10% speedup in the previous test (where you don't "cache" the parametrized generic type) if we change pydantic._internal._generics.get_cached_generic_type_early to not compute the early_cache_key every time, and have a code path that caches for the exact types passed. This seems pretty significant given how much else is happening besides __class_getitem__, though admittedly it would require you to be using generics in a pretty hot path for it to have a macroscopic impact.

Putting an @lru_cache on the __class_getitem__ (requires some minor tweaks to get it to work, namely, making a module-level function that gets the decorator and is immediately called by __class_getitem__) gets another 10% on top of that, bringing the runtime down to ~2.1s vs. ~1.9s when using the cached type (and not even calling __class_getitem__ at all). Ideally we'd be able to get closer to the @lru_cache runtime, I'll see if there's a way to do this with weak keys.

Edit: after more investigation, I don't think there's going to be an "easy" way to create a WeakKey version of @lru_cache. It does seem there is room for performance improvements here, but I think it's probably not worth worrying about given you can always cache the created type, and the cache lookups are still pretty fast.

I will note that we can get a ~2% speedup on __class_getitem__ by creating aliases for _GENERIC_TYPE_CACHES.get etc. (reducing the number of getattr calls I guess). But I'm not sure the 2% speed improvement is worth it anyway considering how unlikely this is to ever be a hot path, and the "better" solution of caching the type avoids it anyway.

dmontagu · 2023-04-04T19:45:13Z

I found a way to speed things up to only ~10-15% slower than the pre-refactor for building generic models, and all tests pass, but I'm a bit suspicious. @samuelcolvin I think you'll probably know whether the change is actually safe, or maybe we can create a test that fails with the change.

I think if it's a safe change though, I'd be happy to consider the performance regression "resolved". There may well still be room for further improvements but I didn't see anything super obvious digging into flamegraphs etc; whereas the regression was obvious (double the number of calls of model_rebuild).

caniko · 2023-04-24T13:28:45Z

I am getting a feeling that you are providing minimal support for generics. Why is that? Generics are such an important part of OOP, and my best OOP experience in Python is with pydantic. This is such a shame...

My feeling comes from this in the V2 alpha post:
While it does not raise an error at runtime yet, subclass checks for parametrized generics should no longer be used. These will result in TypeErrors and we can't promise they will work forever. However, it will be okay to do subclass checks against non-parametrized generic models

Why? Any chances for you to change your mind on this topic? To me it is a no-brainer to just write a test and make sure this never breaks. Could you at least provide the reason for making this decision? Clearly many use generics with pydantic; this seems like an odd feature to exclude.

Disclaimer
Some of my projects that use Pydantic use this functionality. This is detrimental to my biggest pydantic package that should be released next year where I do in fact check if certain models that are parametrized are subclasses of another. The use of generics in these models improve the developer and UI implementation experience by several orders of magnitude.

I hope you will thoroughly reconsider, and thank you nonetheless.

Any news on variadic generics?

dmontagu · 2023-04-24T16:33:57Z

Hi @caniko,

I am getting a feeling that you are providing minimal support for generics.

I am disappointed to hear you feel that way, I agree generics are important and have put in a lot of effort to try to improve the experience of using them in v2. My goal is that they feel more reliable/robust in v2, and are even more encouraged for use than they were in v1.

While it does not raise an error at runtime yet, subclass checks for parametrized generics should no longer be used. These will result in TypeErrors and we can't promise they will work forever. However, it will be okay to do subclass checks against non-parametrized generic models

Why? Any chances for you to change your mind on this topic? To me it is a no-brainer to just write a test and make sure this never breaks.

There are a handful of problems with this:

Right now, we return a concrete subclass from __class_getitem__. This isn't the behavior of standard typing generics, and causes problems with some interactions with standard typing generics. For example, this currently isn't working: List[GenericPydanticModel[T]][int]. More generally, we have copies of the typing module functions in many places with minor logic changes to handle pydantic types specially, for maintainability reasons I would really like to eliminate this (for it to work perfectly with typing it will require some changes to typing, but the typing module maintainers have indicated a willingness to make the changes we need).
Currently in v2, I am careful to keep the class hierarchy from growing when re-parametrizing models. As a result, reparametrizing doesn't preserve subclass relationships. So:
```
from typing import Generic, TypeVar

from pydantic import BaseModel

T = TypeVar("T")
S = TypeVar("S")

class MyGeneric(BaseModel, Generic[T, S]):
    x: T
    y: S


MyPartialGeneric = MyGeneric[int, S]
MyConcreteGeneric = MyPartialGeneric[str]
print(issubclass(MyConcreteGeneric, MyPartialGeneric))
#> False
```
Also, IDEs, etc. complain about trying to do this. Really, I don't want to be committed to a specific behavior here because I think there will be ways to improve the ergonomics and consistency with standard typing module by making changes that would break this behavior.
Checking subclass relationships for concrete generic models is fairly difficult to do, especially in a way that is consistent with type-checkers. In particular, consider the following:
```
class CovariantModel(BaseModel, Generic[T_co]):
    x: T_co

class InvariantModel(BaseModel, Generic[T]):
    x: list[T]

class ContravariantModel(BaseModel, Generic[T_contra]):
    x: Callable[[T_contra], None]

class MultiTypevarModel(BaseModel, Generic[T, S, R]):
    x: T
    y: list[S]
    z: Callable[[R], None]
```
What should be the result of the following calls?
- issubclass(CovariantModel[Any], CovariantModel[int])
- issubclass(CovariantModel[int], CovariantModel[Any])
- issubclass(CovariantModel[int], CovariantModel[float])
- issubclass(CovariantModel[datetime.datetime], CovariantModel[datetime.date])
- Each of the above for InvariantModel, and ContravariantModel
- issubclass(MultiTypevarModel[int, S, R], MultiTypevarModel[T, int, R])
- issubclass(MultiTypevarModel[int, Any, Any], MultiTypevarModel[Any, str, Any])
- (various other permutations of the MultiTypevarModel)
The typing module has largely eluded the need to make a decision on this by just disallowing checks for subclass relationships between different parametrized models; I think if we tried to make these subclass checks, there is no specific choice of behavior we could make that wouldn't have reasonable arguments against it.

To be clear, we intend to always support issubclass(MyGenericModel[int, str], MyGenericModel), and that will always work.

If you need to check subclass relationships between two concrete generics, I would suggest checking if their __origin__s are subclasses, and if so, you can implement whatever logic you like for comparing their __args__. I think right now this data lives on the __pydantic_generic_metadata__ attribute, but I think we may change this before the official v2 release to be more consistent with the standard typing module.

dmontagu · 2023-04-24T16:40:45Z

Also, re: variadic generics, it would be helpful if you share a unit test (ideally representative of real-world usage) that you'd like to pass, that will make it easier to work on. Might make sense to make a different issue for it and/or open a PR with just the failing unit test.

caniko · 2023-04-25T08:58:30Z

Thank you @dmontagu, this was very informative.

To be clear, we intend to always support issubclass(MyGenericModel[int, str], MyGenericModel), and that will always work.

I can relax now.

Might make sense to make a different issue for it and/or open a PR with just the failing unit test.

I can make a PR for it when I have downtime.

dmontagu · 2023-05-19T18:40:15Z

I'm going to close this issue and open new issues for handling:

variadic generics
generic dataclasses

samuelcolvin added the Performance label Apr 3, 2023

samuelcolvin assigned dmontagu Apr 3, 2023

This was referenced Apr 3, 2023

Move generic metadata into a typeddict #5361

Merged

V2: Generics #4665

Closed

dmontagu mentioned this issue Apr 4, 2023

Remove apparently-unnecessary model_rebuild #5371

Merged

dmontagu changed the title ~~V2 Generic Improovements~~ V2 Generic Improvements Apr 6, 2023

Kludex mentioned this issue Apr 25, 2023

Generic dataclass support #1256

Closed

caniko mentioned this issue Apr 28, 2023

Test: Add test for variadic generic #5625

Merged

5 tasks

dmontagu added the v2-reviewed label May 8, 2023

dmontagu closed this as completed May 19, 2023

This was referenced May 19, 2023

Support generic dataclasses #5803

Open

Support variadic generics #5804

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 Generic Improvements #5351

V2 Generic Improvements #5351

samuelcolvin commented Apr 3, 2023 •

edited by dmontagu

dmontagu commented Apr 4, 2023

dmontagu commented Apr 4, 2023 •

edited

dmontagu commented Apr 4, 2023 •

edited

dmontagu commented Apr 4, 2023

caniko commented Apr 24, 2023 •

edited

dmontagu commented Apr 24, 2023 •

edited

dmontagu commented Apr 24, 2023 •

edited

caniko commented Apr 25, 2023

dmontagu commented May 19, 2023

V2 Generic Improvements #5351

V2 Generic Improvements #5351

Comments

samuelcolvin commented Apr 3, 2023 • edited by dmontagu

dmontagu commented Apr 4, 2023

dmontagu commented Apr 4, 2023 • edited

dmontagu commented Apr 4, 2023 • edited

dmontagu commented Apr 4, 2023

caniko commented Apr 24, 2023 • edited

dmontagu commented Apr 24, 2023 • edited

dmontagu commented Apr 24, 2023 • edited

caniko commented Apr 25, 2023

dmontagu commented May 19, 2023

samuelcolvin commented Apr 3, 2023 •

edited by dmontagu

dmontagu commented Apr 4, 2023 •

edited

dmontagu commented Apr 4, 2023 •

edited

caniko commented Apr 24, 2023 •

edited

dmontagu commented Apr 24, 2023 •

edited

dmontagu commented Apr 24, 2023 •

edited