Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10x speedup over V1 - use TypedDict instead of BaseModel #1

Merged
merged 1 commit into from Jul 3, 2023

Conversation

samuelcolvin
Copy link
Contributor

Hi, I saw your linkedin post and thought I'd have a look at why the speedup is only 5x.

It looks like the reasons are:

  • most of the time is spent in "serialization" namely converting the model to a dict
  • the model is quite simple in this case - just str and float types etc., no nested models etc

Since in this case you're not actually using the pydantic model, but converting it straight to a dict, I changed the code here to use a TypeDict, and do the excluding of None values during validation. This means there's no need for a formal "serialization" step - the output of validation to dicts is all you need.

Hope this helps, let me know if you have any questions.

@prrao87 prrao87 merged commit 6ef33c7 into prrao87:main Jul 3, 2023
@prrao87
Copy link
Owner

prrao87 commented Jul 3, 2023

Thanks a lot for the PR, this will help me and a lot of others learn some cool features of Pydantic 😇

@samuelcolvin
Copy link
Contributor Author

FYI we also have profile guide optimisation (PGO) of pydantic-core compilation (see pydantic/pydantic-core#686) coming soon which should give another 10-30% speed of pydantic-core.

@prrao87
Copy link
Owner

prrao87 commented Jul 7, 2023

Hi @samuelcolvin, I'm filing a bug report with VS Code's Pylance language server (which depends on the Pyright type checker). The linter seems to think that specifying a custom method such as @field_validator under the TypedDict object is incorrect syntax, and Pyright says that it's invalid Python (Error: TypedDict classes can contain only type annotations) -- however, the code runs and there's clearly no syntax error. Just to clarify, the error is with Pylance/Pyright, and this issue is only present if you use VS code as your IDE (which I do), hence the bug report on their repo and not Pydantic's.

I've linked the issue above -- it looks like they're saying they're not sure if it's "correct" to inherit from TypedDict directly, and that inheriting from BaseModel is the right way to do it. What are your thoughts? I realize it's still early days and that the Pydantic v2 docs are still catching up, but is there anywhere you've documented how the inheritance from TypedDict works exactly? Any tips you could offer in understanding how Pydantic is able to handle the specification of custom methods under TypedDict objects even though the object was originally intended to just house struct-like fields?

My concern is that there has been an issue filed on the mypy repo, related to allowing custom methods within TypedDict that was closed as out of scope for this object type. I'm unsure if Pyright followed the same approach. So, if this is the case, how has Pydantic reconciled this and how does it pass type checkers if specifying @field_validator alongside TypedDicts?

Update

I just checked PEP 589 and it's clearly mentioned that methods are not allowed within TypedDict:

Methods are not allowed, since the runtime type of a TypedDict object will always be just dict (it is never a subclass of dict).

TypedDict isn’t extensible, and it addresses only a specific use case. TypedDict objects are regular dictionaries at runtime, and TypedDict cannot be used with other dictionary-like or mapping-like classes, including subclasses of dict. There is no way to add methods to TypedDict types. The motivation here is simplicity.

So I suppose the bigger questions are:

  1. Why does the code here in this repo work? Is this because TypeDict is purely for type hinting and that the underlying dict is all that is seen during run time?
  2. If the approach used in this PR, i.e., defining a TypedDict subclass, is non-standard and doesn't respect type hinting guidelines, should this approach even be used in a real Pydantic code base?
    • If not, how would one go about minimizing serialization burden while also using field validators? I don't see any other way to do this other than subclassing BaseModel, which is what's currently shown in the Pydantic docs

@prrao87
Copy link
Owner

prrao87 commented Jul 7, 2023

I've started a discussion in the Pydantic repo regarding this, hope we can continue the discussion there!
pydantic/pydantic#6517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants