Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove flattening of schemas before inlining and add schema walking caches #7473

Closed
wants to merge 9 commits into from

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented Sep 17, 2023

Needs pydantic/pydantic-core#965

Selected Reviewer: @dmontagu

@adriangb
Copy link
Member Author

adriangb commented Sep 17, 2023

This brings tests/benchmarks/test_fastapi_startup.py from 10.66s on my laptop to 5.10s, which is already half of where we started. If we can skip the discriminated union stuff for cases that don't need it (most models and sub schemas) the benchmark gets down to 3.69s. According to #6768 v2 is ~4x slower than v1, so if we can get this down to ~2.5s we'll be back to v1 performance.

@cloudflare-pages
Copy link

cloudflare-pages bot commented Sep 17, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 776b629
Status: ✅  Deploy successful!
Preview URL: https://71d85316.pydantic-docs2.pages.dev
Branch Preview URL: https://remove-flattening-of-schemas.pydantic-docs2.pages.dev

View logs

@adriangb
Copy link
Member Author

This required some changes to FastAPI's assertions about OpenAPI schemas, we should review them and confirm they're good before moving forward: https://github.com/tiangolo/fastapi/pull/10272/files

@adriangb

This comment was marked as outdated.

@adriangb
Copy link
Member Author

please review

@adriangb
Copy link
Member Author

This required some changes to FastAPI's assertions about OpenAPI schemas, we should review them and confirm they're good before moving forward: https://github.com/tiangolo/fastapi/pull/10272/files

Ended up not, not sure what CI is running (locally I still get failures without these changes) but this PR is the same as main.

@adriangb
Copy link
Member Author

adriangb commented Sep 20, 2023

This is now running in 3.31s vs. 10.49s for main on back-to-back runs on my laptop. So we're pretty close to v1 performance!

This is the current flamegraph:

image

The vast majority of the time is spent in Rust (@davidhewitt could this be made faster by using pythonize to parse the schema or something?), and then comes generating the schema (I still think there's a lot of improvement that can and should be done here), the ref stuff and discriminated union stuff is now a distant third.

@adriangb adriangb changed the title Remove flattening of schemas before inlining Remove flattening of schemas before inlining and add schema walking caches Sep 20, 2023
tests/test_dataclasses.py Show resolved Hide resolved
tests/test_internal.py Show resolved Hide resolved
tests/test_internal.py Show resolved Hide resolved
],
),
cs.list_schema(cs.list_schema(cs.int_schema())),
cs.list_schema(cs.list_schema(cs.int_schema(ref='int'), ref='list_of_ints')),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now preserving the ref even when we inline.

Comment on lines +41 to +45
IntModel = Model[int]

assert IntModel.model_validate({'a': '1'}).a == 1

sig = signature(IntModel)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really weird! If I don't call model_validate the signature is different. It seems like some weirdness with generics. @dmontagu any thoughts?

tests/test_root_model.py Show resolved Hide resolved
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is quite complex so I would like to move the removal of copying to another PR if possible. Some questions...

@@ -189,13 +204,13 @@ def _build_schema_type_to_method(self) -> dict[core_schema.CoreSchemaType, Recur
return mapping

def walk(self, schema: core_schema.CoreSchema, f: Walk) -> core_schema.CoreSchema:
return f(schema.copy(), self._walk)
return f(schema, self._walk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have behavioural changes (e.g. walking schema now mutates inplace rather than copying)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I think we don’t mutate in place anyway

pydantic/_internal/_core_utils.py Outdated Show resolved Hide resolved
pydantic/_internal/_core_utils.py Show resolved Hide resolved
Comment on lines +450 to +455
state = _DefinitionsState(
definitions={},
ref_counts=defaultdict(int),
involved_in_recursion={},
current_recursion_ref_count=defaultdict(int),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for moving into a state? I guess it makes closures smaller, but causes lots of dictionary lookups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it can be easily stored in a single metadata key

Comment on lines +531 to +532
if 'serialization' in s:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this check? Seems somewhat surprising to me that serialization override prevents inlining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this to modify serialization on a type without globally modifying a type. That is, you can wrap a type with a serializer

Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
@adriangb
Copy link
Member Author

I’ll be splitting this PR up into multiple smaller PRs:

  • Update pydantic-core
  • Remove ref flattening step
  • Caches
  • Discriminated union improvements
  • Remove unused copies

@adriangb
Copy link
Member Author

Supplanted by #7565, #7536, #7535, #7529, #7528, #7527, #7524, #7523 and #7522

@adriangb adriangb closed this Sep 22, 2023
@samuelcolvin samuelcolvin deleted the remove-flattening-of-schemas branch November 14, 2023 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants