fix: resolve PydanticUserError with defer_build=True in multiprocess forkserver workers (pydantic >= 2.12)#3132
Conversation
…server workers When defer_build=True (the default), pydantic wraps __pydantic_core_schema__ in a MockCoreSchema. Accessing schema["type"] triggers a lazy rebuild that uses a hardcoded stack depth (5 frames) to resolve forward-reference namespaces. In multiprocess forkserver workers the call stack is different, so the heuristic fails and pydantic >= 2.12 raises PydanticUserError. _get_extra_fields_type now detects an unbuilt model via __pydantic_complete__ and calls model_rebuild() with the class's own module namespace before touching the schema. This bypasses the fragile stack-depth heuristic and avoids the error entirely. A second guard wraps the schema access itself so that any residual failure degrades gracefully instead of crashing. Fixes openai#3079
There was a problem hiding this comment.
Pull request overview
Fixes an intermittent PydanticUserError triggered by Pydantic v2.12’s deferred schema rebuild (defer_build=True) in multiprocess forkserver workers by proactively rebuilding models using the model’s module namespace and by guarding schema access.
Changes:
- Add proactive
model_rebuild()when__pydantic_complete__is false, using the class’s module namespace fromsys.modules. - Wrap
__pydantic_core_schema__access in error handling to avoid propagating rebuild failures during construction.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # When defer_build=True, __pydantic_core_schema__ may be a MockCoreSchema whose | ||
| # __getitem__ triggers a lazy rebuild. That lazy rebuild uses a hardcoded stack | ||
| # depth (5 frames) to locate forward-reference namespaces, which is unreliable in | ||
| # multiprocess forkserver workers where the call stack is shallower. In pydantic | ||
| # >= 2.12 this manifests as a PydanticUserError at runtime. |
There was a problem hiding this comment.
This change targets a forkserver + deferred-build regression in pydantic >=2.12, but there doesn’t appear to be test coverage that exercises the deferred rebuild (MockCoreSchema) path in a forkserver worker. Adding a regression test (skipping platforms without forkserver) would help prevent reintroducing this crash.
| try: | ||
| cls.model_rebuild(_types_namespace=vars(module) if module is not None else None) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
cls.model_rebuild(_types_namespace=...) relies on a private Pydantic kwarg; if that kwarg is absent/renamed in some supported pydantic v2 versions, this will raise (e.g. TypeError). Returning None immediately in that case skips the later schema-based path that might still succeed. Consider detecting whether _types_namespace is supported (e.g. via inspect.signature) and/or falling back to calling model_rebuild() without it, and avoid returning early so behavior degrades minimally across versions.
| try: | |
| cls.model_rebuild(_types_namespace=vars(module) if module is not None else None) | |
| except Exception: | |
| return None | |
| model_rebuild = cls.model_rebuild | |
| try: | |
| supports_types_namespace = "_types_namespace" in inspect.signature(model_rebuild).parameters | |
| except (TypeError, ValueError): | |
| supports_types_namespace = False | |
| try: | |
| if supports_types_namespace: | |
| model_rebuild(_types_namespace=vars(module) if module is not None else None) | |
| else: | |
| model_rebuild() | |
| except TypeError: | |
| try: | |
| model_rebuild() | |
| except Exception: | |
| pass | |
| except Exception: | |
| pass |
| except Exception: | ||
| # Accessing the schema failed (e.g. the deferred rebuild produced an error); | ||
| # return None so construct() falls back to treating extra fields as untyped. | ||
| return None |
There was a problem hiding this comment.
The broad except Exception around schema access will swallow unexpected errors (e.g. schema shape changes causing KeyError, or accidental AttributeError) and silently treat extras as untyped. To avoid masking real bugs, consider narrowing this to the specific failure you’re targeting (notably pydantic.errors.PydanticUserError from deferred rebuild) and letting truly unexpected exceptions surface.
Summary
Fixes #3079
When
defer_build=True(the default viaDEFER_PYDANTIC_BUILD), pydantic wraps__pydantic_core_schema__in aMockCoreSchema. Accessingschema["type"]inside_get_extra_fields_typetriggers a lazy rebuild that uses a hardcoded stack depth(5 frames) to locate forward-reference namespaces. In multiprocess forkserver
workers the call stack is shallower/different, so the heuristic cannot find the
openai types namespace. Before pydantic 2.12 this silent failure was mostly harmless;
starting in 2.12
MockCoreSchema._get_built()raisesPydanticUserErrorwhen therebuild fails, causing ~0.6 % of requests to crash in large forkserver deployments.
Changes
src/openai/_models.py—_get_extra_fields_type:cls.__pydantic_complete__before touching the schema; if the model isstill deferred, calls
cls.model_rebuild(_types_namespace=vars(module))using theclass's own module from
sys.modules. This bypasses the fragile stack-depthheuristic and resolves forward references correctly regardless of call-stack depth.
try/exceptso that any residual failuredegrades gracefully (returning
None) instead of propagating an unhandled exception.Test plan
pytest tests/)multiprocessforkserver + pydantic 2.12: create a poolwith
forkserverstart method, import openai inside workers, deserialize aChatCompletionresponse — confirm noPydanticUserErrorraisedDEFER_PYDANTIC_BUILD=true(default) andDEFER_PYDANTIC_BUILD=false