[dynamo] Support copy.deepcopy via polyfill#179611
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179611
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 6 Unrelated FailuresAs of commit 592c15f with merge base d57eed8 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
|
@claude think through the implications and edge cases of the change in guarding behavior: if not hasattr(value, "__self__") or isinstance(
value.__self__, types.ModuleType
):
install_guard(source.make_guard(GuardBuilder.BUILTIN_MATCH))Are there unintended functions this might result skipping guarding that we need to guard on for correctness? |
|
Claude finished @Lucaskabela's task in 6m 18s —— View job
Analysis of the
|
| Category | Examples | Risk |
|---|---|---|
| Dunder methods on user objects | obj.__reduce_ex__, obj.__hash__, obj.__sizeof__, obj.__repr__ |
Low — these come from object and won't change |
| Dunder methods on C extension instances | tensor.__reduce_ex__, dict_instance.__reduce_ex__ |
Low — bound to the type, not mutable |
Methods on C types where __self__ is the type |
tuple.__new__ (but this is caught by BuiltinMethodVariable first at line 1428) |
N/A — doesn't reach this path |
| Module-level builtins | math.sqrt, os.getpid |
Still guarded — __self__ is a ModuleType |
What guards remain
When the BUILTIN_MATCH guard is skipped, the comment says "the type guard on the owner object is sufficient." Let me trace through what this means concretely:
For a bound method accessed via obj.attr, the source is typically an AttrSource(obj_source, "attr"). The base object (obj) already has its own guards installed (e.g., TYPE_MATCH or ID_MATCH depending on how the object was built). If the type of the owner object hasn't changed, the method resolution order guarantees the same builtin method will be found.
Edge cases and potential concerns
1. User class overriding a C builtin dunder — This is the most interesting case. Consider:
class Foo:
pass
class Bar:
def __reduce_ex__(self, protocol):
return (Bar, ())If we first trace with Foo() (gets object.__reduce_ex__, a C builtin), and then call with Bar() (has a Python __reduce_ex__), would we incorrectly skip the guard?
No — the TYPE_MATCH guard on the owner object would catch this. Foo and Bar are different types, so the guard on the base object would invalidate.
2. Monkey-patching __reduce_ex__ on a class after compilation:
class Foo:
pass
# First call — object.__reduce_ex__ is a C builtin
compiled_fn(Foo())
# Now monkey-patch
Foo.__reduce_ex__ = custom_reduce_ex
# Second call — should recompile, but we skipped BUILTIN_MATCH
compiled_fn(Foo())This is a real concern. If the source for the bound builtin is something like AttrSource(LocalSource("x"), "__reduce_ex__"), and the only guard is TYPE_MATCH on x, then monkey-patching Foo.__reduce_ex__ would NOT invalidate the type guard (the type is still Foo). The missing BUILTIN_MATCH/ID_MATCH guard means dynamo wouldn't detect that the method changed.
However, in practice this scenario is unlikely to hit because:
- For
UserDefinedObjectVariable.var_getattr,__reduce_ex__is intercepted at line 2022 and returns aUserMethodVariablewrapping the polyfill — it never reachesSkipFunctionVariable.create_with_sourceat all. - The
SkipFunctionVariable.create_with_sourcepath is only reached when the bound builtin goes throughVariableBuilderdirectly (e.g., a local variable holding a bound method).
3. Bound builtins on non-user-defined objects:
Consider some_random_c_extension_obj.__some_method__. If this C method is a builtin and the object is tracked, skipping the id-based guard relies entirely on whatever guard is on the base object. For most traced objects, TYPE_MATCH is present, which is sufficient.
4. Broader question — is this change too broad?
The change applies to all builtins reaching SkipFunctionVariable.create_with_source, not just __reduce_ex__. While the motivation is copy.deepcopy, any bound C builtin on a non-module instance that somehow reaches this code path would now lack the id-based guard.
In practice, this code path is narrow: it requires a bound C builtin to be loaded as a standalone value (not through call_method dispatch), fail the BuiltinMethodVariable.is_supported_builtin_method check, and go through is_function_or_wrapper → trace_rules.lookup → SkipFunctionVariable. The most common examples are dunder methods accessed via getattr(obj, "__reduce_ex__") or similar patterns.
Recommendation
The change is likely correct for the intended use case but could be tightened:
-
Consider guarding on the method name instead: Rather than dropping the guard entirely, you could install a lighter guard that checks the attribute name resolves to the same underlying function (e.g., via
CLOSURE_MATCHon the unbound function, or aHASATTRguard). This would catch monkey-patching. -
Alternatively, scope it more narrowly: Instead of blanket-skipping for all bound builtins, only skip for specific known-safe methods like
__reduce_ex__,__hash__,__sizeof__, etc. That said, the existing approach mirrors what's already done forGetSetDescriptorTypeandMethodWrapperType(both skip guards entirely), so there's precedent. -
The monkey-patching gap is a pre-existing issue: The same problem exists for
GetSetDescriptorTypeandMethodWrapperType(lines 1471-1484), which also skip all guards. This PR is following established patterns.
Overall, the correctness risk is low because: (a) the TYPE_MATCH guard on the owner object catches the most common invalidation scenarios (different types), and (b) the specific flow for copy.deepcopy intercepts __reduce_ex__ at the var_getattr level before this code path is even reached for user-defined objects. The main gap (monkey-patching a C builtin dunder on an existing class) is a narrow scenario that's unlikely in practice and consistent with existing guard-skipping patterns in the codebase.
Lucaskabela
left a comment
There was a problem hiding this comment.
Reading through some of the edge cases I am convinced this approach should be correct, with monkeypatching being a known issue we don't need to bend over for.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
|
@pytorchbot merge |
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cuda12.8-py3 / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 3 checks: pull / linux-docs / build-docs-python-false, trunk / win-vs2022-cuda12.8-py3 / build, trunk / win-vs2022-cpu-py3 / build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / inductor-test / test (inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98 [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 7 checks: pull / linux-docs / build-docs-python-false, inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 2, 6, linux.rocm.gpu.gfx950.1, unstable), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 6, 6, linux.rocm.gpu.gfx950.1, unstable), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 4, 6, linux.rocm.gpu.gfx950.1, unstable), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 5, 6, linux.rocm.gpu.gfx950.1, unstable), trunk / linux-jammy-rocm-py3.10-mi355 / test (distributed, 3, 3, linux.rocm.gpu.gfx950.2, unstable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo @azahed98