Skip to content

[dynamo] Support copy.deepcopy via polyfill#179611

Closed
anijain2305 wants to merge 12 commits into
gh/anijain2305/1144/basefrom
gh/anijain2305/1144/head
Closed

[dynamo] Support copy.deepcopy via polyfill#179611
anijain2305 wants to merge 12 commits into
gh/anijain2305/1144/basefrom
gh/anijain2305/1144/head

Conversation

@anijain2305
Copy link
Copy Markdown
Contributor

@anijain2305 anijain2305 commented Apr 7, 2026

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179611

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 6 Unrelated Failures

As of commit 592c15f with merge base d57eed8 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests module: dynamo labels Apr 7, 2026
anijain2305 added a commit that referenced this pull request Apr 7, 2026
ghstack-source-id: 3407d17
Pull Request resolved: #179611
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 7, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 7, 2026
ghstack-source-id: aaf43a7
Pull Request resolved: #179611
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
@Lucaskabela
Copy link
Copy Markdown
Contributor

@claude think through the implications and edge cases of the change in guarding behavior:

       if not hasattr(value, "__self__") or isinstance(
                value.__self__, types.ModuleType
            ):
                install_guard(source.make_guard(GuardBuilder.BUILTIN_MATCH))

Are there unintended functions this might result skipping guarding that we need to guard on for correctness?

@claude
Copy link
Copy Markdown

claude Bot commented Apr 7, 2026

Claude finished @Lucaskabela's task in 6m 18s —— View job


  • Read the changed code in variables/builder.py and surrounding context
  • Understand the BUILTIN_MATCH guard semantics
  • Analyze what functions have __self__ that is NOT a types.ModuleType
  • Evaluate edge cases and correctness implications
  • Post analysis

Analysis of the BUILTIN_MATCH guard change

The change in question (variables/functions.py:2143-2151):

elif inspect.isbuiltin(value):
    if not hasattr(value, "__self__") or isinstance(
        value.__self__, types.ModuleType
    ):
        install_guard(source.make_guard(GuardBuilder.BUILTIN_MATCH))

BUILTIN_MATCH is an id()-based guard (id_match_unchecked at guards.py:2720). The change skips this guard when hasattr(value, "__self__") and not isinstance(value.__self__, types.ModuleType) — i.e., for bound builtin methods on non-module instances.

Why the guard was problematic

The reasoning is sound: bound builtin methods (e.g., obj.__reduce_ex__, obj.__hash__, obj.__sizeof__) are created as fresh Python objects on every attribute access in CPython. Their id() is unstable across compile-time vs. runtime, so an id-based guard would immediately fail and trigger recompilation. This is the same fundamental issue that exists for GetSetDescriptor (line 1471-1478) and MethodWrapper (line 1479-1484) types, which also skip guards for the same reason.

What functions are affected

The guard is now skipped for any builtin where __self__ is not a module. This includes:

Category Examples Risk
Dunder methods on user objects obj.__reduce_ex__, obj.__hash__, obj.__sizeof__, obj.__repr__ Low — these come from object and won't change
Dunder methods on C extension instances tensor.__reduce_ex__, dict_instance.__reduce_ex__ Low — bound to the type, not mutable
Methods on C types where __self__ is the type tuple.__new__ (but this is caught by BuiltinMethodVariable first at line 1428) N/A — doesn't reach this path
Module-level builtins math.sqrt, os.getpid Still guarded__self__ is a ModuleType

What guards remain

When the BUILTIN_MATCH guard is skipped, the comment says "the type guard on the owner object is sufficient." Let me trace through what this means concretely:

For a bound method accessed via obj.attr, the source is typically an AttrSource(obj_source, "attr"). The base object (obj) already has its own guards installed (e.g., TYPE_MATCH or ID_MATCH depending on how the object was built). If the type of the owner object hasn't changed, the method resolution order guarantees the same builtin method will be found.

Edge cases and potential concerns

1. User class overriding a C builtin dunder — This is the most interesting case. Consider:

class Foo:
    pass

class Bar:
    def __reduce_ex__(self, protocol):
        return (Bar, ())

If we first trace with Foo() (gets object.__reduce_ex__, a C builtin), and then call with Bar() (has a Python __reduce_ex__), would we incorrectly skip the guard?

No — the TYPE_MATCH guard on the owner object would catch this. Foo and Bar are different types, so the guard on the base object would invalidate.

2. Monkey-patching __reduce_ex__ on a class after compilation:

class Foo:
    pass

# First call — object.__reduce_ex__ is a C builtin
compiled_fn(Foo())

# Now monkey-patch
Foo.__reduce_ex__ = custom_reduce_ex

# Second call — should recompile, but we skipped BUILTIN_MATCH
compiled_fn(Foo())

This is a real concern. If the source for the bound builtin is something like AttrSource(LocalSource("x"), "__reduce_ex__"), and the only guard is TYPE_MATCH on x, then monkey-patching Foo.__reduce_ex__ would NOT invalidate the type guard (the type is still Foo). The missing BUILTIN_MATCH/ID_MATCH guard means dynamo wouldn't detect that the method changed.

However, in practice this scenario is unlikely to hit because:

  • For UserDefinedObjectVariable.var_getattr, __reduce_ex__ is intercepted at line 2022 and returns a UserMethodVariable wrapping the polyfill — it never reaches SkipFunctionVariable.create_with_source at all.
  • The SkipFunctionVariable.create_with_source path is only reached when the bound builtin goes through VariableBuilder directly (e.g., a local variable holding a bound method).

3. Bound builtins on non-user-defined objects:

Consider some_random_c_extension_obj.__some_method__. If this C method is a builtin and the object is tracked, skipping the id-based guard relies entirely on whatever guard is on the base object. For most traced objects, TYPE_MATCH is present, which is sufficient.

4. Broader question — is this change too broad?

The change applies to all builtins reaching SkipFunctionVariable.create_with_source, not just __reduce_ex__. While the motivation is copy.deepcopy, any bound C builtin on a non-module instance that somehow reaches this code path would now lack the id-based guard.

In practice, this code path is narrow: it requires a bound C builtin to be loaded as a standalone value (not through call_method dispatch), fail the BuiltinMethodVariable.is_supported_builtin_method check, and go through is_function_or_wrappertrace_rules.lookupSkipFunctionVariable. The most common examples are dunder methods accessed via getattr(obj, "__reduce_ex__") or similar patterns.

Recommendation

The change is likely correct for the intended use case but could be tightened:

  1. Consider guarding on the method name instead: Rather than dropping the guard entirely, you could install a lighter guard that checks the attribute name resolves to the same underlying function (e.g., via CLOSURE_MATCH on the unbound function, or a HASATTR guard). This would catch monkey-patching.

  2. Alternatively, scope it more narrowly: Instead of blanket-skipping for all bound builtins, only skip for specific known-safe methods like __reduce_ex__, __hash__, __sizeof__, etc. That said, the existing approach mirrors what's already done for GetSetDescriptorType and MethodWrapperType (both skip guards entirely), so there's precedent.

  3. The monkey-patching gap is a pre-existing issue: The same problem exists for GetSetDescriptorType and MethodWrapperType (lines 1471-1484), which also skip all guards. This PR is following established patterns.

Overall, the correctness risk is low because: (a) the TYPE_MATCH guard on the owner object catches the most common invalidation scenarios (different types), and (b) the specific flow for copy.deepcopy intercepts __reduce_ex__ at the var_getattr level before this code path is even reached for user-defined objects. The main gap (monkey-patching a C builtin dunder on an existing class) is a narrow scenario that's unlikely in practice and consistent with existing guard-skipping patterns in the codebase.


Copy link
Copy Markdown
Contributor

@Lucaskabela Lucaskabela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through some of the edge cases I am convinced this approach should be correct, with monkeypatching being a known issue we don't need to bend over for.

@anijain2305
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 10, 2026
ghstack-source-id: dcd78b9
Pull Request resolved: #179611

Signed-off-by: Animesh Jain <anijain@umich.edu>
@anijain2305
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team Raised by workflow job

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cuda12.8-py3 / build

Details for Dev Infra team Raised by workflow job

@anijain2305
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-docs / build-docs-python-false, trunk / win-vs2022-cuda12.8-py3 / build, trunk / win-vs2022-cpu-py3 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / inductor-test / test (inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo azahed98

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 11, 2026
ghstack-source-id: 743819c
Pull Request resolved: #179611

Signed-off-by: Animesh Jain <anijain@umich.edu>
@anijain2305
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@anijain2305
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge -i

@github-actions github-actions Bot deleted the gh/anijain2305/1144/head branch May 12, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: dynamo Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants