Fix constants with non-functional operators #145593

tugsbayasgalan · 2025-01-24T06:29:12Z

Stack from ghstack (oldest at bottom):

-> Fix constants with non-functional operators #145593

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing.

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference.

To fix this, there are two ways:

Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs.
Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster.

For future reference:

Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

Differential Revision: D68610739

[ghstack-poisoned]

pytorch-bot · 2025-01-24T06:29:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145593

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

❌ 2 New Failures, 2 Unrelated Failures

As of commit afd07cb with merge base c184055 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
test_fp8_matmul1 (torch.float8_e4m3fn)
trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/84ba9c6e7844a0b457bc64ca70a9c8cf3655d03d' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh) (#146433)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/7c8ec84dab7dc10d4ef90afc93a49b97bbd04503' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh) (#146433)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/7c8ec84dab7dc10d4ef90afc93a49b97bbd04503' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: ffc8149 Pull Request resolved: #145593

tugsbayasgalan · 2025-01-24T06:32:37Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

avikchaudhuri · 2025-01-24T16:54:46Z

Are the new buffers supposed to be persistent or non-persistent?

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) [ghstack-poisoned]

ghstack-source-id: 1fad672 Pull Request resolved: #145593

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) [ghstack-poisoned]

ghstack-source-id: 5916758 Pull Request resolved: #145593

avikchaudhuri

Did you see if you can simplify some constant handling elsewhere in the export path? Wondering if some steps become dead because of this?

avikchaudhuri · 2025-01-31T23:39:04Z

torch/_export/utils.py

+                if (node.target not in state_dict) and (
+                    node.target not in non_persistent_buffers
+                ):
+                    torch.fx.graph_module._del_attr(mod, node.target)


This is removing an attribute instead of manipulating nodes. There are existing methods in that module for getting attributes, so precedent.

avikchaudhuri · 2025-01-31T23:39:44Z

torch/fx/graph_module.py

    return _get_attr_via_attr_list(model, attr_name.split("."))


+def _del_attr(model: torch.nn.Module, attr_name: str):


Could you not use _get_attr_via_attr_list on prefix to get the final t? That might be better for code reuse.

Yep sounds good!

avikchaudhuri · 2025-01-31T23:43:29Z

On the failing tests: when we make constants be buffers, should we rename them with the buffer naming convention?

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) [ghstack-poisoned]

ghstack-source-id: 97fe22e Pull Request resolved: #145593

tugsbayasgalan · 2025-02-04T16:56:11Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2025-02-04T17:01:49Z

torch/_export/utils.py

+            spec.kind == OutputKind.BUFFER_MUTATION
+            and spec.target in temp_registered_constants
+        ):
+            raise RuntimeError(


cc: @avikchaudhuri

avikchaudhuri

latest round of changes lgtm

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) [ghstack-poisoned]

ghstack-source-id: 8f976c9 Pull Request resolved: #145593

tugsbayasgalan · 2025-02-04T22:23:21Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-02-05T17:36:47Z

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

pytorchmergebot · 2025-02-05T17:38:28Z

Merge started

Your change will be merged while ignoring the following 4 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2), inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix constants with non-functional operators

4f69f6e

[ghstack-poisoned]

tugsbayasgalan requested review from angelayi, avikchaudhuri, ydwu4 and zhxchen17 as code owners January 24, 2025 06:29

pytorch-bot bot added ciflow/inductor release notes: export labels Jan 24, 2025

facebook-github-bot added the fx label Jan 24, 2025

tugsbayasgalan added a commit that referenced this pull request Jan 24, 2025

Fix constants with non-functional operators

4dc808a

ghstack-source-id: ffc8149 Pull Request resolved: #145593

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 24, 2025

tugsbayasgalan requested review from bdhirsh and justinchuby January 24, 2025 06:57

justinchuby requested a review from xadupre January 24, 2025 07:02

pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:02 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:11 Inactive

tugsbayasgalan added a commit that referenced this pull request Jan 27, 2025

Fix constants with non-functional operators

fc1f57d

ghstack-source-id: 1fad672 Pull Request resolved: #145593

pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:56 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:58 Inactive

tugsbayasgalan added a commit that referenced this pull request Jan 27, 2025

Fix constants with non-functional operators

4b74207

ghstack-source-id: 5916758 Pull Request resolved: #145593

avikchaudhuri approved these changes Jan 31, 2025

View reviewed changes

tugsbayasgalan added a commit that referenced this pull request Feb 4, 2025

Fix constants with non-functional operators

87eae1d

ghstack-source-id: 97fe22e Pull Request resolved: #145593

tugsbayasgalan commented Feb 4, 2025

View reviewed changes

pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure

avikchaudhuri approved these changes Feb 4, 2025

View reviewed changes

tugsbayasgalan added a commit that referenced this pull request Feb 4, 2025

Fix constants with non-functional operators

d87ca49

ghstack-source-id: 8f976c9 Pull Request resolved: #145593

pytorch-bot bot temporarily deployed to upload-benchmark-results February 4, 2025 22:52 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 22:52 Failure

pytorch-bot bot had a problem deploying to upload-benchmark-results February 5, 2025 00:15 Failure

pytorchmergebot added the merging label Feb 5, 2025

pytorchmergebot added the Merged label Feb 5, 2025

pytorchmergebot closed this in d2a2b9f Feb 5, 2025

pytorchmergebot removed the merging label Feb 5, 2025

github-actions bot deleted the gh/tugsbayasgalan/287/head branch March 8, 2025 01:52

		return _get_attr_via_attr_list(model, attr_name.split("."))


		def _del_attr(model: torch.nn.Module, attr_name: str):

Fix constants with non-functional operators #145593

Fix constants with non-functional operators #145593

Uh oh!

Conversation

tugsbayasgalan commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145593

❗ 1 Active SEVs

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

tugsbayasgalan commented Jan 24, 2025

Uh oh!

avikchaudhuri commented Jan 24, 2025

Uh oh!

avikchaudhuri left a comment

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri commented Jan 31, 2025

Uh oh!

tugsbayasgalan commented Feb 4, 2025

Uh oh!

tugsbayasgalan Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri left a comment

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan commented Feb 4, 2025

Uh oh!

facebook-github-bot commented Feb 5, 2025

Uh oh!

pytorchmergebot commented Feb 5, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tugsbayasgalan commented Jan 24, 2025 •

edited

Loading

pytorch-bot bot commented Jan 24, 2025 •

edited

Loading