Skip to content

Conversation

tugsbayasgalan
Copy link
Contributor

@tugsbayasgalan tugsbayasgalan commented Jan 24, 2025

Stack from ghstack (oldest at bottom):

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing.

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference.

To fix this, there are two ways:

  1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs.
  2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster.

For future reference:

Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

Differential Revision: D68610739

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145593

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 2 Unrelated Failures

As of commit afd07cb with merge base c184055 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

tugsbayasgalan added a commit that referenced this pull request Jan 24, 2025
ghstack-source-id: ffc8149
Pull Request resolved: #145593
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 24, 2025
@justinchuby justinchuby requested a review from xadupre January 24, 2025 07:02
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:02 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:02 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:02 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:11 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 07:11 Inactive
@avikchaudhuri
Copy link
Contributor

Are the new buffers supposed to be persistent or non-persistent?


Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. 

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. 

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. 


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)

[ghstack-poisoned]
tugsbayasgalan added a commit that referenced this pull request Jan 27, 2025
ghstack-source-id: 1fad672
Pull Request resolved: #145593
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:56 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:56 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:56 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:58 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 15:58 Inactive

Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. 

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. 

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. 


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)

[ghstack-poisoned]
tugsbayasgalan added a commit that referenced this pull request Jan 27, 2025
ghstack-source-id: 5916758
Pull Request resolved: #145593
Copy link
Contributor

@avikchaudhuri avikchaudhuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see if you can simplify some constant handling elsewhere in the export path? Wondering if some steps become dead because of this?

if (node.target not in state_dict) and (
node.target not in non_persistent_buffers
):
torch.fx.graph_module._del_attr(mod, node.target)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing an attribute instead of manipulating nodes. There are existing methods in that module for getting attributes, so precedent.

return _get_attr_via_attr_list(model, attr_name.split("."))


def _del_attr(model: torch.nn.Module, attr_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not use _get_attr_via_attr_list on prefix to get the final t? That might be better for code reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sounds good!

@avikchaudhuri
Copy link
Contributor

On the failing tests: when we make constants be buffers, should we rename them with the buffer naming convention?


Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. 

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. 

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. 


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)

[ghstack-poisoned]
tugsbayasgalan added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 97fe22e
Pull Request resolved: #145593
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

spec.kind == OutputKind.BUFFER_MUTATION
and spec.target in temp_registered_constants
):
raise RuntimeError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 17:25 Failure
Copy link
Contributor

@avikchaudhuri avikchaudhuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest round of changes lgtm


Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. 

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (#141336) and internal issues reported due to this difference. 

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. 


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)

[ghstack-poisoned]
tugsbayasgalan added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 8f976c9
Pull Request resolved: #145593
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 4, 2025 22:52 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 4, 2025 22:52 Inactive
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 22:52 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 22:52 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 4, 2025 22:52 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results February 5, 2025 00:15 Failure
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 4 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2), inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/tugsbayasgalan/287/head branch March 8, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants