Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch calling convention back to real tensors #99320

Closed
wants to merge 3 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Apr 17, 2023

Stack from ghstack (oldest at bottom):

Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

  • This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
  • The TensorRT backend required real inputs and we never figured out how to fix it.
  • In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
  • Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

  • Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 17, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99320

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 Failures

As of commit 6fc2d5a:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Apr 17, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 60d5c7d462c0c0e45b0fe1084280573ed16d237e
Pull Request resolved: #99320
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Apr 17, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 630eba64bace3331bdc6788f356604e74c7ba8eb
Pull Request resolved: #99320
@albanD albanD removed their request for review April 17, 2023 15:46
@ezyang ezyang requested review from Chillee and albanD April 17, 2023 15:56
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Apr 17, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 79091172e92460935a9b66a08a3960615a4eb6ce
Pull Request resolved: #99320
@albanD albanD removed their request for review April 17, 2023 15:58
@ezyang ezyang added ciflow/trunk Trigger trunk jobs on your pull request topic: bc breaking topic category topic: bug fixes topic category release notes: dynamo labels Apr 17, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

ezyang added a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: f572909369d2c03eeb031311724eb9c9cd91acbc
Pull Request resolved: #99391
ezyang added a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #99391
Approved by: https://github.com/bdhirsh
@DanilBaibak
Copy link
Contributor

@pytorchbot revert -m "Break internal build" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@ezyang your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Apr 19, 2023
This reverts commit 780922c.

Reverted #99320 on behalf of https://github.com/DanilBaibak due to Break internal build
@DanilBaibak
Copy link
Contributor

@ezyang sorry, I have to revert your PR. It leads to internal failures. Here you can find more details - D45103307.

@ezyang ezyang reopened this Apr 19, 2023
@ezyang
Copy link
Contributor Author

ezyang commented Apr 19, 2023

@pytorchbot merge -f "force merging as need internal only changes"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ezyang
Copy link
Contributor Author

ezyang commented Apr 19, 2023

attn oncall, fbcode side changes are at https://www.internalfb.com/diff/D45103307

@DanilBaibak
Copy link
Contributor

@pytorchbot revert -m "Break internal build" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 99320 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit c67c16bcd2007cce35440a0c95b163ede2b73b61 returned non-zero exit code 1

Auto-merging torch/_dynamo/config.py
Auto-merging torch/_dynamo/debug_utils.py
CONFLICT (content): Merge conflict in torch/_dynamo/debug_utils.py
Auto-merging torch/_dynamo/eval_frame.py
Auto-merging torch/_inductor/pattern_matcher.py
error: could not revert c67c16bcd20... Switch calling convention back to real tensors (#99320)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants