Skip to content

Conversation

@eellison
Copy link
Contributor

@eellison eellison commented Apr 17, 2023

Stack from ghstack (oldest at bottom):

Fix for #99286. There were a couple locations we were instantiating new fake modes instead of grabbing the correct one from the current tracing context/inputs.

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 17, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99377

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Failures

As of commit 2e98e29:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base ab08284:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
@eellison eellison changed the title Fix hf bart inference failure Grab Current Tracing Fake Mode in a couple spots Apr 17, 2023
@eellison eellison requested review from ezyang and voznesenskym April 17, 2023 22:49
Fix for #99286. There were a couple locations we were instantiating new fake modes instead of grabbing the correct one from the current tracing context/inputs.

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
eellison added a commit that referenced this pull request Apr 17, 2023
ghstack-source-id: 078a15d
Pull Request resolved: #99377
Fix for #99286. There were a couple locations we were instantiating new fake modes instead of grabbing the correct one from the current tracing context/inputs.

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
@eellison
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Fix for #99286. There were a couple locations we were instantiating new fake modes instead of grabbing the correct one from the current tracing context/inputs.

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/eellison/432/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/99377)

pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2023
ghstack-source-id: 1470e6b
Pull Request resolved: #99377
from .fuse_attention import _sfdp_init

with FakeTensorMode():
with detect_fake_mode() or FakeTensorMode():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong as the patterns here are shared globally so you shouldn't arbitrarily pick the fake mode from a particular run; I actually have the more correct fix on a branch I am working on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I'm going to close and re-assign to you then.

@eellison eellison closed this Apr 18, 2023
ezyang added a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: f572909
Pull Request resolved: #99391
ezyang added a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2023
This bug was discovered by a stronger assert (which I will be posting
in a follow up PR.)

The explanation for this change is a bit long and windy, and I am not
sure I entirely understand the situation myself.  But here's what I
think is going on.

jansel's joint graph pattern matcher does something fairly unusual:
in order to initialize the pattern in question, it (lazily) runs
an aot_function invocation in order to trace out what the joint graph
of a given pattern looks like (we ought not use aot_function, but we
can't really do this until bdhirsh lands AOT Autograd export properly).
However, this lazy initialization occurs within the context of a
separate compilation, which has its own tracing context, and
importantly, fake tensor mode.

What we would like, is the pattern matcher lazy initialization fake
tensor mode to be unrelated to whatever the ambient fake tensor mode of
the graph we were actually compiling.  We want these to be independent,
because we don't really care what the current compiled graph is; this is
a lazy init function, it could have gotten initialized during any
compilation, it just happens to be initialized on this one.

To prevent us from picking up the ambient fake mode, we have to do two
things: we have to remove the tracing context (which stores a fake
mode), and we have to also disable the ambiently active fake mode.

In #99377 eellison proposed an
alternative approach, where we reuse the fake mode.  While this probably
won't cause any errors, it's morally not the right thing to do, because
you'll end up polluting the enclosing fake tensor mode with tensors that
have nothing to do with the mode itself.

This might fix #99286
but it's also possible that #99320
fixed it already.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #99391
Approved by: https://github.com/bdhirsh
@facebook-github-bot facebook-github-bot deleted the gh/eellison/432/head branch June 8, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants