Reset joint graph fake mode earlier, and more comprehensively

ezyang · ezyang · commit 750b84bda72c · 2023-04-17T22:24:39.000-04:00
This bug was discovered by a stronger assert (which I will be posting in a follow up PR.) The explanation for this change is a bit long and windy, and I am not sure I entirely understand the situation myself. But here's what I think is going on. jansel's joint graph pattern matcher does something fairly unusual: in order to initialize the pattern in question, it (lazily) runs an aot_function invocation in order to trace out what the joint graph of a given pattern looks like (we ought not use aot_function, but we can't really do this until bdhirsh lands AOT Autograd export properly). However, this lazy initialization occurs within the context of a separate compilation, which has its own tracing context, and importantly, fake tensor mode. What we would like, is the pattern matcher lazy initialization fake tensor mode to be unrelated to whatever the ambient fake tensor mode of the graph we were actually compiling. We want these to be independent, because we don't really care what the current compiled graph is; this is a lazy init function, it could have gotten initialized during any compilation, it just happens to be initialized on this one. To prevent us from picking up the ambient fake mode, we have to do two things: we have to remove the tracing context (which stores a fake mode), and we have to also disable the ambiently active fake mode. In #99377 eellison proposed an alternative approach, where we reuse the fake mode. While this probably won't cause any errors, it's morally not the right thing to do, because you'll end up polluting the enclosing fake tensor mode with tensors that have nothing to do with the mode itself. This might fix #99286 but it's also possible that #99320 fixed it already. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: f572909 Pull Request resolved: #99391
diff --git a/torch/_inductor/fx_passes/joint_graph.py b/torch/_inductor/fx_passes/joint_graph.py
@@ -2,6 +2,8 @@
 import logging
 
 import torch
+import torch._guards
+from torch.fx.experimental.proxy_tensor import maybe_disable_fake_tensor_mode
 from ..._subclasses import FakeTensorMode
 from .. import config
 from ..pattern_matcher import PatternMatcherPass
@@ -14,7 +16,9 @@
 def lazy_init():
     from .fuse_attention import _sfdp_init
 
-    with FakeTensorMode():
+    with torch._guards.tracing(
+        None
+    ), maybe_disable_fake_tensor_mode(), FakeTensorMode():
         _sfdp_init()
 
 
diff --git a/torch/_inductor/pattern_matcher.py b/torch/_inductor/pattern_matcher.py
@@ -664,14 +664,13 @@ def record_joint_graph(joint_graph, inputs, **kwargs):
         gm = clone_graph(joint_graph)
         return default_partition(joint_graph, inputs, **kwargs)
 
-    with torch._guards.tracing(None):
-        aot_function(
-            fn,
-            lambda g, i: make_boxed_func(g),
-            partition_fn=record_joint_graph,
-            decompositions=select_decomp_table(),
-            enable_log=False,
-        )(*args)
+    aot_function(
+        fn,
+        lambda g, i: make_boxed_func(g),
+        partition_fn=record_joint_graph,
+        decompositions=select_decomp_table(),
+        enable_log=False,
+    )(*args)
 
     # remove in/out specs
     gm.graph._codegen = torch.fx.graph.CodeGen()