Update on "[reland][inductor] make thread order consistent with loop …

…order" This PR relands #106827 which get reverted because of causing compilation error for some ads model. Yanbo provide a repro in one of the 14k model ( `pytest ./generated/test_KaiyangZhou_deep_person_reid.py -k test_044`). This is also the model I used to confirm the fix and come up with a unit test. In this model, we call `tritoin_heuristics.triton_config` with size_hints [2048, 2]. Previously this would result in a trition config with XBLOCK=2048 and YBLOCK=2 . But since we change the mapping between size_hints and XYZ dimension, we now generate a triton config with XBLOCK=2 and YBLOCK=2048. This fails compilation since we set max YBLOCK to be 1024. My fix is to make sure we never generate a triton config that exceeds the maximum block size. [ghstack-poisoned]
pytorch · Aug 25, 2023 · cca8afa · cca8afa
1 parent f6d509f
commit cca8afa
Showing 1 changed file with 5 additions and 1 deletion.
diff --git a/test/inductor/test_triton_heuristics.py b/test/inductor/test_triton_heuristics.py
@@ -3,6 +3,9 @@
 import sys
 import unittest
 
+from torch.testing._internal.common_utils import IS_LINUX
+from torch.testing._internal.inductor_utils import HAS_CUDA
+
 try:
     import triton  # noqa: F401
 except ImportError:
@@ -29,4 +32,5 @@ def test_triton_config(self):
 
 
 if __name__ == "__main__":
-    run_tests()
+    if IS_LINUX and HAS_CUDA:
+        run_tests()