Update on "[aoti] clear precomputed symbol replacements before cpp wr…

…apper compilation" After we codegen a triton kernel in the triton codegen backend, we cache the generated triton source code in the wrapper to avoid producing multiple triton kernels with the same content. In AOTI compilation flow, this caching mechanism imposes a strong requirement on the codegen that we must generate the same triton source code for the same schedule node in both python and cpp codegen phases. Otherwise, we would end up with a mismatch between the kernel name formed in the cpp codegen and the cuda kernel key produced from the python codegen. Consequently, we would hit an missing-cuda-kernel error. The precomputed symbol replacements saved in V.graph.sizevars can cause such source-code inconsistency related to indexing code. For example, let's say in the python codegen phase, we produce "ks2*48" as part of indexing an input for schedule node A while yielding a replacement pair "ks0 -> ks2*48" in the precomputed replacements. In the second cpp codegen phase, we would produce "ks0" for the same indexing code of schedule node A due to the "ks0 -> ks2*48" replacement pair. This PR fixed the issue by clearing precomputed_replacements and inv_precomputed_replacements before cpp wrapper codegen. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
pytorch · Mar 28, 2024 · d3fcb17 · d3fcb17
1 parent 78083a9
commit d3fcb17
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/test/inductor/test_aot_inductor.py b/test/inductor/test_aot_inductor.py
@@ -2464,6 +2464,7 @@ def fail_non_abi_compatible_cuda(is_skip=False):
             "test_repeat_interleave": fail_minimal_arrayref_interface(is_skip=True),
             "test_return_constant": fail_minimal_arrayref_interface(is_skip=True),
             "test_reuse_kernel": fail_minimal_arrayref_interface(is_skip=True),
+            "test_reuse_kernel_dynamic": fail_minimal_arrayref_interface(is_skip=True),
             "test_simple": fail_minimal_arrayref_interface(is_skip=True),
             "test_small_constant": fail_minimal_arrayref_interface(is_skip=True),
             "test_with_no_triton_profiler": fail_minimal_arrayref_interface(