Introduce new template heuristic for triton autotune configs #144985

jataylo · 2025-01-16T18:09:26Z

Initial PR to refactor bulkiness of mm_common to allow for better device-specific specialisation e.g. in #143286 we require large conditionalisation to get ROCm specific optimisations in.

This PR introduces a new file torch/_inductor/template_heuristics.py which implements device specific subclasses for autotune configs:

CPUConfigHeuristic()
CUDAConfigHeuristic()
ROCmConfigHeuristic()
XPUConfigHeuristic()

These subclasses are integrated as part of the InductorChoices class, which will be the interface for the kernel files to access the configs.

The mm_common, mm_plus_mm and conv configurations are implemented in this class, in the future we plan to bring in flex attention configurations also so all of the tuning config logic for templated triton kernels are handled in this file.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2025-01-16T18:09:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144985

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 8 Pending, 7 Unrelated Failures

As of commit eb7843a with merge base 683178f ():

NEW FAILURES - The following jobs have failed:

inductor-periodic / rocm6_3-py3_10-periodic-dynamo-benchmarks / test (aot_eager_huggingface, 1, 1, linux.rocm.gpu.mi300.2.c1) (gh)
Process completed with exit code 1.
inductor-periodic / rocm6_3-py3_10-periodic-dynamo-benchmarks / test (aot_eager_timm, 2, 2, linux.rocm.gpu.mi300.2.c1) (gh)
Process completed with exit code 1.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / rocm6_3-py3_10-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 2, 2, linux.rocm.gpu.mi300.2.c1) (gh) (similar failure)
timm_vovnet

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

rocm / linux-focal-rocm6.3-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) (gh) (#146432)
test_public_bindings.py::TestPublicBindings::test_correct_module_names
trunk / win-vs2022-cpu-py3 / test (default, 1, 3, lf.windows.4xlarge.nonephemeral) (gh) (#147041)
test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_same_output_as_python
trunk / win-vs2022-cuda12.1-py3 / build (gh) (#147042)
C:\actions-runner\_work\pytorch\pytorch\third_party\kineto\libkineto\src\init.cpp(148): error C3861: 'isDaemonEnvVarSet': identifier not found

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jansel

This seems like an improvement, but we shouldn't assume a single device and we shouldn't read the config class in global scope so the user can override it without restarting the process.

jansel · 2025-01-16T18:47:03Z

torch/_inductor/choices.py

+    def __init__(self):
+        self.config_heuristics = self._get_device_config_heuristic()
+
+    def _get_device_config_heuristic(self, device_type="cuda"):
+        from torch._inductor.utils import get_gpu_type
+
+        device_type = get_gpu_type()
+
+        if device_type == "cuda":
+            if torch.version.hip is None:
+                return CUDAConfigHeuristic()
+            else:
+                return ROCmConfigHeuristic()
+        elif device_type == "xpu":
+            return XPUConfigHeuristic()
+        elif torch.cuda.is_available():
+            return BaseConfigHeuristic()
+        else:
+            return CPUConfigHeuristic()


This won't work for programs that use multiple devices (most commonly GPU+CPU). We should be selecting the hursitics based on the device of the op, not the hardware properties of the machine. Maybe:

def get_device_config_heuristic(self, device): ... similar to the code you have ...

jansel · 2025-01-16T18:47:39Z

torch/_inductor/kernel/conv.py

+if torch._inductor.config.max_autotune_custom_heuristic is None:
+    conv_heuristics = V.choices.config_heuristics
+else:
+    conv_heuristics = torch._inductor.config.max_autotune_custom_heuristic


Move this into V.choices.

jansel · 2025-01-16T18:48:48Z

torch/_inductor/kernel/mm_common.py

+if inductor_config.max_autotune_custom_heuristic is None:
+    mm_heuristics = V.choices.config_heuristics
+else:
+    mm_heuristics = inductor_config.max_autotune_custom_heuristic


duplicate code should move to V.choices.

jansel · 2025-01-16T18:49:59Z

torch/_inductor/kernel/mm_common.py

+if inductor_config.max_autotune_gemm_search_space != "EXHAUSTIVE":
+    mm_kernel_configs = mm_heuristics.get_mm_configs()
+else:
+    mm_kernel_configs = mm_heuristics.get_exhaustive_mm_configs()
+
+extra_mm_kernel_configs = mm_heuristics.get_extra_mm_configs()
+int8_mm_kernel_configs = mm_heuristics.get_int8_mm_configs()
+mixed_mm_kernel_configs_small_m = mm_heuristics.get_mixed_mm_configs()
+persistent_mm_kernel_configs = mm_heuristics.get_persistent_mm_configs()
+scaled_mm_kernel_configs = mm_heuristics.get_scaled_mm_configs()
+scaled_persistent_mm_kernel_configs = mm_heuristics.get_scaled_persistent_mm_configs()


This code should be moved inside functions so it can get overridden by the user (through the V.choices API). Evaluating in global scope prevents this.

jansel · 2025-01-16T18:51:03Z

torch/_inductor/kernel/conv.py

-# On ROCm convert num_stages to 1 as pipelining provides no benefit
-if torch.version.hip and torch.cuda.is_available():
-    platform_configs = build_rocm_gemm_configs(platform_configs)
+kernel_configs = conv_heuristics.get_conv_configs()


Move this inside the function so you can override the choices.

jataylo · 2025-01-16T19:42:36Z

Thanks @jansel, I will address these comments and try to get CI green.

pytorchmergebot · 2025-02-13T17:00:00Z

Successfully rebased template-heuristics-jack onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout template-heuristics-jack && git pull --rebase)

facebook-github-bot · 2025-02-13T17:03:04Z

@jansel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-02-14T08:49:28Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-02-14T08:51:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-14T08:51:33Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

…configs (pytorch#144985)"" This reverts commit 4434376.

jansel · 2025-02-14T16:59:20Z

@pytorchbot merge -f "landed in fbcode"

I fixed the inernal issue.

pytorchmergebot · 2025-02-14T17:00:45Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jataylo · 2025-02-14T17:15:05Z

Thanks for all the help here @jansel

facebook-github-bot · 2025-02-19T04:47:50Z

This pull request has been reverted by 465930e. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

jataylo added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Jan 16, 2025

jataylo requested a review from jansel January 16, 2025 18:09

pytorch-bot bot added the module: inductor label Jan 16, 2025

jataylo changed the title ~~Introduce new template heuristic for triton configs~~ Introduce new template heuristic for triton autotune configs Jan 16, 2025

pytorchbot added the open source label Jan 16, 2025

jataylo mentioned this pull request Jan 16, 2025

[ROCm] ROCm-specific gemm tuning parameters #143286

Closed

jansel requested changes Jan 16, 2025

View reviewed changes

jataylo force-pushed the template-heuristics-jack branch 2 times, most recently from 8479e33 to 06e9755 Compare January 21, 2025 14:16

pytorch-bot bot temporarily deployed to upload-benchmark-results January 21, 2025 17:20 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure

pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:21 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 21, 2025 17:21 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 02:42 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 02:42 Failure

jataylo and others added 4 commits February 13, 2025 16:59

Fixes

624dfc5

Linting fix

e6bdbb8

Linting

ab09fe9

Final lint

eb7843a

pytorchmergebot force-pushed the template-heuristics-jack branch from f5395a9 to eb7843a Compare February 13, 2025 17:00

pytorchmergebot added the merging label Feb 14, 2025

pytorchmergebot removed the merging label Feb 14, 2025

jataylo added a commit to jataylo/pytorch that referenced this pull request Feb 14, 2025

Revert "Revert "Introduce new template heuristic for triton autotune …

221405d

…configs (pytorch#144985)"" This reverts commit 4434376.

pytorchmergebot added the merging label Feb 14, 2025

pytorchmergebot closed this in c3853d9 Feb 14, 2025

pytorchmergebot removed the merging label Feb 14, 2025

jataylo mentioned this pull request Feb 19, 2025

Revert "Introduce new template heuristic for triton autotune configs" #147388

Closed

Introduce new template heuristic for triton autotune configs #144985

Introduce new template heuristic for triton autotune configs #144985

Uh oh!

Conversation

jataylo commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144985

❌ 2 New Failures, 8 Pending, 7 Unrelated Failures

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

jansel Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

jataylo commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Feb 13, 2025

Uh oh!

facebook-github-bot commented Feb 13, 2025

Uh oh!

facebook-github-bot commented Feb 14, 2025

Uh oh!

pytorchmergebot commented Feb 14, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 14, 2025

Merge failed

Uh oh!

jansel commented Feb 14, 2025

Uh oh!

pytorchmergebot commented Feb 14, 2025

Merge started

Uh oh!

jataylo commented Feb 14, 2025

Uh oh!

facebook-github-bot commented Feb 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jataylo commented Jan 16, 2025 •

edited

Loading

pytorch-bot bot commented Jan 16, 2025 •

edited

Loading

jataylo commented Jan 16, 2025 •

edited

Loading