Skip to content

Conversation

jataylo
Copy link
Collaborator

@jataylo jataylo commented Jan 16, 2025

Initial PR to refactor bulkiness of mm_common to allow for better device-specific specialisation e.g. in #143286 we require large conditionalisation to get ROCm specific optimisations in.

This PR introduces a new file torch/_inductor/template_heuristics.py which implements device specific subclasses for autotune configs:

  • CPUConfigHeuristic()
  • CUDAConfigHeuristic()
  • ROCmConfigHeuristic()
  • XPUConfigHeuristic()

These subclasses are integrated as part of the InductorChoices class, which will be the interface for the kernel files to access the configs.

The mm_common, mm_plus_mm and conv configurations are implemented in this class, in the future we plan to bring in flex attention configurations also so all of the tuning config logic for templated triton kernels are handled in this file.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@jataylo jataylo added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Jan 16, 2025
@jataylo jataylo requested a review from jansel January 16, 2025 18:09
Copy link

pytorch-bot bot commented Jan 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144985

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 8 Pending, 7 Unrelated Failures

As of commit eb7843a with merge base 683178f (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jataylo jataylo changed the title Introduce new template heuristic for triton configs Introduce new template heuristic for triton autotune configs Jan 16, 2025
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement, but we shouldn't assume a single device and we shouldn't read the config class in global scope so the user can override it without restarting the process.

Comment on lines 50 to 68
def __init__(self):
self.config_heuristics = self._get_device_config_heuristic()

def _get_device_config_heuristic(self, device_type="cuda"):
from torch._inductor.utils import get_gpu_type

device_type = get_gpu_type()

if device_type == "cuda":
if torch.version.hip is None:
return CUDAConfigHeuristic()
else:
return ROCmConfigHeuristic()
elif device_type == "xpu":
return XPUConfigHeuristic()
elif torch.cuda.is_available():
return BaseConfigHeuristic()
else:
return CPUConfigHeuristic()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for programs that use multiple devices (most commonly GPU+CPU). We should be selecting the hursitics based on the device of the op, not the hardware properties of the machine. Maybe:

def get_device_config_heuristic(self, device):
   ... similar to the code you have ...

Comment on lines 60 to 63
if torch._inductor.config.max_autotune_custom_heuristic is None:
conv_heuristics = V.choices.config_heuristics
else:
conv_heuristics = torch._inductor.config.max_autotune_custom_heuristic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into V.choices.

Comment on lines 139 to 142
if inductor_config.max_autotune_custom_heuristic is None:
mm_heuristics = V.choices.config_heuristics
else:
mm_heuristics = inductor_config.max_autotune_custom_heuristic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code should move to V.choices.

Comment on lines 144 to 154
if inductor_config.max_autotune_gemm_search_space != "EXHAUSTIVE":
mm_kernel_configs = mm_heuristics.get_mm_configs()
else:
mm_kernel_configs = mm_heuristics.get_exhaustive_mm_configs()

extra_mm_kernel_configs = mm_heuristics.get_extra_mm_configs()
int8_mm_kernel_configs = mm_heuristics.get_int8_mm_configs()
mixed_mm_kernel_configs_small_m = mm_heuristics.get_mixed_mm_configs()
persistent_mm_kernel_configs = mm_heuristics.get_persistent_mm_configs()
scaled_mm_kernel_configs = mm_heuristics.get_scaled_mm_configs()
scaled_persistent_mm_kernel_configs = mm_heuristics.get_scaled_persistent_mm_configs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be moved inside functions so it can get overridden by the user (through the V.choices API). Evaluating in global scope prevents this.

# On ROCm convert num_stages to 1 as pipelining provides no benefit
if torch.version.hip and torch.cuda.is_available():
platform_configs = build_rocm_gemm_configs(platform_configs)
kernel_configs = conv_heuristics.get_conv_configs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this inside the function so you can override the choices.

@jataylo
Copy link
Collaborator Author

jataylo commented Jan 16, 2025

Thanks @jansel, I will address these comments and try to get CI green.

@jataylo jataylo force-pushed the template-heuristics-jack branch 2 times, most recently from 8479e33 to 06e9755 Compare January 21, 2025 14:16
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 21, 2025 17:20 Inactive
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:20 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 21, 2025 17:21 Failure
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 21, 2025 17:21 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 21, 2025 17:21 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 02:42 Inactive
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 02:42 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 02:42 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 02:42 Failure
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 02:42 Failure
@pytorchmergebot
Copy link
Collaborator

Successfully rebased template-heuristics-jack onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout template-heuristics-jack && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the template-heuristics-jack branch from f5395a9 to eb7843a Compare February 13, 2025 17:00
@facebook-github-bot
Copy link
Contributor

@jansel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team Raised by workflow job

jataylo added a commit to jataylo/pytorch that referenced this pull request Feb 14, 2025
@jansel
Copy link
Contributor

jansel commented Feb 14, 2025

@pytorchbot merge -f "landed in fbcode"

I fixed the inernal issue.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jataylo
Copy link
Collaborator Author

jataylo commented Feb 14, 2025

Thanks for all the help here @jansel

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 465930e. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/inductor-periodic ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor open source Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants