[Easy][Profiler] Fix pattern matcher of profiler #157711

Aidyn-A · 2025-07-07T11:55:10Z

Per title, as it fails with the following error if "+PTX" was used in TORCH_CUDA_ARCH_LIST:

  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in skip
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in <genexpr>
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
                   ^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'pute_120'

Because slicing arch[3:] will not end up on having only digits for compute_120 element of torch.cuda.get_arch_list():

>>> torch.cuda.get_arch_list()
['sm_75', 'sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120', 'compute_120']

cc @ptrblck @msaroufim @eqy @jerryzh168 @robieta @chaekit @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise

pytorch-bot · 2025-07-07T11:55:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157711

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 42ce8d8 with merge base c5b46b5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

sraikund16 · 2025-07-07T16:27:41Z

@pytorchbot merge

pytorchmergebot · 2025-07-07T16:29:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-07-07T16:40:19Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m2-15)

Details for Dev Infra team

Raised by workflow job

sraikund16 · 2025-07-07T20:48:05Z

@pytorchbot rebase

pytorchmergebot · 2025-07-07T20:49:31Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-07T20:49:34Z

Successfully rebased fix_profiler_pattern_matcher onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_profiler_pattern_matcher && git pull --rebase)

sraikund16

@Aidyn-A do you mind adding a test before merging this?

Aidyn-A · 2025-07-08T08:50:09Z

@Aidyn-A do you mind adding a test before merging this?

I do not think there is any need or way of adding short unit-tests. There are already a couple of them failing if PyTorch was built with TORCH_CUDA_ARCH_LIST="{any arches>=80} +PTX":

______________________________________________________________________________ TestExperimentalUtils.test_profiler_fp32_matmul_pattern ______________________________________________________________________________
Traceback (most recent call last):
  File "/usr/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3146, in wrapper
    method(*args, **kwargs)
  File "/opt/pytorch/pytorch/test/profiler/test_profiler.py", line 2748, in test_profiler_fp32_matmul_pattern
    has_tf32 = 0 if pattern.skip else 1
                    ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in skip
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in <genexpr>
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
                   ^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'pute_120'

To execute this test, run the following from the base repo dir:
    python test/profiler/test_profiler.py TestExperimentalUtils.test_profiler_fp32_matmul_pattern

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
__________________________________________________________________________ TestExperimentalUtils.test_profiler_pattern_matcher_json_report __________________________________________________________________________
Traceback (most recent call last):
  File "/usr/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3146, in wrapper
    method(*args, **kwargs)
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 1617, in wrapper
    fn(*args, **kwargs)
  File "/opt/pytorch/pytorch/test/profiler/test_profiler.py", line 2885, in test_profiler_pattern_matcher_json_report
    report_all_anti_patterns(prof, json_report_dir=tmpdir, print_enable=False)
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 629, in report_all_anti_patterns
    matched_events = anti_pattern.matched_events()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 98, in matched_events
    if self.skip:
       ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in skip
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/profiler/_pattern_matcher.py", line 313, in <genexpr>
    has_tf32 = all(int(arch[3:]) >= 80 for arch in torch.cuda.get_arch_list())
                   ^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'pute_120'

To execute this test, run the following from the base repo dir:
    python test/profiler/test_profiler.py TestExperimentalUtils.test_profiler_pattern_matcher_json_report

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

The reason this error was not regularly observed is because of Python interrupts the all call at the first False value. So this call is successful:

>>> all(int(arch[3:]) >= 80 for arch in ['sm_70', 'sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120', 'compute_120'])
False

However, if all architectures are >=80, the Python interpreter checks the whole range and fails on compute_120:

>>> all(int(arch[3:]) >= 80 for arch in ['sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120', 'compute_120'])
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: 'pute_120'

sraikund16 · 2025-07-08T16:23:10Z

@Aidyn-A ok sounds good. Can you fix the linter error?

Aidyn-A · 2025-07-09T06:02:55Z

@pytorchbot rebase

pytorchmergebot · 2025-07-09T06:04:26Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-09T06:04:29Z

Successfully rebased fix_profiler_pattern_matcher onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_profiler_pattern_matcher && git pull --rebase)

Aidyn-A · 2025-07-09T12:02:21Z

The linter was failing due to broken CI.

@pytorchbot merge

pytorchmergebot · 2025-07-09T12:04:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Aidyn-A self-assigned this Jul 7, 2025

Aidyn-A requested a review from sraikund16 as a code owner July 7, 2025 11:55

Aidyn-A added module: cuda Related to torch.cuda, and CUDA support in general oncall: profiler profiler-related issues (cpu, gpu, kineto) topic: not user facing topic category labels Jul 7, 2025

pytorchbot added the open source label Jul 7, 2025

Skylion007 approved these changes Jul 7, 2025

View reviewed changes

sraikund16 approved these changes Jul 7, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 7, 2025

pytorchmergebot added the merging label Jul 7, 2025

pytorchmergebot removed the merging label Jul 7, 2025

pytorchmergebot force-pushed the fix_profiler_pattern_matcher branch from 4fddd01 to c9722a8 Compare July 7, 2025 20:49

sraikund16 requested changes Jul 7, 2025

View reviewed changes

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 7, 2025

Aidyn-A requested a review from sraikund16 July 8, 2025 08:53

sraikund16 approved these changes Jul 8, 2025

View reviewed changes

Aidyn-A added 2 commits July 9, 2025 06:04

use regex

1f72079

fix lint

42ce8d8

pytorchmergebot force-pushed the fix_profiler_pattern_matcher branch from c9722a8 to 42ce8d8 Compare July 9, 2025 06:04

pytorchmergebot added the merging label Jul 9, 2025

pytorchmergebot closed this in 86eaf45 Jul 9, 2025

pytorchmergebot added Merged and removed merging labels Jul 9, 2025

[Easy][Profiler] Fix pattern matcher of profiler #157711

[Easy][Profiler] Fix pattern matcher of profiler #157711

Uh oh!

Conversation

Aidyn-A commented Jul 7, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157711

✅ No Failures

Uh oh!

sraikund16 commented Jul 7, 2025

Uh oh!

pytorchmergebot commented Jul 7, 2025

Merge started

Uh oh!

pytorchmergebot commented Jul 7, 2025

Merge failed

Uh oh!

sraikund16 commented Jul 7, 2025

Uh oh!

pytorchmergebot commented Jul 7, 2025

Uh oh!

pytorchmergebot commented Jul 7, 2025

Uh oh!

sraikund16 left a comment

Choose a reason for hiding this comment

Uh oh!

Aidyn-A commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sraikund16 commented Jul 8, 2025

Uh oh!

Aidyn-A commented Jul 9, 2025

Uh oh!

pytorchmergebot commented Jul 9, 2025

Uh oh!

pytorchmergebot commented Jul 9, 2025

Uh oh!

Aidyn-A commented Jul 9, 2025

Uh oh!

pytorchmergebot commented Jul 9, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Aidyn-A commented Jul 7, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 7, 2025 •

edited

Loading

Aidyn-A commented Jul 8, 2025 •

edited

Loading