Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

Open
1 task
jon-chuang opened this issue Aug 28, 2023 · 3 comments
Open
1 task

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Aug 28, 2023

Although server class GPU (A100, H100) are main target for production, some may have their own server with commercial GPU or want to develop optimizations on local GPU. Hence, we should attempt to support sm_89 for hopper-specific features. Currently they might be ignored.

Actually, NVIDIA docs seem pretty clear:
image
Seems that the only thing that hopper and ada lovelace (or other sm_89) share is support for fp8 tensor core, which is not listed explicitly.

@jon-chuang jon-chuang changed the title feat(hardware support): check if hopper optimizations apply to ada lovelace (sm_89) feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) Aug 28, 2023
@Jokeren
Copy link
Contributor

Jokeren commented Aug 28, 2023

wgmma current only supports sm_90 in the public doc, so the comment is OK IMHO.

@jon-chuang
Copy link
Contributor Author

Cool, let me see if TMA will improve sm_89 (Ada lovelace) performance or result in any errors.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Sep 5, 2023

I found some evidence on sm_89 (RTX 4070) that TMA and thread block cluster does not work:

ptxas /tmp/compile-ptx-src-959430, line 41; error   : Feature '%clusterid' requires .target sm_90 or higher
ptxas /tmp/compile-ptx-src-7ff6b4, line 76; error   : Feature '%cluster_ctaid' requires .target sm_90 or higher

Use of TMA results in a Python abort e.g.

test/unit/hopper/test_persistent_warp_specialized_gemm.py::test_user_defined_persistent_warp_specialized_gemm[2048-2048-64-64-64-16-1-False-True-True] Fatal Python error: Aborted

Current thread 0x00007efe63f8b000 (most recent call first):
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 49 in ttir_compute_capability_rewrite
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 55 in optimize_ttir
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 382 in <lambda>
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 488 in compile
  File "<string>", line 74 in static_persistent_tma_warp_specialized_matmul_kernel
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/test/unit/hopper/test_persistent_warp_specialized_gemm.py", line 438 in test_user_defined_persistent_warp_specialized_gemm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants