feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

jon-chuang · 2023-08-28T00:16:17Z

Although server class GPU (A100, H100) are main target for production, some may have their own server with commercial GPU or want to develop optimizations on local GPU. Hence, we should attempt to support sm_89 for hopper-specific features. Currently they might be ignored.

TMA:

triton/python/tutorials/09-experimental-tma-matrix-multiplication.py

Line 37 in 5f448b2

print("Skipping TMA benchmark for GPU with compute capability < 9")

Actually, NVIDIA docs seem pretty clear:

Seems that the only thing that hopper and ada lovelace (or other sm_89) share is support for fp8 tensor core, which is not listed explicitly.

Jokeren · 2023-08-28T00:31:39Z

wgmma current only supports sm_90 in the public doc, so the comment is OK IMHO.

jon-chuang · 2023-08-28T01:14:34Z

Cool, let me see if TMA will improve sm_89 (Ada lovelace) performance or result in any errors.

jon-chuang · 2023-09-05T08:16:00Z

I found some evidence on sm_89 (RTX 4070) that TMA and thread block cluster does not work:

ptxas /tmp/compile-ptx-src-959430, line 41; error   : Feature '%clusterid' requires .target sm_90 or higher
ptxas /tmp/compile-ptx-src-7ff6b4, line 76; error   : Feature '%cluster_ctaid' requires .target sm_90 or higher

Use of TMA results in a Python abort e.g.

test/unit/hopper/test_persistent_warp_specialized_gemm.py::test_user_defined_persistent_warp_specialized_gemm[2048-2048-64-64-64-16-1-False-True-True] Fatal Python error: Aborted

Current thread 0x00007efe63f8b000 (most recent call first):
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 49 in ttir_compute_capability_rewrite
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 55 in optimize_ttir
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 382 in <lambda>
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/triton/compiler/compiler.py", line 488 in compile
  File "<string>", line 74 in static_persistent_tma_warp_specialized_matmul_kernel
  File "/home/jonch/Desktop/Programming/mlsys/triton/python/test/unit/hopper/test_persistent_warp_specialized_gemm.py", line 438 in test_user_defined_persistent_warp_specialized_gemm

jon-chuang changed the title ~~feat(hardware support): check if hopper optimizations apply to ada lovelace (sm_89)~~ feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) Aug 28, 2023

jon-chuang mentioned this issue Sep 7, 2023

[TESTS] Enable tests for FP8 on cc=89 #2258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

jon-chuang commented Aug 28, 2023 •

edited

Jokeren commented Aug 28, 2023 •

edited

jon-chuang commented Aug 28, 2023

jon-chuang commented Sep 5, 2023 •

edited

feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

feat(hardware support): verify if hopper optimizations apply to ada lovelace (sm_89) #2192

Comments

jon-chuang commented Aug 28, 2023 • edited

Jokeren commented Aug 28, 2023 • edited

jon-chuang commented Aug 28, 2023

jon-chuang commented Sep 5, 2023 • edited

jon-chuang commented Aug 28, 2023 •

edited

Jokeren commented Aug 28, 2023 •

edited

jon-chuang commented Sep 5, 2023 •

edited