Support Nvidia Hopper GPUs #27

giordano · 2024-01-13T18:00:40Z

This is an initial attempt to support Nvidia Hopper GPUs, opening as draft because lots of thing still don't work. For example, theoretical peakflops for tensor cores is wrong, it looks like the formula used for A100 doesn't apply to Hopper. I tried to adapt based on figures 10-11 of https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper but with GH200 I get:

julia> for tensorcores in (false, true), dtype in (Float64, Float32, Float16, Int8)
           dtype in (Int8, Float16) && !tensorcores || theoretical_peakflops_gpu(; dtype, tensorcores)
       end
Theoretical Peakflops (TFLOP/s):
 ├ tensorcores: false
 ├ dtype: Float64
 └ max: 33.5
Theoretical Peakflops (TFLOP/s):
 ├ tensorcores: false
 ├ dtype: Float32
 └ max: 66.9
Theoretical Peakflops (TFLOP/s):
 ├ tensorcores: true
 ├ dtype: Float64
 └ max: 66.9
Theoretical Peakflops (TFLOP/s):
 ├ tensorcores: true
 ├ dtype: Float32
 └ max: 535.3
Theoretical Peakflops (TFLOP/s):
 ├ tensorcores: true
 ├ dtype: Float16
 └ max: 1070.5
Theoretical Peakflops (TOP/s):
 ├ tensorcores: true
 ├ dtype: Int8
 └ max: 4282.1

Values for Float64 (with and without tensorcores) and Float32 (without tensorcores) are good, but all other tensorcores peakflops are wrong according to column "H100 SXM5" table 2 of the document above, it should be 494.7 TFLOP/s for Float32, 989.4 for Float16, and 1978.9 for Int8 (also https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip which is specific to GH200 agrees with those numbers, but it has fewer significant digits, they rounded to integers).

giordano · 2024-01-13T18:01:43Z

ext/CUDAExt/implementations/peakflops_gpu.jl

        if Symbol(dtype) == :Float16
-            # matrix dimensions 8x8x4, factor 2 for nflops in A*B+C
-            # see e.g. https://peerj.com/articles/cs-330.pdf


Note: I replaced this link with the DOI because the link is now broken.

giordano · 2024-01-13T18:22:33Z

Trying to measure peakflops, I get

julia> for tensorcores in (false, true), dtype in (Float64, Float32, Float16, Int8)
           (dtype in (Int8, Float16) && !tensorcores) || (dtype in (Float64, Float32) && tensorcores) || GPUInspector.peakflops_gpu(; dtype, tensorcores)
       end
Peakflops (TFLOP/s):
 ├ tensorcores: false
 ├ dtype: Float64
 └ max: 22.3
Peakflops (TFLOP/s):
 ├ tensorcores: false
 ├ dtype: Float32
 └ max: 32.3
Peakflops (TFLOP/s):
 ├ tensorcores: true
 ├ dtype: Float16
 └ max: 633.2
Peakflops (TOP/s):
 ├ tensorcores: true
 ├ dtype: Int8
 └ max: 940.9

These results are quite far from the theoretical peaks, about 50% less, is there anything to tweak in the kernels for a new architecture?

giordano

Interestingly enough, all theoretical tensorcores peakflops for Float32, Float16, and Int8 are wrong by about 8%

julia> 535.3 / 494.7
1.0820699413786132

julia> 1070.5 / 989.4
1.0819688700222356

julia> (4282.1 / 2) / 1978.9
1.0819394613168933

but I have no clue of where this factor comes from.

giordano · 2024-01-15T15:47:09Z

ext/CUDAExt/implementations/peakflops_gpu.jl

+        elseif Symbol(dtype) == :Float64
+            max_peakflops *= 2 * 4 * 4 * 2
+        elseif Symbol(dtype) == :Int8
+            max_peakflops *= 2 * 2 * 32 * 8 * 4 # XXX: Wrong result!


Maybe there's an extra factor of 2 in this formula, but I based this on the Int8 calculation below

giordano commented Jan 13, 2024

View reviewed changes

giordano force-pushed the mg/hopper branch 2 times, most recently from 266b15a to d567774 Compare January 13, 2024 18:09

Support Nvidia Hopper GPUs

b67cee6

giordano force-pushed the mg/hopper branch from d567774 to b67cee6 Compare January 13, 2024 18:23

giordano commented Jan 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Nvidia Hopper GPUs #27

Support Nvidia Hopper GPUs #27

giordano commented Jan 13, 2024 •

edited

Loading

giordano Jan 13, 2024

giordano commented Jan 13, 2024

giordano left a comment

giordano Jan 15, 2024

Support Nvidia Hopper GPUs #27

Are you sure you want to change the base?

Support Nvidia Hopper GPUs #27

Conversation

giordano commented Jan 13, 2024 • edited Loading

giordano Jan 13, 2024

Choose a reason for hiding this comment

giordano commented Jan 13, 2024

giordano left a comment

Choose a reason for hiding this comment

giordano Jan 15, 2024

Choose a reason for hiding this comment

giordano commented Jan 13, 2024 •

edited

Loading