GPU auto-detect capability for kernel builds #341

rosslwheeler · 2024-05-03T06:54:16Z

Fixes to CI -should work in both environments

This is a proposal in case there is interest for kernel builds.

Usage:

Auto detect GPU capability:

make
(e.g. if your GPU capability type is 80 then --generate-code=arch=compute_80,code=[compute_80,sm_80] is used with CFLAGS)

Do not specify capability:

make GPU_COMPUTE_CAPABILITY=
(CFLAGS = -O3 --use_fast_math)

Override capability:

make GPU_COMPUTE_CAPABILITY=86
(e.g. even if your GPU capability type is 80 then --generate-code=arch=compute_86,code=[compute_86,sm_86] is used with CFLAGS)

Tested on Linux Ubuntu 22.04 only.

Fixes to CI -should work in both environments

alecco · 2024-05-03T09:35:29Z

If it's for compiling for the local architecture, why not just use -arch=native?

Also, note the system could have more than one GPU and -arch=native will compile for all GPUs present:

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-architecture-arch

When -arch=native is specified, nvcc detects the visible GPUs on the system and generates codes for them, no PTX program will be generated for this option. It is a warning if there are no visible supported GPU on the system, and the default architecture will be used.

ngc92 · 2024-05-03T16:34:39Z

I think -arch=native is a relatively new option.
nvidia-cuda-toolkit in ubuntu 22.04 comes with 11.5, which doesn't support this option yet.

karpathy · 2024-05-05T21:55:57Z

any reason we don't do this in main makefile too?

karpathy · 2024-05-05T21:56:39Z

~/llm.c/dev/cuda$ make gelu_backward
/usr/bin/nvcc -O3 --use_fast_math --generate-code=arch=compute_80 ,code=[compute_80 ,sm_80 ] -lcublas -lcublasLt gelu_backward.cu -o gelu_backward
nvcc fatal   : Option '--generate-code arch=compute_80', missing code
make: *** [Makefile:27: gelu_backward] Error 1

huh

rosslwheeler · 2024-05-06T00:49:20Z

Okay the extra space at the end of 80 is fixed. Also, fixed the command line override too. Tested all 3 cases on Ubuntu.

One strange thing - it appears that the = after the generate-code was superfluous. It didn't seem to make any difference leaving it there or removing it.

So, these two below appear to run fine even though there's an extra = in there. What's the right syntax?

NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code=arch=compute_80,code=[compute_80,sm_80]
NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code arch=compute_80,code=[compute_80,sm_80]

Added check for command line override

rosslwheeler · 2024-05-06T01:46:57Z

Main Makefile GPU auto-detect change is here: #371

GPU auto-detect capability

aa604a0

Fixes to CI -should work in both environments

rosslwheeler changed the title ~~GPU auto-detect capability~~ GPU auto-detect capability for kernel builds May 3, 2024

rosslwheeler marked this pull request as ready for review May 3, 2024 07:04

rosslwheeler mentioned this pull request May 5, 2024

Refactoring & Improvements to reduce LOC #355

Merged

rosslwheeler added 2 commits May 5, 2024 18:27

Strip space of tail of capability string

c7b8738

Added check for command line override

Change the VS Code tabs to spaces

3e7fbfd

rosslwheeler mentioned this pull request May 6, 2024

main Makefile auto-detect GPU capability and allow overrides #371

Merged

rosslwheeler added 11 commits May 6, 2024 22:47

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

881c376

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

6a7354a

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

bea99a5

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

9446089

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

6601f70

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

bdd6655

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

1054f7e

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

e74ba3e

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

cef3c13

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

49d96ca

Merge branch 'karpathy:master' into cuda-makefile-auto-detect-gpu

2bd8cc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU auto-detect capability for kernel builds #341

GPU auto-detect capability for kernel builds #341

rosslwheeler commented May 3, 2024 •

edited

alecco commented May 3, 2024 •

edited

ngc92 commented May 3, 2024

karpathy commented May 5, 2024

karpathy commented May 5, 2024

rosslwheeler commented May 6, 2024 •

edited

rosslwheeler commented May 6, 2024 •

edited

GPU auto-detect capability for kernel builds #341

Are you sure you want to change the base?

GPU auto-detect capability for kernel builds #341

Conversation

rosslwheeler commented May 3, 2024 • edited

alecco commented May 3, 2024 • edited

ngc92 commented May 3, 2024

karpathy commented May 5, 2024

karpathy commented May 5, 2024

rosslwheeler commented May 6, 2024 • edited

rosslwheeler commented May 6, 2024 • edited

rosslwheeler commented May 3, 2024 •

edited

alecco commented May 3, 2024 •

edited

rosslwheeler commented May 6, 2024 •

edited

rosslwheeler commented May 6, 2024 •

edited