[CUDAExtension] support all visible cards when building a cudaextension #48891

stas00 · 2020-12-05T21:48:22Z

Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types.

Specifically resulting in:

RuntimeError: CUDA error: no kernel image is available for execution on the device

when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated)

My current setup is:

$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capability())"
(8, 6)
$ CUDA_VISIBLE_DEVICES=1 python -c "import torch; print(torch.cuda.get_device_capability())"
(6, 1)

but the extension was getting built with -gencode=arch=compute_80,code=sm_80.

This PR:

introduces a loop over all visible at build time devices to ensure the extension will run on all of them (it sorts the new list generated by the loop, so that the output is easier to debug should a card with lower capacity come last)
adds +PTX to the last entry of ccs derived from local cards (if not _arch_list:) to support other archs
adds a digest of my conversation with @ptrblck on slack in the form of docs which hopefully can help others know which archs to support, how to override defaults, when and how to add PTX, etc.

Please kindly review that my prose is clear and easy to understand.

@ptrblck

ezyang

Ha! Nice fix.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dr-ci · 2020-12-05T23:28:36Z

💊 CI failures summary and remediations

As of commit b4ecc0d (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 251398a from Dec 04 until Dec 06 (22 commits; 9e10e3b - 17f53bf)

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_xla_linux_bionic_py3_6_clang9_test from Dec 04 until Dec 06 (22 commits; 9e10e3b - 17f53bf)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 13 times.

torch/utils/cpp_extension.py

stas00 · 2020-12-08T00:08:05Z

Also I don't quite understand what does this do:

pytorch/torch/utils/cpp_extension.py

Lines 1562 to 1565 in 2b14425

    
           num = arch[0] + arch[2] 
        
           flags.append(f'-gencode=arch=compute_{num},code=sm_{num}') 
        
           if arch.endswith('+PTX'): 
        
               flags.append(f'-gencode=arch=compute_{num},code=compute_{num}')

Where did PTX go and why do we have a weird replication of almost the same nvcc flag, yet different? (last line)

edit @mcarilli pointed me to https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation which explains that this is how PTX is encoded in nvcc flags. Quote:

By specifying a virtual code architecture instead of a real GPU, nvcc postpones the assembly of PTX code until application runtime, at which the target GPU is exactly known. For instance, the command below allows generation of exactly matching GPU binary code, when the application is launched on an sm_50 or later architecture.

nvcc x.cu --gpu-architecture=compute_50 --gpu-code=compute_50

So that last line: flags.append(f'-gencode=arch=compute_{num},code=compute_{num}') means --ptx.

torch/utils/cpp_extension.py

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-12-08T23:20:55Z

@ezyang merged this pull request in 02b6385.

support all visible cards when building cudaextension

b8f90d7

stas00 requested review from ezyang, fmassa and soumith as code owners December 5, 2020 21:48

facebook-github-bot added the cla signed label Dec 5, 2020

stas00 mentioned this pull request Dec 5, 2020

[build] make builder smarter and configurable wrt compute capabilities + docs microsoft/DeepSpeed#578

Merged

6 tasks

pytorchbot added the open source label Dec 5, 2020

stas00 added 2 commits December 5, 2020 14:12

remove dups, add sorting

41d4dab

oops

70feab9

ezyang approved these changes Dec 5, 2020

View reviewed changes

facebook-github-bot reviewed Dec 5, 2020

View reviewed changes

mcarilli reviewed Dec 7, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

mcarilli reviewed Dec 7, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

mcarilli reviewed Dec 7, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

mcarilli reviewed Dec 7, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

mcarilli reviewed Dec 7, 2020

View reviewed changes

torch/utils/cpp_extension.py Show resolved Hide resolved

integrate @mcarilli's improved notes

2b14425

stas00 added 2 commits December 7, 2020 16:14

add +PTX to the arch list of visible cards

2cb0dfc

more prose improvements from @mcarilli

4d73334

mcarilli reviewed Dec 8, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

mcarilli reviewed Dec 8, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

combine 2 paras as it's the same topic

0a29d41

mcarilli reviewed Dec 8, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

improve prose

b4ecc0d

mcarilli approved these changes Dec 8, 2020

View reviewed changes

ngimel approved these changes Dec 8, 2020

View reviewed changes

ClementPinard mentioned this pull request Dec 8, 2020

CUDA extension tutorial is not compatible with distributed data parallel pytorch/tutorials#1196

Closed

facebook-github-bot reviewed Dec 8, 2020

View reviewed changes

facebook-github-bot closed this in 02b6385 Dec 8, 2020

stas00 deleted the cuda-ext-gpu-mix branch December 8, 2020 23:10

facebook-github-bot added the Merged label Dec 8, 2020

stas00 mentioned this pull request Dec 9, 2020

[build] fix computer capability arch flags, add PTX, handle PTX microsoft/DeepSpeed#591

Merged

3 tasks

ClementPinard mentioned this pull request Dec 28, 2020

CUDA Error when pytorch distribution training... ClementPinard/Pytorch-Correlation-extension#63

Open

holly1238 mentioned this pull request Jul 27, 2021

Custom C++ and CUDA Extensions / multiple GPU issue pytorch/tutorials#1133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAExtension] support all visible cards when building a cudaextension #48891

[CUDAExtension] support all visible cards when building a cudaextension #48891

stas00 commented Dec 5, 2020 •

edited

ezyang left a comment

facebook-github-bot left a comment

dr-ci bot commented Dec 5, 2020 •

edited

stas00 commented Dec 8, 2020 •

edited

facebook-github-bot left a comment

facebook-github-bot commented Dec 8, 2020

[CUDAExtension] support all visible cards when building a cudaextension #48891

[CUDAExtension] support all visible cards when building a cudaextension #48891

Conversation

stas00 commented Dec 5, 2020 • edited

ezyang left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

dr-ci bot commented Dec 5, 2020 • edited

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

stas00 commented Dec 8, 2020 • edited

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 8, 2020

stas00 commented Dec 5, 2020 •

edited

dr-ci bot commented Dec 5, 2020 •

edited

stas00 commented Dec 8, 2020 •

edited