New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDAExtension] support all visible cards when building a cudaextension #48891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! Nice fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit b4ecc0d (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
Also I don't quite understand what does this do: pytorch/torch/utils/cpp_extension.py Lines 1562 to 1565 in 2b14425
Where did PTX go and why do we have a weird replication of almost the same nvcc flag, yet different? (last line) edit @mcarilli pointed me to https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation which explains that this is how PTX is encoded in nvcc flags. Quote:
So that last line: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types.
Specifically resulting in:
when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated)
My current setup is:
but the extension was getting built with
-gencode=arch=compute_80,code=sm_80
.This PR:
+PTX
to the last entry of ccs derived from local cards (if not _arch_list:
) to support other archsPlease kindly review that my prose is clear and easy to understand.
@ptrblck