Skip to content

[NVPTX] bad binary since cuda-11.3 #54633

@ye-luo

Description

@ye-luo

My application fails with wrong numbers when using cuda >= 11.3 toolchain. Checked up to 11.6
source and assembly code
badcubin.zip

psi_list_ptr[iw] = psi_local;

both sides are std::complex. Bad binary caused the imaginary part of the left hand side has value 0.

--save-temps assembly files from CUDA 11.2 and 11.3, they differ only by

diff cuda11.3/MultiSlaterDetTableMethod-openmp-nvptx64-nvidia-cuda.s cuda11.2/
5c5
< .version 7.3
---
> .version 7.2

If I compile the whole application with CUDA 11.3 toolchain, test fails. Since my application is OpenMP offload, the nvptx pass invokes ptxas, If I use ptxas from CUDA 11.2 to generate cubin for the failing file and all the rest uses CUDA 11.3. my test passes.
So my guess is the nvptx backend and ptxas (>=7.3) have some incompatibility and caused bad binary. I just leave my analysis here, hopefully backend experts will have more ideas.

Q: Is there a way to force clang to generate assembly files with a different PTX version? In this way combined with --ptxas-path, I can use an alternative ptxas while the rest remains with the primary CUDA toolkit I need to use.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions