Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Failed to build crossEntropy with error in backend: Cannot select: t38: v2f16 = bitcast t37 #5969

Closed
skosnits opened this issue Apr 5, 2022 · 6 comments
Labels
bug Something isn't working compiler Compiler related issue cuda CUDA back-end

Comments

@skosnits
Copy link

skosnits commented Apr 5, 2022

Describe the bug
[CUDA] Failed to build crossEntropy with error in backend: Cannot select: t38: v2f16 = bitcast t37

To Reproduce

  1. Get SYCL with CUDA support (GSG)
  2. Build attached source file: crossEntropy_for_external.cpp.txt
clang++ -fsycl -fsycl-unnamed-lambda -fsycl-targets=nvptx64-nvidia-cuda-sycldevice crossEntropy_for_external.cpp -o crossEntropy.run  -lOpenCL

Error message:

clang++: warning: argument 'nvptx64-nvidia-cuda-sycldevice' is deprecated, use 'nvptx64-nvidia-cuda' instead [-Wdeprecated]
warning: linking module 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'crossEntropy_for_external.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
fatal error: error in backend: Cannot select: t38: v2f16 = bitcast t37
  t37: f32,ch = CopyFromReg t0, Register:f32 %7
    t36: f32 = Register %7
In function: _ZZZ15loss_fwd_kernelIN2cl4sycl6detail9half_impl4halfEENS1_5eventERNS1_5queueEPT_PlSA_S9_S9_S9_S9_ENKUlRNS1_7handlerEE_clESC_ENKUlNS1_7nd_itemILi3EEEE_clESF_
llvm-foreach:
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 15.0.0 (https://github.com/intel/llvm.git 8ec97550b7a84e5d2483f4891059f80abd8db39c)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: <path to built sycl>
clang++: note: diagnostic msg: Error generating preprocessed source(s).

Environment

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
@skosnits skosnits added bug Something isn't working cuda CUDA back-end compiler Compiler related issue labels Apr 5, 2022
@skosnits skosnits added this to Needs triage in oneAPI DPC++ via automation Apr 5, 2022
@skosnits skosnits changed the title [CUDA] Failed to build crossEntropy_for_external.cpp with error in backend: Cannot select: t38: v2f16 = bitcast t37 [CUDA] Failed to build crossEntropy with error in backend: Cannot select: t38: v2f16 = bitcast t37 Apr 5, 2022
@zjin-lcf
Copy link
Contributor

zjin-lcf commented Apr 6, 2022

I didn't observe compile/run errors using HIP backend.

@bader
Copy link
Contributor

bader commented Apr 6, 2022

This looks like a bug in NVPTX LLVM target back-end.

@AerialMantis AerialMantis moved this from Needs triage to Selected in oneAPI DPC++ Apr 11, 2022
@jchlanda
Copy link
Contributor

Would it not be an UB to typecast between float and 2xhalf, like the example here does?
Saying that, I think the compiler should do better than hard crash on it. Submitted a patch against the mainline here: https://reviews.llvm.org/D124171

@zjin-lcf
Copy link
Contributor

The cuda compiler supports the conversion.

@jchlanda
Copy link
Contributor

jchlanda commented Apr 21, 2022

I've run it on a titan machine with DPCPP just now:

# ./crossEntropy.run
Matrix size(bs * W * H) = 128 * 81 * 8732
LossNLL_FWD CPU time(ms)=. 2.01997
LossNLL_FWD GPU time(ms)=. 2.00132
FWDBandWidth = 179.963433 (GB / s), 386722816.000000
LossNLL_BWD CPU time(ms)=. 1.82208
LossNLL_BWD GPU time(ms)=. 1.80568
BWDBandWidth = 214.169755 (GB / s), 386722816.000000

Artem-B pushed a commit to llvm/llvm-project that referenced this issue Apr 25, 2022
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.

This was discovered through: intel/llvm#5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124171
@jchlanda
Copy link
Contributor

This has been merged into sycl branch in: 76d1f5e

oneAPI DPC++ automation moved this from Selected to Closed May 16, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.

This was discovered through: intel/llvm#5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler Compiler related issue cuda CUDA back-end
Projects
No open projects
oneAPI DPC++
  
Closed
Development

No branches or pull requests

4 participants