add support for cuda-nvcc and vexCL Library #221

Open
cppchedy opened this Issue Jan 2, 2017 · 4 comments

Projects

None yet

3 participants

@cppchedy
cppchedy commented Jan 2, 2017

add Support for CUDA's nvcc and OpenCL. writing verbose GPU stuff isn't cool, so i recommend vexCL library support.

@mattgodbolt
Owner

Thanks for filing this. Having no experience with either nvcc or OpenCL, how might this be integrated with Compiler Explorer? The output of both doesn't appear to be assembly instructions.

@cppchedy
cppchedy commented Jan 3, 2017

as can be seen from here nvcc can produce object file compatible with gcc(linked with g++(ld) as done here). dummy idea, just grab that object file and disasm it. I hope it work with no problem.(truly don't know if it will work hope so)

using opencl is as easy as using any other library, the only requirement is to link (check Stackoverflow )libOpenCL.so/dll(depend on platform linux/windows) using gcc. we don't need any special tool . so same idea, we can directly output assembly code or disassemble from binary.

VexCL is a header only library. you can checkout doc and this video.

Recup:

  • for Cuda nvcc tools, with the above proposition, we can only ouput assembly listing after getting final binary or object file.
  • for openCL I think we can handle both: outputting direct assembly from compiler or from binary.

I did not test any of these proposals but I don't see why they won't work.

note: if you use ubuntu(and you have nvidia GPU) ,you can get opencl through sudo apt-get install nvidia-opencl-dev and play with it. to start, use this tutorial opencl-tuto and check the cmakelists.txt and main.cpp to see the mechanics.

@haneefmubarak

Unfortunately, I don't think that will actually work out correctly. AFAIK, nvcc generates PTX bytecode based on the set CUDA target arch. At runtime, the PTX bytecode is delivered to the running NVidia driver, which actually recompiles the PTX to the specific assembly language of the particular graphics card(s) that you have.

In other words, if you get this to work, you'll end up with some x86(_64) code for the host side code, but the device side code will be relatively obscure PTX, which while readable, isn't really designed to be analyzed at that stage (performance analysis given only PTX can be extremely unpredictable and counterintuitive) and isn't designed to be written by hand either (reading the generated PTX code would not likely help you in writing high performance PTX code).


That being said, generating assembly from OpenCL (if only targeted to run on the host) may be more viable, since [in theory] that should just generate normal assembly, albeit with a lot of SIMD instructions.

@cppchedy

thanks for the clarification!
can you give an example in witch opencl will ouput sth other than normal assembly?


I hope at least we get OpenCL support than.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment