Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for curand #43

Closed
mondus opened this issue Feb 25, 2020 · 12 comments
Closed

Support for curand #43

mondus opened this issue Feb 25, 2020 · 12 comments

Comments

@mondus
Copy link
Contributor

mondus commented Feb 25, 2020

I have been making lots of good progress with jitify. Thanks for the excellent tool. One issue which I am currently unable to resolve however is the use of curand.

If I include curand in my jitify kernel (e.g. #include <curand_kernel.h>) and correctly set the compiler to add the cuda include directory (from CUDA_PATH) then there are a whole bunch of errors from cuda.h relating to ambiguous definitions of size_t

e.g.

cuda.h(1773): error: "size_t" is ambiguous
@benbarsdell
Copy link
Member

benbarsdell commented Feb 26, 2020

Thanks for the feedback!

I tried to reproduce the problem but ran into a different issue related to missing headers (which I'll submit a patch for).

Could you tell me what CUDA version you're using and provide a minimal reproducer?
*EDIT: Could you also provide what platform you're running on.

@mondus
Copy link
Contributor Author

mondus commented Feb 26, 2020

Hi @benbarsdell. Thanks for getting back to me. I saw your talk with Kate at GTC when you first introduced jitify and knew I would need it at some point!

I am using CUDA 10.1 on windows using dlfcn-win32 (see #42). A minimal example is to add the line

"#include <curand_kernel.h>\n"

at line 159

Note: This actually works on linux no problem. To reproduce on windows requires building with dlfcn-win32, setting a pre-porcessor macro of NOMINMAX , updating the hard coded include directory and pre-processing (with stringify) the my_header3.cuh file to generate to stringified version. I also had to create a disk version of my_header4.cuh as the callback function didn't seem to work.

I have noticed that one of the later tests also fail on windows. E.g. line 266 fails. Looking at the get_constant_ptr and demangle functions this is because it is expecting the name mangling to be different in visual studio. My windows ptx has the same mangling as in linux however. E.g.

.const .align 4 .u32 a;
.const .align 4 .u32 _ZN1b1aE;
.const .align 4 .u32 _ZN1c1b1aE;

Hence the lookup fails.

@maddyscientist
Copy link
Collaborator

That's interesting that you are not seeing the expected name mangling on Windows. Which compiler are you using? I guess we need to fix up the demangle detection for this.

@mondus
Copy link
Contributor Author

mondus commented Feb 26, 2020

Hi @maddyscientist it is Visual Studio 2019 with CUDA 10.1. I noticed that the demangle detection is easy to fix by changing a macro guard but the actual demangling would require cxxabi which is not so easy to come by on windows...

@benbarsdell
Copy link
Member

I was able to reproduce your issues in Visual Studio 2019 with CUDA 10.2, and I have fixes for them. I'll put them into a PR probably tomorrow. I'll also add workarounds for the NOMINMAX and dlfcn problems to avoid those annoyances.

The size_t issue is because jitify provides a definition of it in a built-in header, but NVRTC already provides its own built-in definition, and on Windows these definitions conflict. The fix is to remove these lines:

jitify/jitify.hpp

Lines 1578 to 1579 in 8af928e

"typedef unsigned long size_t;\n"
"typedef signed long ptrdiff_t;\n"

The name mangling issue is because CUDA (PTX) always uses the Itanium mangling scheme even when compiling with Visual Studio. I was able to implement a simple demangler for variable names that avoids needing cxxabi.

With these fixes, all the tests in jitify_example.cpp pass. I didn't see anything related to the callbacks; maybe it was some kind of current-working-directory problem?

@mondus
Copy link
Contributor Author

mondus commented Feb 27, 2020

@benbarsdell Fantastic. Looking forward to looking at your Itanium demangler code! I will take another look at the callbacks issue. Probably user error.

@benbarsdell
Copy link
Member

The fixes are available in #45. Let me know if anything doesn't work for you.

@benbarsdell
Copy link
Member

@mondus @Robadob
I added the #include <cctype> fix to #45. Are you OK with it being merged?

@Robadob
Copy link
Contributor

Robadob commented Mar 4, 2020

Building with the current status of #45 and VS2015, it works out the box for me. These are the remaining compile warnings I get (I think they're significantly reduce from what Mondus showed me the other week, though I haven't tried it in VS2020).

I haven't tested with Curand, I don't think Mondus shared that code with me. I've sent him an email and will try and chase up if he's in the office today.

Might be worth addressing them. Though, if it blocks our CI I can probably just wrap the include to reduce warning level.

C:\Users\Robadob\Documents\Visual Studio 2015\Projects\Jitify>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -x cu -rdc=true -I"C:\Program Files (x86)\dlfcn-win32\include" -I"C:\Program Files (x86)\Visual Leak Detector\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc140.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "C:\Users\Robadob\Documents\Visual Studio 2015\Projects\Jitify\kernel.cu"
1>c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\vector_types.h(423): warning : calling a __host__ function("std::_Iterator_base12::_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed
1>c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\vector_types.h(423): warning : calling a __host__ function("std::_Iterator_base12::~_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::~_Iterator_base12 [subobject]") is not allowed
1>  kernel.cu
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2259): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2087): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2793): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(3564): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(217): warning C4800: 'unsigned __int64': forcing value to bool 'true' or 'false' (performance warning)
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(217): note: while compiling class template member function 'bool jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>::contains(const unsigned __int64 &) const'
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2894): note: see reference to function template instantiation 'bool jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>::contains(const unsigned __int64 &) const' being compiled
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2357): note: see reference to class template instantiation 'jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>' being compiled

@mondus
Copy link
Contributor Author

mondus commented Mar 4, 2020

@benbarsdell Sorry for the delay. Other academic duties have moved me away form coding this week. I am sure your fix is fine but I will test this early next week and confirm.

@benbarsdell
Copy link
Member

No worries, thanks for the feedback so far. I pushed some more fixes for conversion warnings. I'll merge #45 now (I have a follow-up PR to submit) and we can address any other issues that come up next week.

@mondus
Copy link
Contributor Author

mondus commented Apr 2, 2020

This works for me. Cheers.

@mondus mondus closed this as completed Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants