-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange link error on ubuntu 22.04 #191
Comments
Looks like
while it is ok in nvhpc
it is also OK, when moving to cuda toolkit >=12. |
so just double-checking on this: this looks like you built with one cuda 11.8 install and then tried to run with another 11.8 install, and that was causing the issue? ie, both by themselves would have worked, just mixing them didn't? (agreed that having to different distributions with the same version numbers if "funny", though :-) just trying to make sure that it's not related to owl.) |
Just to be clear:
The problem was that libcudart_static.a shipped with cuda 11.8 doesn't provide cudaGetDeviceProperties_v2. At least, it's working now. My next step is playing with owlExaBrick. |
Huh; that is "slightly" concerning. Basically what you're saying is that CUDA 11.8 is broken :-/. Huh. Now we have three options: a) try and fix the code even for cuda 11.8; b) go into cmakefile, detect cuda version, and at least throw an error; or c) ignore, and hope that people will use the newer cuda 12, anyway... |
I tried another machine under RedHat8, and cuda 11.8 (both toolkit and nvhpc), no problem there. When I look for symbols
cuda-11.8 only contain one of them, cuda-12.0 contains them both. But on RedHat, owl samples apps link fine with cuda-11.8, even though |
I finally found the problem on my ubuntu machine; eventhough both cuda toolkit 11.8 and 12.0 where installed, in complete separated directories, when installing newer toolkit, by default the ubuntu package creates a sym link /usr/local/cuda -> /etc/alternatives/cuda -> /etc/alternatives/cuda-12.0 so what happened is that, I was compiling with nvcc 11.8, the cuda headers where actually taken from 12.0. So there was a mismatch between the header version and the runtime library version. So the question is why is Finaly, I think the problem is there: the path But this path is really not needed if using alias library like CUDA::cudart_static. A possible fix is to replace:
by
So that the path I think that definitely close the issue. I can provide a small if needed. |
On a system where several nvcc toolkit are installed, this path is often an alias to the latest installed toolkit; hence when trying to build with an older version off nvcc you end up in a situation where the old nvcc compiler is using new header; this situation may lead to error at link time (undefined symbols). See issue owl-project#191 for discussion.
Hello,
When trying to build owl on Ubuntu 22.04, I noticed an error message at link time that I'm not able to fix.
The link error only happends when building with nvcc (from toolkit), but disappear when building with nvc++(from nvhpc).
Here is the error message I get when building with cuda-11.8 (toolkit), Optix 7.5.0 and g++-11 (default on ubuntu 22.04), it complains about not find
cudaGetDeviceProperties_v2
Several remarks:
cudaGetDeviceProperties(&prop, getCudaDeviceID()
in DeviceContext.cpp, and just return an empty string, all the samples build/link and run ok. BTW, shouldn't we use the macroOWL_CUDA_CALL
here ?On this same machine, I can build other cuda app with nvcc toolkit without any problem.
So in the end, I have no idea what is the root cause of this link problem.
Any ideas ?
The text was updated successfully, but these errors were encountered: