-
Notifications
You must be signed in to change notification settings - Fork 1.2k
CUDA: Lazily add libdevice to compilation units #6795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The nvvmLazyAddModuleToProgram function, which will be used in the following commit, was added in CUDA 9.2. This commit drops support for CUDA 9.0 and 9.1 so that the use of this function can be supported without maintaining two different paths.
Modules should be lazily loaded where possible, in order to permit certain optimizations that make compilation more efficient and improve runtime performance. This commit adds support for lazy loading and uses it for libdevice, since it is never the entry point to a compilation unit. This seems to have a noticeable effect on compilation speed. Running: ``` python -m numba.runtests numba.cuda.tests -m ``` prior to this commit gives: ``` Ran 1132 tests in 136.102s ``` on my system. With this commit, it gives: ``` Ran 1132 tests in 86.255s ```
stuartarchibald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, couple of minor things to resolve else looks good.
|
Oddly, my OSX CUDA9.2 doesn't have In [1]: p = "/Developer/NVIDIA/CUDA-9.2/nvvm/lib/libnvvm.3.2.0.dylib"
In [2]: import ctypes
In [3]: lib = ctypes.CDLL(p)
In [4]: lib.nvvmLazyAddModuleToProgram
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-7fa6d4c0d54f> in <module>
----> 1 lib.nvvmLazyAddModuleToProgram
~/dev/numba/.conda_env_py38/lib/python3.8/ctypes/__init__.py in __getattr__(self, name)
384 if name.startswith('__') and name.endswith('__'):
385 raise AttributeError(name)
--> 386 func = self.__getitem__(name)
387 setattr(self, name, func)
388 return func
~/dev/numba/.conda_env_py38/lib/python3.8/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
389
390 def __getitem__(self, name_or_ordinal):
--> 391 func = self._FuncPtr((name_or_ordinal, self))
392 if not isinstance(name_or_ordinal, int):
393 func.__name__ = name_or_ordinal
AttributeError: dlsym(0x7ff34acdbb90, nvvmLazyAddModuleToProgram): symbol not found
In [5]: lib.nvvmAddModuleToProgram
Out[5]: <_FuncPtr object at 0x1109cc580> |
|
I have just reinstalled CUDA 9.2.148. here's running $ nm libnvvm.3.2.0.dylib | grep nvvm
0000000000012700 T ___nvvmHandle
000000000000f080 T _nvvmAddModuleToProgram
000000000000f290 T _nvvmCompileProgram
000000000000ed30 T _nvvmCreateProgram
000000000000ee90 T _nvvmDestroyProgram
0000000000012380 T _nvvmGetCompiledResult
00000000000122f0 T _nvvmGetCompiledResultSize
00000000000125b0 T _nvvmGetErrorString
00000000000124e0 T _nvvmGetProgramLog
0000000000012440 T _nvvmGetProgramLogSize
000000000000ec80 T _nvvmIRVersion
0000000000011f50 T _nvvmVerifyProgram
000000000000ebf0 T _nvvmVersionThe symbol is not defined. |
From OOB discussion:
So as I understand it, there's no resolution needed here. |
Assuming that this is resolved, this PR looks good. Please could you resolve the conflicts and then it can be tested on the build farm? Many Thanks. |
|
@stuartarchibald Thanks for the review - conflicts now resolved, and I understood from @sklam that he was happy with the MacOS situation - this should be good for a BuildFarm run now. |
|
Buildfarm ID: |
|
The CUDA 9.0 tests fail on this PR because CUDA 9.0 can not be installed: |
|
Following the OOB conversation - @esc Can we test this with 9.2 as it drops support for 9.0 and 9.1 please? |
|
Will upgrade the build farm to test against 9.2 instead of 9.0 and see if this helps. |
|
Using 9.2 instead yields the following: |
|
Using 9.2 on Windows I get: |
|
@gmarkall thank you for the fixes, |
stuartarchibald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good, thanks.
|
Thanks @stuartarchibald ... just for info, I'm thinking we could remove that fix for 0.55, as by the time it's released CUDA toolkit 10.0 will be over three years old, so it ought to be OK drop support for 9.2 at that point (on the basis of supporting toolkits at least three years back at the time of the release). |
No problem. RE 0.55 and dropping 9.2, perhaps add as an issue for https://github.com/numba/numba/milestone/56 ? Seems reasonable to drop it and also review any other existing CUDA <10.0 workarounds. |
Modules should be lazily loaded where possible, in order to permit certain optimizations that make compilation more efficient and improve runtime performance.
This commit adds support for lazy loading and uses it for libdevice, since it is never the entry point to a compilation unit. This seems to have a noticeable effect on compilation speed. Running:
prior to this PR gives:
on my system. With this PR, it gives:
This PR also drops support for CUDA 9.0 and 9.1 on the basis that they will be greater than three years old by the time 0.54 is released (9.1 released in December 2017), and
nvvmLazyAddModuleToProgramwas added in CUDA 9.2 - this avoids supporting two different paths through NVVM. Note that although examination of this PR suggests it would be straightforward to maintain both paths, I expect it to be problematic for future work based on #6769, in particular related to improving support for debug info.