New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for CuDNN-8.7+ load bug #98644
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98644
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e75d472: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
I've tried to verify the fix using the built pip wheels from https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/4642010532/linux-bionic-cuda11.8-py3.10-gcc7-sm86/artifacts.zip on python3 -c "import torch; print(torch.__config__.show());conv=torch.nn.Conv2d(3,3,3).cuda(); out=conv(torch.rand(1, 3, 24, 24, device='cuda'))" which now fails with: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 228, in <module>
_load_global_deps()
File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 187, in _load_global_deps
raise err
File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 168, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory Indeed
but given the binaries are only ~267MB and other libs are missing, I think my workflow to verify the fix might be wrong or I'm using the wrong artifact binaries. |
Downloaded build artifact as
And tested using the standard script:
|
Thanks for explaining the gh cli workflow.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Preload `cudnn_cnn_infer` and consume `dlerror` to prevent spurious call to `abort()` from `libcudnn.so.8`, if `libnvrtc.so` is missing on the system. Fixes #97041 Pull Request resolved: #98644 Approved by: https://github.com/ngimel (cherry picked from commit c00fd71)
Preload `cudnn_cnn_infer` and consume `dlerror` to prevent spurious call to `abort()` from `libcudnn.so.8`, if `libnvrtc.so` is missing on the system. Fixes #97041 Pull Request resolved: #98644 Approved by: https://github.com/ngimel (cherry picked from commit c00fd71)
Preload `cudnn_cnn_infer` and consume `dlerror` to prevent spurious call to `abort()` from `libcudnn.so.8`, if `libnvrtc.so` is missing on the system. Fixes #97041 Pull Request resolved: #98644 Approved by: https://github.com/ngimel
Preload
cudnn_cnn_infer
and consumedlerror
to prevent spurious call toabort()
fromlibcudnn.so.8
, iflibnvrtc.so
is missing on the system.Fixes #97041