Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL] "Cannot allocate memory" issues on PPC64 #2703

Open
pierrepaleo opened this issue Aug 1, 2019 · 1 comment
Open

[OpenCL] "Cannot allocate memory" issues on PPC64 #2703

pierrepaleo opened this issue Aug 1, 2019 · 1 comment
Labels
Projects

Comments

@pierrepaleo
Copy link
Contributor

Any OpenCL test run on our Power9 machine results in the following error (with my environment):

PYOPENCL_CTX="0:1" ./run_tests.py silx.opencl.test.test_addition.suite
[...]
OSError: [Errno 12] Cannot allocate memory

The reason is linked to scikit-cuda:

  1. On one hand, scikit-cuda creates a CUBLAS context to get the version number when imported.
  2. On the other hand, silx creates an OpenCL context on all present devices to pick the best one.

For some reason doing (1) then (2) succeeds, but doing (2) then (1) fails on Power9.

The following fails:

from silx.opencl.convolution import Convolution                                                                                                                                       
from silx.math.fft.cufft import CUFFT 

The following succeeds:

from silx.math.fft.cufft import CUFFT                                                                                                                                                  
from silx.opencl.convolution import Convolution

A workaround is to modify the order of imports.

@pierrepaleo pierrepaleo added the bug label Aug 1, 2019
@pierrepaleo pierrepaleo added this to In plan in OpenCL via automation Aug 1, 2019
@pierrepaleo
Copy link
Contributor Author

The OpenCL contexts on all visible devices seem to be created when calling pyopencl.get_platforms().
This occurs on both our Power9 and DGX1 servers. It might be due to the nvidia-persistenced daemon.

For now I see no obvious bugfix apart from being careful in the imports order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
OpenCL
  
In plan
Development

No branches or pull requests

1 participant