Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress warning about missing libcuda.so.1 #4405

Closed
morrisonlevi opened this issue Oct 26, 2017 · 7 comments
Closed

Suppress warning about missing libcuda.so.1 #4405

morrisonlevi opened this issue Oct 26, 2017 · 7 comments

Comments

@morrisonlevi
Copy link

morrisonlevi commented Oct 26, 2017

I've built OpenMPI v3.0.0 with CUDA support. We have some hardware which has GPUs and some that does not. Fortunately, if libcuda.so.1 cannot be found OpenMPI will still work. However, it prints this message:

The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
--mca mpi_cuda_support 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.

I don't want to put mpi_cuda_support 0 in my global conf file because I do want CUDA support, just only when CUDA is available. But I don't want this awful message for non-CUDA users.

Is there an environment variable that corresponds to mpi_cuda_support? Alternatively is there an option that simply suppresses the warning but not the behavior? If not it seems like we should add a knob somewhere for this.

@morrisonlevi
Copy link
Author

morrisonlevi commented Oct 26, 2017

I just remembered there is a general OMPI_MCA_* pattern for environment variables. This means I should be able to set mpi_cuda_support=0 in the conf file and put OMPI_MCA_mpi_cuda_support to 1 when GPUs are available.

@morrisonlevi
Copy link
Author

The non-deprecated name for this setting is opal_cuda_support. Setting opal_cuda_support=0 in my etc/openmpi-mca-params.conf file and setting the environment variable OMPI_MCA_opal_cuda_support to 1 when CUDA is available worked.

Sorry for the noise. We probably ought to change the message to use the non-deprecated name, though. Should I open that as a new ticket or..?

@jsquyres
Copy link
Member

@morrisonlevi Sorry for missing this issue for so long -- it got lost in the runup to the Supercomputing trade show (Oct and Nov are tremendously busy for all of us for that reason).

Yes, we should definitely update the help message to use the non-deprecated name. Would you mind filing a pull request? It should be a pretty trivial help string to update. Then you get all the glory of your name in the Open MPI git commit logs! 😄

@kcgthb
Copy link

kcgthb commented May 22, 2018

I have the same issue, and using different configuration files on GPU vs non-GPU nodes is not very practical in my case. We provide Open MPI installations via module files (Lmod) and would like to keep configuration as consistent as possible.

Would it be possible to get a new MCA parameter to disable the missing CUDA lib warning, but still keep the CUDA functionality enabled globally? Something along the lines mpi_warn_on_fork, but called mpi_warn_on_missing_cuda maybe?

Thanks!

@jsquyres
Copy link
Member

@Akshay-Venkatesh Is this something NVIDIA can help with? (hint hint 😄) @kcgthb's suggestion is a good one, and would be pretty easy to implement.

@morrisonlevi
Copy link
Author

I know we would definitely prefer the suggested option instead of juggling the environment as well.

@jsquyres
Copy link
Member

@morrisonlevi @kcgthb NVIDIA has filed a PR for this -- #5188. It should be merged to master shortly (once CI testing completes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants