Skip to content

--with-cuda failes to find libcuda.so #12509

@BKitor

Description

@BKitor

OpenMPI can fail to find libcuda.so and will build without opal acclerator cuda when --with-cuda/--with-cuda-libdir is specified.
This was already reported as a bug in #12264 and fixed in #12382, but the bug persists.
I noticed in with a v5.0.3 tarball, and have been able to reproduce on master.

Details of the problem

user@bigtwin1d:~/bkitor/bk_share/ompi_builds/ompi[master]$ ./build/bin/ompi_info | grep 'Configure command'
  Configure command line: '--prefix=/home/user/bkitor/bk_share/ompi_builds/ompi/build' '--with-cuda=/usr/local/cuda' '--with-ofi=/usr/local'
user@bigtwin1d:~/bkitor/bk_share/ompi_builds/ompi[master]$ ./build/bin/ompi_info | grep 'MCA accelerator'
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.1.0)
user@bigtwin1d:~/bkitor/bk_share/ompi_builds/ompi[master]$ ./build/bin/ompi_info | grep 'Configure command'
  Configure command line: '--prefix=/home/user/bkitor/bk_share/ompi_builds/ompi/build' '--with-cuda=/usr/local/cuda' '--with-cuda-libdir=/usr/local/cuda' '--with-ofi=/usr/local'
user@bigtwin1d:~/bkitor/bk_share/ompi_builds/ompi[master]$ ./build/bin/ompi_info | grep 'MCA accelerator'
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.1.0)

The crux of the issue is that /usr/local/cuda is a symlink, and the find command in opal_check_cuda.m4 won't follow it by default.
Adding the -H flag should fix the issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions