Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OpenMPI CUDA support #4323

Merged
merged 3 commits into from Jun 16, 2017
Merged

Conversation

adamjstewart
Copy link
Member

@ax3l Can you see if this fixes #4322?

@ax3l
Copy link
Member

ax3l commented May 23, 2017

that looks definitely like the place, but are you sure it's not affecting non-external cuda builds? they seemed to work. not that the error propagates from a higher place down where the external dirs are set, no?

@adamjstewart
Copy link
Member Author

Both internal and external builds should be treated the same, the only difference is the installation directory. libs.directories returns a list of all directories containing the libraries. If you want a single directory, you have to take the first element of the list.

You said it was working for internal builds? What did the configure line look like? Do they still work with this change?

not that the error propagates from a higher place down where the external dirs are set, no?

I'm not sure what you mean by this, can you elaborate?

@ax3l
Copy link
Member

ax3l commented May 23, 2017

You said it was working for internal builds? What did the configure line look like? Do they still work with this change?

I did not re-test internal builds when I saw the error, but am currently verifying. please stand by :)

I'm not sure what you mean by this, can you elaborate?

I was just wondering if that's just fixing the external build issue, but you are right: it's a list of libs and hopefully the first entry should do

@ax3l
Copy link
Member

ax3l commented May 23, 2017

hm, the line you propose is valid but the flag seems gone in openmpi 2.1.1 anyway. can't nail down the other issue I reported yet... something in hwloc

@adamjstewart
Copy link
Member Author

adamjstewart commented May 30, 2017

Looking back, I don't see a single version of OpenMPI that accepts the --with-cuda-libdir flag. I'll just remove it altogether.

@ax3l
Copy link
Member

ax3l commented May 30, 2017 via email

@adamjstewart
Copy link
Member Author

@ax3l You're absolutely right, I'll fix that.

@adamjstewart
Copy link
Member Author

@ax3l I tried to include all of the knowledge from https://www.open-mpi.org/faq/?category=buildcuda in the build recipe. Can you try this out and make sure it works?

@adamjstewart
Copy link
Member Author

Ping @ax3l

@ax3l
Copy link
Member

ax3l commented Jun 16, 2017

@adamjstewart thank you! Performed interactively it seems to work for me! (tested with nvidia-docker as described in #4322 with cuda 8.0.61 and OpenMPI 2.1.1)

Unfortunately it still crashes for me when running non-interactive during the build stage of a docker container.

$ spack install openmpi+cuda

==> Error: ProcessError: Command exited with status 1:
    '/home/src/spack/var/spack/stage/openmpi-2.1.1-gjy6oliid7io3s3fg27hkl5wy55j6vsq/openmpi-2.1.1/configure' '--prefix=/home/src/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/openmpi-2.1.1-gjy6oliid7io3s3fg27hkl5wy55j6vsq' '--enable-shared' '--enable-static' '--enable-mpi-cxx' '--without-psm' '--without-psm2' '--without-pmi' '--without-verbs' '--without-mxm' '--without-alps' '--without-lsf' '--without-tm' '--without-slurm' '--without-sge' '--without-loadleveler' '--with-hwloc=/home/src/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/hwloc-1.11.7-z3rifyeiamchoka6jsisk5legsysovjj' '--disable-java' '--disable-mpi-java' '--disable-mpi-thread-multiple' '--enable-dlopen' '--with-cuda=/usr/local/cuda'

# no further lines

I can't really debug it since it passes as soon as I switch to interactive... but that shall not block your fix for now and could be something weird on my side.

@adamjstewart adamjstewart merged commit 60db73a into spack:develop Jun 16, 2017
@adamjstewart adamjstewart deleted the fixes/openmpi-cuda branch June 16, 2017 15:00
xavierandrade pushed a commit to xavierandrade/spack that referenced this pull request Jun 16, 2017
* Fix OpenMPI CUDA support

* Remove --with-cuda-libdir flag, not a real flag

* Fix PGI and CUDA 7 support
EmreAtes pushed a commit to EmreAtes/spack that referenced this pull request Jul 10, 2017
* Fix OpenMPI CUDA support

* Remove --with-cuda-libdir flag, not a real flag

* Fix PGI and CUDA 7 support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error installing OpenMPI+CUDA with external CUDA
2 participants