Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Infiniband with Singularity #876

Open
renganxu opened this issue Aug 17, 2017 · 11 comments

Comments

Projects
None yet
7 participants
@renganxu
Copy link

commented Aug 17, 2017

I am running HPL with Infiniband IBverbs with Singularity container, but I found it is not easy to use. I didn't find any guide on how to use Infiniband with Singularity. Could you give any instructions or guide ? Thanks.

Version of Singularity:

2.3.1

Expected behavior

Easy to use Infiniband IBverbs with Singularity.

Actual behavior

I tried two ways to use Infiniband:

  1. use the existing OFED driver and IBverb library on the host. With this approach, I have to mount all IB related paths to container. For instance, I have to use the following options when running my application with Singularity:
    -B /usr -B /lib -B /etc -B /sys
    Is this the correct way to use Infiniband? In my tests, both the container and host are RHEL 7.2. But if the container is Ubuntu and the host is RHEL, then I don't think this solution will work.

  2. I tried to install OFED driver inside the container. The installation was successful but I cannot load the new driver. The installation output is as follows:
    Installation finished successfully.
    Preparing... ################################# [100%]
    Updating / installing...
    1:mlnx-fw-updater-3.3-1.0.0.0 ################################# [100%]

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Attempting to perform Firmware update...
The firmware for this device is not distributed inside Mellanox driver: 06:00.0 (PSID: DEL2180110032)
To obtain firmware for this device, please contact your HW vendor.

Failed to update Firmware.
See /tmp/MLNX_OFED_LINUX-3.3-1.0.4.0.16950.logs/fw_update.log
To load the new driver, run:
/etc/init.d/openibd restart

Then I execute the last command "/etc/init.d/openibd restart", but one time it had the following error:
Unloading HCA driver: [ OK ]
Loading HCA driver and Access Layer: [ OK ]
sed: couldn't open temporary file /etc/modprobe.d/sedFYTy2E: Read-only file system

and other times it was just hanging forever.

Steps to reproduce behavior

Both the container and the host are RHEL 7.2.

@AdamSimpson

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

I have a similar issue on our Cray's that are running Cray MPICH as it can't be installed in the container. I ended up writing a small utility library, https://github.com/olcf/dl-intercept , which uses the runtime loaders LD_AUDIT feature to substitute libraries at runtime(our substitutions look like this currently: https://github.com/olcf/SingularityTools/blob/master/Titan/rtld.sub).

The general workflow is users bootstrap using the distro provided MPICH as they normally would and then at runtime the container provided MPICH libraries are switched out for the Cray library equivalents(which are bind mounted in). This works with OpenMPI as well although the OpenMPI ABI seems less stable and so you have to be careful with version compatibility. I like this solution as it can be controlled from the environment(no destructive changes to the container are needed) and works even if RPATHs have been set on the executables.

This is designed to be a center wide solution covering many use cases and so may be heavier weight than you need for a more focused purpose.

@renganxu

This comment has been minimized.

Copy link
Author

commented Aug 18, 2017

Hi @AdamSimpson, thanks for your comment. I think the difficult part to use Infiniband is OFED driver and IBverbs library. @vsoch and @gmkurtzer , any suggestions on how to use Infiniband? If we reuse the OFED driver and IBverbs library on the host, then if the container OS and host OS are different, then I doubt it will work. If we install them inside the container, then there is error at least in container with RHEL 7.2. I didn't find any online document about how to use Infiniband. Any help is appreciated.

@AdamSimpson

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2017

Just to be clear on our infinibad systems I bind mount in the appropriate libraries and use the linked dl-intercept untility to make sure the host system libmpi.so libibverbs.so libraries are used by the container. Any actual driver needs to stay on the host as it does with CUDA. I don't have any issues passing in the appropriate libraries from our RHEL system to ubuntu containers, you just have to pay close attention to which libraries are used by the container.

@renganxu

This comment has been minimized.

Copy link
Author

commented Aug 21, 2017

Hi @AdamSimpson, I just tried to use Ubuntu container on RHEL host and bind IB related paths "-B /usr -B /etc", but I have the following errors:
nvcc: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
mpirun: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
ompi_info: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference

This is because the libibverbs.so is in /usr/lib64, so I have to bind /usr. But GCC library lib.so is also in /usr/lib64 and the GCC version is 5.3.1 in ubuntu container and 4.8.5 in RHEL host, so there is conflict here. It seems it's not easy to use Infiniband with Singularity in a portable way. Did you have these errors before?

@AdamSimpson

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2017

You definitely don't want to just bind all of /usr and /etc into the container. In our case we use a somewhat specialized OpenMPI install but it should be pretty similar to most installs baring a few directory name differences. libmpi.so and the MCA components are scattered about two directories:
/mpi_install/lib and /mpi_install/lib/openmpi.

I bind mount these into the container and prepend them both to the container LD_LIBRARY_PATH to make sure they get picked up before whatever is in the base container. I use dl-intercept to make sure even in the case of an RPATH executable the libmpi.so from the host is used.

On our system libibverbs.so is in /lib64 and has several dependency libraries that are needed but generally not in the container. I bind mount /lib64 to /host_lib64 and append /host_lib64 to the containers LD_LIBRARY_PATH since I only want the libraries used if they don't exist in the container. I do want the host libibverbs.so to be used from the host so I include it in my dl-intercept config.

@htsst

This comment has been minimized.

Copy link

commented Aug 22, 2017

Hi. Our site uses the following technique to use Infiniband with Singularity.

On Host

Add the following descriptions in “${singularity/install/path}/etc/singularity/init" to show IB-related libraries on host.

for i in `ldconfig -p | grep -E "/libib|/libgpfs|/libnuma|/libmlx|/libnl"`; do
    if [ -f "$i" ]; then
        message 2 "Found a library: $i\n"
        if [ -z "${SINGULARITY_CONTAINLIBS:-}" ]; then
            SINGULARITY_CONTAINLIBS="$i"
        else
            SINGULARITY_CONTAINLIBS="$SINGULARITY_CONTAINLIBS,$i"
        fi
    fi
done
if [ -z "${SINGULARITY_CONTAINLIBS:-}" ]; then
    message WARN  "Could not find any IB-related libraries on this host!\n";
else
    export SINGULARITY_CONTAINLIBS
fi

Inside Container

Install IB-related library such as libibverbs-dev using apt in ubuntu16.04 into the container.
Then, build and install MPI with IB-related options. In openmpi, we use --with-verbs option.

./configure --prefix=/opt/openmpi/${OPENMPI_VERSION} \
    		--enable-orterun-prefix-by-default \
		--enable-mpirun-prefix-by-default \
		--enable-static \
		--enable-shared \
		--with-verbs \
		--with-cuda && \
make
install

Execution

Execute the mpi program compiled with the installed mpi with the container.
We need to bind mount /etc/libibverbs.d . We can also add the -B description in “${singularity/install/path}/etc/singularity/singulairty.conf".

mpirun -np 4 singularity exec -B /etc/libibverbs.d  container.img ./a.out

It seems working on our site.

@mcuma

This comment has been minimized.

Copy link

commented Sep 8, 2017

Check out our approach at https://github.com/CHPC-UofU/Singularity-ubuntu-mpi

Essentially we install the Ubuntu Mellanox IB stack, which goes to /usr/lib/libibverbs in the container, and set LD_LIBRARY_PATH to it.

Then you can either build your own MPI distro in the container (MPICH or derivatives, or OpenMPI) pointing to that IB stack.

Or, more simply in our case, use the existing MPI builds from our host OS, which are fairly OS-oblivious, e.g. Intel MPI, in the container, and then using the same mpirun outside the container to run the container binary.

Things get trickier if OS stock MPI is used, e.g. in https://github.com/CHPC-UofU/Singularity-meep-mpi, where we install meep-mpich2 package which depends on OS built mpich2. This mpich2 is only build with TCP so we need to adjust LD_LIBRARY_PATH to load libmpich.so from MVAPICH2 build - which happens to be again one that we built on the host, but, it could be one that you build in your container as well.

The nice thing about MPICH based MPI distros is that they are ABI compatible, so you can e.g. run MPICH built binary with MVAPICH2 (or IntelMPI).

HTH

@rgoldino

This comment has been minimized.

Copy link

commented Sep 27, 2017

Thanks for putting this together, @mcuma. I was quite disappointed when I came to understand how non-trivial it is to get Singularity containers working with IB. The documentation talks about how PMIx is used to facilitate MPI communication outside the container but no mention about the fact that you can't actually use MPI over IB without jumping through a bunch of hoops.

Singularity recognized this issue with GPU support and has made that very easy now. Hopefully (hint @gmkurtzer) they will tackle IB next.

On a related note: will your solution only work with Mellanox IB? We have some systems with Intel OmniPath.

@mcuma

This comment has been minimized.

Copy link

commented Sep 27, 2017

@rgoldino, wrt. OmniPath - we don't have one here so I can't say for sure, but, in principle it should be similar to the Mellanox IB stack.

Based on https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_Fabric_Software_IG_H76467_v1.0.pdf, I'd install the prerequisite OS packages (this document just lists RHEL and SLES, so, some googling for equivalent Ubuntu packages may be necessary), then install the IntelOPA-Basic.DISTRO.VERSION.tgz. Looks like you could set this up to be unattended with appropriate options as well.

@rdwrt

This comment has been minimized.

Copy link

commented Dec 20, 2017

Great stuff! These approaches seem viable enough to at least be referenced in the Singularity Readme?

@ebetica

This comment has been minimized.

Copy link

commented Jan 26, 2018

I had significant trouble working with IB and MPI on a Mellanox software stack. Here are my steps for getting it to work on Ubuntu 16.04:

  1. Find the Mellanox OFED driver version of the host, and install this on the container. I found this via the latest version number in dpkg -l | grep mlnx, which reads '4.0-1.0.1.0' (mlnx-fw-updater usually has the right version)
  2. Install the OFED distribution, with the command line arguments they were originally installed with, which you can find with /etc/infiniband/info
  3. Compile MPI with most of the command line flags your host system compiled it with. This can be found with ompi_info --parsable --all | grep config:cli

Unfortunately, this is not good for reproducible research, since containers on one HPC system won't be able to transfer to another easily. I also suggest a --nv like option for MPI and OFED, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.