Skip to content

Building and Running MPICH (CH4)

Charles McEachern edited this page Mar 20, 2018 · 21 revisions

This page describes how to build and run MPICH (CH4 device) to test the libfabric GNI provider. It is assumed the user is building MPICH on a Cray XC system like jupiter or edision/cori, and that you have built and installed a copy of libfabric.

MPICH/CH4 can be built to use the Cray PMI.


Building and Installing MPICH

First, if you don't already have a clone of MPICH

% git clone --recurse-submodules git@github.com:pmodels/mpich.git

Make sure that your clone has PR 2557.

Next, configure and build/install MPICH. Note you will need automake 1.15+, autoconf 2.67+, and libtool 2.4.4+ to keep MPICH's configury happy.

Building using Cray PMI

If you intend to use Cray PMI, you may need to apply these two patches:patch0, patch1. If you are pulling in a fresh copy of MPICH after April 14, 2017, you do not need to apply these patches.

After applying the patches, the following steps can be used to configure MPICH CH4:

% module load PrgEnv-gnu
%./autogen.sh
%./configure CFLAGS="-DMPIDI_CH3_HAS_NO_DYNAMIC_PROCESS" LDFLAGS="-Wl,-rpath -Wl,<path-to-ofi-libfabric-install>/lib" --with-pmi=cray --with-pm=none 
  --prefix=<path-to-mpich-install> --with-libfabric=<path-to-ofi-libfabric-install> --with-device=ch4:ofi
% make -j install

Note if you are wanting to run MPI multi-threaded tests which use MPI_THREAD_MULTIPLE, you will need to add --enable-threads-multiple to the configure line.

Build Using SLURM PMI

There does not appear to be any way currently to build MPICH with CH4 support and SLURM PMI, or at least I've not been able to figure out how to do it.

Running MPICH with libfabric

First, don't forget the KVS related thingy that results from using Cray PMI. You need to set an environment variable to pacify Cray PMI.

export PMI_MAX_KVS_ENTRIES=1000000

Second you will need to build an MPI app using MPICH's compiler wrapper:

% export PATH=mpich_install_dir/bin:${PATH}
% mpicc -o my_app my_app.c

On Tiger and NERSC edison/cori, the application can be launched using srun:

% export MPIR_CVAR_OFI_USE_PROVIDER=gni
% srun -n 2 -N 2 ./my_app

If you'd like to double check against the sockets provider, do the following

% export MPIR_CVAR_OFI_USE_PROVIDER=sockets
% srun -n 2 -N 2 ./my_app

This will force the OFI CH4 netmod to use the sockets provider. Note it seems that the default behavior of the CH4/OFI device is to pick up the sockets provider.

Building and Testing OSU MPI benchmarks

OSU provides a relatively simple set of MPI benchmark tests which are useful for testing the GNI libfabric provider.

% wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.0.tar.gz
% tar -zxvf osu-micro-benchmarks-5.0.tar.gz
% cd osu-micro-benchmarks-5.0
% ./configure CC=mpicc
% make

In the mpi/pt2pt and mpi/collective subdirectories there are a number of tests. To test, for example MPICH send/recv message latency, osu_latency can be used

% cd mpi/pt2pt
% srun -n 2 -N 2 ./osu_latency

Known Issues

The MPICH CH4 OFI MPI one-sided doesn't work with the OSU one-sided tests. This owes to the fact that OSU uses MPI_Win_Allocate and currently that doesn't work for providers that only support FI_MR_BASIC. At this writing, none of the OSU one-sided tests pass with the GNI provider.

The osu_ibcast test fails when run using more than 4 processors. It fails with both the GNI and sockets provider so it's most likely a bug higher up in the MPICH's non-blocking collectives implementation.