An implementation of ARMCI using MPI one-sided communication (RMA)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
benchmarks fix compiler warning: possibly used uninitialized May 6, 2015
m4
src
tests squash 'set but not used' compiler warning May 6, 2015
travis
.travis.yml
CHANGES
COPYRIGHT
Makefile.am
README
VERSION
autogen.sh
configure.ac

README

                    ARMCI on MPI-RMA Implementation Notes
                       James Dinan <dinan@mcs.anl.gov>

===============================================================================
Introduction
===============================================================================

This project provides a full, high performance, portable implementation of the
ARMCI runtime system using MPI's remote memory access (RMA) functionality.

===============================================================================
Installing Only ARMCI-MPI
===============================================================================

ARMCI-MPI uses autoconf and must be configured before compiling:

 $ ./configure

Many configure options are provided, run "configure --help" for details.  After
configuring the source tree, the code can be built and installed by running:

 $ make && make install

The quality of MPI-RMA implementations varies.  As of August, 2011 the
following MPI implementations are known to work correctly with ARMCI-MPI:

 + MVAPICH2 1.6
 + MPICH2
 + Cray MPI on Cray XE6
 + IBM MPI on BG/P (set ARMCI_STRIDED_METHOD=IOV and ARMCI_IOV_METHOD=BATCHED)
 + OpenMPI 1.5.4 (set ARMCI_STRIDED_METHOD=IOV and ARMCI_IOV_METHOD=BATCHED)

The following MPI implementations are known to fail with ARMCI-MPI:

 - MVAPICH2 prior to 1.6

===============================================================================
Installing Global Arrays with ARMCI-MPI
===============================================================================

ARMCI-MPI has been tested with GA 5.0.2.  To build GA with ARMCI-MPI, rename
this directory to "armci" and substitute it for the "armci" directory in the GA
distribution.  Configure and build GA as usual; no special flags are required.

===============================================================================
The ARMCI-MPI Test Suite
===============================================================================

ARMCI-MPI includes a set of testing and benchmark programs located under tests/
and benchmarks/.  These programs can be compiled and run via:

$ make check MPIEXEC="mpiexec -n 4"

The MPIEXEC variable is optional and is used to override the default MPI launch
command.  If you want only to build the test suite, the following target can be
used:

$ make checkprogs

===============================================================================
ARMCI-MPI Errata
===============================================================================

Direct access to local buffers:

 * Because of MPI's semantics, you are not allowed to access shared memory
   directly, it must be through put/get.  Alternatively you can use the 
   new ARMCI_Access_begin/end() functions.
   
Progress semantics:

 * On some MPI implementations and networks you may need to enable implicit
   progress.  In many cases this is done through an environment variable.  For
   MPICH2: set MPICH_ASYNC_PROGRESS; for MVAPICH2 recompile with
   --enable-async-progress and set MPICH_ASYNC_PROGRESS; set DCMF_INTERRUPTS=1
   for MPICH2-BG; etc.

===============================================================================
Environment Variables:
===============================================================================

Boolean environment variables are enabled when set to a value beginning with
't', 'T', 'y', 'Y', or '1'; any other value is interpreted as false.

 -------------------
: Debugging Options :
 -------------------

ARMCI_VERBOSE (boolean)

  Enable extra status output from ARMCI-MPI.

ARMCI_DEBUG_ALLOC (boolean)

  Turn on extra shared allocation debugging.

ARMCI_FLUSH_BARRIERS (boolean)

  Enable/disable extra communication flushing in ARMCI_Barrier.  Extra flushes
  are present to help make unsafe DLA safer.

 ---------------------
: Performance Options :
 ---------------------

ARMCI_CACHE_RANK_TRANSLATION (boolean)

  Create a table to more quickly translate between absolute and group ranks.

 --------------------------
: Noncollective Groups     :
 --------------------------

ARMCI_NONCOLLECTIVE_GROUPS (boolean)

  Enable noncollective ARMCI group formation; group creation is collective on
  the output group rather than the parent group.

 --------------------------
: Shared Buffer Protection :
 --------------------------

ARMCI_SHR_BUF_METHOD = { COPY (default), NOGUARD }

  ARMCI policy for managing shared origin buffers in communication operations:
  lock the buffer (unsafe, but fast), copy the buffer (safe), or don't guard
  the buffer - assume that the system is cache coherent and MPI supports
  unlocked load/store.

 --------------------
: I/O Vector Options :
 --------------------

ARMCI_IOV_METHOD = { AUTO (default), CONSRV, BATCHED, DIRECT }

  Select the IO vector communication strategy: automatic; a "conservative"
  implementation that does lock/unlock around each operation; an implementation
  that issues batches of operations within a single lock/unlock epoch; and a
  direct implementation that generates datatypes for the origin and target and
  issues a single operation using them.

ARMCI_IOV_CHECKS (boolean)

  Enable (expensive) IOV safety/debugging checks (not recommended for
  performance runs).

ARMCI_IOV_BATCHED_LIMIT = { 0 (default), 1, ... }

  Set the maximum number of one-sided operations per epoch for the BATCHED IOV
  method.  Zero (default) is unlimited.
  
 -----------------
: Strided Options :
 -----------------

ARMCI_STRIDED_METHOD = { DIRECT (default), IOV }

  Select the method for processing strided operations.