Skip to content

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4 and the Open Fabric Interface (OFI). Please click on the Wiki tab for help with building and using SOS.

License

Notifications You must be signed in to change notification settings

jandres742/SOS

 
 

Repository files navigation

Sandia OpenSHMEM
----------------

* About

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over
Portals 4.0, the Open Fabrics Interface (OFI), and XPMEM.

* Building

The Sandia OpenSHMEM implementation utilizes the Autoconf/Automake/Libtool
build system.  The standard GNU configure script and make system is used, as
follows:

  % ./configure <options>
  % make
  % make check
  % make install

The "make check" step is not strictly necessary, but is a good idea.  Make
check utilizes the TEST_RUNNER and NPROCS make variables, which can be used to
override defaults, e.g. "make check NPROCS=4" or "make check
TEST_RUNNER='mpiexec -n 2 -ppn 1 -hosts compute1,compute2'".

Sandia OpenSHMEM must be configured to use either the Portals 4 or OFI network
transport, but not both.  It can optionally be configured to use XPMEM or CMA
to optimize communication between PEs within the same shared memory domain.

Options to configure include:

  --prefix=<DIR>          Install implementation in <DIR>, default: /usr/local
  --with-portals4=<DIR>   Find the Portals 4 library in <DIR>
  --with-ofi=<DIR>        Find the libfabric library in <DIR>
  --with-xpmem=<DIR>      Find the XPMEM library in <DIR>
  --with-cma              Use cross-memory attach for on-node communication
  --with-pmi=DIR          Location of PMI installation.  Configure will 
                          automatically look for the PMI runtime provided by
                          the Portals 4 reference implementation
  --enable-pmi-simple     Include support for interfacing with a PMI 1.0
                          launcher.  The launcher must be provided by a
                          separate package, such as MPICH, Hydra, or SLURM.
  --enable-error-checking Enable error checking in SHMEM calls.  This will
                          increase the overhead of communication operations.
  --enable-hard-polling   When using only the network transport, the
                          implementation will use counting events to
                          block the implementation when waiting for 
                          local memory changes.  On some implementations,
                          enabling hard polling may increase target side
                          message rate.
  --enable-remote-virtual-addressing
                          Enable optimizations assuming the symmetric heap is
                          always symmetric with regards to virtual address.
                          This may cause applications to abort during
                          shmem_init() if such a symmetric heap can not be
                          created, but will reduce the instruction count for
                          some operations. This optimization also requires
                          that the Portals 4 implementation support
                          BIND_INACCESSIBLE on LEs.  This optimization will
                          reduce the overhead of communication calls.
  --disable-fortran       Disable the Fortran bindings.  This may be useful
                          if the machine has a Fortran compiler which does
                          not support ISO_C_BINDING.
  --enable-nonblocking-fence
                          By default, shmem_fence() is equivalent to
                          shmem_quiet(), which can be a lengthy
                          operation.  Enabling this feature results in
                          the ordering point being moved from the
                          shmem_fence() to the next put-like call,
                          which can help improve overlap in some
                          cases.
  --with-total-data-ordering=<yes|no|check>
                          If a network supports total data ordering
                          (that is, ordering guarantees to two
                          different addresses on the same target
                          node), this option can remove the
                          shmem_quiet() from shmem_fence() calls when
                          sending short messages.  The option does,
                          however, force ordering requirements on the
                          network, so experimentation may be necessary
                          to determine the best configuration.  Yes
                          means always assume total data ordering is
                          available and abort a job if that's not the
                          case.  No means never use total data
                          ordering optimizations.  Check will result
                          in slightly higher overhead than "yes", but
                          will provide a fallback if the network
                          doesn't provide total data ordering.


There are many other options to configure to influence performance and
behavior.  See 'configure --help' for documentation on available
options.

* SHMEM Runtime Support

  Environment variables:

    SHMEM_VERSION: if defined, print SHMEM version during start_pes().

    SHMEM_INFO: if defined, print (stdout) SHMEM environment variables.

    SHMEM_SYMMETRIC_SIZE (default: 64 MiB)
        The allocated size of the symmetric heap which shmalloc() and shfree()
        operates on. The size value can be scaled with a suffix of
            'K' for kilobytes (B * 1024),
            'M' for Megabytes (KiB * 1024)
            'G' for Gigabytes (MiB * 1024)

    SHMEM_BOUNCE_SIZE (default: 2 KiB)
        The maximum size of a bounce buffer for put messages.
        Messages greater than the immediate send value for the
        underlying network but greater than this threshold will be
        copied into a bounce buffer and then sent.

    SHMEM_MAX_BOUNCE_BUFFERS (default: 128)
        The maximum number of bounce buffers that can be created per context.

    SHMEM_COLL_CROSSOVER (default: 4)
        For num_pes < SHMEM_COLL_CROSSOVER, collective algorithms are
        serial instead of tree based.

    SHMEM_COLL_RADIX (default: 4)
        Controls the width of the n-ary tree for collectives, such that each
        node will fanout-send to a max of approximately SHMEM_COLL_RADIX

    SHMEM_SYMMETRIC_HEAP_USE_MALLOC (default: 0)
        If set to a non-zero integer, will use malloc() instead of
        mmap() to allocate the symmetric heap.  This option may result in
        incorrect behavior when remote virtual addressing is enabled.

    SHMEM_BARRIER_ALGORITHM (default: auto)
        Algorithm to use for barriers.  Default is to auto-select (which
        may result in different algorithms being used for different 
        PE sets).  Options are: auto, linear, tree, dissem.

    SHMEM_BCAST_ALGORITHM (default: auto)
        Algorithm to use for broadcasts.  Default is to auto-select (which
        may result in different algorithms being used for different 
        PE sets).  Options are: auto, linear, tree.

    SHMEM_REDUCE_ALGORITHM (default: auto)
        Algorithm to use for reductions.  Default is to auto-select (which
        may result in different algorithms being used for different 
        PE sets).  Options are: auto, linear, tree, recdbl.

    SHMEM_COLLECT_ALGORITHM (default: auto)
        Algorithm to use for allgathers.  Default is to auto-select (which
        may result in different algorithms being used for different 
        PE sets).  Options are: auto, linear.

    SHMEM_FCOLLECT_ALGORITHM (default: auto)
        Algorithm to use for allgathers with fixed contribution amounts.
        Default is to auto-select (which may result in different 
        algorithms being used for different PE sets).  
        Options are: auto, linear, ring, recdbl.  Note that recursive
        doubling (recdbl) will fall back to ring if the PE set is not a
        power of two in size.

    SHMEM_CMA_PUT_MAX (default: 8192)
        '--with-cma', shmem put lengths <= CMA_PUT_MAX use process_vm_writev();
        otherwise use Portals4 transport put.

    SHMEM_CMA_GET_MAX (default: 16384)
        '--with-cma', shmem get lengths <= CMA_GET_MAX use process_vm_readv();
        otherwise use Portals4 transport get.

    SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES (default: off)
        If defined, large pages will be used to back the symmetric heap.  This
        feature is only available on Linux.

    SHMEM_SYMMETRIC_HEAP_PAGE_SIZE (default: 2MB)
        Used to specify a large page size when using large pages to back the
        symmetric heap.  Ignored if SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES is not
        set.  Refer to SHMEM_SYMMETRIC_SIZE for input syntax.

    SHMEM_DISABLE_ASLR_CHECK (default: on)
        Disable runtime checks for address space layout randomization (ASLR).

  OFI Transport Environment variables:

    SHMEM_OFI_PROVIDER (default: auto)
        The name of the provider that should be used by the OFI transport.
        Shell-style wildcards, including * and ?, are allowed.  The fi_info
        utility included with libfabric can be used for assistance with
        identifying the desired provider.

    SHMEM_OFI_FABRIC (default: auto)
        The name of the fabric that should be used by the OFI transport.
        Shell-style wildcards, including * and ?, are allowed.  The fi_info
        utility included with libfabric can be used for assistance with
        identifying the desired fabric.

    SHMEM_OFI_DOMAIN (default: auto)
        The name of the fabric domain that should be used by the OFI transport.
        Shell-style wildcards, including * and ?, are allowed.  The fi_info
        utility included with libfabric can be used for assistance with
        identifying the desired fabric domain.

    SHMEM_OFI_ATOMIC_CHECKS_WARN (default: off)
        If defined, OFI will not abort if fabric provider doesn't support every
        data type x op combination, instead it will print a warning.

    SHMEM_OFI_TX_POLL_LIMIT (default: 0)
        Sets the maximum number of iterations for the transmit polling loop
        (for put/quiet operations).  Setting this to -1 enables continuous
        completion polling (i.e. there is no polling limit).  The default
        behavior is to call fi_cntr_wait without polling.

    SHMEM_OFI_RX_POLL_LIMIT (default: 0)
        Sets the maximum number of iterations for the receive polling loop (for
        get/wait operations).  Setting this to -1 enables continuous completion
        polling (i.e. there is no polling limit).  The default behavior is to
        call fi_cntr_wait without polling.

    SHMEM_OFI_STX_MAX (default: 1)
        Sets the maximum number of sharable transmit contexts (STXs) per PE.
        STXs are the underlying transmit resources that are allocated to
        OpenSHMEM contexts and they are allocated using the algorithm specified
        by the SHMEM_OFI_STX_ALLOCATOR parameter.

    SHMEM_OFI_STX_ALLOCATOR (default: round-robin)
        Algorithm for allocating STX resources to OpenSHMEM contexts.  In
        particular, the algorithm determines how resources are shared by
        contexts once all STXs have been allocated.  Options are: round-robin,
        random.

    SHMEM_OFI_STX_THRESHOLD (default: 1)
       Number of contexts that must be allocated to all shared STXs before
       another shared STX can be allocated.  This threshold can be increased to
       reduce the number of shared STXs and increase the number of STXs
       available for private use (i.e., with contexts that enable the
       SHMEM_CTX_PRIVATE option).

    SHMEM_OFI_STX_DISABLE_PRIVATE (default: off)
       Disable STX privatization. Enabling this may improve load balance across
       transmit resources, especially in scenarios where the number of contexts
       exceeds the number of STXs.

  Debugging Environment variables:

    SHMEM_DEBUG (default: off)
        If defined enables debugging messages from OpenSHMEM runtime. 

    SHMEM_TRAP_ON_ABORT (default: off)
        If defined, generate a trap when aborting an OpenSHMEM program.  This
        can be used to interface with a debugger or generate core files.

About

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4 and the Open Fabric Interface (OFI). Please click on the Wiki tab for help with building and using SOS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 77.6%
  • Shell 7.0%
  • M4 5.6%
  • C++ 4.5%
  • Python 2.8%
  • Makefile 1.1%
  • Other 1.4%