Skip to content
Jeff Squyres edited this page Aug 13, 2015 · 25 revisions

Notes for the v2.X series

Estimated timeline

  • June '15 master branched to 2.0.0

Must-have Features for 2.0.0

  • Thread safety (MPI_THREAD_MULTIPLE) support
    • need to verify which BTLs are thread safe (via testing vs stating)
    • need more testing (non-blocking collectives, one-sided, MPI I/O, etc.)
    • need to document what is not thread safe
    • performance improvements when using MPI_THREAD_MULTIPLE (i.e., TEST/WAIT improvements) - may wait for a publication before committing
  • MPI-3.1 Compliance
    • Ticket #349 (MPI_Aint_add) - 2.0.X candidate
    • Ticket #369 (same_disp_info key for MPI_Win_create) - 2.0.X candidate, maybe
    • Ticket #273 (non blocking coll I/O, non trivial). This is dependent on moving libnbc core out of libnbc component.
    • Ticket #404 (MPI_Aint_diff) - 2.0.X candidate
    • Ticket #357 (MPI_Initialized, MPI_Query_thread, MPI_Thread_is_main) always thread safe (probably just verify with a test to see this is true now for OMPI thread models)
  • MPI-3 Errata Items
  • Coverity cleanup (IN PROGRESS, down to ~260)
  • Scalable startup work (smarter add_proc in the OB1 PML), needs more work
    • Sparse groups
    • Additional PMIx features (issue 394)
  • ROMIO refresh - need to be using a released ROMIO package (DONE)
  • Fix Java bindings garbage collection issues
  • Hwloc 1.11 final
  • CUDA extension (to add MPIX_CUDA_IS_AWESOME to mpi.h) and MPI_T Cvar for run-time query of whether CUDA is supported in this OMPI

Must-have Features for the 2.X Series

  • PMIx integration
  • pending PRs (Nathan's free list work) (DONE)
  • Multi-rail performance in OB1? What happened? WONT FIX
  • TCP latency went up. What happened? Maybe in 2.x series sometime... (IBM)
  • Support for thread-based asynchronous progress for BTLs (anyone working on this now?)
  • Improved story on out-of-the-box performance, particularly for collectives. Ideally some kind of auto-tune type of mechanism. (otopo project)
  • Consolidation of event-based progress threads to opal_progress_thread

Desirable-to-have Features for 2.X (vendor specific)

  • Rationalized configuration for Cray XE/XC (DONE)
  • switch to using libfabric MTL on Cray XC
  • usNIC stuff
    • conversion to libfabric (DONE)
  • simplified verbs BTL for iWarp? (NOT GOING TO HAPPEN)
  • Mellanox features
  • ...whatever else others add here...

Nice-to-have Features

  • OMPI commands (mpirun, orte_info, etc.): deprecate all single-dash options except for the sacrosanct ones (-np, etc.). Print a stderr warning for all the deprecated options.
    • Note that MPI-3.1 8.8 mpiexec mentions: -soft, -host, -arch, -wdir, -path, -file
  • Score-P integration (won't hit 2.0.0, but will get in 2.x)
  • libfabric support (Intel MTL, Cisco BTL, others) (DONE in 1.10)
  • Memkind support both for MPI_Alloc_mem and Open MPI internal
    • No current owner at Intel for Memkind
  • Nathan Hjelmn's BTL 3.0 changes (DONE)
  • MPI-4 features (maybe as extensions?)
    • endpoints proposal
    • ULFM (as of June 2015, Ralph/George are coordinating so that ORTE can give ULFM what it needs)
    • MPI T extensions
  • Add MPI 3 features to Java bindings
    • Some have been done; Howard/LANL is adding the rest

Features that are already in master

  • Switch to using OMPI I/O as default
  • Switch to vader as default for shared memory BTL
  • PSM2 MTL

Terminating support

  • Cray XT legacy items (ESS alps component, etc.) (DONE - although new ess/alps for Cray XE/XC)
  • MX BTL
  • What other BTLs to delete? SCIF?
  • Clean up README (DONE)
  • Delete coll hierarch component
  • coll ML disabled
  • Delete VampirTrace interface
  • Deprecate mpif77/mpif90: print a stderr warning

Testing

  • What do we want to test?
    • More thread safety tests - non blocking collectives, etc.
    • OMPI I/O tests, refresh from HDF group? (DONE)

Stale code check - Opal

Framework Component Owner Status
shmem sysv LANL maintenance
shmem mmap LANL maintenance
shmem posix LANL maintenance
shmem base LANL maintenance
backtrace none SNL maintenance
backtrace execinfo SNL maintenance
backtrace printstack SNL maintenance
backtrace base project maintenance
crs none UTK maintenance
crs criu CISCO maintenance
crs self UTK maintenance
crs dmtcp UBrit.Columbia unmaintained
crs base project maintenance
pstat linux INTEL maintenance
pstat test INTEL maintenance
if bsdx_ipv4 INTEL maintenance
if bsdx_ipv6 INTEL maintenance
if linux_ipv6 INTEL maintenance
if solaris_ipv6 nobody maintenance
if posix_ipv4 INTEL maintenance
if base project active
pmix s1 INTEL active
pmix s2 INTEL active
pmix cray LANL active
installdirs env SNL maintenance
installdirs config SNL maintenance
installdirs base project active
hwloc external CISCO maintenance
hwloc hwloc1110 INTEL maintenance
hwloc base project maintenance
reachable netlink INTEL unmaintained
reachable weighted INTEL unmaintained
reachable base INTEL unmaintained
event external CISCO maintenance
event libevent2022 INTEL active
event base project maintenance
allocator basic NVIDIA maintenance
allocator bucket NVIDIA maintenance
allocator base project maintenance
timer solaris nobody unmaintained
timer linux SNL maintenance
timer darwin SNL unmaintained
timer aix IBM? unmaintained
timer altix SNL? unmaintained
timer base SNL maintenance
compress gzip project maintenance
compress bzip project maintenance
compress base project maintenance
rcache vma LANL maintenance
memcpy base project maintenance
common sm UTK maintenance
common verbs MELLANOX maintenance
common ugni LANL active
common cuda NVIDIA active
common libfabric Intel active
memchecker valgrind HLRS? unmaintained
memchecker base project unmaintained
dstore hash project active
dstore base project active
sec basic INTEL maintenance
sec munge INTEL active
sec keystone INTEL maintenance
sec base INTEL active
btl tcp UTK active
btl sm UTK active
btl usnic CISCO active
btl template project active
btl portals4 SNL active?
btl scif LANL maintenance
btl self UTK active
btl ugni LANL active
btl smcuda NVIDIA active
btl vader LANL active
btl openib Chelsio maintenance
btl base btlowners active
mpool sm LANL maintenance
mpool gpusm NVIDIA maintenance
mpool rgpusm NVIDIA maintenance
mpool grdma LANL maintenance
mpool udreg LANL maintenance
mpool base project maintenance
memory linux MELLANOX,CISCO maintenance
memory malloc_solaris nobody unmaintained
memory base project maintenance

Stale code check - ORTE

Framework Component Owner Status
oob tcp INTEL maintenance
oob alps LANL active
oob usock INTEL maintenance
oob ud MELLANOX maintenance
oob base project maintenance
rtc hwloc INTEL maintenance
rtc freq INTEL maintenance
rtc omp INTEL active
rtc base INTEL maintenance
schizo ompi INTEL active
schizo base INTEL active
filem raw INTEL maintenance
filem base INTEL maintenance
dfs app INTEL maintenance
dfs orted INTEL maintenance
dfs base INTEL maintenance
rmaps rank_file INTEL maintenance
rmaps lama CISCO maintenance
rmaps round_robin INTEL maintenance
rmaps seq INTEL maintenance
rmaps staged INTEL maintenance
rmaps resilient INTEL maintenance
rmaps mindist MELLANOX maintenance
rmaps ppr INTEL maintenance
rmaps base INTEL maintenance
routed binomial INTEL maintenance
routed debruijn LANL? unmaintained
routed direct INTEL active
routed base INTEL maintenance
errmgr default_app INTEL maintenance
errmgr default_orted INTEL maintenance
errmgr default_tool INTEL maintenance
errmgr default_hnp INTEL maintenance
errmgr base INTEL maintenance
plm isolated INTEL maintenance
plm tm INTEL maintenance
plm rsh INTEL maintenance
plm alps LANL maintenance
plm slurm INTEL maintenance
plm lsf INTEL maintenance
plm base project maintenance
ess tm INTEL maintenance
ess tool INTEL maintenance
ess pmi INTEL maintenance
ess hnp INTEL maintenance
ess singleton INTEL maintenance
ess alps LANL maintenance
ess env Intel maintenance
ess slurm INTEL maintenance
ess lsf INTEL maintenance
ess base project maintenance
rml oob INTEL maintenance
rml ftrm ? unmaintained
rml base INTEL maintenance
snapc full nobody unmaintained
snapc base nobody unmaintained
sstore central nobody unmaintained
sstore stage nobody unmaintained
sstore base nobody unmaintained
odls alps LANL active
odls default INTEL maintenance
odls base project maintenance
state app INTEL active
state novm INTEL active
state staged_orted INTEL active
state tool INTEL active
state dvm INTEL active
state hnp INTEL active
state orted INTEL active
state staged_hnp INTEL active
state base INTEL active
grpcomm rcd INTEL maintenance
grpcomm direct INTEL maintenance
grpcomm brks INTEL maintenance
grpcomm base INTEL maintenance
ras tm INTEL maintenance
ras loadleveler IBM maintenance
ras alps LANL active
ras simulator INTEL maintenance
ras slurm INTEL maintenance
ras lsf INTEL maintenance
ras gridengine INTEL unmaintained
ras base INTEL maintenance
common alps LANL active
iof mr_hnp INTEL maintenance
iof tool INTEL maintenance
iof hnp INTEL maintenance
iof mr_orted INTEL maintenance
iof orted INTEL maintenance
iof base INTEL maintenance

Stale code check - OMPI

Framework Component Owner Status
fbtl pvfs2 UH active
fbtl posix UH active
fbtl plfs UH active
fbtl base UH active
pubsub orte INTEL maintenance
pubsub pmi INTEL maintenance
pubsub base INTEL maintenance
osc sm LANL maintenance
osc rdma LANL active?
osc portals4 SNL active
osc pt2pt LANL active
osc base LANL active
fs pvfs2 UH active
fs lustre UH active
fs ufs UH active
fs plfs UH active
fs base UH active
pml ob1 LANL active
pml yalla MELLANOX active
pml v UTK maintenance
pml bfo NVIDIA unmaintained
pml cm SNL maintenance
pml crcpw nobody unmaintained
pml base project active
dpm orte INTEL maintenance
dpm base INTEL maintenance
topo basic UTK maintenance
topo example UTK maintenance
topo base UTK maintenance
vprotocol pessimist UTK maintenance
vprotocol example UTK maintenance
vprotocol base UTK maintenance
coll libnbc project active
coll inter UH maintenance
coll sm nobody unmaintained
coll fca MELLANOX active
coll hcoll MELLANOX active
coll portals4 SNL active
coll self CISCO maintenance
coll cuda NVIDIA maintenance
coll basic UH maintenance
coll ml ORNL? unmaintained
coll demo project maintenance
coll tuned UTK maintenance
coll base project maintenance
bml r2 SNL maintenance
bml base project maintenance
io romio314 LANL/RIST active
io ompio UH active
io base project maintenance
bcol iboffload ORNL unmaintained
bcol basesmuma ORNL unmaintained
bcol ptpcoll ORNL unmaintained
bcol base ORNL unmaintained
sharedfp sm UH maintenance
sharedfp lockedfile UH maintenance
sharedfp individual UH maintenance
sharedfp addproc UH maintenance
sharedfp base UH maintenance
sbgp p2p ORNL unmaintained
sbgp basesmuma ORNL unmaintained
sbgp basesmsocket ORNL unmaintained
sbgp ibnet ORNL unmaintained
sbgp base ORNL unmaintained
op x86 INTEL maintenance
op example project maintenance
op base project unmaintained
mtl mxm MELLANOX active
mtl psm INTEL active
mtl portals4 SNL active
mtl ofi INTEL active
mtl base project active
fcoll dynamic UH active
fcoll static UH active
fcoll individual UH active
fcoll two_phase UH active
fcoll base UH active
crcp bkmrk nobody unmaintained
crcp base nobody unmaintained
Clone this wiki locally