You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: a simple MPI hello world compiled with the TOSS 4 (non tce) openmpi 4.1 build prints this annoying but seemingly harmless message:
$ module purge
$ module use /opt/toss/modules/modulefiles
$ module load openmpi-gnu
$ flux run ./hello
Failed to open drm root directory /sys/class/drm.: No such file or directory
fdPyPwZT2RV: completed MPI_Init in 1.112s. There are 1 tasks
fdPyPwZT2RV: completed first barrier in 0.000s
fdPyPwZT2RV: completed MPI_Finalize in 0.002s
OpenMPI calls hwloc, and hwloc loads a plugin calls rsmi which prints to stderr.
Solution: One can use the environment to control which hwloc components are loaded as described here:
$ flux run --env=HWLOC_COMPONENTS=-rsmi ./hello
fdPyTZiwDSf: completed MPI_Init in 1.657s. There are 1 tasks
fdPyTZiwDSf: completed first barrier in 0.000s
fdPyTZiwDSf: completed MPI_Finalize in 0.002s
Other potentially useful runes for that MPI build are
# Avoid broken openib btl (use tcp/shmem)
-env=OMPI_MCA_btl=^openib
# In case UCX is used - avoid deadlock in MPI_Init()
-opmi=pmix
# Is compiled for slurm, so make sure it finds our `libpmi2.so` before theirs (not needed in flux-core 0.59.0 and beyond)
--env=LD_LIBRARY_PATH=$(dirname $(flux config builtin pmi_library_path)):$LD_LIBRARY_PATH
The text was updated successfully, but these errors were encountered:
using the new -o hwloc.xmlfile shell option would resolve the rsmi errors since Flux doesn't use hwloc_topology_load(3) for jobs (it fetches XML from the enclosing instance so that the topology is not re-discovered unnecessarily). It would also likely make MPI_Init much faster. (I guess the same effect could be had with -o pmi=pmix with recent flux-pmix as well)
Problem: a simple MPI hello world compiled with the TOSS 4 (non tce) openmpi 4.1 build prints this annoying but seemingly harmless message:
OpenMPI calls hwloc, and hwloc loads a plugin calls rsmi which prints to stderr.
Solution: One can use the environment to control which hwloc components are loaded as described here:
Other potentially useful runes for that MPI build are
The text was updated successfully, but these errors were encountered: