-
Notifications
You must be signed in to change notification settings - Fork 937
Closed
Labels
RTEIssue likely is in RTE or PMIx areasIssue likely is in RTE or PMIx areasTarget: v3.0.xTarget: v3.1.xTarget: v4.0.xbug
Description
One of the customers recently encountered the problem in the Slurm/PMIx environment.
For some unrelated reasons PMIx_Fence was cancelled by timeout by RM. However because OMPI doesn't checks the return status of the operation all processes proceeded further:
- modex: https://github.com/open-mpi/ompi/blob/v3.0.x/ompi/runtime/ompi_mpi_init.c#L663
- barrier: https://github.com/open-mpi/ompi/blob/v3.0.x/ompi/runtime/ompi_mpi_init.c#L840
I think this is not a correct behavior and processes should take an error code path in this case.
Metadata
Metadata
Assignees
Labels
RTEIssue likely is in RTE or PMIx areasIssue likely is in RTE or PMIx areasTarget: v3.0.xTarget: v3.1.xTarget: v4.0.xbug