Skip to content

Conversation

@jsquyres
Copy link
Member

We were initializing the mutex in file_open, but that didn't handle MPI_FILE_NULL. So we move it to the constructor, and therefore it's always initialized for all file handles -- even MPI_FILE_NULL.

Signed-off-by: George Bosilca bosilca@icl.utk.edu
Signed-off-by: Jeff Squyres jeff@squyres.com
(cherry picked from commit 2118615)

This is the v5.0.x PR corresponding to main PR #12416

We were initializing the mutex in file_open, but that didn't handle
MPI_FILE_NULL.  So we move it to the constructor, and therefore it's
always initialized for all file handles -- even MPI_FILE_NULL.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit 2118615)
@jsquyres jsquyres added the bug label Mar 20, 2024
@github-actions github-actions bot added this to the v5.0.3 milestone Mar 20, 2024
@wenduwan
Copy link
Contributor

mpi4py failed for the (apparently) same reason as before

 [fv-az1530-442:142574] [[61101,1],1] selected pml ob1, but peer [[61101,1],0] on unknown selected pml 
�
[fv-az1530-442:142574] OPAL ERROR: Unreachable in file communicator/comm.c at line 2385
[fv-az1530-442:142574] 0: Error in ompi_get_rprocs
[fv-az1530-442:142575] [[61101,1],2] selected pml ob1, but peer [[61101,1],0] on unknown selected pml 
�
[fv-az1530-442:142575] OPAL ERROR: Unreachable in file communicator/comm.c at line 2385
[fv-az1530-442:142575] 1: Error in ompi_get_rprocs
setUpClass (test_ulfm.TestULFMInter) ... ERROR
setUpClass (test_ulfm.TestULFMInter) ... ERROR

======================================================================
ERROR: setUpClass (test_ulfm.TestULFMInter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/ompi/ompi/test/test_ulfm.py", line 196, in setUpClass
    INTERCOMM = MPI.Intracomm.Create_intercomm(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/mpi4py/MPI/Comm.pyx", line 2336, in mpi4py.MPI.Intracomm.Create_intercomm
    with nogil: CHKERR( MPI_Intercomm_create(
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error

----------------------------------------------------------------------
Ran 1667 tests in 78.168s

FAILED (errors=1, skipped=78)

======================================================================
ERROR: setUpClass (test_ulfm.TestULFMInter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/ompi/ompi/test/test_ulfm.py", line 196, in setUpClass
    INTERCOMM = MPI.Intracomm.Create_intercomm(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/mpi4py/MPI/Comm.pyx", line 2336, in mpi4py.MPI.Intracomm.Create_intercomm
    with nogil: CHKERR( MPI_Intercomm_create(
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error

----------------------------------------------------------------------
Ran 1667 tests in 78.168s

FAILED (errors=1, skipped=78)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
  Proc: [[61101,1],1]
  Errorcode: 1

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

@wenduwan wenduwan merged commit 75686f9 into open-mpi:v5.0.x Mar 20, 2024
@jsquyres jsquyres deleted the pr/v5.0.x/mpi-file-null-fix branch March 20, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants