-
Notifications
You must be signed in to change notification settings - Fork 927
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
main
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
08e41ed5629b51832f5708181af6d89218c7a74e 3rd-party/openpmix (v1.1.3-4067-g08e41ed5)
30cadc6746ebddd69ea42ca78b964398f782e4e3 3rd-party/prrte (psrvr-v2.0.0rc1-4839-g30cadc6746)
6032f68dd9636b48977f59e986acc01a746593a6 3rd-party/pympistandard (remotes/origin/main-23-g6032f68)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)
Please describe the system on which you are running
- Operating system/version: rockylinux:10.0
- Computer hardware: Ampere Altra MAX or NVIDIA Grace
- Network type: intra-node
Details of the problem
Problem 1
MPI_Waitall() and MPI_Testall() don't return MPI_ERR_IN_STATUS under the following conditions:
(1) either an MPI_Testall() or an MPI_Waitall() completing procedure is called,
(2) the communicator with errhandler MPI_ERRORS_RETURN is specified as the comm
argument to the completing procedure,
(3) the request handle associated with a persistent communication request is specified in the array_of_requests
argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR
!= MPI_SUCCESS).
Table 1: Multiple completion functions that return multiple statuses
array_of_statuses == MPI_STATUSES_IGNORE | array_of_statuses != MPI_STATUSES_IGNORE | |
---|---|---|
MPI_Waitall | valid | invalid |
MPI_Waitsome | valid | valid |
MPI_Testall | invalid | invalid |
MPI_Testsome | valid | valid |
Invalid means in this condition the completing procedure doesn't return MPI_ERR_IN_STATUS.
Reason
For example, it seems that ompi_request_default_wait_all() in ompi/request/req_wait.c calls the continue statement before setting MPI_ERR_IN_STATUS to the mpi_error return code variable if the request is a persistent communication request.
if( request->req_persistent ) {
request->req_state = OMPI_REQUEST_INACTIVE;
continue;
}
Problem 2
MPI_Testany() doesn't return the communication error under the following conditions:
(1) An MPI_Testany() completing procedure is called,
(2) The communicator with errhandler MPI_ERRORS_RETURN is specified as the comm
argument to the completing procedure,
(3) The request handle associated with a persistent communication request is specified in the array_of_requests
argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR
!= MPI_SUCCESS).
Reason
It seems that ompi_request_default_test_any() in ompi/request/req_test.c returns MPI_SUCCESS unconditionally if the request is a persistent communication request.
if( request->req_persistent ) {
request->req_state = OMPI_REQUEST_INACTIVE;
return OMPI_SUCCESS;
}
Problem 3
MPI completing procedures such as MPI_Wait() and MPI_Test() may free the request handle for a persistent communication request under the following conditions:
(1) Either of completing procedures is called,
(2) The request handle associated with a persistent communication request is specified in the array_of_requests
argument to the completing procedure, and
(3) the persistent request completes with an error ( .req_status.MPI_ERROR
!= MPI_SUCCESS).
Reason
In case of .req_status.MPI_ERROR != MPI_SUCCESS, it seems that the ompi_errhandler_request_invoke() in ompi/errhandler/errhandler_invoke.c unconditionally frees the request by calling ompi_request_free() except a FT condition even if the request is a persistent request.
if (MPI_REQUEST_NULL != requests[i] &&
MPI_SUCCESS != requests[i]->req_status.MPI_ERROR) {
#if OPAL_ENABLE_FT_MPI
/* Special case for MPI_ANY_SOURCE when marked as
* MPI_ERR_PROC_FAILED_PENDING,
* This request should not be freed since it is still active. */
if( MPI_ERR_PROC_FAILED_PENDING != requests[i]->req_status.MPI_ERROR ) {
ompi_request_free(&(requests[i]));
}