Skip to content

MPI_Waitall doesn't return MPI_ERR_IN_STATUS for persistent request #13432

@mentOS31

Description

@mentOS31

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

main

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 08e41ed5629b51832f5708181af6d89218c7a74e 3rd-party/openpmix (v1.1.3-4067-g08e41ed5)
 30cadc6746ebddd69ea42ca78b964398f782e4e3 3rd-party/prrte (psrvr-v2.0.0rc1-4839-g30cadc6746)
 6032f68dd9636b48977f59e986acc01a746593a6 3rd-party/pympistandard (remotes/origin/main-23-g6032f68)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)

Please describe the system on which you are running

  • Operating system/version: rockylinux:10.0
  • Computer hardware: Ampere Altra MAX or NVIDIA Grace
  • Network type: intra-node

Details of the problem

Problem 1

MPI_Waitall() and MPI_Testall() don't return MPI_ERR_IN_STATUS under the following conditions:

(1) either an MPI_Testall() or an MPI_Waitall() completing procedure is called,
(2) the communicator with errhandler MPI_ERRORS_RETURN is specified as the comm argument to the completing procedure,
(3) the request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Table 1: Multiple completion functions that return multiple statuses

array_of_statuses == MPI_STATUSES_IGNORE array_of_statuses != MPI_STATUSES_IGNORE
MPI_Waitall valid invalid
MPI_Waitsome valid valid
MPI_Testall invalid invalid
MPI_Testsome valid valid

Invalid means in this condition the completing procedure doesn't return MPI_ERR_IN_STATUS.

Reason

For example, it seems that ompi_request_default_wait_all() in ompi/request/req_wait.c calls the continue statement before setting MPI_ERR_IN_STATUS to the mpi_error return code variable if the request is a persistent communication request.

code

            if( request->req_persistent ) {
                request->req_state = OMPI_REQUEST_INACTIVE;
                continue;
            }

Problem 2

MPI_Testany() doesn't return the communication error under the following conditions:

(1) An MPI_Testany() completing procedure is called,
(2) The communicator with errhandler MPI_ERRORS_RETURN is specified as the comm argument to the completing procedure,
(3) The request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Reason

It seems that ompi_request_default_test_any() in ompi/request/req_test.c returns MPI_SUCCESS unconditionally if the request is a persistent communication request.

code

            if( request->req_persistent ) {
                request->req_state = OMPI_REQUEST_INACTIVE;
                return OMPI_SUCCESS;
            }

Problem 3

MPI completing procedures such as MPI_Wait() and MPI_Test() may free the request handle for a persistent communication request under the following conditions:

(1) Either of completing procedures is called,
(2) The request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(3) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Reason

In case of .req_status.MPI_ERROR != MPI_SUCCESS, it seems that the ompi_errhandler_request_invoke() in ompi/errhandler/errhandler_invoke.c unconditionally frees the request by calling ompi_request_free() except a FT condition even if the request is a persistent request.

code

        if (MPI_REQUEST_NULL != requests[i] &&
            MPI_SUCCESS != requests[i]->req_status.MPI_ERROR) {
#if OPAL_ENABLE_FT_MPI
            /* Special case for MPI_ANY_SOURCE when marked as
             * MPI_ERR_PROC_FAILED_PENDING,
             * This request should not be freed since it is still active. */
            if( MPI_ERR_PROC_FAILED_PENDING != requests[i]->req_status.MPI_ERROR ) {
                ompi_request_free(&(requests[i]));
            }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions