Skip to content

libompitrace does not check use of MPI_DATATYPE_NULL when calling PMPI_Type_get_name and causes MPI_ERR_TYPE  #10028

@minsii

Description

@minsii

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

We use 4.0.2 but the same issue exists in master branch.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Built from source of 4.0.2 release tarball

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

(confirmed that the issue is irrelevant to specific OS)

  • Operating system/version:
  • Computer hardware:
  • Network type:

Details of the problem

We use libompitrace with NCCL-tests which internally uses MPI_DATATYPE_NULL with MPI_Allgather (see https://github.com/NVIDIA/nccl-tests/blob/master/src/common.cu#L1022). This is a legal use of MPI_DATATYPE_NULL. However, when we enable libompitrace with OpenMPI, it reports error below.

*** An error occurred in MPI_Type_get_name
*** reported by process [1728118785,0]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_TYPE: invalid datatype
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

By checking the code of libompitrace/allgather.c, we found that it does not check whether the datatype is MPI_DATATYPE_NULL before calling PMPI_Type_get_name, thus caused this error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions