Skip to content

Crash or a hang (it depends on OpenMPI version) when receiver and sender use different datatypes #3937

@densamoilov

Description

@densamoilov

Hi all,

I'm developer of Intel Math Kernel Library and for our cluster components we provide OpenMPI support for our customers.
But recently we faced with issue related to sending/receiving when sender and receiver use different data types.

I tried to use several OpenMPI versions such as 1.6.1 (hang), 1.8.1 and 2.1.1 (crash).
These versions have been downloaded from OpenMPI site and built from source.

Information about system:
OS: RHEL 7.2
CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Network type: N/A, within 1 node.

You can find reproducer below. The reproducer works fine with other MPI implementations, such as Intel MPI, MPICH. I run it on 2 processes and as result I observe the following error message:

[mkl:147075] *** An error occurred in MPI_Bcast
[mkl:147075] *** reported by process [3517710337,1]
[mkl:147075] *** on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
[mkl:147075] *** MPI_ERR_TRUNCATE: message truncated
[mkl:147075] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mkl:147075] ***    and potentially your MPI job)
#include <mpi.h>
#include <stdlib.h>

MPI_Datatype get_type(int m, int n, int lda, MPI_Datatype type, int rank, int *count)
{

   if (rank == 1) {
       (*count) = m * n;
       return type;
   }

   MPI_Datatype new_type;

   MPI_Type_vector(n, m, lda, type, &new_type);
   MPI_Type_commit(&new_type);
   (*count) = 1;

   return new_type;
}


int main(int argc, char** argv)
{
    int m = 1000;
    int n = 1000;
    int lda = 1000;

    int size, rank;
    int count = -1;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    float* buffer = (float*) malloc(sizeof(float) * m * n);

    MPI_Datatype my_type = get_type(m, n, lda, MPI_FLOAT, rank, &count);

    MPI_Bcast((void*)buffer, count, my_type, 0, MPI_COMM_WORLD);

    free(buffer);
    MPI_Type_free(&my_type);

    MPI_Finalize();

    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions