Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI collective writes causing MPI_Finalize to segfault in mca_btl_vader_endpoint_xpmem_rcache_cleanup() #6524

Closed
cniethammer opened this issue Mar 26, 2019 · 7 comments
Assignees

Comments

@cniethammer
Copy link
Contributor

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

openmpi-4.0.1rc3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installation from tarball with

./configure --enable-shared --enable-static --enable-mpi-thread-multiple

Please describe the system on which you are running

Cray XC40, PrgEnv-gnu/6.0.4
gcc 8.2.0
xpmem/2.2.4-6


Details of the problem

Using MPI collective writes (MPI_File_write_all, MPI_File_write_at_all) on NFS shared file system causes MPI_Finalize() to segfault for a total write of >= 6MB of data:

ranks fails works
24 256*1024 255*1024
12 512*1024 511*1024

A debug build with --enable-debug reports

mpirun -n 24 ./test
.../openmpi-4.0.1rc3/opal/mca/btl/vader/btl_vader_xpmem.c:160: mca_btl_vader_endpoint_xpmem_rcache_cleanup: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) (reg))->obj_magic_id' failed.

Note: The same test on a Lustre file system works fine.

The test program used is:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include "mpi.h"

int main(int argc, char *argv[]) {
  int rank, size;
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  
  int *buffer;
  long BUFFER_SIZE = (512 * 1024); /* # elements in buffer *for each MPI proc */
  if (argc > 1) {
    BUFFER_SIZE = atol(argv[1]);
  }
  int i;
  buffer = (int *)malloc(BUFFER_SIZE * sizeof(int));
  for (i = 0; i < BUFFER_SIZE; i++) {
    buffer[i] = rank * BUFFER_SIZE + i;
  }

  MPI_File fh;
  MPI_File_open(MPI_COMM_WORLD, "data.bin", MPI_MODE_CREATE | MPI_MODE_RDWR,
                MPI_INFO_NULL, &fh);

  int count = 1;
  int blocklength = BUFFER_SIZE;
  int stride = BUFFER_SIZE;
  MPI_Datatype etype = MPI_INT;
  MPI_Datatype filetype;
  MPI_Type_vector(count, blocklength, stride, etype, &filetype);
  MPI_Type_commit(&filetype);

  MPI_Status status;
/* Both methods end in the same error; Changing to non collective versions works
 * fine. */
#if 0
    MPI_Offset disp = (MPI_Offset)rank * BUFFER_SIZE * sizeof(int);
    MPI_File_set_view(fh, disp, etype, filetype, "native", MPI_INFO_NULL);
    MPI_File_write_all(fh, (void *)buffer, BUFFER_SIZE, etype, &status);
#else
  MPI_File_write_at_all(fh, BUFFER_SIZE * rank, (void *)buffer, BUFFER_SIZE,
                        etype, &status);
#endif

  MPI_File_close(&fh);
  free(buffer);

  MPI_Finalize();

  return 0;
}
@ggouaillardet
Copy link
Contributor

@cniethammer how many nodes are you running on ?

@cniethammer
Copy link
Contributor Author

The test was run on a single node.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 27, 2019
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor

@cniethammer can you please manually download and apply the patch at https://github.com/open-mpi/ompi/pull/6526.patch ?

@cniethammer
Copy link
Contributor Author

Unfortunately the patch did not solve the problem.

@ggouaillardet
Copy link
Contributor

@cniethammer I found an other issue that is specific to xpmem (double free issue)
unfortunately, I do not know when I will find the time to investigate it.

meanwhile, disabling xpmem should do the trick
mpirun --mca btl_vader_single_copy_mechanism cma ...

if cma is not available on cray, then you can
mpirun --mca btl_vader_single_copy_mechanism none ...
(and that will very likely impact performances)

@cniethammer
Copy link
Contributor Author

Strike! Changing the vader single copy mechanism seems to solve the issue.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 28, 2019
Refs. open-mpi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 11, 2019
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@77060ca)
orivej-nixos pushed a commit to NixOS/nixpkgs that referenced this issue Jun 28, 2019
The fix is scheduled for release in OpenMPI 4.0.2.

Upstream issue: open-mpi/ompi#6524
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 8, 2020
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References open-mpi#6524
References open-mpi#7030
Closes open-mpi#6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 8, 2020
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References open-mpi#6524
References open-mpi#7030
Closes open-mpi#6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
hppritcha pushed a commit to hppritcha/ompi that referenced this issue Jan 19, 2020
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References open-mpi#6524
References open-mpi#7030
Closes open-mpi#6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805)
cniethammer pushed a commit to cniethammer/ompi that referenced this issue May 10, 2020
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References open-mpi#6524
References open-mpi#7030
Closes open-mpi#6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805)
@hppritcha hppritcha self-assigned this Mar 26, 2021
@hppritcha
Copy link
Member

This problem appears to be resolved in v4.0.x at 803bdd6 on a Cray XC running CLE 7.0UP02 and xpmem 2.2.20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants