Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/ofi: Segmentation fault inside MPI_Alltoallw call #6341

Closed
dqwu opened this issue Dec 21, 2022 · 10 comments
Closed

ch4/ofi: Segmentation fault inside MPI_Alltoallw call #6341

dqwu opened this issue Dec 21, 2022 · 10 comments

Comments

@dqwu
Copy link

dqwu commented Dec 21, 2022

We have a test case which calls MPI_Alltoallw a few times, and the last MPI_Alltoallw call has segmentation fault (signal 11).

MPICH versions: 4.0, 4.0.1, 4.0.2, 4.0.3, 4.1a1 (not reproducible with 4.1b1)
MPICH Device: ch4:ofi (not reproducible with ch3:nemesis)
Environment: ANL GCE Linux workstations (Ubuntu 20, GCC 9.4), ANL JLSE nodes (Intel OneAPI, aurora mpich module based on MPICH 4.1a1)

Call stack generated with Address Sanitizer (MPICH 4.1a1):

==3185512==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000190 (pc 0x7f15cbffcbde bp 0x000000000000 sp 0x7ffca7f3cd20 T0)
==3185512==The signal is caused by a READ memory access.
==3185512==Hint: address points to the zero page.
    #0 0x7f15cbffcbdd in MPIDI_POSIX_progress_recv src/mpid/ch4/shm/src/../posix/posix_progress.h:73
    #1 0x7f15cbffcbdd in MPIDI_POSIX_progress src/mpid/ch4/shm/src/../posix/posix_progress.h:141
    #2 0x7f15cc001a0f in MPIDI_SHM_progress src/mpid/ch4/shm/src/shm_progress.h:18
    #3 0x7f15cc001a0f in MPIDI_progress_test src/mpid/ch4/src/ch4_progress.h:170
    #4 0x7f15cc001a0f in MPID_Progress_wait src/mpid/ch4/src/ch4_progress.h:324
    #5 0x7f15cc008244 in MPIR_Waitall_state src/mpi/request/request_impl.c:973
    #6 0x7f15cc008777 in MPID_Waitall src/mpid/ch4/src/ch4_wait.h:138
    #7 0x7f15cc008777 in MPIR_Waitall src/mpi/request/request_impl.c:1096
    #8 0x7f15cbf8634e in MPIC_Waitall src/mpi/coll/helper_fns.c:627
    #9 0x7f15cbec4eaf in MPIR_Alltoallw_intra_scattered src/mpi/coll/alltoallw/alltoallw_intra_scattered.c:98
    #10 0x7f15cbf58b7d in MPIR_Alltoallw_allcomm_auto src/mpi/coll/mpir_coll.c:3799
    #11 0x7f15cbf58e27 in MPIR_Alltoallw_impl src/mpi/coll/mpir_coll.c:3856
    #12 0x7f15cbf592fb in MPIDI_NM_mpi_alltoallw src/mpid/ch4/netmod/include/../ofi/ofi_coll.h:270
    #13 0x7f15cbf592fb in MPIDI_Alltoallw_intra_composition_alpha src/mpid/ch4/src/ch4_coll_impl.h:903
    #14 0x7f15cbf592fb in MPID_Alltoallw src/mpid/ch4/src/ch4_coll.h:1044
    #15 0x7f15cbf592fb in MPIR_Alltoallw src/mpi/coll/mpir_coll.c:3908
    #16 0x7f15cbbfa8f6 in internal_Alltoallw src/binding/c/coll/alltoallw.c:146
    #17 0x7f15cbbfa8f6 in PMPI_Alltoallw src/binding/c/coll/alltoallw.c:206
    #18 0x562ec44de62f in my_rearrange_io2comp /scratch/wuda/TMP/scorpio/examples/c/example1.c:118
    #19 0x562ec44e0a6f in main /scratch/wuda/TMP/scorpio/examples/c/example1.c:320
    #20 0x7f15cb7c6082 in __libc_start_main ../csu/libc-start.c:308
    #21 0x562ec44dcd3d in _start (scorpio/build/examples/c/example1+0x2bd3d)

@hzhou Is this a known issue to you? It seems that this issue might have been fixed in latest MPICH 4.1b1.
If necessary, I can provide detailed steps for you to run this test case on ANL GCE workstations.

@hzhou
Copy link
Contributor

hzhou commented Dec 21, 2022

Yes, please share the steps to reproduce the bug.

@dqwu
Copy link
Author

dqwu commented Dec 21, 2022

Yes, please share the steps to reproduce the bug.

@hzhou Since it might have been fixed by MPICH 4.1b1, do you know which PR has the fix?

Below are my steps to reproduce the segmentation fault on ANL GCE nodes (Ubuntu 20 with default GCC 9.4.0).

[Build MPICH 4.1a1 with -g flag and ch4:ofi device]

wget https://www.mpich.org/static/downloads/4.1a1/mpich-4.1a1.tar.gz
tar zxf mpich-4.1a1.tar.gz
cd mpich-4.1a1
CC=gcc CXX=g++ FC=gfortran CFLAGS="-g" ./configure --prefix=/path/to/mpich/installation --with-device=ch4:ofi
make -j4
make install

[Use custom MPICH build]
export PATH=/path/to/mpich/installation/bin:$PATH

[Build PnetCDF lib required by SCORPIO]

wget https://parallel-netcdf.github.io/Release/pnetcdf-1.12.3.tar.gz
tar zxf pnetcdf-1.12.3.tar.gz
cd pnetcdf-1.12.3
CC=mpicc CXX=mpicxx FC=mpifort ./configure --prefix=/path/to/pnetcdf/installation
make -j4
make install

[Build and run a C example of SCORPIO]

git clone --branch dqwu/test_mpich_ch4 https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CFLAGS="-fsanitize=address -fno-omit-frame-pointer -g" \
CC=mpicc CXX=mpicxx FC=mpifort cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/path/to/pnetcdf/installation \
-DPIO_USE_MALLOC=ON \
-DPIO_ENABLE_EXAMPLES=ON \
..

make -j4

cd examples/c
cp ../../../datasets/piodecomp*.dat ./
export ASAN_OPTIONS=detect_leaks=0
mpiexec -n 48 ./example1

@k202077
Copy link

k202077 commented Jan 2, 2023

@skosukhin found a similar issue using v4.0.3 with yaksa (not with dataloop). It also causes a segmentation vault in the routine MPIDI_POSIX_progress_recv. The call too MPIDIG_recv_copy_seg seems to be the problem. When calling this routine payload_left is 0 and rreq is NULL. At the start of the routine rreq is dereferenced, which generates the segmentation vault.

Unfortunately, I was not able to generate a simple reproducer...

Do you maybe have such a reproducer, which we could use in our configure to check for a usable MPI?

@hzhou
Copy link
Contributor

hzhou commented Jan 2, 2023

@k202077 Interesting. The last commit that fixes such segfault is c0234e4. Could you try manually patching the code and see if that fixes the segfault?

Does the segfault related to an anysource receive?

@k202077
Copy link

k202077 commented Jan 3, 2023

Commit c0234e4 did not fix the issue for me. However, interestingly a6d3064 did the job... (I am using a derived data type that contains three contiguous double).

@hzhou
Copy link
Contributor

hzhou commented Jan 3, 2023

Commit c0234e4 did not fix the issue for me. However, interestingly a6d3064 did the job... (I am using a derived data type that contains three contiguous double).

@k202077 Hmm, the mentioned patch restores a shortcut for a contiguous datatype. Thus, the potential bug may still exist. Could you describe how to reproduce your original segfault? If the reproducer is not simple, we may not be able to get to it right away, but it is still useful for reference.

@hzhou
Copy link
Contributor

hzhou commented Jan 3, 2023

I have reproduced the bug following #6341 (comment) and confirmed the same patch b0cef14 fixes the issue. I'll dig a bit deeper and update.

@hzhou
Copy link
Contributor

hzhou commented Jan 3, 2023

The bug was in https://github.com/hzhou/mpich/blob/84902a51df5330d979668900e9f3ab7b567cfe1b/src/mpi/datatype/typerep/src/typerep_yaksa_pack.c#L147 due to a typo -- datatype should be dtp->basic_type. The bug is triggered when a process creates more than 256 * 2 = 512 derived datatypes -- which result in element_size in the bugged line to be >= 2 -- and when the real send size are not multiple of the false element_size, which is very likely when that is 3 as the number of derived datatypes exceeds 256 * 3 = 768 and the datatype is contiguous. This then resulted in sender repeatedly sending 0 bytes payload segment due to the restriction that we only can send multiple of element_size.

The bug was introduced in a338d6a (which fixes a separate previous bug but unfortunately with a typo), and it was fixed b0cef14. Thus the affected MPICH releases are from 4.0rc2 to 4.0.3. @k202077 I think the easiest way to configure check is to check mpichversion and exclude the mpich-4.0.x series.

@hzhou
Copy link
Contributor

hzhou commented Jan 3, 2023

I am closing this issue since it is fully investigated. If you need assistance on patching or configure check, please feel free to reopen and comment.

@hzhou hzhou closed this as completed Jan 3, 2023
@tjahns
Copy link

tjahns commented Jan 5, 2023

@k202077 and I came up with a reproducer which we will use to detect defective versions. I'm sharing it here so others can quickly detect if their MPI installation is affected:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
  MPI_Init(&argc, &argv);

  int size, rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  enum {
    DDT_COUNT = 1024,
    EXCH_COUNT = 128,
    BASE_COUNT = 4,
    RANK_COUNT = 2};

  if (size != RANK_COUNT) {
    fprintf(stderr, "wrong number of processes (has to be %d)", RANK_COUNT);
    exit(EXIT_FAILURE);
  }

  MPI_Datatype * ddts = malloc(DDT_COUNT * sizeof(*ddts));
  for (int i = 0; i < DDT_COUNT; ++i) {
    MPI_Type_contiguous(i + 1, MPI_DOUBLE, ddts + i);
    MPI_Type_commit(ddts + i);
  }

  {
    MPI_Datatype ddt;
    MPI_Type_contiguous(BASE_COUNT, MPI_DOUBLE, &ddt);
    MPI_Type_commit(&ddt);

    double * send_buffer =
      malloc(EXCH_COUNT * BASE_COUNT * sizeof(*send_buffer));
    double * recv_buffer =
      malloc(RANK_COUNT * EXCH_COUNT * BASE_COUNT * sizeof(*recv_buffer));

    for (int i = 0; i < EXCH_COUNT; ++i)
      for (int j = 0; j < BASE_COUNT; ++j)
        send_buffer[i * BASE_COUNT + j] = (double)rank;
    for (int i = 0; i < RANK_COUNT; ++i)
      for (int j = 0; j < EXCH_COUNT; ++j)
        for (int k = 0; k < BASE_COUNT; ++k)
          recv_buffer[i * EXCH_COUNT * BASE_COUNT +
                      j * BASE_COUNT + k] = -1.0;

    MPI_Request requests[2][RANK_COUNT];

    for (int i = 0; i < RANK_COUNT; ++i)
      MPI_Irecv(
        recv_buffer + i * EXCH_COUNT * BASE_COUNT, EXCH_COUNT,
        ddt, i, 0, MPI_COMM_WORLD, requests[0] + i);
    for (int i = 0; i < RANK_COUNT; ++i)
      MPI_Isend(
        send_buffer, EXCH_COUNT, ddt, i, 0, MPI_COMM_WORLD,
        requests[1] + i);
    MPI_Waitall(2 * RANK_COUNT, requests[0], MPI_STATUSES_IGNORE);

    free(recv_buffer);
    free(send_buffer);
    MPI_Type_free(&ddt);
  }

  for (int i = 0; i < DDT_COUNT; ++i)
    MPI_Type_free(ddts + i);
  free(ddts);

  MPI_Finalize();
  return EXIT_SUCCESS;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants