Skip to content

Random errors on MPI_COMPARE_AND_SWAP with pt2pt OSC of Open MPI master #933

@kawashima-fj

Description

@kawashima-fj

Random errors occur on MPI_COMPARE_AND_SWAP when using pt2pt OSC.

Run my cswap.c at Gist with:

mpiexec -n 2 --mca osc pt2pt --mca btl self,vader ./cswap

You'll see any of the following errors or another on the rank 0.

cswap: ompi-src/ompi/mca/pml/ob1/pml_ob1_sendreq.h:251: send_request_pml_complete: Assertion `0 == sendreq->req_send.req_base.req_pml_complete' failed.
[mymachine:22183] *** Process received signal ***
[mymachine:22183] Signal: Aborted (6)
[mymachine:22183] Signal code:  (-6)
[mymachine:22183] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f66eeb2b8d0]
[mymachine:22183] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f66ee7a8107]
[mymachine:22183] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f66ee7a94e8]
[mymachine:22183] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2e226)[0x7f66ee7a1226]
[mymachine:22183] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2e2d2)[0x7f66ee7a12d2]
[mymachine:22183] [ 5] ompi/lib/openmpi/mca_pml_ob1.so(+0x19dcf)[0x7f66e5faddcf]
[mymachine:22183] [ 6] ompi/lib/openmpi/mca_pml_ob1.so(+0x1a7d4)[0x7f66e5fae7d4]
[mymachine:22183] [ 7] ompi/lib/openmpi/mca_pml_ob1.so(+0x1a8df)[0x7f66e5fae8df]
[mymachine:22183] [ 8] ompi/lib/openmpi/mca_btl_vader.so(+0x3ec2)[0x7f66e65d2ec2]
[mymachine:22183] [ 9] ompi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x68)[0x7f66e65d537d]
[mymachine:22183] [10] ompi/lib/openmpi/mca_btl_vader.so(+0x6598)[0x7f66e65d5598]
[mymachine:22183] [11] ompi/lib/openmpi/mca_btl_vader.so(+0x6753)[0x7f66e65d5753]
[mymachine:22183] [12] ompi/lib/libopen-pal.so.0(opal_progress+0xa9)[0x7f66ee18a0eb]
[mymachine:22183] [13] ompi/lib/openmpi/mca_pml_ob1.so(+0xd3ca)[0x7f66e5fa13ca]
[mymachine:22183] [14] ompi/lib/openmpi/mca_pml_ob1.so(+0xd5ab)[0x7f66e5fa15ab]
[mymachine:22183] [15] ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4ee)[0x7f66e5fa3554]
[mymachine:22183] [16] ompi/lib/libmpi.so.0(PMPI_Send+0x2a7)[0x7f66eeddc039]
[mymachine:22183] [17] ./cswap[0x400ad2]
[mymachine:22183] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f66ee794b45]
[mymachine:22183] [19] ./cswap[0x400929]
[mymachine:22183] *** End of error message ***

[warn] opal_libevent2022_event_base_loop: reentrant invocation.  Only one event_base_loop can run on each event_base at once.
*** Error in `./cswap': free(): invalid pointer: 0x00007fb506e2a240 ***
[mymachine:20230] *** Process received signal ***
[mymachine:20230] Signal: Aborted (6)
[mymachine:20230] Signal code:  (-6)
[mymachine:20230] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fb5068b68d0]
[mymachine:20230] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fb506533107]
[mymachine:20230] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fb5065344e8]
[mymachine:20230] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x73204)[0x7fb506571204]
[mymachine:20230] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x789de)[0x7fb5065769de]
[mymachine:20230] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x796e6)[0x7fb5065776e6]
[mymachine:20230] [ 6] ompi/lib/openmpi/mca_pml_ob1.so(+0xe260)[0x7fb5011c4260]
[mymachine:20230] [ 7] ompi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x504)[0x7fb5011c556a]
[mymachine:20230] [ 8] ompi/lib/libmpi.so.0(PMPI_Send+0x2a7)[0x7fb506b67039]
[mymachine:20230] [ 9] ./cswap[0x400ad2]
[mymachine:20230] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb50651fb45]
[mymachine:20230] [11] ./cswap[0x400929]
[mymachine:20230] *** End of error message ***

These error occur on both Open MPI master (pt2pt OSC) and v1.8 branch (rdma OSC). Though I didn't confirm, probably v2.x branch (pt2pt OSC) and v1.10 branch (rdma OSC) has the same problem.
(rdma OSC was renamed to pt2pt OSC on master and v2.0, while new rdma OSC was introduced in master)

The cause is related to the ob1 PML blocking send optimization and the recursive send operation via the request completion callback (ompi_request_t::req_complete_cb).

In my cswap.c, the following steps are taken.

  1. On rank 0 (and rank 1), MPI_Win_create is called and the callback function ompi_osc_pt2pt_callback is registered for a request returned by the mca_pml_ob1_irecv_init function called by the ompi_osc_pt2pt_frag_start_receive function.
  2. On rank 1, MPI_Compare_and_swap is called and this function sends a control message of OMPI_OSC_PT2PT_HDR_TYPE_CSWAP to rank 0.
  3. On rank 0, MPI_Send is called and the special request mca_pml_ob1_sendreq is used for this call in the mca_pml_ob1_send function.
  4. On rank 0, ompi_request_wait_completion function is called for the request if the mca_pml_ob1_send_inline function cannot send the message immediately. This function blocks until the completion of the send operation.
  5. On rank 0, the ompi_osc_pt2pt_callback function registered at 1. is called when the control message of 2. arrives.
  6. On rank 0, the mca_pml_ob1_send function is called again (recursively) to send back a control message in the ompi_osc_pt2pt_cswap_start function.
  7. On rank 0, the special request mca_pml_ob1_sendreq is used again though is it in use.
  8. On rank 0, a bad thing occur.

A stack trace at 7. will be something like this:

MPI_Send
mca_pml_ob1_send
ompi_request_wait_completion // wait for the send operation
opal_condition_wait
opal_progress
(BTL progress function)
mca_pml_ob1_recv_frag_callback_match // control message for CSWAP
recv_request_pml_complete // completion of irecv_init
ompi_request_complete
ompi_osc_pt2pt_callback // callback of irecv_init
process_frag
process_cswap
ompi_osc_pt2pt_cswap_start
mca_pml_ob1_send // recursive send operation

I confirmed that the error doesn't occur if I replace MCA_PML_CALL(send(...)) to MCA_PML_CALL(isend(...)) in the ompi_osc_pt2pt_cswap_start function. But I think it may not be the real fix. If we allow recursive call of the mca_pml_ob1_send function, we should change the management of the mca_pml_ob1_sendreq. For example, in the mca_pml_ob1_send function, check the request state of mca_pml_ob1_sendreq and don't use it if it is in use.

Though I used vader BTL above, this error is not specific to this BTL. This error occurs very often with vader BTL on my machine. It occurs also with openib BTL sometimes.

@hjelmn @bosilca What do you think?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions