-
Notifications
You must be signed in to change notification settings - Fork 68
v2.x osc/pt2pt fixes #947
v2.x osc/pt2pt fixes #947
Conversation
(cherry picked from open-mpi/ompi@06ecdb6) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by a new multi-threaded RMA benchmarking suite. The following bugs have been identified and fixed: - The code that signaled the actual start of an access epoch changed the eager_send_active flag on a synchronization object without holding the object's lock. This could cause another thread waiting on eager sends to block indefinitely because the entirety of ompi_osc_pt2pt_sync_expected could exectute between the check of eager_send_active and the conditon wait of ompi_osc_pt2pt_sync_wait. - The bookkeeping of fragments could get screwed up when performing long put/accumulate operations from different threads. This was caused by the fragment flush code at the end of both put and accumulate. This code was put in place to avoid sending a large number of unexpected messages to a peer. To fix the bookkeeping issue we now 1) wait for eager sends to be active before stating any large isend's, and 2) keep track of the number of large isends associated with a fragment. If the number of large isends reaches 32 the active fragment is flushed. - Use atomics to update the large receive/send tag counters. This prevents duplicate tags from being used. The tag space has also been updated to use the entire 16-bits of the tag space. These changes should also fix open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from open-mpi/ompi@d7264aa) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from open-mpi/ompi@519fffb) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
@regrant Should mention that these bugs and some ugni btl bugs were found by the threaded rma benchmarks. |
|
Test PASSed. |
|
@hjelmn Glad to hear that the RMA-MT benchmarks are doing their job. I can take a look at this, but it won't be until early next week that I'll have time to really go over this thoroughly. I'm assuming that since this is not a blocker it can wait for a couple of days? |
|
@regrant Would be nice to get this into the 2.0.0 final release. If you can't get to it by then that is fine. We will at least have osc/pt2pt working multithreaded in 2.0.1. |
|
@hjelmn i found a quite big issue with this commit on master here is a snapshot of my patch diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
index a1dcfd7..5a6572f 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
@@ -34,14 +34,8 @@
#include <stdio.h>
/* progress an OSC request */
-static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
+static inline int ompi_osc_pt2pt_comm_complete_internal (ompi_osc_pt2pt_module_t *module, ompi_request_t *request)
{
- ompi_osc_pt2pt_module_t *module =
- (ompi_osc_pt2pt_module_t*) request->req_complete_cb_data;
-
- OPAL_OUTPUT_VERBOSE((10, ompi_osc_base_framework.framework_output,
- "isend_completion_cb called"));
-
mark_outgoing_completion(module);
/* put this request on the garbage colletion list */
@@ -50,6 +44,17 @@ static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
return OMPI_SUCCESS;
}
+static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
+{
+ ompi_osc_pt2pt_module_t *module =
+ (ompi_osc_pt2pt_module_t*) request->req_complete_cb_data;
+
+ OPAL_OUTPUT_VERBOSE((10, ompi_osc_base_framework.framework_output,
+ "ompi_osc_pt2pt_comm_complete called"));
+
+ return ompi_osc_pt2pt_comm_complete_internal(module, request);
+}
+
static int ompi_osc_pt2pt_req_comm_complete (ompi_request_t *request)
{
ompi_osc_pt2pt_request_t *pt2pt_request = (ompi_osc_pt2pt_request_t *) request->req_complete_cb_data;
@@ -62,7 +67,8 @@ static int ompi_osc_pt2pt_req_comm_complete (ompi_request_t *request)
ompi_osc_pt2pt_request_complete (pt2pt_request, request->req_status.MPI_ERROR);
}
- return ompi_osc_pt2pt_comm_complete (request);
+
+ return ompi_osc_pt2pt_comm_complete_internal (pt2pt_request->module, request);
}
static inline int ompi_osc_pt2pt_data_isend (ompi_osc_pt2pt_module_t *module, const void *buf, |
|
@hjelmn, i also had to restore the previous i noticed diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt.h b/ompi/mca/osc/pt2pt/osc_pt2pt.h
index 409011d..68ca022 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt.h
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt.h
@@ -631,8 +631,8 @@ static inline void osc_pt2pt_add_pending (ompi_osc_pt2pt_pending_t *pending)
opal_list_append (&mca_osc_pt2pt_component.pending_operations, &pending->super));
}
-#define OSC_PT2PT_FRAG_TAG 0x80000
-#define OSC_PT2PT_FRAG_MASK 0x7ffff
+#define OSC_PT2PT_FRAG_TAG 0x10000
+#define OSC_PT2PT_FRAG_MASK 0x0ffff
/**
* get_tag:now investigating |
|
Both these bugs look like I missed something in the commit. Weird that it didn't show up in MTT... Will fix on master and update this PR. |
|
it did show up, see http://mtt.open-mpi.org/index.php?do_redir=2267 about the last hang, it occurs after task 1 MPI_Rget_accumulate to task 0. |
|
Huh, weird. I checked yesterday and didn't see any failures. In fact it was really clean for once.. Not sure why. Will fix today and see what tonight's MTTs show. |
|
no go on this stuff till we figure out what's going on with master |
|
master has been fixed by open-mpi/ompi#1341, and this PR should be updated to cherry pick the latest fixes |
|
i'd like to see at least one night of MTT for open-mpi/ompi#1341. If MTT runs are clean, then this PR can be merged, per @ggouaillardet recommendation, after including open-mpi/ompi#1341 with this PR. |
|
Agreed. Was hoping the PR would make it into the tarball but just missed it. |
This commit fixes several bugs identified by @ggouaillardet and MTT: - Fix SEGV in long send completion caused by missing update to the request callback data. - Add an MPI_Barrier to the fence short-cut. This fixes potential semantic issues where messages may be received before fence is reached. - Ensure fragments are flushed when using request-based RMA. This allows MPI_Test/MPI_Wait/etc to work as expected. - Restore the tag space back to 16-bits. It was intended that the space be expanded to 32-bits but the required change to the fragment headers was not committed. The tag space may be expanded in a later commit. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit open-mpi/ompi@5b9c82a) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
Test PASSed. |
|
Looks like the overnite MTT results for ivy cluster (which seemed to be showing most of the problems with all these osc bug fixes) ran clean overnite. |
:bot🏷️bug
:bot:milestone:v2.0.0
:bot:assign: @regrant
@ggouaillardet Please take a look as well. I added one of the patches you committed to master but made some modifications. Let me know if there are any regressions.