Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Feb 3, 2016

:bot🏷️bug
:bot:milestone:v2.0.0
:bot:assign: @regrant

@ggouaillardet Please take a look as well. I added one of the patches you committed to master but made some modifications. Let me know if there are any regressions.

ggouaillardet and others added 3 commits February 3, 2016 08:52
(cherry picked from open-mpi/ompi@06ecdb6)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(cherry picked from open-mpi/ompi@d7264aa)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(cherry picked from open-mpi/ompi@519fffb)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@ompiteam-bot ompiteam-bot added this to the v2.0.0 milestone Feb 3, 2016
@hjelmn
Copy link
Member Author

hjelmn commented Feb 3, 2016

@regrant Should mention that these bugs and some ugni btl bugs were found by the threaded rma benchmarks.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1313/ for details.

@regrant
Copy link
Contributor

regrant commented Feb 3, 2016

@hjelmn Glad to hear that the RMA-MT benchmarks are doing their job. I can take a look at this, but it won't be until early next week that I'll have time to really go over this thoroughly. I'm assuming that since this is not a blocker it can wait for a couple of days?

@hjelmn
Copy link
Member Author

hjelmn commented Feb 4, 2016

@regrant Would be nice to get this into the 2.0.0 final release. If you can't get to it by then that is fine. We will at least have osc/pt2pt working multithreaded in 2.0.1.

@ggouaillardet
Copy link
Contributor

@hjelmn i found a quite big issue with this commit on master
issues can be evidenced with the ibm/onesided test suite (c_get is crashing for example)
(do not forget to disable the rdma osc component if your interconnect is rdma capable)

here is a snapshot of my patch

diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
index a1dcfd7..5a6572f 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c
@@ -34,14 +34,8 @@
 #include <stdio.h>

 /* progress an OSC request */
-static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
+static inline int ompi_osc_pt2pt_comm_complete_internal (ompi_osc_pt2pt_module_t *module, ompi_request_t *request)
 {
-    ompi_osc_pt2pt_module_t *module =
-        (ompi_osc_pt2pt_module_t*) request->req_complete_cb_data;
-
-    OPAL_OUTPUT_VERBOSE((10, ompi_osc_base_framework.framework_output,
-                         "isend_completion_cb called"));
-
     mark_outgoing_completion(module);

     /* put this request on the garbage colletion list */
@@ -50,6 +44,17 @@ static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
     return OMPI_SUCCESS;
 }

+static int ompi_osc_pt2pt_comm_complete (ompi_request_t *request)
+{
+    ompi_osc_pt2pt_module_t *module =
+        (ompi_osc_pt2pt_module_t*) request->req_complete_cb_data;
+
+    OPAL_OUTPUT_VERBOSE((10, ompi_osc_base_framework.framework_output,
+                         "ompi_osc_pt2pt_comm_complete called"));
+
+    return ompi_osc_pt2pt_comm_complete_internal(module, request);
+}
+
 static int ompi_osc_pt2pt_req_comm_complete (ompi_request_t *request)
 {
     ompi_osc_pt2pt_request_t *pt2pt_request = (ompi_osc_pt2pt_request_t *) request->req_complete_cb_data;
@@ -62,7 +67,8 @@ static int ompi_osc_pt2pt_req_comm_complete (ompi_request_t *request)
         ompi_osc_pt2pt_request_complete (pt2pt_request, request->req_status.MPI_ERROR);
     }

-    return ompi_osc_pt2pt_comm_complete (request);
+    
+    return ompi_osc_pt2pt_comm_complete_internal (pt2pt_request->module, request);
 }

 static inline int ompi_osc_pt2pt_data_isend (ompi_osc_pt2pt_module_t *module, const void *buf,

@ggouaillardet
Copy link
Contributor

@hjelmn, i also had to restore the previous OSC_PT2PT_FRAG_TAG and OSC_PT2PT_FRAG_MASK values, otherwise c_post_start hangs

i noticed osc_pt2pt headers uses an uint16_t tag (and everything is well aligned with that), and that could explain why tags must fit into 16 bits

diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt.h b/ompi/mca/osc/pt2pt/osc_pt2pt.h
index 409011d..68ca022 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt.h
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt.h
@@ -631,8 +631,8 @@ static inline void osc_pt2pt_add_pending (ompi_osc_pt2pt_pending_t *pending)
                             opal_list_append (&mca_osc_pt2pt_component.pending_operations, &pending->super));
 }

-#define OSC_PT2PT_FRAG_TAG   0x80000
-#define OSC_PT2PT_FRAG_MASK  0x7ffff
+#define OSC_PT2PT_FRAG_TAG   0x10000
+#define OSC_PT2PT_FRAG_MASK  0x0ffff

 /**
  * get_tag:

now investigating c_reqops hang ...

@hjelmn
Copy link
Member Author

hjelmn commented Feb 4, 2016

Both these bugs look like I missed something in the commit. Weird that it didn't show up in MTT... Will fix on master and update this PR.

@ggouaillardet
Copy link
Contributor

it did show up, see http://mtt.open-mpi.org/index.php?do_redir=2267

about the last hang, it occurs after task 1 MPI_Rget_accumulate to task 0.
it seems the request is never sent from task 1 to task 0

@hjelmn
Copy link
Member Author

hjelmn commented Feb 4, 2016

Huh, weird. I checked yesterday and didn't see any failures. In fact it was really clean for once.. Not sure why. Will fix today and see what tonight's MTTs show.

@hppritcha
Copy link
Member

no go on this stuff till we figure out what's going on with master
open-mpi/ompi#1342 (comment)
yep again, this is getting annoying.
osc is really not that important to 99% of mpi apps.

@ggouaillardet
Copy link
Contributor

master has been fixed by open-mpi/ompi#1341, and this PR should be updated to cherry pick the latest fixes

@hppritcha
Copy link
Member

i'd like to see at least one night of MTT for open-mpi/ompi#1341. If MTT runs are clean, then this PR can be merged, per @ggouaillardet recommendation, after including open-mpi/ompi#1341 with this PR.

@hjelmn
Copy link
Member Author

hjelmn commented Feb 5, 2016

Agreed. Was hoping the PR would make it into the tarball but just missed it.

This commit fixes several bugs identified by @ggouaillardet and MTT:

 - Fix SEGV in long send completion caused by missing update to the
   request callback data.

 - Add an MPI_Barrier to the fence short-cut. This fixes potential
   semantic issues where messages may be received before fence is
   reached.

 - Ensure fragments are flushed when using request-based RMA. This
   allows MPI_Test/MPI_Wait/etc to work as expected.

 - Restore the tag space back to 16-bits. It was intended that the
   space be expanded to 32-bits but the required change to the
   fragment headers was not committed. The tag space may be expanded
   in a later commit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(cherry picked from commit open-mpi/ompi@5b9c82a)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1314/ for details.

@hppritcha
Copy link
Member

Looks like the overnite MTT results for ivy cluster (which seemed to be showing most of the problems with all these osc bug fixes) ran clean overnite.
@jsquyres I think this is good to go.

jsquyres added a commit that referenced this pull request Feb 7, 2016
@jsquyres jsquyres merged commit efeac60 into open-mpi:v2.x Feb 7, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants