ch4/rma: enable native RMA for dynamic window #4804

minsii · 2020-09-23T01:08:08Z

Pull Request Description

For dynamic window, we always fallback to ch4 active message in current main branch. However, it can be optimized by using native RMA in certain situations. This PR implements it and improves the MPICH test suite for more dynamic window tests.

This patch enables native RMA through three steps:

Instead of always skip fi_mr_reg when base is NULL at window creation, try register if FI_MR_ALLOCATED is not set. If it successes, native RMA can be enabled as it registers the entire virtual address space.
If FI_MR_ALLOCATED is set, we have to register only the allocated buffers. For dynamic windows, we can register it and collectively exchange the mr_keys if coll_attach hint is set.
At RMA issuing path, we check whether a remote mr_key can be found to cover the entire range of target buffer(disp:disp+extent). If yes, we can leverage native RMA/atomics; otherwise, we fallback to AM path. This ensure correctness for operations that access to multiple attached segments.

Expected Impact

Author Checklist

minsii · 2020-09-23T04:28:54Z

test:mpich/ch4/most

minsii · 2020-09-25T20:54:37Z

test:mpich/ch4/most

hzhou

First round of review, looks good so far.

src/mpid/ch4/src/ch4_impl.h

src/mpid/ch4/include/mpidpre.h

src/mpid/ch4/netmod/ofi/ofi_rma.c

src/mpid/ch4/netmod/ofi/ofi_rma.h

src/mpid/ch4/netmod/ofi/ofi_win.c

test/mpi/rma/Makefile.am

hzhou · 2020-09-25T22:35:57Z

src/mpid/ch4/netmod/ofi/ofi_rma.h

+    MPL_DBG_MSG_FMT(MPIDI_CH4_DBG_GENERAL, VERBOSE,
+                    (MPL_DBG_FDEST, "Query atomics support: max_count 0x%lx, dtsize 0x%lx", *count,
+                     *dtsize));
+


I am against permanently adding too much debug messages. It is relatively simple to manually insert the log or printf during debugging when needed. The example here will add a log entry for every acc atomic operations, right? I think it can easily drown out other debug messages if an application is atomic-heavy.

It generates this line only when log is on. For each acc path, it already generates many lines to track the calling stack (e.g., MPIR_FUNC_VERBOSE_ENTER, MPIR_FUNC_VERBOSE_EXIT). Thus, I don't think adding this message will pollute log. But it provides very useful info for us to figure out the cause why it does not use native atomics (at least it takes me long time to figure it out when testing on a new platform).

The log functionality will not be needed if we always manually add such prints only at debugging time :-)

The log functionality will not be needed if we always manually add such prints only at debugging time :-)

That is my opinion. :) I think we should delete all debug logging from repository unless those logging output makes sense to a non-mpich user.

The log functionality is needed so we can manually add it (printf sometime is too slow or too interfering). But with permanent embedded debug output , when we do need to debug, we'll have to parse the huge log, and potentially getting confused by unrelated output. In theory, the class can be used to filter output. But they are just added complexity. The time I needed to understand what class to filter and how to filter, I could just manually insert my desired debug output and be done with it.

I do believe the log can be useful. One example is that, a user reported to me that he cannot get native atomics on his platform for unknown reason. Then I can easily figure out the cause if the user can send me the log.

Anyway, this is really personally preference. I hope you will not block this PR because of the log preference:-)

Anyway, this is really personally preference. I hope you will not block this PR because of the log preference:-)

Of course not :). I am just curious that you insisted.

minsii · 2020-09-26T05:35:38Z

test:mpich/ch4/most

minsii · 2020-09-28T04:54:07Z

test:mpich/ch4/most

minsii · 2020-09-28T04:55:09Z

@hzhou I have addressed all comments, except the log preference one (please see my response under the comment). Can you please check again?

hzhou

Looks good over all. Here is my final review with some nitpicks.

src/mpid/ch4/netmod/ofi/ofi_impl.h

src/mpid/ch4/netmod/ofi/ofi_pre.h

src/mpid/ch4/netmod/ofi/ofi_types.h

src/mpid/ch4/netmod/ofi/ofi_rma.c

hzhou · 2020-09-28T14:21:44Z

src/mpid/ch4/netmod/ofi/ofi_win.c

+                                        0ULL,   /* In:  flags               */
+                                        &MPIDI_OFI_WIN(win).mr, /* Out: memregion object    */
+                                        NULL), rc);     /* In:  context             */
+    } else if (win->create_flavor == MPI_WIN_FLAVOR_DYNAMIC)


It is visually disturbing when partial branches omits braces.

src/mpid/ch4/netmod/ofi/ofi_win.c

src/mpid/ch4/src/ch4_rma.h

minsii · 2020-09-28T16:52:21Z

test:mpich/ch4/most

hzhou

Looks good to me.

It indicates that all win_attach|detach calls for this window is collective over all processes in the communicator. Valid only for dynamic window. False by default.

Previous code always skips memory registration for dynamic window and disables native RMA for it. This patch allows native RMA in such a case through three steps: 1. Instead of always skip fi_mr_reg when base is NULL at window creation, try register if FI_MR_ALLOCATED is not set. If it successes, native RMA can be enabled as it registers the entire virtual address space. 2. If FI_MR_ALLOCATED is set, we have to register only the allocated buffers. For dynamic windows, we can register it and collectively exchange the mr_keys if coll_attach hint is set. 3. At RMA issuing path, we check whether a remote mr_key can be found to cover the entire range of target buffer(disp:disp+extent). If yes, we can leverage native RMA/atomics; otherwise, we fallback to AM path. This ensure correctness for operations that access to multiple attached segments. Two additional avl trees are needed per window: - dwin_mrs, storing local MRs. Used to close MR at win_detach. - dwin_target_mems: storing remote <addr, mr_key>. Used to prepare parameters for native calls. Co-Author: Hui Zhou <hzhou321@anl.gov>

The single test file generates three types of tests: 1. Separate test for each of put, acc, get_acc, fop, and cas accessing to a single attached region with dynamic window (arg=-count=4096 or 1). 2. Seperate test for each of put, acc, and get_acc accessing across two attached regions (with arg=-count=9216). 3. The same set of the above tests but with coll_attach hint.

Revert bd5cfdb

If all nodes contain only single local process, we can enable ACCU_NO_SHM. Then netmod can control whether all atomics use AM or all use native calls.

minsii force-pushed the pr/dwin-coll-attach branch 2 times, most recently from 6ffd67c to 36d91d8 Compare September 23, 2020 04:12

minsii force-pushed the pr/dwin-coll-attach branch 8 times, most recently from 70ee44b to bd9bdeb Compare September 25, 2020 20:47

minsii requested a review from hzhou September 25, 2020 20:54

hzhou requested changes Sep 25, 2020

View reviewed changes

minsii force-pushed the pr/dwin-coll-attach branch from bd9bdeb to 72bd21d Compare September 26, 2020 05:20

hzhou mentioned this pull request Sep 26, 2020

ch4/ofi: always try fi_mr_reg and then decide whether to fallback #4771

Closed

10 tasks

minsii force-pushed the pr/dwin-coll-attach branch 2 times, most recently from af7a4cb to 402d0ef Compare September 28, 2020 04:49

hzhou requested changes Sep 28, 2020

View reviewed changes

minsii force-pushed the pr/dwin-coll-attach branch 3 times, most recently from 738cdcc to 342930e Compare September 28, 2020 16:42

hzhou approved these changes Sep 28, 2020

View reviewed changes

minsii added 3 commits September 28, 2020 14:12

ch4/win: introduce info coll_attach for dynamic win

48158f7

It indicates that all win_attach|detach calls for this window is collective over all processes in the communicator. Valid only for dynamic window. False by default.

ch4: define macro to get dtype contig, size, extent and lb

aeaf0cc

minsii added 7 commits September 28, 2020 14:12

test/rma: extent dynamic win tests to add coll_attach

fd1f397

test/rma: add test calling multiple attachs with dynamic win

42fb24b

ch4/rma: add winattr dump debug in each operation

5a7eb2c

ch4/rma: do not update winattr bits after set info

d2a017d

Revert bd5cfdb

ch4/rma: automatically set ACCU_NO_SHM winattr

9d73896

If all nodes contain only single local process, we can enable ACCU_NO_SHM. Then netmod can control whether all atomics use AM or all use native calls.

ch4/ofi: add dbg log print for atomics condition check

8e6a4b7

minsii force-pushed the pr/dwin-coll-attach branch from 342930e to 8e6a4b7 Compare September 28, 2020 19:12

minsii merged commit bd5ba1e into pmodels:main Sep 28, 2020

minsii deleted the pr/dwin-coll-attach branch September 28, 2020 19:33

minsii mentioned this pull request Sep 28, 2020

Using single dynamic window for symm heap and symm data in OSHMPI pmodels/oshmpi#63

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch4/rma: enable native RMA for dynamic window #4804

ch4/rma: enable native RMA for dynamic window #4804

minsii commented Sep 23, 2020 •

edited

minsii commented Sep 23, 2020

minsii commented Sep 25, 2020

hzhou left a comment

hzhou Sep 25, 2020

minsii Sep 26, 2020

hzhou Sep 26, 2020

minsii Sep 28, 2020

hzhou Sep 28, 2020

minsii commented Sep 26, 2020

minsii commented Sep 28, 2020

minsii commented Sep 28, 2020

hzhou left a comment

hzhou Sep 28, 2020

minsii commented Sep 28, 2020

hzhou left a comment

ch4/rma: enable native RMA for dynamic window #4804

ch4/rma: enable native RMA for dynamic window #4804

Conversation

minsii commented Sep 23, 2020 • edited

Pull Request Description

Expected Impact

Author Checklist

minsii commented Sep 23, 2020

minsii commented Sep 25, 2020

hzhou left a comment

Choose a reason for hiding this comment

hzhou Sep 25, 2020

Choose a reason for hiding this comment

minsii Sep 26, 2020

Choose a reason for hiding this comment

hzhou Sep 26, 2020

Choose a reason for hiding this comment

minsii Sep 28, 2020

Choose a reason for hiding this comment

hzhou Sep 28, 2020

Choose a reason for hiding this comment

minsii commented Sep 26, 2020

minsii commented Sep 28, 2020

minsii commented Sep 28, 2020

hzhou left a comment

Choose a reason for hiding this comment

hzhou Sep 28, 2020

Choose a reason for hiding this comment

minsii commented Sep 28, 2020

hzhou left a comment

Choose a reason for hiding this comment

minsii commented Sep 23, 2020 •

edited