New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shm: Intra-node topology aware trees #3727
Conversation
This PR also includes the patches from PR #3490. So ideally that PR should be reviewed first. |
fb4c278
to
f48be9a
Compare
test:jenkins/ch4/most |
f48be9a
to
63c2008
Compare
e8ed3e0
to
601d3dc
Compare
@yfguo This is the follow up PR of intra-node collectives. It has been rebased. Code wise it should be smaller PR to review :) |
I will review it. |
test:jenkins/ch4/ofi |
test:jenkins/ch3/ofi |
@@ -1351,6 +1351,8 @@ HWLOC_DO_AM_CONDITIONALS | |||
AM_CONDITIONAL([have_hwloc], [test "${have_hwloc}" = "yes"]) | |||
AM_CONDITIONAL([use_embedded_hwloc], [test "${use_embedded_hwloc}" = "yes"]) | |||
|
|||
AM_CONDITIONAL([HAVE_HWLOC], [test x$have_hwloc = xyes]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant since HAVE_HWLOC
is defined in line 1346. Please remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change I was not able to use “HAVE_HWLOC” in the Makefile. I can try the lower case “have_hwloc” to see if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. HYDRA_HAVE_HWLOC
is defined in hydra's configure.ac
so we should not rely on that for MPID code. If HAVE_HWLOC
is not working I need to look into this problem and fix it along with other changes in PR #3766.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes are needed.
601d3dc
to
4818cea
Compare
test:jenkins/ch4/ofi |
Signed-off-by: Yanfei Guo <yguo@anl.gov>
This is required to selectively exclude hwloc related files from a Makefile Signed-off-by: Yanfei Guo <yguo@anl.gov>
When SHM collectives and NM based collectives are both enabled, break the circular calls between calling NM/SHM bcast from shm allocate and calling shm allocate from SHM (release_gather) based bcast. Hence, mpi level Bcast is called. Signed-off-by: Yanfei Guo <yguo@anl.gov>
Break the strange recursion happening when this Bcast gets called which triggers the SHM Bcast. Use MPIR_Bcast_impl instead. Signed-off-by: Yanfei Guo <yguo@anl.gov>
Creates collective specific trees which leverage the memory hierarchy of a node. On a sample machine with multiple sockets and multiple cores per socket, where each rank is bound to a core, a rank per socket is chosen as the leader rank. For a bcast friendly tree, a socket leader interacts with other socket leaders first, then with the ranks mapped on the socket. For a reduce friendly tree, a socket leader interacts with ranks within the same socket, the with other socket leaders. A left skewed tree is created for bcast and right skewed tree (children are added in the reverse order) for reduce. Impact and benefits of these optimizations have been studied in the Supercomputing 2018 paper - "Framework for scalable intra-node collective operations using shared memory" by Jain et.al. https://dl.acm.org/citation.cfm?id=3291695 Co-Authored-by: Surabhi Jain <surabhi.jain@intel.com> Signed-off-by: Yanfei Guo <yguo@anl.gov>
4818cea
to
baad115
Compare
test:jenkins/ch4/ofi |
test final before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing without the hacks that enable izem, I think this one commit should be excluded from the PR as release_gather is not working without izem.
And I can do that.
@yfguo Yes. You are right. The last commit "posix: Enable shm based intra node colls by default" should be removed. |
baad115
to
7aa5ec7
Compare
Thanks. I just did a another push. Once the test is clean, I will merge it. |
test:jenkins/ch4/ofi |
Topology aware trees take machine topology into account to optimize data movement between ranks mapped on the same node. It deliver significant performance benefit when used along with shared memory based intra-node collectives.