Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shm: Intra-node topology aware trees #3727

Merged
merged 5 commits into from May 1, 2019

Conversation

jain-surabhi-23
Copy link

Topology aware trees take machine topology into account to optimize data movement between ranks mapped on the same node. It deliver significant performance benefit when used along with shared memory based intra-node collectives.

@jain-surabhi-23
Copy link
Author

This PR also includes the patches from PR #3490. So ideally that PR should be reviewed first.

@wesbland
Copy link
Contributor

test:jenkins/ch4/most

@wesbland
Copy link
Contributor

wesbland commented Apr 16, 2019

@raffenet / @yfguo - How do you want to proceed with this PR and #3490? Do you want to keep both of them? Just this one since it improves the performance of the other one?

Also, who should be the reviewer on this one? I assume not @minsii.

@jain-surabhi-23
Copy link
Author

@yfguo This is the follow up PR of intra-node collectives. It has been rebased. Code wise it should be smaller PR to review :)

@yfguo
Copy link
Contributor

yfguo commented Apr 24, 2019

I will review it.

@yfguo
Copy link
Contributor

yfguo commented Apr 24, 2019

test:jenkins/ch4/ofi
test:jenkins/ch4/ofi

@yfguo
Copy link
Contributor

yfguo commented Apr 24, 2019

test:jenkins/ch3/ofi

@@ -1351,6 +1351,8 @@ HWLOC_DO_AM_CONDITIONALS
AM_CONDITIONAL([have_hwloc], [test "${have_hwloc}" = "yes"])
AM_CONDITIONAL([use_embedded_hwloc], [test "${use_embedded_hwloc}" = "yes"])

AM_CONDITIONAL([HAVE_HWLOC], [test x$have_hwloc = xyes])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant since HAVE_HWLOC is defined in line 1346. Please remove this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change I was not able to use “HAVE_HWLOC” in the Makefile. I can try the lower case “have_hwloc” to see if that works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfguo I was looking at your PR #3766. So in this PR I can actually use "HYDRA_HAVE_HWLOC" (assuming that PR gets merged first). Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. HYDRA_HAVE_HWLOC is defined in hydra's configure.ac so we should not rely on that for MPID code. If HAVE_HWLOC is not working I need to look into this problem and fix it along with other changes in PR #3766.

src/mpid/ch4/shm/posix/posix_coll_init.c Outdated Show resolved Hide resolved
test/mpi/coll/testlist.def Outdated Show resolved Hide resolved
Copy link
Contributor

@yfguo yfguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes are needed.

@jain-surabhi-23
Copy link
Author

test:jenkins/ch4/ofi

jainsura-intel and others added 5 commits May 1, 2019 11:34
Signed-off-by: Yanfei Guo <yguo@anl.gov>
This is required to selectively exclude hwloc related files from a
Makefile

Signed-off-by: Yanfei Guo <yguo@anl.gov>
When SHM collectives and NM based collectives are both
enabled, break the circular calls between calling NM/SHM bcast from shm
allocate and calling shm allocate from SHM (release_gather) based bcast.
Hence, mpi level Bcast is called.

Signed-off-by: Yanfei Guo <yguo@anl.gov>
Break the strange recursion happening when this Bcast gets called which
triggers the SHM Bcast. Use MPIR_Bcast_impl instead.

Signed-off-by: Yanfei Guo <yguo@anl.gov>
Creates collective specific trees which leverage the memory hierarchy of
a node. On a sample machine with multiple sockets and multiple cores per
socket, where each rank is bound to a core, a rank per socket is chosen
as the leader rank. For a bcast friendly tree, a socket leader interacts
with other socket leaders first, then with the ranks mapped on the
socket. For a reduce friendly tree, a socket leader interacts with ranks
within the same socket, the with other socket leaders. A left skewed
tree is created for bcast and right skewed tree (children are added in
the reverse order) for reduce.
Impact and benefits of these optimizations have been studied in the
Supercomputing 2018 paper - "Framework for scalable intra-node
collective operations using shared memory" by Jain et.al.
https://dl.acm.org/citation.cfm?id=3291695

Co-Authored-by: Surabhi Jain <surabhi.jain@intel.com>
Signed-off-by: Yanfei Guo <yguo@anl.gov>
@yfguo
Copy link
Contributor

yfguo commented May 1, 2019

test:jenkins/ch4/ofi

@yfguo
Copy link
Contributor

yfguo commented May 1, 2019

test final before merging.

Copy link
Contributor

@yfguo yfguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing without the hacks that enable izem, I think this one commit should be excluded from the PR as release_gather is not working without izem.

And I can do that.

@jain-surabhi-23
Copy link
Author

@yfguo Yes. You are right. The last commit "posix: Enable shm based intra node colls by default" should be removed.

@yfguo
Copy link
Contributor

yfguo commented May 1, 2019

Thanks. I just did a another push. Once the test is clean, I will merge it.

@yfguo
Copy link
Contributor

yfguo commented May 1, 2019

test:jenkins/ch4/ofi

@yfguo yfguo merged commit 35d2fbf into pmodels:master May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants