Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NEVER_ALONE for chained MPI calls #184

Merged
merged 3 commits into from
Nov 29, 2021
Merged

Use NEVER_ALONE for chained MPI calls #184

merged 3 commits into from
Nov 29, 2021

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Nov 25, 2021

In this PR I set MPI functions to use the NEVER_ALONE scheduling topology hint by default.

Preliminary results show that this policy reduces the number of cross-host messages by 20% on our baseline experiment.

I have to update some tests in the test_remote_mpi_worlds.cpp file as they assumed a specific scheduling behaviour.

@csegarragonz csegarragonz added mpi Related to the MPI implementation scheduler labels Nov 25, 2021
@csegarragonz csegarragonz self-assigned this Nov 25, 2021
@@ -201,61 +202,14 @@ TEST_CASE_METHOD(RemoteMpiTestFixture,
thisWorld.destroy();
}

TEST_CASE_METHOD(RemoteMpiTestFixture, "Test barrier across hosts", "[mpi]")
Copy link
Collaborator Author

@csegarragonz csegarragonz Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new scheduling policy, we will always have at least two remote processes. Thus, we can't test a barrier across hosts without using a distributed setting (both remote ranks must call barrier to unlock).

The other tests in this file I managed to port.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok nice, we have quite a lot of this kind of test where we're trying to fake a distributed setting locally, all of which date from before the distributed tests.

Is there a dist test that covers this now? I can't remember. If not, could we add one? Happy to discuss specifics of this offline as I'm not sure we have any Faabric dist tests that cover the MPI implementation.

In future it might be worth trying to port things to dist tests rather than keep the hacky local versions, but in this instance it seems to work ok.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am introducing distributed tests for MPI (inheriting from mpi-native in #186). The order in which we merge this PR and the other one does not really matter.

@@ -201,61 +202,14 @@ TEST_CASE_METHOD(RemoteMpiTestFixture,
thisWorld.destroy();
}

TEST_CASE_METHOD(RemoteMpiTestFixture, "Test barrier across hosts", "[mpi]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok nice, we have quite a lot of this kind of test where we're trying to fake a distributed setting locally, all of which date from before the distributed tests.

Is there a dist test that covers this now? I can't remember. If not, could we add one? Happy to discuss specifics of this offline as I'm not sure we have any Faabric dist tests that cover the MPI implementation.

In future it might be worth trying to port things to dist tests rather than keep the hacky local versions, but in this instance it seems to work ok.

tests/test/scheduler/test_remote_mpi_worlds.cpp Outdated Show resolved Hide resolved
tests/test/scheduler/test_remote_mpi_worlds.cpp Outdated Show resolved Hide resolved
@csegarragonz csegarragonz merged commit 657a506 into master Nov 29, 2021
@csegarragonz csegarragonz deleted the mpi-topo-hint branch November 29, 2021 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpi Related to the MPI implementation scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants