-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue-less remote send/recv #105
Conversation
6c5b169
to
a283e83
Compare
src/transport/MpiMessageEndpoint.cpp
Outdated
const std::shared_ptr<faabric::MPIMessage>& msg) | ||
{ | ||
// TODO - is this lazy init very expensive? | ||
if (sendMessageEndpoint.socket == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal circumstances, I would open both sendMessageEndpoint
and recvMessageEndpoint
in the constructor. However, this is very problematic for testing. That's because:
mpiMessageEndpoints
arestatic thread local
, thus we need to create different worlds in separate threads.- If both
send
andrecv
sockets are opened in the constructor, bothrecv
sockets will try to bind to the same adress + port, as they are on the same network namespace, yielding an error (even though we are only using one, in the simplest scenario).
Not that I believe branching could be the bottleneck here, but this is a case where testing influences implementation. Is this what TDD looks like in practice? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the implementation relies on TLS then unfortunately the tests should exercise that TLS. As you say, this means we'd need to create worlds in different threads.
Re. the ports and testing, the solution is probably to make the base port configurable, then change the value in tests. In this case, each world could take a port offset as a constructor argument, set to the value we currently use by default, but overridden in tests.
Making this configurable arguably makes these classes more flexible and reusable so I'd say it's a +1 for testing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we'll first try with binding to a specific (protocol, address, port) tuple, and will leave port management until the multi-tenancy PR is under-way
149105c
to
cfaa4bc
Compare
63706a4
to
cdaf39e
Compare
cdaf39e
to
421234d
Compare
// Initialise the endpoint vector if not initialised | ||
if (mpiMessageEndpoints.size() == 0) { | ||
for (int i = 0; i < size * size; i++) { | ||
mpiMessageEndpoints.emplace_back(nullptr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see you've ported these changes into the other PR which I reviewed accidentally (#111), so the comments apply to both.
Relevant comments from the other PR:
a) Rather than spread it around, I think lazy initialisation for the rank should all happen in one place.
b) I think you can do mpiMessageEndpoints = std::vector(size * size, nullptr)
and save a few lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) Indeed.
b) mpiMessageEndpoints
is a vector of std::unique_ptr
so it has a deleted =
operator. Thus I need to emplace nullptr
s back. Note that this also answers your concern in #111 wrt mpiMessageEndpoint[index
being actually initialised to nullptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's annoying, makes sense though.
src/scheduler/MpiWorld.cpp
Outdated
// Get host for recv rank | ||
std::string host = getHostForRank(recvRank); | ||
assert(!host.empty()); | ||
assert(host != thisHost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re. assertions and invalid ranks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first assert
can go as we actually check this in getHostForRank
but the latter has to do with logic particular to this method. I.e. if we send to a remote rank, the sending rank, must actually be in a remote host.
src/transport/MpiMessageEndpoint.cpp
Outdated
const std::shared_ptr<faabric::MPIMessage>& msg) | ||
{ | ||
// TODO - is this lazy init very expensive? | ||
if (sendMessageEndpoint.socket == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the implementation relies on TLS then unfortunately the tests should exercise that TLS. As you say, this means we'd need to create worlds in different threads.
Re. the ports and testing, the solution is probably to make the base port configurable, then change the value in tests. In this case, each world could take a port offset as a constructor argument, set to the value we currently use by default, but overridden in tests.
Making this configurable arguably makes these classes more flexible and reusable so I'd say it's a +1 for testing!
} | ||
|
||
TEST_CASE_METHOD(RemoteMpiTestFixture, | ||
"Test collective messaging across hosts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests weren't using two different threads (not sure why they were passing). Given that I had to change them, I also separated them into smaller tests, easier to identify which fails.
359127a
to
527541c
Compare
527541c
to
886d0f5
Compare
@@ -117,6 +118,19 @@ class ConfTestFixture | |||
faabric::util::SystemConfig& conf; | |||
}; | |||
|
|||
class MessageContextFixture : public SchedulerTestFixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to move this here to use it in the MpiMessageEndpoint
tests.
In this PR I introduce rank-to-rank messaging bypassing the central message endpoint server.
Sending and receiving messages is in the hot path for most workloads we run, thus we want to introduce the least contention. A low-hanging-fruit is allowing ranks to open messaging endpoints themselves.
As discussed offline, this exposes a bug not covered in the tests, for which I provide a solution in #111.