-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dirty page checking and snapshot diffs #96
Conversation
@@ -9,6 +9,8 @@ namespace faabric::scheduler { | |||
// ----------------------------------- | |||
// Mocking | |||
// ----------------------------------- | |||
std::mutex mockMutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing race conditions on the mock methods when running in threaded tests. This mocking stuff is getting a little excessive to be held in the main source for the clients. A nicer way would be to use the polymorphism of messages to have a single buffer where we keep all the messages a client has sent, but this would require some nasty casting in tests that want to access those messages.
src/scheduler/Scheduler.cpp
Outdated
SnapshotClient& Scheduler::getSnapshotClient(const std::string& otherHost) | ||
{ | ||
// Note, our keys here have to include the tid as the clients can only be | ||
// used within the same thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csegarragonz I don't know how this isn't failing for the function call clients, as the functionCallClients
map is shared by all threads and keyed on the other hostname, so I would imagine the same client is getting used by different threads. However, it seems there aren't any assertions failing, so I guess that isn't true...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not failing as we do the assertions, but I'm indeed surprised that it is not. Is your approach thread-safe either? I see that different threads would only ever access different keys, but I am wondering if this won't leave the unerlying data structure in an undefined state.
I guess I assumed the scheduler was single-threaded. In MPI we use TLS for the clients, but it's painful to clean up (as it must be explicitely cleaned, from the same thread which opened it). Happy to discuss this offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler is not single-threaded, it's a singleton object that gets called by lots of threads, so thread safety is still very important. The confusion may have arisen because the scheduler used to manage its own thread pool but now doesn't actually spawn any threads (the executors do that), this does not mean it's only ever run in a single thread though.
You're right on the map accesses, there should be a lock around this as the map itself isn't thread safe.
tests/utils/fixtures.h
Outdated
#include <faabric/util/testing.h> | ||
|
||
namespace tests { | ||
class BaseTestFixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This BaseTestFixture
is here to capture the stuff I found we were repeating in all the existing features. We have to be careful not to make this a dumping ground, and only included the tidy-up that's actually going to get repeated elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally doubt that this is useful/desirable. The main point of fixtures was to remove the dumping ground cleanFaabric()
had turned into. However, I feel we have just moved the burden of it to BaseTestFixture
.
For instance, why would the transport tests redis.flushAll()
? I think there's value in knowing what you are modifying in your tests, and resetting just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did think that when creating this, the thing is with Redis and the scheduler for example, the snippets of clean-up would have to be sprinkled around so many places that it makes sense to avoid repetition.
The solution I think should be a fixture per feature and have multiple inheritance in the tests, i.e. we have a RedisFixture
, a StateFixture
, a MockClientsFixture
etc. I'll see if that works here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, overall LGTM, and the dirty page tracking looks great and neat. I know you started this before we agreed on the new PR+commit structure, but for the future, 15 hundred lines added with four commits is very hard to review in a per-commit fashion (which btw you can do using n
and p
in the browser).
Wrt the PR size/topics I feel that all the fixture changes had really nothing to do with dirty page checking, so this could definately be in a different PR.
snapshotDiffs.size(), | ||
funcStr, | ||
h); | ||
SnapshotClient& c = getSnapshotClient(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite stylistic and a matter of preference, but do you think we could one-line the getting + push? This is how its done with getFunctionCallClient
so we'd be consistent throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like a big deal to me, although they are both functions to get clients they're not actually related in any other way, so I'm not sure consistency matters. I guess we could come up with a rule that if we call a "get something" on one line, then it's only used to call a single method on the next line, that the "get something" should be inlined with the method call on the next line? There's probably not much consistency on that throughout the codebase.
src/scheduler/SnapshotClient.cpp
Outdated
} else { | ||
// Send the header first | ||
sendHeader(faabric::scheduler::SnapshotCalls::PushSnapshot); | ||
|
||
faabric::util::SystemConfig& conf = faabric::util::getSystemConfig(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mark as const
as requesting a reference seems counter-intuitive (as we don't modify anything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good point on the const
, but remember that even without the const
, requesting a reference avoids a copy so I wouldn't say it's counter-intuitive.
mb.Finish(requestOffset); | ||
uint8_t* msg = mb.GetBufferPointer(); | ||
int size = mb.GetSize(); | ||
send(msg, size); | ||
|
||
// Await a response as this call must be synchronous | ||
awaitResponse(SNAPSHOT_PORT + REPLY_PORT_OFFSET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the MessageEndpointServer
we automatically include the REPLY_PORT_OFFSET
programatically. I am wondering if we could do so as well in awaitResponse
, as its quite an implementation detail that we may hide from the interface.
I guess the original motivation not to do so was to be able to awaitResponse
on an arbitrary port, but we use it nowhere (i.e. without the OFFSET
).
Feel free to ignore and I will do this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i think doing it automatically inside the method makes sense. The only downside is that it adds a bit of magic that may not be obvious to the caller, but I think it's worth it. Let's do in another PR though to try and keep a lid on the size of this one.
src/scheduler/SnapshotClient.cpp
Outdated
|
||
// Send the data | ||
mb.Finish(requestOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now used sufficiently many times to be a candidate for being macro-ed. I suggest something like:
#define SEND_FLATBUFFER(header, mb, offset, waitResponse) \
sendHeader(header); \
mb.Finish(offset); \
uint8_t* msg = mb.GetBufferPointer(); \
int size = mb.GetSize(); \
if (waitResponse) { \
awaitResponse(SNAPSHOT_PORT + REPLY_PORT_OFFSET); \
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good spot.
src/scheduler/SnapshotServer.cpp
Outdated
const SnapshotDiffPushRequest* r = | ||
flatbuffers::GetMutableRoot<SnapshotDiffPushRequest>(msg.udata()); | ||
|
||
faabric::util::getLogger()->info("Receiving {} diffs to snapshot {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for info
here
char value[] = "4"; | ||
size_t nWritten = fwrite(value, sizeof(char), 1, fd); | ||
if (nWritten != 1) { | ||
throw std::runtime_error("Failed to write to clear_refs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this happen in threads? If so could we logger->error(...) + throw
. I'm afraid these exceptions will fly under the radar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the exceptions in here will generally only happen if there's a permissions error on the file itself (or if the kernel doesn't support soft dirty PTEs which is unlikely as they were added in 3.x), so it will only happen if we've misconfigured an environment. In that case, this exception will be triggered from both the main thread and child threads.
I've added a lot more logging now which should make things a little clearer.
std::vector<bool> dirtyFlags = faabric::util::getDirtyPages(data, nPages); | ||
|
||
// Convert to snapshot diffs | ||
// TODO - reduce number of diffs by merging adjacent dirty pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we tackle the TODO
in this PR? I feel it should not be very complex to implement. I was thinking of something like:
auto next1 = std::find(dirtyFlags.begin(), dirtyFlags.end(), 1);
assert(next != dirtyFlags.end());
auto next0 = [&](){ return std::find(next1, dirtyFlags.end(), 0); };
while (next0() != dirtyFlags.end()) {
// May have to +/- 1
auto size = next0() - next1;
uint32_t offset = next1 * faabric::util::HOST_PAGE_SIZE;
diffs.emplace_back(offset, data + offset, size * faabric::util::HOST_PAGE_SIZE;
next1 = std::find(next0, dirtyFlags.end(), 1);
}
// handle the last chunk (i.e. when the vector ends with ones)
there's provably a nicer way, and the code above will be full of bugs, but just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be premature optimisation which is why I left it as a TODO. I'm not sure how often this will be beneficial and am reluctant to add more complexity. I'll revisit once I've done some benchmarking.
tests/utils/fixtures.h
Outdated
#include <faabric/util/testing.h> | ||
|
||
namespace tests { | ||
class BaseTestFixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally doubt that this is useful/desirable. The main point of fixtures was to remove the dumping ground cleanFaabric()
had turned into. However, I feel we have just moved the burden of it to BaseTestFixture
.
For instance, why would the transport tests redis.flushAll()
? I think there's value in knowing what you are modifying in your tests, and resetting just that.
src/scheduler/Scheduler.cpp
Outdated
SnapshotClient& Scheduler::getSnapshotClient(const std::string& otherHost) | ||
{ | ||
// Note, our keys here have to include the tid as the clients can only be | ||
// used within the same thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not failing as we do the assertions, but I'm indeed surprised that it is not. Is your approach thread-safe either? I see that different threads would only ever access different keys, but I am wondering if this won't leave the unerlying data structure in an undefined state.
I guess I assumed the scheduler was single-threaded. In MPI we use TLS for the clients, but it's painful to clean up (as it must be explicitely cleaned, from the same thread which opened it). Happy to discuss this offline.
Yes let's not be too strict on this one, I think we can aim for the
Yes, again in an ideal world, but let's not let great be the enemy of the good. "nothing to do" is a little strong, it avoided repeating code in the new tests I was writing. |
@csegarragonz the latest commit is to address the same issue we saw before in Faasm, where there's a race condition when we're getting an object from a map, intialising it if it isn't there already. Without the shared locking it's possible for another thread to come in while the object is being initialised and see (a) partially initialised object, or (b) the map will be in an inconsistent state. I've added the fix around getting loggers, getting function call clients and getting snapshot clients in the scheduler. |
…xture to use shared superclasses
…on dist test executor
@@ -1,7 +1,5 @@ | |||
#!/bin/bash | |||
|
|||
set -e | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle failures in the worker and still be able to print logs we have to switch off set -e
{ | ||
faabric::util::SharedLock lock(snapshotClientsMx); | ||
return snapshotClients.at(key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were the changes to getSnapshotClient
and getFunctionCallClient
that address the issues of cross-thread clients and race conditions.
{ | ||
faabric::util::SharedLock lock(loggerMx); | ||
return loggers[name]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same race condition issue mentioned previously. Given the proposed move to using spdlog macros we don't need to worry too much about the performance hit this will have (which will probably be negligible anyway unless called in a very tight loop)
faabric::util::SnapshotData snapshot() override; | ||
|
||
uint8_t* snapshotMemory = nullptr; | ||
size_t snapshotSize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I hadn't set defaults on the snapshotMemory
and snapshotSize
, which worked on my machine as the size was getting set to zero, but in Github Actions it was getting set to a non-zero value which was breaking everything. The take-away is that we must always set default values on class members. I looked for a clang-tidy rule to enforce this but couldn't find it.
The aim of this PR is:
Changes:
snapshot()
method to executor subclasses so that Faabric can request a pointer to their memory (which it then uses to work out dirty pages).logger
an instance variable where possible to avoid callinggetLogger
all over the place.Because the diffs from child threads have to be applied before other threads can continue, we have to make sure that they are applied before the thread result is returned (as the thread result is the thing that other threads wait on). Therefore we have to piggyback the diffs on the existing thread result message.
I've ported the thread result message to use flatbuffers which is where all the other snapshot stuff lives. For now I've put it in with the existing
SnapshotClient
andSnapshotServer
in an attempt to minimise the number of changes in this PR, but eventually we'll need to commit to either protobuf or flatbuffers. Flatbuffers seem potentially more efficient, but the awkwardness of passing them around and accessing the data within is a major downside.