Dirty page checking and snapshot diffs #96

Shillaker · 2021-05-21T12:58:35Z

The aim of this PR is:

Return diffs from the workers back to the master after executing a batch of threads. This means tracking dirty pages on worker hosts when executing a batch of threads, then returning the resulting memory diff back to the master when finished.
Push diffs from the master host to workers when executing subsequent batches of threads (i.e. push the whole snapshot to the host when executing the first batch, then only push diffs after that). This means tracking dirty pages on the master between executing batches of threads.

Changes:

Add pushing diffs between hosts. This is either pushing diffs on their own (master -> worker), or pushing diffs along with a thread result (worker -> master). See notes below on this.
Add dirty page checking based on soft-dirty PTEs.
Add snapshot() method to executor subclasses so that Faabric can request a pointer to their memory (which it then uses to work out dirty pages).
Convert both snapshot push operations (full and diffs) to await a response synchronous, as the master host needs to know the snapshot data has been pushed and set up before sending function messages.
Add test fixtures where possible to reduce verbosity of tests.
Made logger an instance variable where possible to avoid calling getLogger all over the place.

Because the diffs from child threads have to be applied before other threads can continue, we have to make sure that they are applied before the thread result is returned (as the thread result is the thing that other threads wait on). Therefore we have to piggyback the diffs on the existing thread result message.

I've ported the thread result message to use flatbuffers which is where all the other snapshot stuff lives. For now I've put it in with the existing SnapshotClient and SnapshotServer in an attempt to minimise the number of changes in this PR, but eventually we'll need to commit to either protobuf or flatbuffers. Flatbuffers seem potentially more efficient, but the awkwardness of passing them around and accessing the data within is a major downside.

Shillaker · 2021-05-25T10:23:31Z

src/scheduler/FunctionCallClient.cpp

@@ -9,6 +9,8 @@ namespace faabric::scheduler {
 // -----------------------------------
 // Mocking
 // -----------------------------------
+std::mutex mockMutex;


I was seeing race conditions on the mock methods when running in threaded tests. This mocking stuff is getting a little excessive to be held in the main source for the clients. A nicer way would be to use the polymorphism of messages to have a single buffer where we keep all the messages a client has sent, but this would require some nasty casting in tests that want to access those messages.

Shillaker · 2021-05-27T15:03:48Z

src/scheduler/Scheduler.cpp

+SnapshotClient& Scheduler::getSnapshotClient(const std::string& otherHost)
+{
+    // Note, our keys here have to include the tid as the clients can only be
+    // used within the same thread


@csegarragonz I don't know how this isn't failing for the function call clients, as the functionCallClients map is shared by all threads and keyed on the other hostname, so I would imagine the same client is getting used by different threads. However, it seems there aren't any assertions failing, so I guess that isn't true...?

It is not failing as we do the assertions, but I'm indeed surprised that it is not. Is your approach thread-safe either? I see that different threads would only ever access different keys, but I am wondering if this won't leave the unerlying data structure in an undefined state.

I guess I assumed the scheduler was single-threaded. In MPI we use TLS for the clients, but it's painful to clean up (as it must be explicitely cleaned, from the same thread which opened it). Happy to discuss this offline.

The scheduler is not single-threaded, it's a singleton object that gets called by lots of threads, so thread safety is still very important. The confusion may have arisen because the scheduler used to manage its own thread pool but now doesn't actually spawn any threads (the executors do that), this does not mean it's only ever run in a single thread though.

You're right on the map accesses, there should be a lock around this as the map itself isn't thread safe.

Shillaker · 2021-05-27T15:07:54Z

tests/utils/fixtures.h

+#include <faabric/util/testing.h>
+
+namespace tests {
+class BaseTestFixture


This BaseTestFixture is here to capture the stuff I found we were repeating in all the existing features. We have to be careful not to make this a dumping ground, and only included the tidy-up that's actually going to get repeated elsewhere.

I personally doubt that this is useful/desirable. The main point of fixtures was to remove the dumping ground cleanFaabric() had turned into. However, I feel we have just moved the burden of it to BaseTestFixture.

For instance, why would the transport tests redis.flushAll()? I think there's value in knowing what you are modifying in your tests, and resetting just that.

Yes, I did think that when creating this, the thing is with Redis and the scheduler for example, the snippets of clean-up would have to be sprinkled around so many places that it makes sense to avoid repetition.

The solution I think should be a fixture per feature and have multiple inheritance in the tests, i.e. we have a RedisFixture, a StateFixture, a MockClientsFixture etc. I'll see if that works here...

csegarragonz

Left some comments, overall LGTM, and the dirty page tracking looks great and neat. I know you started this before we agreed on the new PR+commit structure, but for the future, 15 hundred lines added with four commits is very hard to review in a per-commit fashion (which btw you can do using n and p in the browser).

Wrt the PR size/topics I feel that all the fixture changes had really nothing to do with dirty page checking, so this could definately be in a different PR.

include/faabric/scheduler/SnapshotClient.h

csegarragonz · 2021-05-28T06:11:43Z

src/scheduler/Scheduler.cpp

+                                  snapshotDiffs.size(),
+                                  funcStr,
+                                  h);
+                    SnapshotClient& c = getSnapshotClient(h);


This is quite stylistic and a matter of preference, but do you think we could one-line the getting + push? This is how its done with getFunctionCallClient so we'd be consistent throughout.

This doesn't seem like a big deal to me, although they are both functions to get clients they're not actually related in any other way, so I'm not sure consistency matters. I guess we could come up with a rule that if we call a "get something" on one line, then it's only used to call a single method on the next line, that the "get something" should be inlined with the method call on the next line? There's probably not much consistency on that throughout the codebase.

csegarragonz · 2021-05-28T06:27:26Z

src/scheduler/SnapshotClient.cpp

    } else {
        // Send the header first
        sendHeader(faabric::scheduler::SnapshotCalls::PushSnapshot);

+        faabric::util::SystemConfig& conf = faabric::util::getSystemConfig();


Maybe mark as const as requesting a reference seems counter-intuitive (as we don't modify anything).

Yep good point on the const, but remember that even without the const, requesting a reference avoids a copy so I wouldn't say it's counter-intuitive.

csegarragonz · 2021-05-28T06:29:36Z

src/scheduler/SnapshotClient.cpp

        mb.Finish(requestOffset);
        uint8_t* msg = mb.GetBufferPointer();
        int size = mb.GetSize();
        send(msg, size);
+
+        // Await a response as this call must be synchronous
+        awaitResponse(SNAPSHOT_PORT + REPLY_PORT_OFFSET);


In the MessageEndpointServer we automatically include the REPLY_PORT_OFFSET programatically. I am wondering if we could do so as well in awaitResponse, as its quite an implementation detail that we may hide from the interface.

I guess the original motivation not to do so was to be able to awaitResponse on an arbitrary port, but we use it nowhere (i.e. without the OFFSET).

Feel free to ignore and I will do this in a separate PR.

Yeah i think doing it automatically inside the method makes sense. The only downside is that it adds a bit of magic that may not be obvious to the caller, but I think it's worth it. Let's do in another PR though to try and keep a lid on the size of this one.

csegarragonz · 2021-05-28T06:35:46Z

src/scheduler/SnapshotClient.cpp


+        // Send the data
        mb.Finish(requestOffset);


I think this is now used sufficiently many times to be a candidate for being macro-ed. I suggest something like:

#define SEND_FLATBUFFER(header, mb, offset, waitResponse) \ sendHeader(header); \ mb.Finish(offset); \ uint8_t* msg = mb.GetBufferPointer(); \ int size = mb.GetSize(); \ if (waitResponse) { \ awaitResponse(SNAPSHOT_PORT + REPLY_PORT_OFFSET); \ }

Yep good spot.

csegarragonz · 2021-05-28T06:48:54Z

src/scheduler/SnapshotServer.cpp

+    const SnapshotDiffPushRequest* r =
+      flatbuffers::GetMutableRoot<SnapshotDiffPushRequest>(msg.udata());
+
+    faabric::util::getLogger()->info("Receiving {} diffs to snapshot {}",


Same for info here

csegarragonz · 2021-05-28T06:52:17Z

src/util/memory.cpp

+    char value[] = "4";
+    size_t nWritten = fwrite(value, sizeof(char), 1, fd);
+    if (nWritten != 1) {
+        throw std::runtime_error("Failed to write to clear_refs");


Will this happen in threads? If so could we logger->error(...) + throw. I'm afraid these exceptions will fly under the radar.

All the exceptions in here will generally only happen if there's a permissions error on the file itself (or if the kernel doesn't support soft dirty PTEs which is unlikely as they were added in 3.x), so it will only happen if we've misconfigured an environment. In that case, this exception will be triggered from both the main thread and child threads.

I've added a lot more logging now which should make things a little clearer.

csegarragonz · 2021-05-28T07:07:27Z

src/util/snapshot.cpp

+    std::vector<bool> dirtyFlags = faabric::util::getDirtyPages(data, nPages);
+
+    // Convert to snapshot diffs
+    // TODO - reduce number of diffs by merging adjacent dirty pages


Could we tackle the TODO in this PR? I feel it should not be very complex to implement. I was thinking of something like:

auto next1 = std::find(dirtyFlags.begin(), dirtyFlags.end(), 1); assert(next != dirtyFlags.end()); auto next0 = [&](){ return std::find(next1, dirtyFlags.end(), 0); }; while (next0() != dirtyFlags.end()) { // May have to +/- 1 auto size = next0() - next1; uint32_t offset = next1 * faabric::util::HOST_PAGE_SIZE; diffs.emplace_back(offset, data + offset, size * faabric::util::HOST_PAGE_SIZE; next1 = std::find(next0, dirtyFlags.end(), 1); } // handle the last chunk (i.e. when the vector ends with ones)

there's provably a nicer way, and the code above will be full of bugs, but just an idea.

I think this may be premature optimisation which is why I left it as a TODO. I'm not sure how often this will be beneficial and am reluctant to add more complexity. I'll revisit once I've done some benchmarking.

csegarragonz · 2021-05-28T07:15:35Z

tests/utils/fixtures.h

+#include <faabric/util/testing.h>
+
+namespace tests {
+class BaseTestFixture


I personally doubt that this is useful/desirable. The main point of fixtures was to remove the dumping ground cleanFaabric() had turned into. However, I feel we have just moved the burden of it to BaseTestFixture.

For instance, why would the transport tests redis.flushAll()? I think there's value in knowing what you are modifying in your tests, and resetting just that.

csegarragonz · 2021-05-28T07:20:12Z

src/scheduler/Scheduler.cpp

+SnapshotClient& Scheduler::getSnapshotClient(const std::string& otherHost)
+{
+    // Note, our keys here have to include the tid as the clients can only be
+    // used within the same thread


It is not failing as we do the assertions, but I'm indeed surprised that it is not. Is your approach thread-safe either? I see that different threads would only ever access different keys, but I am wondering if this won't leave the unerlying data structure in an undefined state.

I guess I assumed the scheduler was single-threaded. In MPI we use TLS for the clients, but it's painful to clean up (as it must be explicitely cleaned, from the same thread which opened it). Happy to discuss this offline.

Shillaker · 2021-05-28T08:50:52Z

Left some comments, overall LGTM, and the dirty page tracking looks great and neat. I know you started this before we agreed on the new PR+commit structure, but for the future, 15 hundred lines added with four commits is very hard to review in a per-commit fashion (which btw you can do using n and p in the browser).

Yes let's not be too strict on this one, I think we can aim for the n/p reviewing in an ideal world but not worry too much if we don't achieve it every time. In this case it was a choice between 50 commits with things like "fixing tests" repeated over and over and one big one, neither of which is useful. I could potentially have rebased into something slightly more meaningful, but I think you have to start the work in that mindset for it to really work.

Wrt the PR size/topics I feel that all the fixture changes had really nothing to do with dirty page checking, so this could definately be in a different PR.

Yes, again in an ideal world, but let's not let great be the enemy of the good. "nothing to do" is a little strong, it avoided repeating code in the new tests I was writing.

Shillaker · 2021-05-28T11:28:51Z

@csegarragonz the latest commit is to address the same issue we saw before in Faasm, where there's a race condition when we're getting an object from a map, intialising it if it isn't there already. Without the shared locking it's possible for another thread to come in while the object is being initialised and see (a) partially initialised object, or (b) the map will be in an inconsistent state.

I've added the fix around getting loggers, getting function call clients and getting snapshot clients in the scheduler.

…xture to use shared superclasses

…on dist test executor

Shillaker · 2021-05-29T08:03:54Z

dist-test/run.sh

@@ -1,7 +1,5 @@
 #!/bin/bash

-set -e
-


To handle failures in the worker and still be able to print logs we have to switch off set -e

src/proto/faabric.proto

Shillaker · 2021-05-29T08:07:27Z

src/scheduler/Scheduler.cpp

+    {
+        faabric::util::SharedLock lock(snapshotClientsMx);
+        return snapshotClients.at(key);
+    }


These were the changes to getSnapshotClient and getFunctionCallClient that address the issues of cross-thread clients and race conditions.

Shillaker · 2021-05-29T08:10:49Z

src/util/logging.cpp

+    {
+        faabric::util::SharedLock lock(loggerMx);
+        return loggers[name];
+    }


Same race condition issue mentioned previously. Given the proposed move to using spdlog macros we don't need to worry too much about the performance hit this will have (which will probably be negligible anyway unless called in a very tight loop)

Shillaker · 2021-05-29T08:19:11Z

tests/dist/DistTestExecutor.h

+    faabric::util::SnapshotData snapshot() override;
+
+    uint8_t* snapshotMemory = nullptr;
+    size_t snapshotSize = 0;


Originally I hadn't set defaults on the snapshotMemory and snapshotSize, which worked on my machine as the size was getting set to zero, but in Github Actions it was getting set to a non-zero value which was breaking everything. The take-away is that we must always set default values on class members. I looked for a clang-tidy rule to enforce this but couldn't find it.

Shillaker self-assigned this May 21, 2021

Shillaker changed the title ~~Pushing snapshot diffs for updating thread shared memory~~ Dirty page checking May 24, 2021

Shillaker changed the title ~~Dirty page checking~~ Dirty page checking and snapshot diffs May 24, 2021

Shillaker force-pushed the thread-snaps branch from f0fabb0 to ed48c64 Compare May 24, 2021 14:47

Shillaker commented May 25, 2021

View reviewed changes

Shillaker marked this pull request as ready for review May 26, 2021 09:22

Shillaker marked this pull request as draft May 26, 2021 15:24

Shillaker marked this pull request as ready for review May 27, 2021 11:00

Send snapshot diffs back and forth

797cd1a

Shillaker force-pushed the thread-snaps branch from 99406b6 to 797cd1a Compare May 27, 2021 12:53

Shillaker added 2 commits May 27, 2021 14:13

Fixing local tests and almost fixing distributed test

56207d2

Fixing up distributed tests

44d1949

Shillaker commented May 27, 2021

View reviewed changes

Attempt at fixing dist tests

e97c845

Shillaker requested a review from csegarragonz May 27, 2021 19:26

csegarragonz requested changes May 28, 2021

View reviewed changes

Shillaker added 4 commits May 28, 2021 09:51

Review comments

5f07780

Merge branch 'master' into thread-snaps

0e6ff64

Fix further potential race condition

f63324b

Fix two race conditions, one on logger, one on clients inside scheduler

acce470

Shillaker added 6 commits May 28, 2021 13:12

Remove info logging, fix snapshot going out of scope in test

a94ddc2

Merge branch 'master' into thread-snaps

54f5b9e

Remove RedisTestFixture from SchedulerTestFixture and move DistTestFi…

a9bc7d8

…xture to use shared superclasses

Add worker logs to dist tests

8608ae5

Added unbelievably missing test for snapshot diffs on thread results

c7f6302

Remove unused snapshot size fields from protobuf, add default values …

d73b455

…on dist test executor

Shillaker commented May 29, 2021

View reviewed changes

src/proto/faabric.proto Show resolved Hide resolved

Shillaker commented May 29, 2021

View reviewed changes

Shillaker added 2 commits May 29, 2021 08:33

Add full clean-up on TestExecutorFixture

4ff78be

Add missing tidy-up in function client server test

798e980

Shillaker requested a review from csegarragonz May 29, 2021 09:13

csegarragonz approved these changes Jun 1, 2021

View reviewed changes

csegarragonz merged commit 2be5d88 into master Jun 1, 2021

csegarragonz deleted the thread-snaps branch June 1, 2021 07:19

Shillaker mentioned this pull request Jun 15, 2021

Faabric update: per-host flushing, dirty pages integration faasm/faasm#435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dirty page checking and snapshot diffs #96

Dirty page checking and snapshot diffs #96

Shillaker commented May 21, 2021 •

edited

Loading

Shillaker May 25, 2021

Shillaker May 27, 2021

csegarragonz May 28, 2021

Shillaker May 28, 2021

Shillaker May 27, 2021

csegarragonz May 28, 2021

Shillaker May 28, 2021

csegarragonz left a comment •

edited

Loading

csegarragonz May 28, 2021

Shillaker May 28, 2021 •

edited

Loading

csegarragonz May 28, 2021

Shillaker May 28, 2021 •

edited

Loading

csegarragonz May 28, 2021

Shillaker May 29, 2021

csegarragonz May 28, 2021

Shillaker May 28, 2021

csegarragonz May 28, 2021

csegarragonz May 28, 2021

Shillaker May 28, 2021 •

edited

Loading

csegarragonz May 28, 2021

Shillaker May 28, 2021 •

edited

Loading

csegarragonz May 28, 2021

csegarragonz May 28, 2021

Shillaker commented May 28, 2021

Shillaker commented May 28, 2021

Shillaker May 29, 2021

Shillaker May 29, 2021

Shillaker May 29, 2021

Shillaker May 29, 2021

Dirty page checking and snapshot diffs #96

Dirty page checking and snapshot diffs #96

Conversation

Shillaker commented May 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csegarragonz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker May 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker May 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker May 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker May 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker commented May 28, 2021

Shillaker commented May 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shillaker commented May 21, 2021 •

edited

Loading

csegarragonz left a comment •

edited

Loading

Shillaker May 28, 2021 •

edited

Loading

Shillaker May 28, 2021 •

edited

Loading

Shillaker May 28, 2021 •

edited

Loading

Shillaker May 28, 2021 •

edited

Loading