Scheduler hints and fixes for distributed coordination bugs #169

Shillaker · 2021-11-01T13:05:16Z

This is a mixed bag, all changes required to get a larger OpenMP application working and do some tidy-up along the way.

Scheduler hints

At the moment "hints" are taken on face value by the scheduler (so they're more like scheduler plans). This is useful in a multithreaded application that repeatedly performs an operation using the same number of threads and doesn't need to keep repeating the scheduling decision. How applications decide to cache the scheduling decision is up to them. This change is relatively simple, and just involves adding one more scheduler method:

faabric::util::SchedulingDecision callFunctions(
    std::shared_ptr<faabric::BatchExecRequest> req, 
    const faabric::util::SchedulingDecision &hint
    );

Where hint is the proposed scheduling decision. To implement this I just refactored the scheduling logic into two parts: (i) calculating the decision (only performed when a hint isn't passed); (ii) executing a decision (performed both when a hint is and isn't passed).

Snapshot diff whitelisting

Snapshot merge regions are now specified as a whitelist rather than a blacklist. This makes more sense, as we were adding a lot of logic around ignoring and doing default diffs, when in fact applications will know which regions they want to merge. The "worst case" is for a generic threaded application with no consistency guarantees where it would want to merge the whole heap, in which case it could specify a single merge region for it.

This rendered Ignore regions obsolete, so I removed them.

Bug fixes

Deadlock and inconsistencies with waiting for point-to-point groups to be enabled - added more checks in PointToPointBroker, and the FlagWaiter class, which allows an arbitrary number of threads to wait on a boolean flag set by another (can't use a barrier as we don't know in advance how many threads will need to wait).

Nested thread groups deadlocking when scheduled on same thread pool thread as their parent - if thread A spawns a nested group of threads B1 and B2, we must ensure that neither B1 nor B2 is queued on the same thread pool thread as A (as A is dependent on the completion of B1 and B2, but they wouldn't execute until A is finished). Previously the thread pool scheduling could allow this even when the Executor was not overloaded. This PR changes the thread pool allocation to allocate one task to every free thread pool thread before we start overloading them, and when we do, we avoid indexes 0 and 1 as these are likely to contain blocking threads. This doesn't completely remove the problem, but ensures it won't happen under normal (i.e. un-overloaded) execution.

Locking on non-existent groups on snapshot pushes - locking in the SnapshotServer only needs to happen on the master host when snapshot diffs are being returned. However, previously the SnapshotServer tried to lock on the group on any snapshot or snapshot diff push, causing an error when the group didn't exist (i.e. on non-master hosts). It now only happens on snapshot diff pushes when the group exists on that host.

Snapshot diffing inefficiency - previously this was going through all the dirty pages byte-by-byte, checking them against merge regions, and adding a default overwrite mode. Instead, it's much more efficient to only do merging in specified merge regions, and by default do no merging. This makes the code cleaner, and the diffing process more efficient.

Snapshot skipping missing diffs - previously executors would only restore a snapshot once, however, this actually needs to be done every time, as the snapshot may have changed between executions. Ideally we'd apply the diffs directly to all the mapped regions rather than have to re-restore from the original snapshot, but this is left as a potential optimisation for the future.

Shillaker · 2021-11-04T18:15:40Z

include/faabric/util/snapshot.h

+    void addDiffs(std::vector<SnapshotDiff>& diffs,
+                  const uint8_t* original,
+                  const uint8_t* updated);
+};


Moved this class down the file as it now depends on the SnapshotDiff class which is declared above.

Shillaker · 2021-11-09T08:32:11Z

include/faabric/transport/PointToPointServer.h

@@ -11,7 +11,7 @@ class PointToPointServer final : public MessageEndpointServer
    PointToPointServer();

  private:
-    PointToPointBroker& reg;
+    PointToPointBroker& broker;


This used to be called a "registry" hence the reg name, but it was a bit confusing.

Shillaker · 2021-11-09T08:36:08Z

src/scheduler/Scheduler.cpp

@@ -224,228 +223,339 @@ faabric::util::SchedulingDecision Scheduler::callFunctions(
        throw std::runtime_error("Message with no master host");
    }

-    // Set up scheduling decision


The diff of the callFunctions methods isn't particularly easy to parse, full version here.

Shillaker · 2021-11-10T08:08:09Z

tests/test/util/test_snapshot.cpp

@@ -500,84 +477,9 @@ TEST_CASE_METHOD(SnapshotMergeTestFixture,
    deallocatePages(snap.data, snapPages);
 }

-TEST_CASE_METHOD(SnapshotMergeTestFixture, "Test cross-page ignores", "[util]")


Ignores have been removed

Shillaker · 2021-11-10T08:10:27Z

tests/test/scheduler/test_executor.cpp

@@ -430,94 +448,6 @@ TEST_CASE_METHOD(TestExecutorFixture,
    }
 }

-TEST_CASE_METHOD(TestExecutorFixture,
-                 "Test executing remote chained threads",
-                 "[executor]")


This test is fiddly and fragile as it's trying to recreate a remote execution. Everything it checks is now covered in the distributed tests which are much cleaner, so I've removed it.

csegarragonz

LGTM, just a couple minor points

csegarragonz · 2021-11-10T10:13:08Z

src/scheduler/Executor.cpp

+            // If any tasks are blocking we risk a deadlock, and can no longer
+            // guarantee the application will finish.
+            // In general if we're on the master host and this is a thread, we
+            // should avoid the zeroth and first pool threads as they are likely


I have read about this in the PR description, and this seems to me a risky game to be playing.

How likely, i.e. when, would such an overload happen?

Overloads like this would only happen when there aren't enough resources available, so they're unlikely. When the system is overloaded it can either keep accepting functions and do its best to execute them, or start rejecting requests with an error. Faabric does the former and starts queueing, as this would then trigger the underlying cluster to scale out in a "real" deployment.

This specific bit of code is only related to threading, i.e. when an application spawns more threads than there are cores in the system. Well written multi-threaded applications ought to request the level of parallelism available in the environment, at which point the system can specify an appropriate limit that will avoid this scenario (as is the case with OpenMP).

This behaviour is covered in a few tests so although the code shouldn't be triggered in a real deployment, it is still tested.

src/scheduler/Scheduler.cpp

src/transport/PointToPointBroker.cpp

Started fixes for locks

9fdfcef

Shillaker self-assigned this Nov 1, 2021

Shillaker added 6 commits November 1, 2021 14:52

Clearing up groups

a14a1bc

Experiments with waiters/ maps

b6a0c2e

Add test for flag waiter

64dbd92

Avoid scheduling multiple threads on same thread as 0th group

9d4570d

Overloading thread pool

249daf2

Moved unrelated changes to separate PR

6fca18b

Shillaker changed the title ~~Fixes for distributed locks~~ Fixes for distributed coordination bugs Nov 2, 2021

Shillaker added 4 commits November 2, 2021 14:24

Add locks.cpp

ea11f7a

Only lock on group when it exists

314226f

Add custom merges

bb5af80

Overhaul of snapshot diffing

2cd38e0

Shillaker commented Nov 4, 2021

View reviewed changes

Shillaker added 9 commits November 4, 2021 18:34

Move spurious logging statement

db28492

More snapshotting fixes

8ce8375

Rearrange scheduler logic

627209b

Fix indexing in scheduler loop

e2b8f3a

Avoid pushing snapshots to master host

d9cca41

Remove last snapshot stuff

a32b7a3

Add scheduler hints

4b6eb32

Fix bug in host ordering

983dfaa

Sort out reusing ptp messages

ce8a870

Shillaker changed the title ~~Fixes for distributed coordination bugs~~ Scheduler hints and fixes for distributed coordination bugs Nov 8, 2021

Shillaker mentioned this pull request Nov 8, 2021

Use Faabric distributed coordination and merge operations faasm/faasm#534

Merged

Naming

41118f9

Shillaker commented Nov 9, 2021

View reviewed changes

Shillaker added 2 commits November 9, 2021 10:26

Fixing up tests

52b3877

Continuing test fixes

d472329

Shillaker added 3 commits November 9, 2021 12:23

Remove ignore regions

e46fe94

Add distributed locking test

dd7b6b8

Formatting

ebc21eb

Shillaker marked this pull request as ready for review November 9, 2021 15:43

Fix dist tests and remove unnecessary unit test

11473c8

Shillaker commented Nov 10, 2021

View reviewed changes

Shillaker added 2 commits November 10, 2021 08:35

Clearer error message when insufficient pool threads

0d2708e

Bump cores in failing tests

4a76b0b

Shillaker requested a review from csegarragonz November 10, 2021 09:43

csegarragonz approved these changes Nov 10, 2021

View reviewed changes

Shillaker added 2 commits November 10, 2021 14:08

Factor out scheduler decision making

ef52533

Fix locking bug

5af97fd

Shillaker merged commit 990c640 into master Nov 10, 2021

Shillaker deleted the dist-lock-fixes branch November 10, 2021 16:58

csegarragonz mentioned this pull request Feb 23, 2022

Add task to generate release body #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler hints and fixes for distributed coordination bugs #169

Scheduler hints and fixes for distributed coordination bugs #169

Shillaker commented Nov 1, 2021 •

edited

Loading

Shillaker Nov 4, 2021

Shillaker Nov 9, 2021

Shillaker Nov 9, 2021

Shillaker Nov 10, 2021

Shillaker Nov 10, 2021

csegarragonz left a comment

csegarragonz Nov 10, 2021

Shillaker Nov 10, 2021 •

edited

Loading

Scheduler hints and fixes for distributed coordination bugs #169

Scheduler hints and fixes for distributed coordination bugs #169

Conversation

Shillaker commented Nov 1, 2021 • edited Loading

Scheduler hints

Snapshot diff whitelisting

Bug fixes

Shillaker Nov 4, 2021

Choose a reason for hiding this comment

Shillaker Nov 9, 2021

Choose a reason for hiding this comment

Shillaker Nov 9, 2021

Choose a reason for hiding this comment

Shillaker Nov 10, 2021

Choose a reason for hiding this comment

Shillaker Nov 10, 2021

Choose a reason for hiding this comment

csegarragonz left a comment

Choose a reason for hiding this comment

csegarragonz Nov 10, 2021

Choose a reason for hiding this comment

Shillaker Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

Shillaker commented Nov 1, 2021 •

edited

Loading

Shillaker Nov 10, 2021 •

edited

Loading