Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge regions overhaul #201

Merged
merged 25 commits into from
Dec 29, 2021
Merged

Merge regions overhaul #201

merged 25 commits into from
Dec 29, 2021

Conversation

Shillaker
Copy link
Collaborator

@Shillaker Shillaker commented Dec 20, 2021

This PR addresses the following problems:

  • Merge regions were not being transferred across hosts (see below).
  • Whitelisting merge regions for big applications became unfeasible (impossible to specify all places where the memory might change).
  • Merge region code only supported Raw and Int types (needed more types including Bool, Long, Float and Double).

When it comes to transferring merge regions across hosts, we have to consider three scenarios where snapshot data is being transferred:

  1. First time executing a thread on a remote host, in which case the full snapshot is pushed.
  2. Subsequent executions of threads on a remote host (in which case just an update to the snapshot is sent)
  3. Remote hosts returning snapshot diffs after executing a thread.

In the first two, we are not currently including merge regions, which means, if the main thread on the main host sets up some merge regions, they are not respected by remote hosts.

Changes:

  • Distinguish between a snapshot update (2) and simple snapshot diffs (3).
  • Include snapshot merge regions in (1) and (2).
  • Reintroduce Ignore merge regions to blacklist regions of memory (e.g. guard pages), then automatically fill the rest of the snapshot memory with Overwrite regions.
  • Add double, float, bool and long support.

@Shillaker
Copy link
Collaborator Author

@KubaSz I added you as a reviewer given that you looked at the previous snapshot stuff, but feel free to ignore if you don't have scope to keep checking this stuff 😄

@eigenraven
Copy link
Collaborator

eigenraven commented Dec 21, 2021

I don't have much time to review this, but from a quick glance I can offer two quick comments:

  • const std::vector<T>& -> std::span<const T> in arguments
  • when pushing/emplacing back to a vector in a loop, make sure there's a reserve before the loop with the number of iterations

@Shillaker
Copy link
Collaborator Author

Ok nice, cheers and have a good Xmas!

@Shillaker Shillaker removed the request for review from eigenraven December 21, 2021 11:20
@Shillaker Shillaker marked this pull request as draft December 21, 2021 11:44
@Shillaker Shillaker removed the request for review from csegarragonz December 21, 2021 11:44
@Shillaker Shillaker changed the title Push snapshot merge regions with snapshots and snapshot diffs Merge regions overhaul Dec 21, 2021
@Shillaker Shillaker marked this pull request as ready for review December 27, 2021 14:03
@Shillaker Shillaker self-assigned this Dec 27, 2021
@Shillaker Shillaker requested review from csegarragonz and removed request for csegarragonz December 28, 2021 13:32
Copy link
Collaborator

@csegarragonz csegarragonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just pointed out two very minor issues, feel free to ignore them.

I have deliberately marked all occurings I have found to aid in the re-factor if you end up going for it.

};

template<typename T>
inline bool calculateDiffValue(const uint8_t* original,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediate to tell the difference between calculateDiffValue and applyDiffValue, could we add a comment explaining what each one does, or how they differ?

Also, just to double check, can the uint8_t pointers here can't be converted to std::spans?

@@ -25,6 +25,8 @@ void FaabricMain::startBackground()
// Crash handler
faabric::util::setUpCrashHandler();

PROF_BEGIN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point made in the Faasm PR re. comitting profiling macros. I would personally remove them, but feel free to ignore.


#include <sys/mman.h>

namespace faabric::snapshot {
std::shared_ptr<faabric::util::SnapshotData> SnapshotRegistry::getSnapshot(
const std::string& key)
{
PROF_START(GetSnapshot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point re. profiling (feel free to ignore).

{
static ClearRefsWrapper wrap;

PROF_START(ResetDirty)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re. profiling (feel free to ignore).

{
// Note we only need a shared lock here as we are not modifying data and the
// OS will handle synchronisation of the mapping itself
PROF_START(MapSnapshot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point re. profiling

@@ -337,6 +400,7 @@ MemoryView::MemoryView(std::span<const uint8_t> dataIn)

std::vector<SnapshotDiff> MemoryView::getDirtyRegions()
{
PROF_START(GetDirtyRegions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiling

@@ -36,9 +36,13 @@ TEST_CASE_METHOD(SnapshotTestFixture,

int sharedMemPages = 8;
size_t sharedMemSize = sharedMemPages * HOST_PAGE_SIZE;
MemoryRegion sharedMem = allocateSharedMemory(sharedMemSize);
MemoryRegion sharedMem = allocatePrivateMemory(sharedMemSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, the variable is called sharedMem and is defined by calling allocatePrivateMemory which feels odd.

@@ -433,7 +433,7 @@ TEST_CASE("Test mapping memory", "[util]")
TEST_CASE("Test mapping memory fails with invalid fd", "[util]")
{
size_t memSize = 10 * HOST_PAGE_SIZE;
MemoryRegion sharedMem = allocateSharedMemory(memSize);
MemoryRegion sharedMem = allocatePrivateMemory(memSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point re. variable wording.

@@ -179,12 +179,12 @@ TEST_CASE_METHOD(SnapshotMergeTestFixture,
std::vector<uint8_t> expectedSnapMem = snap->getDataCopy();

// Set up two shared mem regions
MemoryRegion sharedMemA = allocateSharedMemory(snapSize);
MemoryRegion sharedMemB = allocateSharedMemory(snapSize);
MemoryRegion sharedMemA = allocatePrivateMemory(snapSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point re. variable naming throughout this test.

@@ -590,6 +592,214 @@ TEST_CASE_METHOD(SnapshotMergeTestFixture,
expectedData = faabric::util::valueToBytes<int>(diffValue);
}

SECTION("Long")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice nesting of tests here 👍

@Shillaker
Copy link
Collaborator Author

I'll merge this in the interest of moving things along and fixing downstream, then I'll put the refactoring into #210

@Shillaker Shillaker merged commit 0e7a32c into master Dec 29, 2021
@Shillaker Shillaker deleted the mrs-with-snaps branch December 29, 2021 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants