Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[common] Add encrypted-file-node-size slab into slab allocator #1763

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

kailun-qin
Copy link
Contributor

@kailun-qin kailun-qin commented Feb 6, 2024

Description of the changes

Allocating the file nodes (file_node_t) of Gramine's encrypted files can be one source of overhead for certain workloads (e.g., RocksDB compaction). To address this performance bottleneck, this commit introduces a new slab size (8288B, considering the alignment) tailored for the size of the file node (8235B) and therefore a new free list for 8256B blocks managed by the slab allocator.

Note that while this change can improve performance for certain memory allocation patterns, it introduces internal fragmentation and may also increase memory overhead and external fragmentation if the application rarely makes allocations of the file node size.

In addition, this commit fixes an issue where in-memory files are not zeroed out when resized.

Fixes #1714.
Closes #1723.

How to test this PR?

CI


This change is Reviewable

@kailun-qin kailun-qin marked this pull request as draft February 6, 2024 17:32
@kailun-qin kailun-qin changed the title [common] Add 4KB slab size to slab allocator [common] Add encrypted-file-node-size slab into slab allocator Feb 6, 2024
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


-- commits line 9 at r2:
I obviously don't like this, as this is special-casing the encrypted file logic into the generic slab allocator.

Moreover, objects in Gramine that take e.g. 2KB or 4KB will occupy slab slots of ~8KB, so the fragmentation of memory will be terrible.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):
CI fails on LibOS regression tests:

__________________ TC_30_Syscall.test_055_mmap_emulated_tmpfs __________________
...
E   subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'mmap_file_emulated', '/mnt/tmpfs/test_mmap']' returned non-zero exit status 1.
----------------------------- Captured stdout call -----------------------------
[0.002] CREATE OK
[0.002] mmap_file_emulated: unexpected non-zero byte at addr_shared[16]

@kailun-qin Could you take a look?


@kailun-qin kailun-qin force-pushed the kailun-qin/slab-4k branch 2 times, most recently from ec02135 to 0800855 Compare February 8, 2024 14:22
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


libos/src/fs/libos_fs_mem.c line 21 at r4 (raw file):

    size_t min_size = MIN(buf_size, (size_t)mem->size);
    memcpy(buf, mem->buf, MIN(buf_size, min_size));

Why do you have this MIN() here? This should be just min_size

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


-- commits line 9 at r5:
I'm confused by all these numbers. How did you arrive at 8256 bytes?


common/include/slabmgr.h line 97 at r5 (raw file):

#define SLAB_LEVEL_SIZES                                                       \
    16, 32, 64, 128 - SLAB_HDR_SIZE, 256 - SLAB_HDR_SIZE, 512 - SLAB_HDR_SIZE, \
        1024 - SLAB_HDR_SIZE, 2048 - SLAB_HDR_SIZE, 8288 - SLAB_HDR_SIZE

We may want to add one more level, that will cover 4KB objects, otherwise we have a huge gap between ~2KB and ~8KB, and memory fragmentation can be significant.

Copy link
Contributor Author

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


-- commits line 9 at r5:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm confused by all these numbers. How did you arrive at 8256 bytes?

hmm... the size of file_node_t is 8235B, the SLAB_HDR_SIZE is 32B, so we need at least 8267B for a slab object for our case. While drafting this code, I tried to pick a value that is aligned to 32B (a multiple of 32) for the size of slab object , so this was where 8288B came from. And 8256B (8288 - 32) is thus the user buffer size.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

CI fails on LibOS regression tests:

__________________ TC_30_Syscall.test_055_mmap_emulated_tmpfs __________________
...
E   subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'mmap_file_emulated', '/mnt/tmpfs/test_mmap']' returned non-zero exit status 1.
----------------------------- Captured stdout call -----------------------------
[0.002] CREATE OK
[0.002] mmap_file_emulated: unexpected non-zero byte at addr_shared[16]

@kailun-qin Could you take a look?

This was fixed, see change in libos_fs_mem.c file. (I would prefer to move it as a separate commit or even a PR, but can decide later when we have enough approvals.)



-- commits line 9 at r2:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I obviously don't like this, as this is special-casing the encrypted file logic into the generic slab allocator.

Moreover, objects in Gramine that take e.g. 2KB or 4KB will occupy slab slots of ~8KB, so the fragmentation of memory will be terrible.

The other reviewers are leaning towards this PR instead of the previous attempt in #1723.

Since both patches are supposed to be only a temporary solution (the final solution will be using a new proper memory allocator), I'm fine with merging this PR.


-- commits line 9 at r5:

Previously, kailun-qin (Kailun Qin) wrote…

hmm... the size of file_node_t is 8235B, the SLAB_HDR_SIZE is 32B, so we need at least 8267B for a slab object for our case. While drafting this code, I tried to pick a value that is aligned to 32B (a multiple of 32) for the size of slab object , so this was where 8288B came from. And 8256B (8288 - 32) is thus the user buffer size.

Thanks, I understand the reasoning now.


common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We may want to add one more level, that will cover 4KB objects, otherwise we have a huge gap between ~2KB and ~8KB, and memory fragmentation can be significant.

We should probably do a couple experiments, whether we want to add yet another slab level.

I'm not blocking here though.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):
This PR was tested independently by two Intel teams: validation team and a team that found this perf bug & required this.

The PR improves performance of Encrypted Files and is even better than #1723 in terms of resulting latency/throughput.

The PR also doesn't result in any significant performance changes (neither perf degradation nor perf improvement) in our other CI workloads.

The PR looks good to go.



common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We should probably do a couple experiments, whether we want to add yet another slab level.

I'm not blocking here though.

@mkow What do you think? We can merge the PR as-is, as it was validated by two Intel teams, and no perf issues were detected. See my other comment.

@dimakuv dimakuv marked this pull request as ready for review February 12, 2024 10:36
@dimakuv dimakuv self-assigned this Feb 12, 2024
@mkow mkow requested a review from dimakuv February 12, 2024 15:42
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow What do you think? We can merge the PR as-is, as it was validated by two Intel teams, and no perf issues were detected. See my other comment.

// we talked about this in private and Dmitrii will do some stats to check whether this may be worth adding.


libos/src/fs/libos_fs_mem.c line 20 at r5 (raw file):

        return -ENOMEM;

    memcpy(buf, mem->buf, MIN(buf_size, (size_t)mem->size));

ouch 😬

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


common/include/slabmgr.h line 97 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

// we talked about this in private and Dmitrii will do some stats to check whether this may be worth adding.

Ok, based on results reported in #1768, we are adding one additional (intermediate) slab level -- around 4KB.

Allocating the file nodes (`file_node_t`) of Gramine's encrypted files
can be one source of overhead for certain workloads (e.g., RocksDB
compaction). To address this performance bottleneck, this commit
introduces a new slab size (8288B, considering the alignment) tailored
for the size of the file node (8235B) and therefore a new free list for
8256B blocks managed by the slab allocator. Additionally, one more slab
size (~4KB) is added, to have proper powers-of-2 slab levels.

Note that while this change can improve performance for certain memory
allocation patterns, it introduces internal fragmentation and may also
increase memory overhead and external fragmentation if the application
rarely makes allocations of the file node size. However, our limited
experiments did not show any issues.

In addition, this commit fixes an issue where in-memory files are not
zeroed out when resized.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)


common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, based on results reported in #1768, we are adding one additional (intermediate) slab level -- around 4KB.

Done, also adjusted the commit message, please take a look.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kailun-qin)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This PR was tested independently by two Intel teams: validation team and a team that found this perf bug & required this.

The PR improves performance of Encrypted Files and is even better than #1723 in terms of resulting latency/throughput.

The PR also doesn't result in any significant performance changes (neither perf degradation nor perf improvement) in our other CI workloads.

The PR looks good to go.

I'm blocking for now. I asked the two Intel teams to re-do the validation again (on the new version of the PR -- finalized and rebased). Let's wait for their replies, just to be sure we didn't run into any perf regressions.


Copy link
Contributor Author

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm blocking for now. I asked the two Intel teams to re-do the validation again (on the new version of the PR -- finalized and rebased). Let's wait for their replies, just to be sure we didn't run into any perf regressions.

Done. Both teams reported no issues with the latest revision of the PR.


@dimakuv dimakuv merged commit 9288b49 into gramineproject:master Feb 14, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Encrypted FS] Use pre-allocacted free list instead of calloc/free for file nodes
3 participants