[common] Add encrypted-file-node-size slab into slab allocator #1763

kailun-qin · 2024-02-06T17:04:07Z

Description of the changes

Allocating the file nodes (file_node_t) of Gramine's encrypted files can be one source of overhead for certain workloads (e.g., RocksDB compaction). To address this performance bottleneck, this commit introduces a new slab size (8288B, considering the alignment) tailored for the size of the file node (8235B) and therefore a new free list for 8256B blocks managed by the slab allocator.

Note that while this change can improve performance for certain memory allocation patterns, it introduces internal fragmentation and may also increase memory overhead and external fragmentation if the application rarely makes allocations of the file node size.

In addition, this commit fixes an issue where in-memory files are not zeroed out when resized.

Fixes #1714.
Closes #1723.

How to test this PR?

CI

This change is

dimakuv

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

-- commits line 9 at r2:
I obviously don't like this, as this is special-casing the encrypted file logic into the generic slab allocator.

Moreover, objects in Gramine that take e.g. 2KB or 4KB will occupy slab slots of ~8KB, so the fragmentation of memory will be terrible.

dimakuv

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):
CI fails on LibOS regression tests:

__________________ TC_30_Syscall.test_055_mmap_emulated_tmpfs __________________
...
E   subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'mmap_file_emulated', '/mnt/tmpfs/test_mmap']' returned non-zero exit status 1.
----------------------------- Captured stdout call -----------------------------
[0.002] CREATE OK
[0.002] mmap_file_emulated: unexpected non-zero byte at addr_shared[16]

@kailun-qin Could you take a look?

dimakuv

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

libos/src/fs/libos_fs_mem.c line 21 at r4 (raw file):

    size_t min_size = MIN(buf_size, (size_t)mem->size);
    memcpy(buf, mem->buf, MIN(buf_size, min_size));

Why do you have this MIN() here? This should be just min_size

dimakuv

Reviewed 1 of 1 files at r5.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

-- commits line 9 at r5:
I'm confused by all these numbers. How did you arrive at 8256 bytes?

common/include/slabmgr.h line 97 at r5 (raw file):

#define SLAB_LEVEL_SIZES                                                       \
    16, 32, 64, 128 - SLAB_HDR_SIZE, 256 - SLAB_HDR_SIZE, 512 - SLAB_HDR_SIZE, \
        1024 - SLAB_HDR_SIZE, 2048 - SLAB_HDR_SIZE, 8288 - SLAB_HDR_SIZE

We may want to add one more level, that will cover 4KB objects, otherwise we have a huge gap between ~2KB and ~8KB, and memory fragmentation can be significant.

kailun-qin

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

-- commits line 9 at r5:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm confused by all these numbers. How did you arrive at 8256 bytes?

hmm... the size of file_node_t is 8235B, the SLAB_HDR_SIZE is 32B, so we need at least 8267B for a slab object for our case. While drafting this code, I tried to pick a value that is aligned to 32B (a multiple of 32) for the size of slab object , so this was where 8288B came from. And 8256B (8288 - 32) is thus the user buffer size.

dimakuv

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

CI fails on LibOS regression tests:

__________________ TC_30_Syscall.test_055_mmap_emulated_tmpfs __________________
...
E   subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-20.04/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'mmap_file_emulated', '/mnt/tmpfs/test_mmap']' returned non-zero exit status 1.
----------------------------- Captured stdout call -----------------------------
[0.002] CREATE OK
[0.002] mmap_file_emulated: unexpected non-zero byte at addr_shared[16]

@kailun-qin Could you take a look?

This was fixed, see change in libos_fs_mem.c file. (I would prefer to move it as a separate commit or even a PR, but can decide later when we have enough approvals.)

-- commits line 9 at r2:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I obviously don't like this, as this is special-casing the encrypted file logic into the generic slab allocator.

Moreover, objects in Gramine that take e.g. 2KB or 4KB will occupy slab slots of ~8KB, so the fragmentation of memory will be terrible.

The other reviewers are leaning towards this PR instead of the previous attempt in #1723.

Since both patches are supposed to be only a temporary solution (the final solution will be using a new proper memory allocator), I'm fine with merging this PR.

-- commits line 9 at r5:

Previously, kailun-qin (Kailun Qin) wrote…

hmm... the size of file_node_t is 8235B, the SLAB_HDR_SIZE is 32B, so we need at least 8267B for a slab object for our case. While drafting this code, I tried to pick a value that is aligned to 32B (a multiple of 32) for the size of slab object , so this was where 8288B came from. And 8256B (8288 - 32) is thus the user buffer size.

Thanks, I understand the reasoning now.

common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We may want to add one more level, that will cover 4KB objects, otherwise we have a huge gap between ~2KB and ~8KB, and memory fragmentation can be significant.

We should probably do a couple experiments, whether we want to add yet another slab level.

I'm not blocking here though.

dimakuv

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):
This PR was tested independently by two Intel teams: validation team and a team that found this perf bug & required this.

The PR improves performance of Encrypted Files and is even better than #1723 in terms of resulting latency/throughput.

The PR also doesn't result in any significant performance changes (neither perf degradation nor perf improvement) in our other CI workloads.

The PR looks good to go.

common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We should probably do a couple experiments, whether we want to add yet another slab level.

I'm not blocking here though.

@mkow What do you think? We can merge the PR as-is, as it was validated by two Intel teams, and no perf issues were detected. See my other comment.

mkow

Reviewed 1 of 1 files at r2, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow What do you think? We can merge the PR as-is, as it was validated by two Intel teams, and no perf issues were detected. See my other comment.

// we talked about this in private and Dmitrii will do some stats to check whether this may be worth adding.

libos/src/fs/libos_fs_mem.c line 20 at r5 (raw file):

        return -ENOMEM;

    memcpy(buf, mem->buf, MIN(buf_size, (size_t)mem->size));

ouch 😬

dimakuv

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

common/include/slabmgr.h line 97 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

// we talked about this in private and Dmitrii will do some stats to check whether this may be worth adding.

Ok, based on results reported in #1768, we are adding one additional (intermediate) slab level -- around 4KB.

Allocating the file nodes (`file_node_t`) of Gramine's encrypted files can be one source of overhead for certain workloads (e.g., RocksDB compaction). To address this performance bottleneck, this commit introduces a new slab size (8288B, considering the alignment) tailored for the size of the file node (8235B) and therefore a new free list for 8256B blocks managed by the slab allocator. Additionally, one more slab size (~4KB) is added, to have proper powers-of-2 slab levels. Note that while this change can improve performance for certain memory allocation patterns, it introduces internal fragmentation and may also increase memory overhead and external fragmentation if the application rarely makes allocations of the file node size. However, our limited experiments did not show any issues. In addition, this commit fixes an issue where in-memory files are not zeroed out when resized. Signed-off-by: Kailun Qin <kailun.qin@intel.com>

dimakuv

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)

common/include/slabmgr.h line 97 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, based on results reported in #1768, we are adding one additional (intermediate) slab level -- around 4KB.

Done, also adjusted the commit message, please take a look.

mkow

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved

mkow

Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved

dimakuv

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kailun-qin)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This PR was tested independently by two Intel teams: validation team and a team that found this perf bug & required this.

The PR improves performance of Encrypted Files and is even better than #1723 in terms of resulting latency/throughput.

The PR also doesn't result in any significant performance changes (neither perf degradation nor perf improvement) in our other CI workloads.

The PR looks good to go.

I'm blocking for now. I asked the two Intel teams to re-do the validation again (on the new version of the PR -- finalized and rebased). Let's wait for their replies, just to be sure we didn't run into any perf regressions.

kailun-qin

Reviewed 1 of 1 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion

dimakuv

Reviewable status: complete! all files reviewed, all discussions resolved

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm blocking for now. I asked the two Intel teams to re-do the validation again (on the new version of the PR -- finalized and rebased). Let's wait for their replies, just to be sure we didn't run into any perf regressions.

Done. Both teams reported no issues with the latest revision of the PR.

kailun-qin marked this pull request as draft February 6, 2024 17:32

kailun-qin force-pushed the kailun-qin/slab-4k branch from 3dd08ab to 12664eb Compare February 6, 2024 18:19

kailun-qin changed the title ~~[common] Add 4KB slab size to slab allocator~~ [common] Add encrypted-file-node-size slab into slab allocator Feb 6, 2024

kailun-qin mentioned this pull request Feb 6, 2024

[common] Introduce pre-allocated node free list for encrypted files #1723

Closed

dimakuv reviewed Feb 7, 2024

View reviewed changes

dimakuv reviewed Feb 8, 2024

View reviewed changes

kailun-qin force-pushed the kailun-qin/slab-4k branch 2 times, most recently from ec02135 to 0800855 Compare February 8, 2024 14:22

dimakuv reviewed Feb 8, 2024

View reviewed changes

kailun-qin force-pushed the kailun-qin/slab-4k branch from 0800855 to aa4c2b4 Compare February 8, 2024 14:34

dimakuv reviewed Feb 8, 2024

View reviewed changes

kailun-qin commented Feb 9, 2024

View reviewed changes

dimakuv approved these changes Feb 9, 2024

View reviewed changes

dimakuv mentioned this pull request Feb 9, 2024

[Metaissue] Memory allocators in Gramine, their performance and possible alternatives #1767

Open

13 tasks

dimakuv reviewed Feb 12, 2024

View reviewed changes

dimakuv marked this pull request as ready for review February 12, 2024 10:36

dimakuv self-assigned this Feb 12, 2024

mkow requested a review from dimakuv February 12, 2024 15:42

mkow reviewed Feb 12, 2024

View reviewed changes

dimakuv requested changes Feb 12, 2024

View reviewed changes

dimakuv force-pushed the kailun-qin/slab-4k branch from aa4c2b4 to 0b44435 Compare February 12, 2024 18:26

dimakuv approved these changes Feb 12, 2024

View reviewed changes

mkow approved these changes Feb 12, 2024

View reviewed changes

dimakuv force-pushed the kailun-qin/slab-4k branch from 0b44435 to 9288b49 Compare February 12, 2024 18:42

mkow approved these changes Feb 12, 2024

View reviewed changes

dimakuv requested changes Feb 12, 2024

View reviewed changes

kailun-qin commented Feb 14, 2024

View reviewed changes

dimakuv approved these changes Feb 14, 2024

View reviewed changes

dimakuv merged commit 9288b49 into gramineproject:master Feb 14, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[common] Add encrypted-file-node-size slab into slab allocator #1763

[common] Add encrypted-file-node-size slab into slab allocator #1763

kailun-qin commented Feb 6, 2024 •

edited by dimakuv

Loading

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

kailun-qin left a comment

dimakuv left a comment

dimakuv left a comment

mkow left a comment

dimakuv left a comment

dimakuv left a comment

mkow left a comment

mkow left a comment

dimakuv left a comment

kailun-qin left a comment

dimakuv left a comment

[common] Add encrypted-file-node-size slab into slab allocator #1763

[common] Add encrypted-file-node-size slab into slab allocator #1763

Conversation

kailun-qin commented Feb 6, 2024 • edited by dimakuv Loading

Description of the changes

How to test this PR?

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin commented Feb 6, 2024 •

edited by dimakuv

Loading