Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure enclave multi-thread overhead #4906

Open
fantas1a opened this issue Jan 22, 2024 · 8 comments
Open

Measure enclave multi-thread overhead #4906

fantas1a opened this issue Jan 22, 2024 · 8 comments

Comments

@fantas1a
Copy link

Hi,

I recently ran several fully parallel SGX programs with 32 physical cores on one NUMA memory, but all the SGX programs do not scale well after 8 or 16 threads. Their speedups are about 7x for 16 threads and stop gaining afterward (can even become slower with more threads). TCS number is set 32, EPC is 192 GB and used less than 60%.

Q1: I wonder if there is any suggestion for my multi-thread issue,
Q2: or any tools from the open enclave that can help me locate the program (e.g., sgx_emmt, thread perf, memory measure tool, stack size usage, measure E/OCall, mutex, paging)?
Q3: I see tcmalloc is suggested for SGX multi-thread. May I know if open enclave support tcmalloc?

Thank you very much!

@anakrish
Copy link
Contributor

I see tcmalloc is suggested for SGX multi-thread

We don't support tcmalloc yet.

Can you try using snmalloc as shown in this example? https://github.com/openenclave/openenclave/tree/master/samples/pluggable_allocator

@fantas1a
Copy link
Author

I see tcmalloc is suggested for SGX multi-thread

We don't support tcmalloc yet.

Can you try using snmalloc as shown in this example? https://github.com/openenclave/openenclave/tree/master/samples/pluggable_allocator

Hi @anakrish , I just tried snmalloc, maybe because the program is not allocator-intensive, it did give me more speedup for multithread. Thank you.

@anakrish
Copy link
Contributor

@fantas1a Thanks for quickly trying out snmalloc.

Can you describe the behavior characteristics of your application a bit more?

but all the SGX programs do not scale well after 8 or 16 threads. Their speedups are about 7x for 16 threads and stop gaining afterward

What does scaling refer to here?

@fantas1a
Copy link
Author

@fantas1a Thanks for quickly trying out snmalloc.

Can you describe the behavior characteristics of your application a bit more?

but all the SGX programs do not scale well after 8 or 16 threads. Their speedups are about 7x for 16 threads and stop gaining afterward

What does scaling refer to here?

@anakrish I test several open-source fully parallel sort algorithms and other data calculation primitives like bitonic sort and array compaction (i.e., bringing input elements with flag 1 to the front of the input element array). Data are all initialized within enclaves before my timer starts. And during my test, there is not much new memory allocation. I think that is why the snmalloc does not help with their speedup.

The speedups for the programs with 8, 16, 32 threads in one enclave are only about 4x, 7x, 8x over the time with 1 thread. The SGX programs do not gain much from multi-thread after 16 threads and sometimes it performs even slower with more than 16 threads. That is the scaling issue I mean above.

I checked several possible factors that can affect the multithread speedup issue. Intel SGX SDK also do not benefit the programs much after 16 threads (so it is not a problem of open enclave). All my enclave threads are busy wait before my timer starts, no intensive mutex, and no oblivious O/ECalls as I can see. EPC size is 180 GB and consumes less than 60%.

@fantas1a
Copy link
Author

@fantas1a Thanks for quickly trying out snmalloc.

Can you describe the behavior characteristics of your application a bit more?

but all the SGX programs do not scale well after 8 or 16 threads. Their speedups are about 7x for 16 threads and stop gaining afterward

What does scaling refer to here?

Hello, may I know if there is a way to detect EPC paging, or thread status? Thanks.

@anakrish
Copy link
Contributor

Hello, may I know if there is a way to detect EPC paging, or thread status? Thanks.

There is no way to detect EPC paging. By thread status what are you looking for?

no intensive mutex, and no oblivious O/ECalls as I can see. EPC size is 180 GB and consumes less than 60%.

How many cores are there in the VM/machine? When there are more threads than cores, I suspect they performance would take a hit since to suspend/resume a thread EEEXIT/EENTER must be performed.

@fantas1a
Copy link
Author

Hello, may I know if there is a way to detect EPC paging, or thread status? Thanks.

There is no way to detect EPC paging. By thread status what are you looking for?

no intensive mutex, and no oblivious O/ECalls as I can see. EPC size is 180 GB and consumes less than 60%.

How many cores are there in the VM/machine? When there are more threads than cores, I suspect they performance would take a hit since to suspend/resume a thread EEEXIT/EENTER must be performed.

Thank you very much.

  1. By thread status, I want some profiling tools for SGX programs and I want to know if any of the threads are idle and locate the bottleneck.

  2. There are 32 physical cores on the machine, I set NumTCS = 32 and only use 32 threads.

@achamayou
Copy link
Contributor

@fantas1a there is an oe_allocator_mallinfo() interface, and although in the case of snmalloc the stats will be a little coarse (to the closest Mb if I remember correctly), you can get a sense of whether you use substantially more than the EPC or not by calling that at the points in your benchmark where you think you might: https://github.com/openenclave/openenclave/blob/273c422d2663be4e4bd0e61daca1de8167cc3f41/docs/DesignDocs/Mallinfo.md

This of course assumes that you have no other enclaves running on the machine.

There are some external tools available too, such as https://github.com/smherwig/phoenix-spf and https://github.com/ibr-ds/sgx-perf, but they tend to require the Intel SDK, and it is not clear if they would work with recent versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants