-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure enclave multi-thread overhead #4906
Comments
We don't support tcmalloc yet. Can you try using snmalloc as shown in this example? https://github.com/openenclave/openenclave/tree/master/samples/pluggable_allocator |
Hi @anakrish , I just tried snmalloc, maybe because the program is not allocator-intensive, it did give me more speedup for multithread. Thank you. |
@fantas1a Thanks for quickly trying out snmalloc. Can you describe the behavior characteristics of your application a bit more?
What does |
@anakrish I test several open-source fully parallel sort algorithms and other data calculation primitives like bitonic sort and array compaction (i.e., bringing input elements with flag 1 to the front of the input element array). Data are all initialized within enclaves before my timer starts. And during my test, there is not much new memory allocation. I think that is why the snmalloc does not help with their speedup. The speedups for the programs with 8, 16, 32 threads in one enclave are only about 4x, 7x, 8x over the time with 1 thread. The SGX programs do not gain much from multi-thread after 16 threads and sometimes it performs even slower with more than 16 threads. That is the scaling issue I mean above. I checked several possible factors that can affect the multithread speedup issue. Intel SGX SDK also do not benefit the programs much after 16 threads (so it is not a problem of open enclave). All my enclave threads are busy wait before my timer starts, no intensive mutex, and no oblivious O/ECalls as I can see. EPC size is 180 GB and consumes less than 60%. |
Hello, may I know if there is a way to detect EPC paging, or thread status? Thanks. |
There is no way to detect EPC paging. By
How many cores are there in the VM/machine? When there are more threads than cores, I suspect they performance would take a hit since to suspend/resume a thread EEEXIT/EENTER must be performed. |
Thank you very much.
|
@fantas1a there is an oe_allocator_mallinfo() interface, and although in the case of snmalloc the stats will be a little coarse (to the closest Mb if I remember correctly), you can get a sense of whether you use substantially more than the EPC or not by calling that at the points in your benchmark where you think you might: https://github.com/openenclave/openenclave/blob/273c422d2663be4e4bd0e61daca1de8167cc3f41/docs/DesignDocs/Mallinfo.md This of course assumes that you have no other enclaves running on the machine. There are some external tools available too, such as https://github.com/smherwig/phoenix-spf and https://github.com/ibr-ds/sgx-perf, but they tend to require the Intel SDK, and it is not clear if they would work with recent versions. |
Hi,
I recently ran several fully parallel SGX programs with 32 physical cores on one NUMA memory, but all the SGX programs do not scale well after 8 or 16 threads. Their speedups are about 7x for 16 threads and stop gaining afterward (can even become slower with more threads). TCS number is set 32, EPC is 192 GB and used less than 60%.
Q1: I wonder if there is any suggestion for my multi-thread issue,
Q2: or any tools from the open enclave that can help me locate the program (e.g., sgx_emmt, thread perf, memory measure tool, stack size usage, measure E/OCall, mutex, paging)?
Q3: I see tcmalloc is suggested for SGX multi-thread. May I know if open enclave support tcmalloc?
Thank you very much!
The text was updated successfully, but these errors were encountered: