-
Hi fellow concurrency performance hunters! I'm trying to profile a performance regression using the profilers of Firefox and Chrome. It showed that some multi-threaded code spends about 65% or 7sec with The code that spawns threads looks like this std::vector<std::thread> threadPool;
threadPool.reserve(NUM_THREADS); // is navigator.hardwareConcurrency
for (size_t threadIndex = 0; threadIndex < threads; ++threadIndex) {
threadPool.emplace_back(
[&](/* Subset Indices per Thread */) {
for (/* Subset Indices per Thread*/) {
lambda(i); // Only captures by reference.
}
},
/* Subset Indices per Thread */);
}
for (auto &thread : threadPool) {
thread.join();
} Whereas we link with
Has anybody seen something similar? Thanks and greetings from rainy Berlin |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
|
Beta Was this translation helpful? Give feedback.
-
I wonder if this is the same issue as #18628? |
Beta Was this translation helpful? Give feedback.
-
Thank you for your insights. I investigated the code a bit further and guess we start too many new threads. Newer more than the worker pool size, but I guess a thread pool on our C++ side would already help here. Seems there was no change regarding Emscripten. Guess I was mislead here a bit by Linux and Windows not showing that performance issue. Sorry for the noise :) But I'm still confused why the |
Beta Was this translation helpful? Give feedback.
pthread_create
will memset to 0 the memory the thread needs, which includes the stack and TLS. Your stack size is 5MB, so reducing that could help significantly. Hopefully you don't need that much? The default is 64K these days, which is enough for most applications.