exceptions are not scalable #73

nyh · 2015-11-11T12:58:47Z

This is an issue to think about - it's not urgent, and I don't any idea how we can solve it.

Seastar goes out of its way not to use any locks or even atomic operations, because these are not scalable as the number of cores grow. In particularly, we have our own single-thread version of std::shared_ptr and std::string because the standard ones use atomic operations because they can be shared across threads.

One unscalable thing we're left with is exception handling: std::exception_ptr uses atomic operations (just like std::shared_ptr). But more worryingly, throwing an exception appears to be taking global locks while doing stack unwinding (see for example http://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks) which means one thread throwing an exception can block another thread which is also trying to throw an exception. And blocking is really bad on Seastar's single-thread-per-core design.

Obviously, the best solution is to use exceptions as little as possible. But when your sever is handling 1 million requests per second, you need to be really careful to avoid any possibility of exceptions in the course of request handling. Note that exceptions are known to be slow - that is fine. What is not fine is that an exception on one thread can block other threads on a machine with many cores.

I don't know if we can ever solve this issue without modifying/overriding libgcc, but the minimum we should do is to document this issue and warn against using exceptions too much in Seastar.

Another idea worth looking into is whether we can implement a future's exception state without actually throwing exceptions: In a lot of Seastar code, we do not throw an exception, but rather return a make_exception_future<...>(). Commit 44e35a4 prevent a bunch of wasteful rethrows of this store exception, but we still have two problems: 1) make_exception_future internally throws an exception to build a std::exception_ptr, and 2) code which uses then_wrapped() usually rethrows the exception when calling get(). Is there a way to support exceptional futures without the overheads of actual exception handling?

The text was updated successfully, but these errors were encountered:

nyh · 2015-11-11T15:44:07Z

Looking at gcc's source code, libstdc++-v3/libsupc++/*, it seems not difficult in theory to take a abject, and instead of throwing it (which calls the __cxa_throw() code) and then catching it and calling std::current_exception() - is should be feasable to create an exception_ptr object directly. This might, however, need some modifications to libgcc because some things (like the std::exception_ptr::exception_ptr(void*) constructor) are private.

nyh · 2015-11-12T09:14:13Z

Some very partial progress in the situation:

In commit 20bf03b, Gleb found and fixed additional cases where we catch-and-rethrow an exception, where all we really needed to do was to pass along the existing exception_ptr.
In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 I started a discussion on making gcc's implementation of std::make_exception() (used by our make_exception_future()) work without throwing an exception at all. It appears possible to do, but will require some work inside libstdc++.

unknownzerx · 2015-12-30T01:39:29Z

In continuation-passing style, try catch can be manually implemented by maintaining two continuations -- one for normal flow and one for exceptional flow.

IMHO with promise-future, the continuation is still explicitly expressed (it might be called 'continuation-chaining style'?). We could avoid using C++ try catch at all.

nyh · 2016-07-03T22:14:03Z

I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
I'm not yet sure if the fix will involve only gcc, or also (or only) glibc (e.g., might require changes to dl_iterate_phdr?). So maybe I put this report in the wrong bug tracker.

nyh · 2017-01-01T09:44:57Z

Thanks to @gleb-cloudius, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 was solved in gcc 7, and std::make_exception_ptr() will not involve throwing an actual exception.

Gleb, can you please summarize the state of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744, i.e., the attempt to allow concurrent exception throwing? I see there was a lot of activity on that issue, but don't understand what was the conclusion. Note that even though the above fix, and several others mentioned in comments above, reduced the amount of exception throwing - some still remains so it would be nice to make that scalable as well.

gleb-cloudius · 2017-01-01T09:54:07Z

There is no conclusion. I think there is an understanding of the problem and willingness to address it, but not at all cost. It involves coordination between gcc and glibc and hence needs much more time dedicated to it. I proposed a solution that requires adding a new ABI function that in gllibc that has to be used by newer gcc during unwind. There are comment on the function implementation itself (how it achieves parallelism) and on adding a new ABI function as opposite of doing something with symbol versioning (not sure how difference in locking behaviour can be addressed by symbol versioning, but then I did not looked enough into it). That's more or less the state.

…

-- Gleb.

nyh · 2018-02-12T13:27:35Z

In #399, @gleb-cloudius explains the current state of this issue.

"There are two locks that are taken on exception path. One was eliminated in gcc7 (by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=240193) for another we have a local workaround protected by NO_EXCEPTION_HACK. NO_EXCEPTION_HACK should not be set unless you try to compile libstdc++ or libgcc statically, so workaround should be enabled by default. 464f5e3 has much more detailed explanation of the whole ordeal. Note that the workaround assumes that no libraries are loaded dynamically after application starts (it is possible to support that too, just requires more coding)."

Basically Seastar does not have this bug any longer because one lock was eliminated by gcc 7 (so switch to gcc 7!) and a second lock we work around by reimplementing dl_iterate_phdr() ourselves in core/exception_hacks.cc.

Relates: https://st.yandex-team.ru/ Multiple threads on multiple cores should be able to concurrently throw exceptions without bothering one another. But unfortunately, it appears that in the current implementation of libstdc++ and/or glibc, the stack unwinding process takes a global lock (while getting the list of shared objects, and perhaps other things) which serializes these parallel exception-throwing and can dramatically slow down the program. Some might dismiss this inefficiency with the standard "exceptions should be rare" excuse. They should be rare. But sometimes they are not, leading do a catastrophic collapse in performance. We saw an illustrative example of an "exception storm" in an application of ours. This application can handle lots and lots of requests per second on many cores. Some unexpected circumstance caused the application to slow down somewhat, which led to some of the requests timing out. The timeout was implemented as an exception, so now we had thousands of exceptions being thrown in all cores in parallel. This led to the applications threads starting to hang, once in a while, on the lock(s) inside "throw". This in turn made the application even slower, and created even more timeouts, which in turn resulted in even more exceptions. In this way the number of exceptions per second escalated, until the point where most of the work the application was doing was fighting over the "throw" locks, and no useful work was being done. This patch eliminates the "throw" lock by supplying our own "dl_iterate_phdr" function which operates over a cached list of shared objects, which should mitigate blocking behavior in exception storm scenario, but as a tradeoff disables dynamic loading/unloading during component system lifetime -- there is no thread-safe and robust way to synchronize that with the cache we've got. If one really needs dlopen/dlclose outside of component constructor/destructor, this optimization can be disabled via USERVER_DISABLE_PHDR_CACHE cmake option. In the benchmarks which just throw+catch in parallel we are expectedly seeing X (number of threads) improvements: it tooks some 20s+ seconds for 8 threads to throw a million of exceptions each in parallel, and now it takes mere ~2s. This also improves an RPS by a factor of 2+ when an endpoint under load just throws std::runtime_error. Some references: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744 scylladb/seastar@464f5e3 scylladb/seastar#73 https://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks

todin mentioned this issue Feb 2, 2018

global lock: dl_load_write_lock @ dl_iterate_phdr() #399

Closed

ppetraki mentioned this issue Apr 19, 2018

Add a memory manager TileDB-Inc/TileDB#41

Closed

sdaly2107 mentioned this issue Dec 3, 2019

dlopen constraints android/ndk#1143

Closed

insertinterestingnamehere mentioned this issue Feb 4, 2020

Poor Scaling of Exceptions Fix IntelligentSoftwareSystems/Galois#71

Open

mapleFU mentioned this issue Aug 11, 2022

Add StatusOr for error handling in modern C++ style apache/kvrocks#768

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exceptions are not scalable #73

exceptions are not scalable #73

nyh commented Nov 11, 2015

nyh commented Nov 11, 2015

nyh commented Nov 12, 2015

unknownzerx commented Dec 30, 2015

nyh commented Jul 3, 2016

nyh commented Jan 1, 2017

gleb-cloudius commented Jan 1, 2017 via email

nyh commented Feb 12, 2018

exceptions are not scalable #73

exceptions are not scalable #73

Comments

nyh commented Nov 11, 2015

nyh commented Nov 11, 2015

nyh commented Nov 12, 2015

unknownzerx commented Dec 30, 2015

nyh commented Jul 3, 2016

nyh commented Jan 1, 2017

gleb-cloudius commented Jan 1, 2017 via email

nyh commented Feb 12, 2018