Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exceptions are not scalable #73

Open
nyh opened this issue Nov 11, 2015 · 7 comments
Open

exceptions are not scalable #73

nyh opened this issue Nov 11, 2015 · 7 comments

Comments

@nyh
Copy link
Contributor

nyh commented Nov 11, 2015

This is an issue to think about - it's not urgent, and I don't any idea how we can solve it.

Seastar goes out of its way not to use any locks or even atomic operations, because these are not scalable as the number of cores grow. In particularly, we have our own single-thread version of std::shared_ptr and std::string because the standard ones use atomic operations because they can be shared across threads.

One unscalable thing we're left with is exception handling: std::exception_ptr uses atomic operations (just like std::shared_ptr). But more worryingly, throwing an exception appears to be taking global locks while doing stack unwinding (see for example http://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks) which means one thread throwing an exception can block another thread which is also trying to throw an exception. And blocking is really bad on Seastar's single-thread-per-core design.

Obviously, the best solution is to use exceptions as little as possible. But when your sever is handling 1 million requests per second, you need to be really careful to avoid any possibility of exceptions in the course of request handling. Note that exceptions are known to be slow - that is fine. What is not fine is that an exception on one thread can block other threads on a machine with many cores.

I don't know if we can ever solve this issue without modifying/overriding libgcc, but the minimum we should do is to document this issue and warn against using exceptions too much in Seastar.

Another idea worth looking into is whether we can implement a future's exception state without actually throwing exceptions: In a lot of Seastar code, we do not throw an exception, but rather return a make_exception_future<...>(). Commit 44e35a4 prevent a bunch of wasteful rethrows of this store exception, but we still have two problems: 1) make_exception_future internally throws an exception to build a std::exception_ptr, and 2) code which uses then_wrapped() usually rethrows the exception when calling get(). Is there a way to support exceptional futures without the overheads of actual exception handling?

@nyh
Copy link
Contributor Author

nyh commented Nov 11, 2015

Looking at gcc's source code, libstdc++-v3/libsupc++/*, it seems not difficult in theory to take a abject, and instead of throwing it (which calls the __cxa_throw() code) and then catching it and calling std::current_exception() - is should be feasable to create an exception_ptr object directly. This might, however, need some modifications to libgcc because some things (like the std::exception_ptr::exception_ptr(void*) constructor) are private.

@nyh
Copy link
Contributor Author

nyh commented Nov 12, 2015

Some very partial progress in the situation:

  1. In commit 20bf03b, Gleb found and fixed additional cases where we catch-and-rethrow an exception, where all we really needed to do was to pass along the existing exception_ptr.
  2. In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 I started a discussion on making gcc's implementation of std::make_exception() (used by our make_exception_future()) work without throwing an exception at all. It appears possible to do, but will require some work inside libstdc++.

@unknownzerx
Copy link

In continuation-passing style, try catch can be manually implemented by maintaining two continuations -- one for normal flow and one for exceptional flow.

IMHO with promise-future, the continuation is still explicitly expressed (it might be called 'continuation-chaining style'?). We could avoid using C++ try catch at all.

@nyh
Copy link
Contributor Author

nyh commented Jul 3, 2016

I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
I'm not yet sure if the fix will involve only gcc, or also (or only) glibc (e.g., might require changes to dl_iterate_phdr?). So maybe I put this report in the wrong bug tracker.

@nyh
Copy link
Contributor Author

nyh commented Jan 1, 2017

Thanks to @gleb-cloudius, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 was solved in gcc 7, and std::make_exception_ptr() will not involve throwing an actual exception.

Gleb, can you please summarize the state of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744, i.e., the attempt to allow concurrent exception throwing? I see there was a lot of activity on that issue, but don't understand what was the conclusion. Note that even though the above fix, and several others mentioned in comments above, reduced the amount of exception throwing - some still remains so it would be nice to make that scalable as well.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Jan 1, 2017 via email

@nyh
Copy link
Contributor Author

nyh commented Feb 12, 2018

In #399, @gleb-cloudius explains the current state of this issue.

"There are two locks that are taken on exception path. One was eliminated in gcc7 (by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=240193) for another we have a local workaround protected by NO_EXCEPTION_HACK. NO_EXCEPTION_HACK should not be set unless you try to compile libstdc++ or libgcc statically, so workaround should be enabled by default. 464f5e3 has much more detailed explanation of the whole ordeal. Note that the workaround assumes that no libraries are loaded dynamically after application starts (it is possible to support that too, just requires more coding)."

Basically Seastar does not have this bug any longer because one lock was eliminated by gcc 7 (so switch to gcc 7!) and a second lock we work around by reimplementing dl_iterate_phdr() ourselves in core/exception_hacks.cc.

robot-piglet pushed a commit to userver-framework/userver that referenced this issue May 12, 2023
Relates: https://st.yandex-team.ru/

Multiple threads on multiple cores should be able to concurrently throw exceptions without bothering one another. But unfortunately, it appears that in the current implementation of libstdc++ and/or glibc, the stack unwinding process takes a global lock (while getting the list of shared objects, and perhaps other things) which serializes these parallel exception-throwing and can dramatically slow down the program.

Some might dismiss this inefficiency with the standard "exceptions should be rare" excuse. They should be rare. But sometimes they are not, leading do a catastrophic collapse in performance.

We saw an illustrative example of an "exception storm" in an application of ours. This application can handle lots and lots of requests per second on many cores. Some unexpected circumstance caused the application to slow down somewhat, which led to some of the requests timing out. The timeout was implemented as an exception, so now we had thousands of exceptions being thrown in all cores in parallel. This led to the applications threads starting to hang, once in a while, on the lock(s) inside "throw". This in turn made the application even slower, and created even more timeouts, which in turn resulted in even more exceptions. In this way the number of exceptions per second escalated, until the point where most of the work the application was doing was fighting over the "throw" locks, and no useful work was being done.

This patch eliminates the "throw" lock by supplying our own "dl_iterate_phdr" function which operates over a cached list of shared objects, which should mitigate blocking behavior in exception storm scenario, but as a tradeoff disables dynamic loading/unloading during component system lifetime -- there is no thread-safe and robust way to synchronize that with the cache we've got.
If one really needs dlopen/dlclose outside of component constructor/destructor, this optimization can be disabled via USERVER_DISABLE_PHDR_CACHE cmake option.

In the benchmarks which just throw+catch in parallel we are expectedly seeing X (number of threads) improvements: it tooks some 20s+ seconds for 8 threads to throw a million of exceptions each in parallel, and now it takes mere ~2s. This also improves an RPS by a factor of 2+ when an endpoint under load just throws std::runtime_error.

Some references:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
scylladb/seastar@464f5e3
scylladb/seastar#73
https://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks
Riferus pushed a commit to Riferus/userver that referenced this issue Aug 24, 2023
Relates: https://st.yandex-team.ru/

Multiple threads on multiple cores should be able to concurrently throw exceptions without bothering one another. But unfortunately, it appears that in the current implementation of libstdc++ and/or glibc, the stack unwinding process takes a global lock (while getting the list of shared objects, and perhaps other things) which serializes these parallel exception-throwing and can dramatically slow down the program.

Some might dismiss this inefficiency with the standard "exceptions should be rare" excuse. They should be rare. But sometimes they are not, leading do a catastrophic collapse in performance.

We saw an illustrative example of an "exception storm" in an application of ours. This application can handle lots and lots of requests per second on many cores. Some unexpected circumstance caused the application to slow down somewhat, which led to some of the requests timing out. The timeout was implemented as an exception, so now we had thousands of exceptions being thrown in all cores in parallel. This led to the applications threads starting to hang, once in a while, on the lock(s) inside "throw". This in turn made the application even slower, and created even more timeouts, which in turn resulted in even more exceptions. In this way the number of exceptions per second escalated, until the point where most of the work the application was doing was fighting over the "throw" locks, and no useful work was being done.

This patch eliminates the "throw" lock by supplying our own "dl_iterate_phdr" function which operates over a cached list of shared objects, which should mitigate blocking behavior in exception storm scenario, but as a tradeoff disables dynamic loading/unloading during component system lifetime -- there is no thread-safe and robust way to synchronize that with the cache we've got.
If one really needs dlopen/dlclose outside of component constructor/destructor, this optimization can be disabled via USERVER_DISABLE_PHDR_CACHE cmake option.

In the benchmarks which just throw+catch in parallel we are expectedly seeing X (number of threads) improvements: it tooks some 20s+ seconds for 8 threads to throw a million of exceptions each in parallel, and now it takes mere ~2s. This also improves an RPS by a factor of 2+ when an endpoint under load just throws std::runtime_error.

Some references:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
scylladb/seastar@464f5e3
scylladb/seastar#73
https://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants