New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat as errors the uncaught exceptions in finalisers #8873
Conversation
Did you consider simply not letting uncaught exceptions in finalizers subvert the control flow of the program but still letting the program be made aware that something bad happened ? I always thought these kind of problems would be better solved by a trap (again sorry...) rather than effectively letting them turn any kind of exception into a asynchronous one (before this PR) or introducing a new, but single, one to represent them (this PR). Basically the idea would be to either use or extend (my preference) the uncaught exception handler and simply pass these uncaught exceptions there hereby easily letting the program decide whatever action it wishes to perform according to where they occur. Something like: type ctx = [ `Signal_handler | `Program | `Finalizer | `Self (* the uncaught handler itself raised *) ]
val set_uncaught_exception_handler : (ctx -> exn -> raw_backtrace -> unit) -> unit
val handle_uncaught_exception : ctx -> exn -> raw_backtrace -> unit |
Yes, I explained in detail the challenges of this alternative approach. The point is that this simple fix is available right now, does not raise as many questions, and does not preclude the solution you suggest from being implemented once OCaml is ready for it.
(No need to apologize for being insistent :)
This is what I meant with a dbuenzli-hook. In the handler, there is indeed not much else you can do besides aborting or swallowing the exception (and logging). Maybe you were thinking of being able to do something else? |
Right but somehow I don't see it fixing much. I mean the only thing you get is more information right ? That is: rather than a puzzling exception out of of the blue you still get it but wrapped in another one. Agreed it's better, but still... I'm not sure the problem is common enough to warrant introducing a half-fix in the stdlib which will have to be deprecated. Unless there's something I deeply misunderstand I don't really buy the "backwards-compatible" argument you are making in the first point above: these errors are non-deterministic and if they happen to occur outside an exn handler your program is already abortive now. So why not introduce the trap immediately ? You could even elect not to abort on
No that pretty much sums it... |
Okay sorry of course not if you have a global |
No, you get the (theoretical) assurance that when you catch for instance Not_found, it means what you expect. And it does not claim to achieve anything else.
I agree that it is quite theoretical, but it's pretty bad and gets mentioned often when discussing asynchronous exceptions. I do not think it is a half-fix, but a proper fix for OCaml as it currently stands (where you've currently got OOM, Sys.Break, and where you have to be careful about breaking changes).
I'd be in favour of an alternative PR which introduces a trap right away, but note that this currently needs to at the very least implement masking first, and to investigate the consequences of the change.
In general I am not sure that ignoring bugs amounts to a healthy default, which is part of the complications. But I agree it might end up the best choice, i.e. not necessarily consider uncaught exceptions as a bug, e.g. if you have a mental model where a finalizer runs in an isolated thread. Note that once you make ignoring the default, then it's hard to go back. And if you use libraries written when ignoring is the default, they might not be well-tested against changing this behaviour. With the current PR we have the time to wait and see where multicore is headed before settling on a design.
Yes, I used to think the same, which is why I did not think opening a PR until recently when I was encouraged to do so. Developers have been speaking about sanitizing uncaught exceptions in finalisers for a while now, everybody agrees it's bad, but I have not seen much progress. Sorry if it looks like a lot of discussion around tiny details, but it's important. |
I did not see your edit.
Not all exceptions out of finalisers are programming errors (currently). |
That's a convincing point.
|
Also, note the proposition of introducing a common exception Continuing with the idea of I am curious to audit the uses of Sys.Break in the opam repository. Has anybody some guidance on where to start? There's a challenging problem for backwards compatibility. Maybe it is necessary to introduce deprecation warnings for patterns like I'll be around at the OCaml workshop in the morning if people want to discuss this. |
That could be a start:
|
You mention in This PR does not go in a direction I'm interested in, but it doesn't seem to break anything I rely on. |
@lthls Thanks for your input. One aspect is that you do not want to catch without a catch-all, but you are allowed to match for display or logging purposes for instance (hence the "general rule"). This is in theory. I am now closing this PR as I am going to explain. @lpw25 @stedolan @gasche (and other people who have shown interest or displayed opinions about asynchronous exceptions), this is for the record. I did an audit of the use of asynchronous exceptions in OCaml packages published on opam. I performed the following searches.
Then I curated the results by hand by assessing whether they 1) purposefully raise exceptions in signal handlers or finalisers, 2) whether they explicitly match on this exception. Note that the goal was not to be exhaustive but to receive some concrete usage examples, hence the coarse search patterns and the search limited to opam. Nobody has the time nor the possibility to do an exhaustive audit. Here is a list of packages that rely on asynchronous exceptions by criterion 1):
The the best of my understanding, here is the sublist of packages that subscribe to the discipline of not discriminating the asynchronous exception from other kinds of unexpected exceptions (criterion 2):
The most prevalent use of async exceptions is raising an exception on SIGINT in interactive programs. Quite importantly, in many cases a specific action is performed, from a specific message ("Interrupted"), to a specific user interaction ("press Ctrl-C two more times to exit"), even to a distinct behaviour (taking the decision to respawn a task instead of aborting). It appears important to be able to discriminate As for finalisers, the first remark is that The prize of the most innovative use of asynchronous exceptions goes to Note again the empirical evidence in favour of the need to compute until some space or time bound is reached. So, contrary to what I understood of the maintainer folklore, the established ocaml codebase is not ready even for the mildest proposed change (wrap exceptions in As for multicore, I see new challenges. I clearly see a design for asynchronous exceptions in multicore, but there are two new questions: 1) is it possible to discriminate signals from other unexpected exceptions? (there are a few obvious ideas to explore), 2) for any such design, is there a clear evolution path from the established code base to the new design? (that sounds more challenging to me). Hope to see you in Berlin! |
I understand you went quickly over this but I'd just like to mention that:
does not as far as I understand your definitions belong to 1: it only matches over known asynchronous OCaml runtime system exceptions to let them flow up and traps any other one. It does not raise exceptions from signal handlers or use finalizers. So I think it's rather in 2. |
Ah, yes. I remember it as a special case because I remember even looking up details on opam and seeing you were the author. There are a few other packages I marked as "?" because they matched on
In doubt, in case I missed something, I registered the intent to match on Sys.Break. For completeness, here's the list of ones I have marked as "bad" (1 but not 2 nor above):
Edit: Here are my raw notes, for the record: https://gitlab.com/gadmm/stdlib-experiment/blob/master/other/async_audit/break_or_catch_break |
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
Before this patch, a Memprof callback can potentially raise any exception, including for instance Not_found out of thin air. This means that when doing "try ... with Not_found ->...", this might not always mean what the programmer intends. This is a well-known theoretical bug affecting the raising behaviour of finalisers (ocaml#8873). In general, any exception raised by Memprof is an asynchronous exception subject to the discipline of ML interrupts (see e.g. http://isabelle.in.tum.de/dist/library/Doc/Implementation/ML.html): one must not discriminate interrupts, and an interrupt must always be re-raised promptly except at isolation boundaries. This patch avoids this sort of bugs, and it encourages and facilitates following the discipline of interrupts for catching exceptions arising from Memprof callbacks. Note that some code in the wild exists of the form `try ... with Sys.Break -> ...`. Such code does not follow the interrupt discipline as it discriminates on interrupts. With this patch this practice becomes incompatible with Memprof (similarly to what it is already with Fun.protect). Users of such code who want to use Memprof must fix it to properly catch all exceptions, and can record the cancellation state into a boolean flag inside the signal handler of SIGINT for the current thread.
This old patch fixes the bug whereby catchable exceptions (e.g. Not_found) can in theory appear out of thin air, due to exceptions being raised at allocation points from finalisers. After discussion with @lpw25 and @stedolan, I decided to resurrect it and propose it as a PR.
Warning: this was my first modification to the runtime, initially meant as an exercise to find my way around the compiler. I cannot guarantee that I understand everything I wrote, so please review carefully. And no hard feelings if rejected!
The strategy is the same as for
Fun.protect
: we promote the uncaught exception into a serious exception by wrapping it into a new exceptionFinaliser_raised of exn
. This uncaught exception is documented as a programming error or unexpected exception (OOM, Sys.Break...), exactly likeFun.Finally_raised
, so one should not catchFinaliser_raised
other than with a catch-all.Right now, there is a consensus that uncaught exceptions in finalisers should be ignored or terminate the program (probably with a customisable dbuenzli-hook). I too think this is a good general direction, and the current PR does not aim to change what this ideal solution should be. There are a few comparison points where one might still want the current PR as a solution available right here and now:
The current PR is backwards-compatible: it is guaranteed to only reveal bugs (catching an exception out of a finaliser without a catch-all then re-raise, or at isolated boundaries, has to be a programming error). On the contrary, changing the behaviour to abort on uncaught exceptions in finalisers can turn buggy but hardened programs into abortive ones. So, one might prefer a smoother transition where one has an additional chance to detect these bugs in the meanwhile.
Besides programming errors, these uncaught exceptions in finalisers can currently be OOM or Sys.Break. In that case, it is better to ignore than to abort, but you at least need to mask signals while finalisers run. And if you want to allow aborting, you first need to sanitize OOM exceptions. Maybe it is a more involved change than it sounds, that pends a more general clarification of the exception safety model which is bound to happen with multicore. (It does make sense, for instance, to ask that finalisers are written in the same dialect as other resource cleanup functions where signals are masked and all exceptions should be handled.) In short, the alternative proposition is not obvious to design, and has no ETA.
The current PR fixes the bug now, and up until our multicore people ask that finalisers run in their own threads.
As an alternative design to the current
Finaliser_raised
which struck me due to the way the code is structured: it also makes sense to have a singleAsynchronous_exception of exn
to ensure that all asynchronous exceptions are promoted to a serious exception (that is: uncaught exceptions in finalisers, exceptions raised from signal handlers, and uncaught exceptions from memprof callbacks). Programs that catch Sys.Break explicitly might break, but one is already not supposed to catch Sys.Break without a catch-all.Failing tests: currently, I have the following kind of test failure, e.g.
typing-sigsubst/sigsubst.ml
:Is this a bug? Can you give me clues to fix it?