-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: ensure that finalizers are not called in [noexc] mode #11868
Conversation
This commit implements no functional change, it merely propagates a 'noexc' parameter from the shared-heap entry points to "sweep" functions -- which are in charge of potentially running finalizers. These 'noexc' parameters are currently not used.
1c2db0a
to
ff33f3b
Compare
Is there a reason we can't just reuse the 4.x logic? |
We could, but this also requires an invasive small change, and the performance implications would be similarly hard to predict, even harder than the current approach that preserves the current behavior in absence of custom blocks with finalisers. |
Note: @xavierleroy pointed out to me that there is a difference between "C finalisers", which are added by some custom blocks, and "OCaml finalisers" which are added by |
On the multicore side, I think the relevant people to ping for feedback would be @kayceesrk and @ctk21. |
ff33f3b
to
967aad8
Compare
@damiendoligez pointed out (from just an oral description of the PR) that it is not correct to just "skip" a garbage value during sweeping. I tried to change the code to "stop sweeping" on the first custom value with a finalizer, but this proved a bit too hard to someone unfamiliar with the sweeping logic. (I have the impression that What I did instead is to update this PR to avoid sweeping completely in The old implementation remains available on my remote fork: The new version still behaves badly with the "compaction corner case" (the previous version was 200x slower than 5.00 on this code, the new version is only 100x slower, but something is still clearly wrong). |
The slowdown in completion_corner_case is due to a quadratic behavior in the runtime if sweeping is disabled during minor collections. The corner_case code is very simple: let c = ref []
let () =
for i = 0 to 1000 do
c := 0 :: !c;
Gc.compact ()
done (In 5.x, So: a loop with an allocation followed by a major GC. The quadratic behavior when sweeping is disabled during the minor GC is as follows;
With the trunk code, a single pool moves from unswept to available to unswept again during each iteration performing a minor GC. With the PR code, each minor GC allocates a new pool that immediately becomes "unswept" and is never available for further allocations. Each major GC then has one more pool to sweep, resulting in quadratic behavior. With an iteration count of 1000, trunk creates 23 pools in total, while the PR creates 1027. |
Currently we are in a weird situation where the major GC cycles never make pools available to the allocator, only major slices and the minor GC make pools available. If we disable sweeping during minor GCs, and the user continually forces major cycles, we have lost. I'm out of my league to fix this issue. I would expect a full major cycle (or whatever it is that |
I'm not sure I understand why we can't sweep during opportunistic major slices. These are done before and after a domain has finished it's minor collection. I don't think there's a circumstance whereby you could do an opportunistic major slice while iterating over global roots? |
I'm not fully sure, but my understanding is that during an opportunistic major slice, other domains may be still in their own minor GC. For example one domain may be in the middle of scanning the global roots. Note that restoring sweeping in opportunistic major slices would not prevent the quadratic behavior in this test, which is single-domain and thus never performs opportunistic slices. |
I'll take a contrarian view from the description. C (custom) finalizers can run outside of safe points, unlike OCaml finalizers. In exchange, custom finalizers should not "access the runtime" (they do not possess the runtime capability, in ownership parlance). They are meant to deal with external data and resources. In the original bug report, removing a global root is an access to the runtime (e.g. this borrows the runtime capability in the Rust-OCaml FFI). From this point of view, the original code has a programming error. This restriction was undocumented AFAIR, but clear when reading the runtime code already in OCaml 4 (for the few people doing so; I am not blaming the user). The question is what to do about it; from this angle the choice is between keeping the restriction and trying to relax it. I do not yet understand what state of affairs this PR tries to achieve. What would be a high-level description of the new situation with this PR, in terms that can be explained to users? I suspect that if you want to allow runtime access inside custom finalizers, there is more work to do. Possible alternative solutions:
Solution 2. sounds sensible to me, because it is very accidental that the original code worked in OCaml 4. There are very few things one can do inside custom finalizers, and it is very accidental that remove_global_roots worked in this situation. There are probably few functions that would work accidentally, and I am not aware of other runtime functions being in situation of being relaxed to no longer require the runtime capability. In addition, it is desirable to be able to call these functions from custom finalizers. It is therefore a reasonable way to manage backwards-compatibility expectations. |
As for the documentation that could be clearer:
This is very informal speak for saying that it does not have the runtime capability; as we noticed in our work on the OCaml-Rust FFI, we can see the benefit of adding "runtime capability" or "domain capability" to the vocabulary because with it we see that the remove_global_root functions require it and we do not have it. There are probably other things one cannot do inside custom finalizers in OCaml 4 already, that are not listed in the documentation above. (It is also unclear that the argument to |
I have sympathy for this viewpoint. Maybe the appropriate response to #11865 is to better document the restrictions on custom block operations, and to mention OCaml finalizers as a more flexible alternative to custom block finalizers. |
In our work on Sundials/ML, the permissibility of calling |
Excellent points, thanks @gadmm. My answers:
I can see a use-case for calling root-registration functions from a C finaliser: when we have a C value that we want to pass around as an OCaml value, and may in turn refer internally to OCaml values.It seems relatively natural in this case to store the C value in an OCaml custom block, and register its OCaml children as roots, deleted by the custom finalization function. |
@tbrk Nice find. What I do not understand, though, is that the thread says that one should probably not do so. The discussion still reports a good use-case and the existence of programs relying on it (which did work in practice).
In relation to boxroot, yes, one should ideally not require the runtime capability to remove a root for various reasons (discussed in the paper). I used to believe that OCaml 5 improved on this point thanks to the mutex. But as the issue shows, OCaml 5 has not entirely lifted the restriction, so it would be nice to go the (easy) extra step. This would be a win-win. For a bigger change such as delaying custom finalizers to a safe point, what is the use-case for defining custom values from OCaml? Please also take into account whether you have sufficient maintainance workforce to implement the change and carefully evaluate its effects. I might be able to review such a change, but please prioritize the collective time. |
I read this as Guillaume-speak for "there is no point in working on this now". I agree that it's not a priority, and even my intermediate fix here (which is not providing all the guarantees we want, but at least fixing the bug) is running into unexpected snags that I don't have the expertise to fix by myself. Closing. I will restart discussing lifting the mutex restriction in the relevant issue thread., #11865. To summarize the understanding of the runtime that I acquired during the work on this PR:
(The sweep-then-mark design is clearly explained in the "Retrofitting" paper, but I had forgotten about it since.) |
@gadmm: My reading of the thread is that calls to
to
This is not a big deal, but it is slightly more complicated. The finalizer can no longer be a static function hidden in the C file, but has to be mentioned across the C and OCaml files, the call to Gc.finalise must not be forgotten, and care must be taken with exceptions between a |
Slightly orthogonal, but this reminds me that we desperately need design documents IMO. Published papers that don't stay up to date with the code do not cut it, I'm afraid. There's too much institutional knowledge kept in the heads of specific people. This would never be acceptable in a modern company setting. |
Note: I parked my last iteration on this PR in a branch of its own: The previous iteration: |
Yes, it happened before that the documentation was incomplete and left users to make assumptions based on observed behaviour.
No, it transparently said what it said (and asked a question). |
This is an attempt at fixing #11865. It is a small but invasive change to the runtime code, and we would need feedback from people familiar with
shared_heap.c
to tell if this may work.The problem
#11865 is a deadlock caused by the execution of a finalizer during the scanning of minor GC roots. The reason that a finalizer might run during the minor GC root scanning is that moving live roots to the major heap will try to find "free slots" in the major heap, which will in some case try to sweep some dead values to get such free slots. When the values swept are custom blocks with a finalizer, sweeping calls the finalizer.
I suspect that calling finalizers from oldification code of the minor GC is adefect of the current runtime, that is bound to create many more issues than #11865. (In particular issues that are not specific to global roots, which are involved in the #11865 deadlock.) For example, it appears that currently the
intern_alloc_obj
function, which also callscaml_shared_try_alloc
, could also call finalizers.(A look at the 4.x runtime sources suggests that the 4.x runtime would not try to sweep existing dead blocks in these situations, it would extend the major heap right away.)
The proposed fix
The fix proposed here is to propagate a
noexc
boolean to all runtime functions that may eventually call asweep
function. (noexc
is an existing runtime convention to say that a function is called in a setting where raising exceptions is not safe. In particular, if one may not raise exceptions then one cannot call finalizers.) When sweeping functions find a dead custom block with finalizer innoexc
mode, the value is left as-is instead of being freed. Those values can only be freed by major-GC invocations that are not innoexc
mode.I could observe that this fixes the issue in #11865.
Remaining issues
This change may affect the dynamics of the major GC in non-trivial way, and we should test the performance impact carefully.
In particular, the test compaction_corner_case.ml seems to slow down massively due to this change (to the point that it appears to never terminate), and I currently have no idea why. (The rest of the tessuite passes.)
I have marked the PR as "draft" while I haven't found the cause of the compaction_corner_case slowdown, but design comments and reviews are still warmly welcome in the meantime.