Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to prevent automatic Gc collection #8794

Open
toots opened this issue Jul 6, 2019 · 15 comments

Comments

@toots
Copy link
Contributor

commented Jul 6, 2019

Hi,

I apologize if this has already been discussed but we've been running into a pretty intractable issue with the Gc recently which I believe could be solved elegantly with an API to temporarily disable automatic Gc collection.

Here's a naive example pretty close to our case:

let logs = Queue.create ()
let m = Mutex.create ()

(* Queue log *)
let log msg =
  Mutex.lock m;
  Queue.add msg logs;
  Mutex.unlock m

(* Print logs *)
let print () =
  Mutex.lock m;
  Queue.iter print_log logs;
  Mutex.unlock m

let create_big_object id =
  let finalise () =
    log ("Object  " ^ id ^ " cleaned up!")
  in
  let s =  (...) in
  Gc.finalise_last finalise s;
  s 

What can happen here is a deadlock inside a thread calling any of the log or print functions. Indeed, if the Gc starts a collection on a value instantiated via create_big_object then we may run into a double-locking of the same mutex.

This example is naive but, in a large scale application, tracking down every call that may be executed inside a finalise callback can be pretty difficult.

It is possible to write complex wrappers about those operations to make it work but I believe that this kind of issue is bound to happen in many other cases where the user writes some blocking code not knowing that the Gc might interrupt it and run into a double locking situation.

On the other hand, being able to assure that the Gc will not interrupt the execution would guarantee an elegant and easy to check solution. Something like:

let log = Gc.suspend (fun msg ->
  Mutex.lock m;
  Queue.add msg logs;
  Mutex.unlock m)

(* Print logs *)
let print = Gc.suspend (fun () ->
  Mutex.lock m;
  Queue.iter print_log logs;
  Mutex.unlock m)
@pmetzger

This comment has been minimized.

Copy link
Member

commented Jul 6, 2019

Rather than calling it suspend how about with_gc_suspended or some such?

(I'm not attached to that name, either; my issue with suspend is that it seems to me like it would mean something that statefully suspend gc until a resume was called, while you're trying to present an API in which a thunk will be called with gc suspended for the duration of the execution of the thunk.)

@bobot

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2019

Blocking only the execution of finalizers would be sufficient, and far easier to achieve. However other asynchrony (timer, signals, finalizer of custom block) will still be a problem, so looking at lock-less queuing (which are easier in the monocore case) seems more future proof.

@toots

This comment has been minimized.

Copy link
Contributor Author

commented Jul 7, 2019

Blocking only the execution of finalizers would be sufficient, and far easier to achieve. However other asynchrony (timer, signals, finalizer of custom block) will still be a problem, so looking at lock-less queuing (which are easier in the monocore case) seems more future proof.

This is correct for the current example but lock-less data structures are inherently hard and you would still have the possibility of running into this situation with any other concurrent lock implementation.

@pmetzger

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

Yah, naively, this does really seem like an area where what you want is the ability to stop gc briefly in a critical section.

(That said, in a safe language, the existence of critical sections is probably a misfeature, but on the other hand, fixing that is not a simple matter at all!)

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

The idea of "stopping the Gc" doesn't fly, because if there is no Gc you cannot allocate memory, and so there is essentially nothing at all you can do. (In this specific example the problem comes precisely from the fact that there are allocations, which you want to be able to do, in the critical section.)

So what could be discussed here is a way to temporarily delay the execution of asynchronous code (most of which is scheduled by Gc-related operations currently, but in most case that is arguably an implentation detail): finalisers, signal handlers, memprof handlers, etc.

Instead of a with_asynchrony_delayed higher-order function that you would have to remember to call in all critical sections, I would rather recommend considering extra logic to be called from the asynchrony handlers when they run in the risk of deadlocks. We could have a builtin exception Delay that, say, finalisers can raise, and means "I cannot be finalized now, remember me and try again later". (Many asynchronous mechanisms already support some fort of delaying for events that occur while running C instead of OCaml code.) Then the finalizer would look like

let try_log msg =
  if not Mutex.try_lock m then false
  else begin
    Queue.add msg logs;
    Mutex.unlock m
  end

  let finalise () =
    if not try_log ("Object  " ^ id ^ " cleaned up!")
    then raise Delay
  in

This API has simpler types and it only needs to be used carefully from asynchronous handlers, instead of polluting normal (critical) code paths.

@toots

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

This looks interesting to me but part of the problem is that it can be tricky to track down the call chain from the initial registration of a finalise callback to the actual deadlocking call, specially in complex stacks. This API would work nicely if there was a way to detect wether or not a given part of the code is being executed from within a Gc callback. For instance:

let log msg =
  if Gc.callback_context() then raise Gc.Delay;
  Mutex.lock m;
  Queue.add msg logs;
  Mutex.unlock m

  let finalise () =
    some_function_that_later_calls_log ()
  in
@gadmm

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

It makes a lot of sense to have a higher-order function that temporarily disables all asynchronous operations for the duration of a scope. That should solve your issue in the short run by allowing you to do what you described initially. If I followed correctly, that should be simpler to implement following recent related work by @jhjourdan for the FFI (?). I plan to ask Leo and Stephen their thoughts next week, since this is useful for resource management in general. Depending on the difficulty, I'm happy to get involved.

In the long run, maybe the issue is that you are trying to do some serious effects in your finaliser, which you do not know when it runs. Gabriel proposes to extend the language with some facilities that would let the programmer try to detect whether now is a good time to run the finaliser from the finaliser itself. Instead of doing things in a roundabout way, one could also try to use deterministic resource management facilities to perform serious clean-up operations only at times the programmer can predict, and see where those facilities are currently lacking.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

I agree with the idea of being able to temporarily prevent asynchronous operations, but I think of it as a low-level facility to avoid races in very very specific resource-handling code (like the unwind_protect implementation you demonstrated previously), to be used only for very short-lived computations. Intuitively my feeling is that mutual-exclusion sections in a concurrent program have a much coarser granularity than what is desirable for such a mechanism.

That said, the solution that I propose also has clear downsides -- it is invasive in term of affecting the reasoning when writing all the code that may be called from an asynchronous operation.

@gadmm

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

I agree with you that ideally the end programmer should not have to sprinkle the code with masking functions. Masking's purpose should remain low-level. But although I'd like to be able to say confidently that Romain's example is really a use-case for deterministic resource management rather than masking, I do not think we have enough information on the use-case to know whether the actual solution would be to do things differently. On the other hand, the application does not look outlandish to me if meant as a quick alternative to more involved synchronisation mechanisms: depending on the use-case, the desired solution can be entirely reasonable (the sections are small, etc.).

@gadmm

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

I have seen @lpw25 who advised that you run your finaliser in a thread of its own.

Would that work for you?

@bobot

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Since the GC will not be activated if there is no allocation, we could limit the allocation or be sure that they will succeed in the minor heap.

People wanted to be able to ask the compiler to check that some code doesn't allocate, for example the body of a very hot loop. For example an annotation like [@ no_allocation] would be added to a function, and the compiler would stop with an error if it can't ensure that during the call of this function no allocation is done (it would perhaps also prevent inlining of the function).

You could use another implementation of your critical section were you can allocate all the needed datastructure before entering the critical section, and check that the critical section doesn't allocate.

We can also ask the compiler to do all the allocation: [@ pre_allocate] would check that all the allocations of the functions will be done in the minor heap and at the start of the functions it calls the gc until the minor_heap as enough free space for all the allocations.

@toots

This comment has been minimized.

Copy link
Contributor Author

commented Jul 28, 2019

I think @bobot propositions look great. Solves the issue without changing the language..

@gadmm My use case is a logging mechanism. I think it fits a couple of real checkboxes:

  • It needs to be coordinated, otherwise log lines get mixed up on the screen
  • It happens everywhere in the application, potentially at fast pace and during low-level operations (in my case logging the collection of an object)

I'm aware of the existence of the Message module. I tried and failed to implement the functionality using this module, but that's not the topic here.

The reason I reported the issue is that, I believe that it is a fundamental concern with the language's semantics that ought at least to be discussed and, if possible addressed.

@gadmm

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

@bobot: Regarding [@ no_allocation], this could be a role for the effect system. Regarding [@ pre_allocate], 1) if a bound on the required memory is known, it can already be done by hand with Gc.get_minor_free/minor/stack_size, 2) if not, what is the strategy to compute a bound during compilation? I am sceptical regarding [@ pre_allocate] because it is not clear how to reliably program/reason with it, and it trashes the performance guarantees of the GC to collect the minor heap too often.

@toots: I am curious about what makes it important for you to observe finalisation. What sort of work does your finaliser do apart from logging?

@bobot

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

  1. if a bound on the required memory is known, it can already be done by hand with Gc.get_minor_free

Oh, yes 😲 ! while Gc.get_minor_free () < $bound; do GC.minor (); done. However computing $bound is hard outside the compiler. And it would be nice that inside the preallocated section we don't have to test the size of the free space in the minor gc.

  1. if not, what is the strategy to compute a bound during compilation?

Perhaps naïvely, like for [@ no_allocation], two possibilities:

  1. each function could indicate in the .cmx the maximum quantity of allocation done
  2. inline all the sub-functions and then compute the maximum number of allocation.

I am sceptical regarding [@ pre_allocate] because it is not clear how to reliably program/reason with it,

Indeed whenever you have a loop that can't be bounded, or recursive function the compiler will fail. Which could lead to maintenance nightmare. However this kind of code should use only functions you have complete control on. The semantic of the program is the same as before, except without implicit execution of the runtime.

and it trashes the performance guarantees of the GC to collect the minor heap too often.

I don't see why it must be. If you have enough free space in the minor heap, no need to run a minor gc. Of course for complexe code the WCMA (worst-case minor allocation) could be a huge over-approximation and in this case more minor gc will be done. But usually such critical sections stay simple.

@bobot

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

Since the GC will not be activated if there is no allocation

The MR #8716 showed me that my affirmation is surely false because caml_array_blit and now caml_array_fill could call the GC through caml_check_urgent_gc.

  /* Many {caml_modify, caml_initialize} in a row can create a lot of old-to-young refs.
     Give the minor GC a chance to run if it needs to. */
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.