Skip to content

Conversation

@gadmm
Copy link
Contributor

@gadmm gadmm commented Jul 18, 2024

The PR #12558 introduced the following regression: GC alarms did not run any more. (Reported at https://discuss.ocaml.org/t/changes-in-handling-of-gc-parameters-and-alarms-in-5-2-0/14986/2)

The arec variable was referenced in the closure for at_exit, which prevented the value from being finalised.

I wondered why it was not caught, and in fact the testsuite had no test for alarms. So naturally I add a regression test, based on the provided reproducer.

@lthls
Copy link
Contributor

lthls commented Jul 18, 2024

create_alarm should probably be marked with [@inline never] to be reliable. Otherwise, the arec record could end up statically allocated which would prevent it from ever being collected.
You might be able to observe this with your test case if you compile with flambda and -O3.

@gasche
Copy link
Member

gasche commented Jul 18, 2024

Instead of playing with [@inline never], we could register the finalizer on ref arec -- a mutable value that cannot be shared. This would also have prevented the initial bug.

@gadmm
Copy link
Contributor Author

gadmm commented Jul 18, 2024

Nice, I just applied the suggestion. I chose inline never of which I understand the consequences better.

@gadmm
Copy link
Contributor Author

gadmm commented Jul 18, 2024

@Octachron This is going to be a candidate for backporting

@Octachron
Copy link
Member

Yes, I agree.

@lthls
Copy link
Contributor

lthls commented Jul 18, 2024

Instead of playing with [@inline never], we could register the finalizer on ref arec -- a mutable value that cannot be shared. This would also have prevented the initial bug.

That would be a decent solution, but making the alarm_rec type mutable would have the same effect without the extra allocation.

@gasche
Copy link
Member

gasche commented Jul 18, 2024

@lthls This was my initial idea, but using ref arec is better because it is syntactically obvious that this value is already dead, and in particular isn't captured by a closure. (I find that it also clarifies what goes on when re-installing the alarm later on -- we set the finaliser on a new value, instead of setting it on an older value that we have to trust remains dead.)

@lthls
Copy link
Contributor

lthls commented Jul 18, 2024

@lthls This was my initial idea, but using ref arec is better because it is syntactically obvious that this value is already dead, and in particular isn't captured by a closure.

I don't follow. With the new code, arec is used only once, so wouldn't inlining its definition at the use do the same thing ?

(I find that it also clarifies what goes on when re-installing the alarm later on -- we set the finaliser on a new value, instead of setting it on an older value that we have to trust remains dead.)

We could have this property by implementing call_alarm like this:

let rec call_alarm arec =
  if Atomic.get arec.active then begin
    let finally () = finalise call_alarm { arec with f = arec.f } in
    Fun.protect ~finally arec.f
  end

@gadmm
Copy link
Contributor Author

gadmm commented Jul 19, 2024

It is not clear from the discussion that there is anything left to fix. Is there something left to change?

@gasche
Copy link
Member

gasche commented Jul 19, 2024

I don't like the use of [@inline never] to reason about program semantics / optimization boundaries, to me inlining is a performance optimization that should not affect program behavior. Whenever possible, I prefer in-language mechanisms to enable a certain program reasonable, for example having a record with mutable fields to ensure that their identity cannot be shared across callers. That does not mean that your version is wrong, or that I am vetoing the PR, or demanding changes -- I recognize the expertise of @lthls and believe that you @gadmm also probably have good reasons to prefer this approach -- but I'm not going to be the approving&merging maintainer for this version that runs against my taste and intuition.

@gadmm
Copy link
Contributor Author

gadmm commented Jul 19, 2024

I understand the meaning of ref in forcing values to have an identity, but I don't typically relate this to being able to reason about the lifetime. It took me thinking about the compilation to understand why this would prevent making them static, and I am not sure I see how this is a consequence of preventing sharing.

In both cases one reasons on consequences of implementation details rather than more conceptual program explanations, so both are equivalent to me in this regards. I'll defer to @lthls for what is the best solution.

@lthls
Copy link
Contributor

lthls commented Jul 19, 2024

I'm a bit worried about the test. How long does it take to run ? If it is more than one second I think it should be changed to something less expensive (like a sequence of Gc.full_major ()).

@gadmm
Copy link
Contributor Author

gadmm commented Jul 22, 2024

Good question, it takes 400ms to run here, which is also how long using Gc.full_major takes.

gadmm added 2 commits July 22, 2024 18:15
The arec variable was referenced in the closure for at_exit, which
prevented the value from being finalised.

Added a regression test.
@gadmm
Copy link
Contributor Author

gadmm commented Jul 22, 2024

In case of failure of the test, the change I just pushed reduces the duration from 39s to 4s.

@lthls lthls merged commit 9ca0ce5 into ocaml:trunk Jul 22, 2024
@lthls
Copy link
Contributor

lthls commented Jul 22, 2024

I don't like the use of [@inline never] to reason about program semantics / optimization boundaries, to me inlining is a performance optimization that should not affect program behavior.

Performance optimisations affect any program behaviour that depends on the GC, and this is one of them. We do not provide any guarantee about the time at which a value will be collected (some values will still be alive when the program ends), so whether a GC alarm will trigger in a given program can depend on GC parameters, compiler optimisations, and more. The [@inline never] trick is a small change that will have no visible impact, and prevent a few (but maybe not all) cases where the compiler could have prevented the alarm from triggering. The ref trick is in the same category, but with a (very small) performance cost, and it does not really prevent the compiler from keeping the value alive (yes, even anonymous references could be kept alive indefinitely in some cases; it is not particularly useful, but preventing it is more work than allowing it).

@gadmm
Copy link
Contributor Author

gadmm commented Jul 24, 2024

As an aside,

We do not provide any guarantee about the time at which a value will be collected

Is there no effort at all in ensuring safety for space for flambda?

@lthls
Copy link
Contributor

lthls commented Jul 24, 2024

Is there no effort at all in ensuring safety for space for flambda?

I'm not sure what you mean by "no effort". OCaml's compilation scheme for closures ensures that the SSC property from the paper holds for all variables that are not statically allocated. Or formulated differently, the SSC property always holds in OCaml, but whether a variable is local or global depends on context and optimisations.

@gadmm
Copy link
Contributor Author

gadmm commented Jul 24, 2024

Thanks @lthls, this is the clarification I was hoping for

@lthls
Copy link
Contributor

lthls commented Jul 24, 2024

Thinking a bit more, I think that the SSC property might not hold with recursive definitions:

let f x y =
  let rec a () = x and b () = y in
  ignore (b ());
  a

I think that on the above example, both x and y will be kept alive because the closure for a is shared with the one for b.
Flambda2 should get rid of this because we separate the recursive definitions into strongly connected components, but I don't think this is done for the official releases (flambda or not, native or bytecode).

Octachron pushed a commit that referenced this pull request Jul 25, 2024
Fix GC alarm regression

(cherry picked from commit 9ca0ce5)
@Octachron
Copy link
Member

Cherry-picked to 5.2 as 13ee4da .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants