Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fail too hard in domain_thread_func #12855

Merged
merged 1 commit into from Jan 17, 2024

Conversation

dustanddreams
Copy link
Contributor

@dustanddreams dustanddreams commented Dec 21, 2023

As noticed by @jmid, commit 3838bbf by @NickBarnes adds a hard failure in domain_thread_func if domain_create fails, although the remainder of this routine actually is designed to cope correctly with allocation failure.

Remove this and keep the graceful failure behaviour.

Probably not worth a changes entry, this regression was only introduced on december 8th.

…fails.

The logic in this routine already nicely deals with failure.
@gasche
Copy link
Member

gasche commented Dec 21, 2023

I am not sure that your proposed change is correct. My guess would that @NickBarnes added this ( see the discussion at the time around #12382 (comment) ) because with his statmemprof changes, domain_create initializes new state that is not properly released by the pre-existing failure path. Let's wait to hear what @NickBarnes thinks.

@jmid
Copy link
Contributor

jmid commented Dec 21, 2023

For everyone's viewing benefit, consider the following program which I previously shared elsewhere to illustrate the difference:

let size = 256

let rec myspawn i () =
  if i = 0
  then ()
  else
    let d = Domain.spawn (myspawn (i-1)) in
    Domain.join d

let _ =
  try
    myspawn size ()
  with Failure s ->
    begin
      assert (s = "failed to allocate domain");
      Printf.printf "Oh well, too ambitious Domain count\n"
    end

On 5.0.0, 5.1.0, and 5.1.1 this behaves consistently:

$ ocamlopt -o /tmp/fatal.exe /tmp/fatal.ml 
$ /tmp/fatal.exe
Oh well, too ambitious Domain count

On trunk (commit ba71f2e from this morning):

$ /tmp/fatal.exe 
Fatal error: Failed to create domain
Aborted (core dumped)

Perhaps this kind of 'negative test' can be included in compiler test suite? (feel free to use this one!)

@gasche
Copy link
Member

gasche commented Dec 21, 2023

My intuition would be that the behavior could be improved here, so it is sensible to report this as a regression, but that it may need a different, more complex fix. (But this may be wrong and in fact the proposed fix is fine?) I don't know if the people working on that aspect of the runtime are interested in implementing this more complex fix, or if they think that the current hard-failure behavior is perfectly fine, in which case your test/model should be updated.

@jmid
Copy link
Contributor

jmid commented Dec 21, 2023

FYI, I've also given this PR a succesful spin on multicoretests 👍
Please note, this only checks "non-memprof usage" of the runtime.

@NickBarnes
Copy link
Contributor

Yes, I'm pretty sure this is right. I introduced this problem while figuring out failure recovery, as @gasche suggests, and never cleaned it up before making this PR. I observe that there is a still a niggle in domain-creation failure (some failures still leave caml_num_domains_running incremented), but that should be a separate PR.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on behalf of @NickBarnes and also myself. Thanks!

Could you add a Changes entry?

@dustanddreams
Copy link
Contributor Author

Could you add a Changes entry?

Do you really want one for such a minor fix? I wrote "probably not worth a changes entry, this regression was only introduced on december 8th."

@gasche
Copy link
Member

gasche commented Jan 17, 2024

Fair enough, I'll go ahead and merge. (I understand that the change we are fixing is not included in the 5.2 branch, so this does not need backporting either.)

@gasche gasche merged commit 55c3487 into ocaml:trunk Jan 17, 2024
11 of 13 checks passed
@dustanddreams dustanddreams deleted the overzealous_domain_failure branch January 17, 2024 13:54
@jmid
Copy link
Contributor

jmid commented Jan 17, 2024

(I understand that the change we are fixing is not included in the 5.2 branch, so this does not need backporting either.)

The bug is also present on the 5.2 branch, so a cherry-pick would be appreciated!
I just confirmed with a repro of the above with a 5.2 built on Monday:

$ ocamlopt -o /tmp/fatal.exe /tmp/fatal.ml 
$ /tmp/fatal.exe
Fatal error: Failed to create domain
Aborted (core dumped)

(I suspect 5.2 was branched shortly after this was opened).

@gasche
Copy link
Member

gasche commented Jan 17, 2024

Thanks for checking that! (Somehow github displayed the wrong branch information on 3838bbf ). Let's backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants