Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8274320: os::fork_and_exec() should be using posix_spawn #5698

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Sep 25, 2021

Hi, may I have reviews for this small patch please?

os::fork_and_exec(), used in the hotspot to spawn child programs (scripts etc) in error situations, should be using posix_spawn().

ATM it uses either fork() or vfork(). vfork() got deprecated on MacOS and we get build errors (JDK-8274293) - even though in this case it would be completely fine to use. This leaves us with fork() for MacOS, which has the known problems with large-footprint-parents. This matters here especially since we also use os::fork_and_exec to implement -XX:OnError for OOM situations.

We already use posix_spawn() as default for Runtime.exec() since JDK 15, and it is available on all our Unices. We also should use it here.

I kept the name of the function (fork_and_exec) since people know it, even though it's more incorrect now than before.

Tests:

  • manual tests using -XX:OnError with various scripts, including checking that env variables are passed correctly
  • manually ran runtime/ErrorHandling tests
  • GHAs

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8274320: os::fork_and_exec() should be using posix_spawn

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5698/head:pull/5698
$ git checkout pull/5698

Update a local copy of the PR:
$ git checkout pull/5698
$ git pull https://git.openjdk.java.net/jdk pull/5698/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5698

View PR using the GUI difftool:
$ git pr show -t 5698

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5698.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 25, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 25, 2021

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Sep 25, 2021
@tstuefe tstuefe force-pushed the JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn branch from 103ddd6 to 859c893 Compare Sep 25, 2021
@tstuefe tstuefe force-pushed the JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn branch from 859c893 to 42e4120 Compare Sep 26, 2021
@tstuefe tstuefe marked this pull request as ready for review Sep 26, 2021
@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Sep 26, 2021

/label hotspot-runtime

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Sep 26, 2021

/label remove hotspot

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Sep 26, 2021

/label hotspot-runtime

@openjdk
Copy link

@openjdk openjdk bot commented Sep 26, 2021

@tstuefe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict rfr labels Sep 26, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 26, 2021

Webrevs

@openjdk openjdk bot added the hotspot-runtime label Sep 26, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 26, 2021

@tstuefe
The hotspot-runtime label was successfully added.

@openjdk openjdk bot removed merge-conflict hotspot labels Sep 27, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 27, 2021

@tstuefe
The hotspot label was successfully removed.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 27, 2021

@tstuefe The hotspot-runtime label was already applied.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Sep 29, 2021

Hi Thomas,

There is a crucial difference between the requirements of os::fork_and_exec versus the Java Runtime.exec call - in the VM we need to use an async-signal-safe function where possible. fork() is async-signal-safe but posix_spawn is not.

Cheers,
David

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 6, 2021

Hi Thomas,

There is a crucial difference between the requirements of os::fork_and_exec versus the Java Runtime.exec call - in the VM we need to use an async-signal-safe function where possible. fork() is async-signal-safe but posix_spawn is not.

Cheers, David

Hi David,

fork() is not async-signal safe either. Since fork() can cause the execution of at-fork handlers.

https://man7.org/linux/man-pages/man7/signal-safety.7.html

       *  POSIX.1-2001 TC1 clarified that if an application calls
          fork(2) from a signal handler and any of the fork handlers
          registered by pthread_atfork(3) calls a function that is not
          async-signal-safe, the behavior is undefined. A future
          revision of the standard is likely to remove fork(2) from the
          list of async-signal-safe functions.

Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.

For me, this is an important point. Analysis options such as OnError should be very reliable. They are often used in support situations where time is short and customer patience is limited. Failing analysis causes more iterations or may make analysis even impossible.

Cheers, Thomas

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 6, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 6/10/2021 4:52 pm, Thomas Stuefe wrote:

On Wed, 29 Sep 2021 06:11:27 GMT, David Holmes <dholmes at openjdk.org> wrote:

Hi Thomas,

There is a crucial difference between the requirements of os::fork_and_exec versus the Java Runtime.exec call - in the VM we need to use an async-signal-safe function where possible. fork() is async-signal-safe but posix_spawn is not.

Cheers, David

Hi David,

fork() is not async-signal safe either. Since fork() can cause the execution of at-fork handlers.

https://man7.org/linux/man-pages/man7/signal-safety.7.html

    \*  POSIX\.1\-2001 TC1 clarified that if an application calls
       fork\(2\) from a signal handler and any of the fork handlers
       registered by pthread\_atfork\(3\) calls a function that is not
       async\-signal\-safe\, the behavior is undefined\. A future
       revision of the standard is likely to remove fork\(2\) from the
       list of async\-signal\-safe functions\.

Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.

Sorry but I have to disagree. fork() is async-signal-safe, but if an
at-fork handler is not, then all bets are off - that is fine, it is the
best we can do. But posix_spawn makes no claim to any kind of
async-signal safety so we very much do lose something by switching to it
IMO.

David
-----

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 7, 2021

Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.

Sorry but I have to disagree. fork() is async-signal-safe, but if an at-fork handler is not, then all bets are off - that is fine, it is the best we can do. But posix_spawn makes no claim to any kind of async-signal safety so we very much do lose something by switching to it IMO.

David -----

Hi David,

I looked a bit closer, since I wanted to figure out whether the async-unsafeness of calls in error reporting actually matters. Because the problem is that these functions are not re-entrant, right? But we already have an intricate mechanism in place to guard against re-entrance errors, with our secondary signal handling.

So the first time we enter error handling, we mark this thread as the reporting one and install the secondary signal handler. All subsequent invocations open a new frame, and we skip the error reporting steps that caused the last error. So for re-entrance problems like this:

  • main()
  • signal !
  • posix_spawn()
  • signal !
  • posix_spawn()

we are covered: in VMError::report_and_die(), most steps are guarded against re-execution by the step-counting-logic inside VMError::report() and by boolean flags inside VMError::report_and_die() ("log_done" and such).

(Note that this mechanism seems not well understood and bitrots: recent addition like the Jfr::on_vm_shutdown miss this logic. That step would be executed over and over again. That is a separate issue and should be fixed.)

Wrt to OnError, it is guarded against multiple executions via skip_OnError:

  static bool skip_OnError = false;
  if (!skip_OnError && OnError && OnError[0]) {
    skip_OnError = true;

So it won't be re-executed if a secondary signal happens inside error handling itself.

The only caveat here is that it does not guard us against problems if the non-reentrant function we call in signal handling is already atop of us on the stack:

  • main()
  • posix_spawn()
  • signal !
  • posix_spawn()

But for this to happen, the signal must originate from posix_spawn itself, and be a synchronous error signal which causes us to invoke error handling. So, posix_spawn() needs to be crashy in the first place. I'd argue that the chances for this to happen are very slim, unless the libc itself is broken.


fork() is async signal safe only if no atfork handlers are used. We don't know that since we share the process with other entities, including system libraries themselves. I even dimly remember reading that the glibc itself using atfork handlers for internal cleanup, but cannot come up with a prove. But using atfork handlers is a common technique used by libraries to close mutexes on fork. So the current fork() never has been completely async signal safe either.


posix_spawn has the charm that it allows us to circumvent a very common problem with forking in low-memory situations. Like vfork(), but with less risk involed. We analyzed this ([1]) when @dmlloyd proposed to exchange vfork against posix_spawn in Runtime.exec(). He convinced me that this is a good idea. posix_spawn(), at least on glibc and muslc, uses clone(CLONE_VM | CLONE_VFORK) and mitigates the vfork-problems by starting the child off in an own stack.

So, we in exchange for a theoretical problem which I think is very narrow, we'd get reliability in common situations (VM with high footprint). I think that tradeoff makes sense.

Cheers, Thomas

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056158.html

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 7, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 7/10/2021 4:25 pm, Thomas Stuefe wrote:

On Wed, 6 Oct 2021 12:48:40 GMT, David Holmes <david.holmes at oracle.com> wrote:

Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.

Sorry but I have to disagree. fork() is async-signal-safe, but if an at-fork handler is not, then all bets are off - that is fine, it is the best we can do. But posix_spawn makes no claim to any kind of async-signal safety so we very much do lose something by switching to it IMO.

David -----

Hi David,

I looked a bit closer, since I wanted to figure out whether the async-unsafeness of calls in error reporting actually matters. Because the problem is that these functions are not re-entrant, right? But we already have an intricate mechanism in place to guard against re-entrance errors, with our secondary signal handling.

It's not the reentrancy problem I'm concerned with but simply the
ability to "safely" call fork() from within a signal-handling context
because it is marked as async-signal-safe (not withstanding that an
at-fork handler may not be).

My worry is that we may hit cases where posix_spawn just causes the VM
to hang.

I know we already take many risks in the error reporting code with
regard to the ability to actually execute stuff from a signal handler,
but I still feel it necessary to raise this every time something new in
error reporting is proposed. This is stuff that only goes wrong live in
the field after we have shipped a release.

I understand the desire to have a more reliable forking mechanism with
respect to memory management, but that has to be weighed against other
reliability factors.

I'm not aware of our existing use of fork() being flagged as causing
major problems in that regard. So in my mind this change increases the
risk of a hang, whilst "fixing" a problem that hasn't AFAIK really been
raised as a problem.

I'm happy to hear other opinions on this.

Cheers,
David
-----

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 7, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 7/10/2021 4:25 pm, Thomas Stuefe wrote:

On Wed, 6 Oct 2021 12:48:40 GMT, David Holmes <david.holmes at oracle.com> wrote:

Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.

Sorry but I have to disagree. fork() is async-signal-safe, but if an at-fork handler is not, then all bets are off - that is fine, it is the best we can do. But posix_spawn makes no claim to any kind of async-signal safety so we very much do lose something by switching to it IMO.
David -----

Hi David,
I looked a bit closer, since I wanted to figure out whether the async-unsafeness of calls in error reporting actually matters. Because the problem is that these functions are not re-entrant, right? But we already have an intricate mechanism in place to guard against re-entrance errors, with our secondary signal handling.

It's not the reentrancy problem I'm concerned with but simply the ability to "safely" call fork() from within a signal-handling context because it is marked as async-signal-safe (not withstanding that an at-fork handler may not be).

My worry is that we may hit cases where posix_spawn just causes the VM to hang.

I know we already take many risks in the error reporting code with regard to the ability to actually execute stuff from a signal handler, but I still feel it necessary to raise this every time something new in error reporting is proposed. This is stuff that only goes wrong live in the field after we have shipped a release.

I understand the desire to have a more reliable forking mechanism with respect to memory management, but that has to be weighed against other reliability factors.

I'm not aware of our existing use of fork() being flagged as causing major problems in that regard. So in my mind this change increases the risk of a hang, whilst "fixing" a problem that hasn't AFAIK really been raised as a problem.

I'm happy to hear other opinions on this.

Cheers, David -----

Hi David,

I see what you mean. Okay, hanging would be bad. Lets wait for other opinions. @fweimer do you have an opinion about whether posix_spawn is more async-unsafe than fork, at least for the glibc?

..Thomas

@navyxliu
Copy link
Member

@navyxliu navyxliu commented Oct 14, 2021

When it comes to Linux/glibc, Linux kernel has Copy-On-Write feature, doesn't it? So this change can't improve Linux further.

On the other side, Darwin can take benefit from it by changing from fork() to posix_spawn(). If so, why not just use ifdef APPLE posix_spwarn() on Darwin? VMError has already used the macro for the vfork issue.

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 15, 2021

Hi Xin, thanks for giving this an eye.

When it comes to Linux/glibc, Linux kernel has Copy-On-Write feature, doesn't it? So this change can't improve Linux further.

Not completely true, since AFAIU you need to at least copy the page tables. posix_spawn (if it uses clone(VFORK) under the hood) should not need to copy the page tables.

On the other side, Darwin can take benefit from it by changing from fork() to posix_spawn(). If so, why not just use ifdef APPLE posix_spwarn() on Darwin? VMError has already used the macro for the vfork issue.

My aim was to simplify the code, not make it more complex. If the argument is "posix_spawn is dangerous during error reporting" we should not use it at all. If that argument does not hold, we can use it on all platforms.

Thanks, Thomas

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 21, 2021

Mailing list message from Florian Weimer on hotspot-runtime-dev:

* David Holmes:

Sorry but I have to disagree. fork() is async-signal-safe, but if an
at-fork handler is not, then all bets are off - that is fine, it is
the best we can do. But posix_spawn makes no claim to any kind of
async-signal safety so we very much do lose something by switching to
it IMO.

Sorry, I didn't see the thread until now.

The glibc implementation of fork is not async-signal-safe even if the
process has not installed any fork handlers. Our (downstream)
perspective is captured here:

Using the fork function in signal handlers
<https://access.redhat.com/articles/2921161>

The current implementation of posix_spawn in glibc is async-signal-safe,
I think. I would have to ask on libc-alpha if we can make this official
in any way. The current musl implementation seems to be safe as well.

The glibc implementation of posix_spawn has changed substantially over
the years, and I can dig through the history to make sure it has not
changed materially. What's the oldest glibc release you still need to
support?

Other functions in the posix_spawn corner (for maintaining file actions)
are definitely not safe because they call malloc internally, but the
current patch does not use them.

When used carefully, vfork can be made async-signal-safe. But you
really have to block signals before calling it, and in the subprocess,
restore the signal handler disposition to SIG_DFL, and then unblock the
signals. Otherwise some signal handler might run with a
slightly-incorrect TCB. (Historic posix_spawn implementations did not
do the signal handlers dance.) At least vfork does not run fork
handlers.

VMError::report_and_die() seems to call fopen for the replay data file.
There are probably more issues like that.

Thanks,
Florian

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 22, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Florian,

On 22/10/2021 5:34 am, Florian Weimer wrote:

* David Holmes:

Sorry but I have to disagree. fork() is async-signal-safe, but if an
at-fork handler is not, then all bets are off - that is fine, it is
the best we can do. But posix_spawn makes no claim to any kind of
async-signal safety so we very much do lose something by switching to
it IMO.

Sorry, I didn't see the thread until now.

The glibc implementation of fork is not async-signal-safe even if the
process has not installed any fork handlers. Our (downstream)
perspective is captured here:

Using the fork function in signal handlers
<https://access.redhat.com/articles/2921161>

I'm surprised then that we have not encountered any such reported
deadlocks in recent years. I found this issue also somewhat illuminating:

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

especially the report that fork() is no longer required to be
async-signal-safe, but IIUC neither is posix_spawn, so we're left with
no way to implement this functionality in a sound way and must hope for
the best from the implementation. That's not very satisfactory. But in
light of this I can't really reject the change to use posix_spawn on the
grounds that fork() is safer.

Cheers,
David
-----

The current implementation of posix_spawn in glibc is async-signal-safe,
I think. I would have to ask on libc-alpha if we can make this official
in any way. The current musl implementation seems to be safe as well.

The glibc implementation of posix_spawn has changed substantially over
the years, and I can dig through the history to make sure it has not
changed materially. What's the oldest glibc release you still need to
support?

Other functions in the posix_spawn corner (for maintaining file actions)
are definitely not safe because they call malloc internally, but the
current patch does not use them.

When used carefully, vfork can be made async-signal-safe. But you
really have to block signals before calling it, and in the subprocess,
restore the signal handler disposition to SIG_DFL, and then unblock the
signals. Otherwise some signal handler might run with a
slightly-incorrect TCB. (Historic posix_spawn implementations did not
do the signal handlers dance.) At least vfork does not run fork
handlers.

VMError::report_and_die() seems to call fopen for the replay data file.
There are probably more issues like that.

Thanks,
Florian

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 22, 2021

Mailing list message from Thomas Stüfe on hotspot-runtime-dev:

Hi Florian, David,

@Florian: thanks a lot for digging this up!

The oldest glibc release we need is difficult to estimate, it depends on
what individual downstream vendors want to do with the JVM. If we don't
downport this patch I guess glibc 2.24 would be a safe bet. From what I can
see this is when posix_spawn on Linux started using clone(). The only still
supported commercial distro with an older glibc I am aware of is Ubuntu
16.04.

---

So IUUC we could deadlock today with fork() too, if we crash inside malloc.
I'd say posix_spawn sounds good then, since according to Florian
it's async-signal safe, and it works better under memory pressure. We still
don't know what other libc's do.

The worst thing which can happen is that we hang. David is right, that
would be bad. But not super-bad, we still have the global
`ErrorLogTimeout`. That one kicks in after 2 minutes and _exit()s the VM.
But we lose the core then.

As a future improvement, it may make sense to extend the
secondary-signal-and-timeout-capture-feature in VMError::report() (the STEP
macro and friends) to encompass the caller function
VMError::report_and_die(). In other words, all the steps in
VMError::report_and_die(), including the handling of '-XX:OnError', should
run with individual timeouts too and not endanger the follow-up steps. That
way, if we spawn a tool with -XX:OnError which hangs, we would not wait for
the ErrorLogTimeout to _exit() but would cancel this individual step and
continue with the next step. And we would still abort() at the end and get
a core.

Cheers, Thomas

On Fri, Oct 22, 2021 at 9:19 AM David Holmes <david.holmes at oracle.com>
wrote:

Hi Florian,

On 22/10/2021 5:34 am, Florian Weimer wrote:

* David Holmes:

Sorry but I have to disagree. fork() is async-signal-safe, but if an
at-fork handler is not, then all bets are off - that is fine, it is
the best we can do. But posix_spawn makes no claim to any kind of
async-signal safety so we very much do lose something by switching to
it IMO.

Sorry, I didn't see the thread until now.

The glibc implementation of fork is not async-signal-safe even if the
process has not installed any fork handlers. Our (downstream)
perspective is captured here:

Using the fork function in signal handlers
<https://access.redhat.com/articles/2921161>

I'm surprised then that we have not encountered any such reported
deadlocks in recent years. I found this issue also somewhat illuminating:

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

especially the report that fork() is no longer required to be
async-signal-safe, but IIUC neither is posix_spawn, so we're left with
no way to implement this functionality in a sound way and must hope for
the best from the implementation. That's not very satisfactory. But in
light of this I can't really reject the change to use posix_spawn on the
grounds that fork() is safer.

Cheers,
David
-----

The current implementation of posix_spawn in glibc is async-signal-safe,
I think. I would have to ask on libc-alpha if we can make this official
in any way. The current musl implementation seems to be safe as well.

The glibc implementation of posix_spawn has changed substantially over
the years, and I can dig through the history to make sure it has not
changed materially. What's the oldest glibc release you still need to
support?

Other functions in the posix_spawn corner (for maintaining file actions)
are definitely not safe because they call malloc internally, but the
current patch does not use them.

When used carefully, vfork can be made async-signal-safe. But you
really have to block signals before calling it, and in the subprocess,
restore the signal handler disposition to SIG_DFL, and then unblock the
signals. Otherwise some signal handler might run with a
slightly-incorrect TCB. (Historic posix_spawn implementations did not
do the signal handlers dance.) At least vfork does not run fork
handlers.

VMError::report_and_die() seems to call fopen for the replay data file.
There are probably more issues like that.

Thanks,
Florian

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 22, 2021

Mailing list message from Florian Weimer on hotspot-runtime-dev:

* Thomas St?fe:

The oldest glibc release we need is difficult to estimate, it depends
on what individual downstream vendors want to do with the JVM. If we
don't downport this patch I guess glibc 2.24 would be a safe bet. From
what I can see this is when posix_spawn on Linux started using
clone(). The only still supported commercial distro with an older
glibc I am aware of is Ubuntu 16.04.

Red Hat Enterprise Linux 7 is based on glibc 2.17. Its posix_spawn
implementation uses vfork, but it's doing so incorrectly because it does
not block signals (signals delivered to the new subprocess before execve
may corrupt the parent process because of the shared TCB). I don't know
if this is a problem in practice. In the end, it's trading one set of
potential issues for another (on el7, it's strictly an improvement on
el8 and later).

Adoptium builds still target el7 and include OpenJDK 17. I don't know
if they plan to ship OpenJDK 18 builds.

So IUUC we could deadlock today with fork() too, if we crash inside
malloc. I'd say posix_spawn sounds good then, since according to
Florian it's async-signal safe, and it works better under memory
pressure. We still don't know what other libc's do.

There's one caveat: posix_spawn is not *guaranteed* to be
async-signal-safe, it's just how the current implementation behaves.
Maybe we could promise that it remains async-signal-safe as a glibc
extension, or even update POSIX accordingly. Most of the work
posix_spawn does happens in the new subprocess by its nature, and with
the (very desirable) vfork optimization, the implementation on a
traditional kernel has to stick to a subset of the async-signal-safe
functions anyway.

Thanks,
Florian

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Change looks technically good to me, but I can not judge if we can rely on async-signal-safety of posix_spawn. I guess we have to wait until glibc folks have discussed that.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 29, 2021

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8274320: os::fork_and_exec() should be using posix_spawn

Reviewed-by: mdoerr, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 432 new commits pushed to the master branch:

  • bf2e9ee: 8275080: G1CollectedHeap::expand() returns the wrong value
  • b7104ba: 8196017: java/awt/Mouse/GetMousePositionTest/GetMousePositionWithPopup.java fails
  • 6875678: 8273831: PrintServiceLookup spawns 2 threads in the current classloader, getting orphaned
  • 5bbc8d3: 8274621: NullPointerException because listenAddress[0] is null
  • 5021a12: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed
  • fe6a202: 8271356: Modify jdb to treat an empty command as a repeat of the previous command
  • cef9db9: 8276039: Remove unnecessary qualifications of java_lang_Class::
  • 13265f9: 8274750: java/io/File/GetXSpace.java failed: '/dev': 191488 != 190976
  • 5facaa2: 8276128: (bf) Remove unused constant ARRAY_BASE_OFFSET from Direct-X-Buffer
  • d6d82f5: 8275608: runtime/Metaspace/elastic/TestMetaspaceAllocationMT2 too slow
  • ... and 422 more: https://git.openjdk.java.net/jdk/compare/252aaa9249d8979366b37d59487b5b039d923e35...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 29, 2021
@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 29, 2021

Change looks technically good to me, but I can not judge if we can rely on async-signal-safety of posix_spawn. I guess we have to wait until glibc folks have discussed that.

Thank you Martin.

@dholmes-ora : do you have an idea how we could resolve this deadlock? Even with input from the glibc folks, a number of questions will remain unanswered (eg what about the other libcs).

Still, I think my patch would improve the situation, which is murky UB now and will remain somewhat murky. But I also think this is not worth many more brain cycles from either one of us, so if you prefer I can withdraw this PR. I am fine either way.

Cheers, Thomas

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 29, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 29/10/2021 8:47 pm, Thomas Stuefe wrote:

On Fri, 29 Oct 2021 10:29:14 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

Change looks technically good to me, but I can not judge if we can rely on async-signal-safety of posix_spawn. I guess we have to wait until glibc folks have discussed that.

Thank you Martin.

@dholmes-ora : do you have an idea how we could resolve this deadlock? Even with input from the glibc folks, a number of questions will remain unanswered (eg what about the other libcs).

I've withdrawn my objection to this patch (see response to Florian).
Given the advantage of posix_spawn in terms of memory management, and
the fact no solution is guaranteed async-signal-safe, you may as well
give this a go.

Cheers,
David

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Oct 30, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 29/10/2021 8:47 pm, Thomas Stuefe wrote:

On Fri, 29 Oct 2021 10:29:14 GMT, Martin Doerr wrote:

Change looks technically good to me, but I can not judge if we can rely on async-signal-safety of posix_spawn. I guess we have to wait until glibc folks have discussed that.

Thank you Martin.
@dholmes-ora : do you have an idea how we could resolve this deadlock? Even with input from the glibc folks, a number of questions will remain unanswered (eg what about the other libcs).

I've withdrawn my objection to this patch (see response to Florian). Given the advantage of posix_spawn in terms of memory management, and the fact no solution is guaranteed async-signal-safe, you may as well give this a go.

Cheers, David

Thank you David! Would you then approve it or should I look for a second Reviewer?

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 31, 2021

Hi Thomas,

I've approved it so we can start baking it. I hope it will get sufficient test coverage.

Thanks,
David

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented Nov 1, 2021

Thanks @dholmes-ora. We should extend the OnError tests somewhat.

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Nov 1, 2021

Going to push as commit 158831e.
Since your change was applied there have been 432 commits pushed to the master branch:

  • bf2e9ee: 8275080: G1CollectedHeap::expand() returns the wrong value
  • b7104ba: 8196017: java/awt/Mouse/GetMousePositionTest/GetMousePositionWithPopup.java fails
  • 6875678: 8273831: PrintServiceLookup spawns 2 threads in the current classloader, getting orphaned
  • 5bbc8d3: 8274621: NullPointerException because listenAddress[0] is null
  • 5021a12: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed
  • fe6a202: 8271356: Modify jdb to treat an empty command as a repeat of the previous command
  • cef9db9: 8276039: Remove unnecessary qualifications of java_lang_Class::
  • 13265f9: 8274750: java/io/File/GetXSpace.java failed: '/dev': 191488 != 190976
  • 5facaa2: 8276128: (bf) Remove unused constant ARRAY_BASE_OFFSET from Direct-X-Buffer
  • d6d82f5: 8275608: runtime/Metaspace/elastic/TestMetaspaceAllocationMT2 too slow
  • ... and 422 more: https://git.openjdk.java.net/jdk/compare/252aaa9249d8979366b37d59487b5b039d923e35...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 1, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 1, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 1, 2021

@tstuefe Pushed as commit 158831e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn branch Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime integrated
4 participants