Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to 9.2.15.0 causes Kernel#system to hang. #6579

Closed
jonathanswenson opened this issue Feb 25, 2021 · 6 comments · Fixed by #6582
Closed

Upgrade to 9.2.15.0 causes Kernel#system to hang. #6579

jonathanswenson opened this issue Feb 25, 2021 · 6 comments · Fixed by #6582
Labels

Comments

@jonathanswenson
Copy link

Environment Information

Provide at least:

  • JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc)
    jruby 9.2.15.0 (2.5.7) 2021-02-24 aa05fda562 OpenJDK 64-Bit Server VM 11.0.9+0-adhoc..source on 11.0.9+0-adhoc..source +jit [linux-x86_64]

  • Operating system and platform (e.g. uname -a)
    Linux hostname 5.7.17-1rodete5-amd64 #1 SMP Debian 5.7.17-1rodete5 (2021-01-08) x86_64 GNU/Linux

Other relevant info you may wish to add:
I’m using nix to manage my env, but others in my org who aren’t using nix are running into the same issue.

Expected Behavior

We have a bit of a complex build step that uses a bash script to call a rake task, uses system to call a few ruby scripts, which uses the system method to launch a gradle wrapper script ultimately running a java gradle process. Under 9.2.13.0 (and 9.2.14.0), this would launch successfully, waiting for the gradle task to complete and return.

Actual Behavior

However, with 9.2.15.0, the gradle process hangs in the TL process state forever. It is unresponsive to jstacks, ultimately causing the parent build process to hang.

Seems as though the system command is launching differently from 9.2.14.0 due to this commit: 0cb9170

Was unable to reproduce with a simple target script (instead of gradle) thus far. Trying to use a custom build from @headius to validate a potential fix.

@headius
Copy link
Member

headius commented Feb 25, 2021

This is likely due to the introduction of exec into the sh command line we run to intercept the chdir before running the expected command. I do not know why this would cause it to hang versus not using exec, but a PR is coming to try removing it.

@headius
Copy link
Member

headius commented Feb 25, 2021

Attempting to simulate this locally... have not been able to trigger a hang outside of JRuby using a similar command structure:

$ sh -c 'cd lib; exec "$@"' sh gradle

> Task :help

Welcome to Gradle 6.8.3.

To run a build, run gradle <task> ...

To see a list of available tasks, run gradle tasks

To see a list of command-line options, run gradle --help

To see more detail about a task, run gradle help --task <task>

For troubleshooting, visit https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.8.3/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 671ms
1 actionable task: 1 executed

Gradle does interact with the TTY to some degree, rewriting lines and using color text, but removing the exec does not change how it behaves (i.e. the exec does not seem to give it more or less access to the TTY):

$ sh -c 'cd lib; "$@"' sh gradle | cat

> Task :help

Welcome to Gradle 6.8.3.

To run a build, run gradle <task> ...

To see a list of available tasks, run gradle tasks

To see a list of command-line options, run gradle --help

To see more detail about a task, run gradle help --task <task>

For troubleshooting, visit https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.8.3/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 745ms
1 actionable task: 1 executed

Running the same command structure piped to cat (to hide the TTY) eliminates the color text and rewriting but otherwise does not change behavior.

There are workarounds for this:

  • Don't use the chdir: option with system. Change dir within the requested command or construct the sh command line yourself.
  • Use backquotes with the same sh command and chdir call.
  • Disable native execution. I believe this will fall back on JVM process launching and will not build the sh command line the same way.

@headius
Copy link
Member

headius commented Feb 26, 2021

The exec change appears to have been a red herring. Changing it as in #6580 does not change the behavior.

The following command lines all hang for me locally and indicate that system + chdir is not quite working:

  • jruby -e 'system "gradle", chdir: "lib"'
  • jruby -e 'Dir.chdir "lib"; system "gradle"; Dir.chdir ".."'

Removing the chdir bits from these commands allows them to complete successfully.

Interestingly, replacing system with backquotes works fine, even though it uses native process launching too.

The workarounds above are probably still valid, but reverting 378f691 on the branch may also work since it routes system-within-chdir back to the old logic.

@headius
Copy link
Member

headius commented Feb 26, 2021

Additional discoveries...

I suspect this has something to do with the way Gradle interacts with the terminal and the daemon process. Running without a daemon still hangs, but it appears to run to completion before it does so. It clearly manipulates the parent terminal when called via system.

Perhaps when it gets stuck it is waiting for some input for another command to run?

Another workaround is to disable native popen using -Xnative.popen=false which will also disable native system without disabling the rest of the native subsystem.

headius added a commit to headius/jruby that referenced this issue Feb 26, 2021
In jruby#5383 we added logic to create a new process group when doing
chdir in spawn, to avoid orphaning the eventual process spawned
by the intermediate sh. Unfortunately this has a side effect of
hanging on waitpid if the process spawns a long-running daemon.

This patch removes the process group and instead uses exec in the
generated sh script so that the sh process is replaced. The forked
daemon lives on, as it should, but only the main process is
waited on.

Fixes jruby#6579
@headius
Copy link
Member

headius commented Feb 26, 2021

Success! I have pushed a fix in #6582.

This turns out to be a very unlucky confluence of events.

In #5383 we added logic to create a new process group when doing chdir for a single-string subprocess launch like via system, to avoid orphaning the eventual command we intend to launch. The issue in that situation was that killing the resulting pid would only kill the top sh process, and creating a new process group ensured that the command we launched also terminates predictably.

Unfortunately this has the side effect of causing waitpid to also wait for any long-running daemon processes launched by the target command. That is the case here, since gradle by default launches a background process to do the heavy lifting.

We can remove the process group logic without regressing by using exec, which is exactly how @mrnoname1000's sh form from #6226 handles this. However the fix made there only affects subprocess launches with more than a single string argument (for reasons).

This regressed in 9.2.15.0 because we started using native process logic for all system calls, even if they needed chdir. Since no system tests try to spawn long-running daemon processes, this issue was not discovered until after release.

headius added a commit to headius/jruby that referenced this issue Feb 26, 2021
In jruby#5383 we added logic to create a new process group when doing
chdir in spawn, to avoid orphaning the eventual process spawned
by the intermediate sh. Unfortunately this has a side effect of
hanging on waitpid if the process spawns a long-running daemon.

This patch removes the process group and instead uses exec in the
generated sh script so that the sh process is replaced. The forked
daemon lives on, as it should, but only the main process is
waited on.

Fixes jruby#6579
headius added a commit to headius/jruby that referenced this issue Feb 26, 2021
In jruby#5383 we added logic to create a new process group when doing
chdir in spawn, to avoid orphaning the eventual process spawned
by the intermediate sh. Unfortunately this has a side effect of
hanging on waitpid if the process spawns a long-running daemon.

This patch removes the process group and instead uses exec in the
generated sh script so that the sh process is replaced. The forked
daemon lives on, as it should, but only the main process is
waited on.

Fixes jruby#6579
@headius headius added the core label Feb 28, 2021
@headius
Copy link
Member

headius commented Feb 28, 2021

This is now fixed for 9.2.16, but I have been unable to devise a simple test case. I will continue to look for options there but I am marking this resolved now.

@headius headius closed this as completed Feb 28, 2021
@enebo enebo added this to the JRuby 9.2.16.0 milestone Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants