New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to 9.2.15.0 causes Kernel#system to hang. #6579
Comments
This is likely due to the introduction of |
Attempting to simulate this locally... have not been able to trigger a hang outside of JRuby using a similar command structure:
Gradle does interact with the TTY to some degree, rewriting lines and using color text, but removing the exec does not change how it behaves (i.e. the exec does not seem to give it more or less access to the TTY):
Running the same command structure piped to There are workarounds for this:
|
The The following command lines all hang for me locally and indicate that system + chdir is not quite working:
Removing the Interestingly, replacing The workarounds above are probably still valid, but reverting 378f691 on the branch may also work since it routes system-within-chdir back to the old logic. |
Additional discoveries... I suspect this has something to do with the way Gradle interacts with the terminal and the daemon process. Running without a daemon still hangs, but it appears to run to completion before it does so. It clearly manipulates the parent terminal when called via Perhaps when it gets stuck it is waiting for some input for another command to run? Another workaround is to disable native popen using |
In jruby#5383 we added logic to create a new process group when doing chdir in spawn, to avoid orphaning the eventual process spawned by the intermediate sh. Unfortunately this has a side effect of hanging on waitpid if the process spawns a long-running daemon. This patch removes the process group and instead uses exec in the generated sh script so that the sh process is replaced. The forked daemon lives on, as it should, but only the main process is waited on. Fixes jruby#6579
Success! I have pushed a fix in #6582. This turns out to be a very unlucky confluence of events. In #5383 we added logic to create a new process group when doing chdir for a single-string subprocess launch like via Unfortunately this has the side effect of causing We can remove the process group logic without regressing by using exec, which is exactly how @mrnoname1000's sh form from #6226 handles this. However the fix made there only affects subprocess launches with more than a single string argument (for reasons). This regressed in 9.2.15.0 because we started using native process logic for all system calls, even if they needed chdir. Since no |
In jruby#5383 we added logic to create a new process group when doing chdir in spawn, to avoid orphaning the eventual process spawned by the intermediate sh. Unfortunately this has a side effect of hanging on waitpid if the process spawns a long-running daemon. This patch removes the process group and instead uses exec in the generated sh script so that the sh process is replaced. The forked daemon lives on, as it should, but only the main process is waited on. Fixes jruby#6579
In jruby#5383 we added logic to create a new process group when doing chdir in spawn, to avoid orphaning the eventual process spawned by the intermediate sh. Unfortunately this has a side effect of hanging on waitpid if the process spawns a long-running daemon. This patch removes the process group and instead uses exec in the generated sh script so that the sh process is replaced. The forked daemon lives on, as it should, but only the main process is waited on. Fixes jruby#6579
This is now fixed for 9.2.16, but I have been unable to devise a simple test case. I will continue to look for options there but I am marking this resolved now. |
Environment Information
Provide at least:
JRuby version (
jruby -v
) and command line (flags, JRUBY_OPTS, etc)jruby 9.2.15.0 (2.5.7) 2021-02-24 aa05fda562 OpenJDK 64-Bit Server VM 11.0.9+0-adhoc..source on 11.0.9+0-adhoc..source +jit [linux-x86_64]
Operating system and platform (e.g.
uname -a
)Linux hostname 5.7.17-1rodete5-amd64 #1 SMP Debian 5.7.17-1rodete5 (2021-01-08) x86_64 GNU/Linux
Other relevant info you may wish to add:
I’m using nix to manage my env, but others in my org who aren’t using nix are running into the same issue.
Expected Behavior
We have a bit of a complex build step that uses a bash script to call a rake task, uses system to call a few ruby scripts, which uses the
system
method to launch a gradle wrapper script ultimately running a java gradle process. Under 9.2.13.0 (and 9.2.14.0), this would launch successfully, waiting for the gradle task to complete and return.Actual Behavior
However, with 9.2.15.0, the gradle process hangs in the TL process state forever. It is unresponsive to jstacks, ultimately causing the parent build process to hang.
Seems as though the system command is launching differently from 9.2.14.0 due to this commit: 0cb9170
Was unable to reproduce with a simple target script (instead of gradle) thus far. Trying to use a custom build from @headius to validate a potential fix.
The text was updated successfully, but these errors were encountered: