New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to get output from PTY.spawn periodically hangs #3020

Closed
camlow325 opened this Issue Jun 4, 2015 · 15 comments

Comments

Projects
None yet
4 participants
@camlow325
Contributor

camlow325 commented Jun 4, 2015

We're experiencing periodic hangs during attempts to read the output from or wait on a process created by a call to PTY.spawn. Here's a snippet of code which demonstrates the problem:

require 'pty'

itr = 0
while true
  itr = itr + 1
  puts "iteration - #{itr}"
  PTY.spawn('/bin/echo blah') do |cmd_out, cmd_in, pid|
    begin
      puts "pid - #{pid}"
      one_char = cmd_out.getc.chr
      puts "first char - #{one_char}"
      rest = cmd_out.gets
      puts "rest chars - #{rest}"
    ensure
      puts "closing cmd_out..."
      cmd_out.close
      puts "closing cmd_in..."
      cmd_in.close
      puts "wait for #{pid}"
      Process.wait(pid)
      puts "done waiting for #{pid}"
    end
  end
end

From the command line, we just run this as jruby ./thefile.rb. We've observed this code fail in multiple ways:

  1. Most often, the process hangs on the cmd_out.getc.chr call.

  2. Sometimes, the process hangs on the Process.wait (pid) call.

  3. Much less frequently - the data that the cmd_out.getc.chr and cmd_out.gets calls read back appears to be the path to the jruby jar + an error message. For example:

first char = /
rest chars = .../jruby-1.7.20/lib/jruby.jar: invalid LOC header (bad signature)

It is pretty uncommon to get through more than about about 150 iterations or so before one of the failures above happens. The third failure, when it occurs, is usually on the first PTY.spawn call.

I've been able to reproduce this failure on multiple versions of JRuby - including 1.7.19, 1.7.20, 9.0.0.0.pre2. I haven't seen the program run indefinitely without failing on any version of JRuby, so I suspect that this may never have worked.

Is there something obviously wrong about the way we're using PTY.spawn here, or is PTY.spawn basically not functional at this point?

@reidmorrison

This comment has been minimized.

Show comment
Hide comment
@reidmorrison

reidmorrison Feb 9, 2017

Trying PTY.spawn on jruby-9.1.7.0.

When the process hangs on Process.wait it is most likely when PTY.spawn randomly fails to launch the session. Adding the following option stops the hang:

Process.wait(the_pid, Process::WNOHANG)

However, what causes the spawn to sometimes fail is not known.

I also had to break apart the arguments on the spawn otherwise it would not work:

PTY.spawn('/bin/echo', 'blah') do ...

After much trial and error I was able to get it to work on JRuby with the following code that includes an automatic retry for when the spawn sometimes fails:
https://gist.github.com/reidmorrison/de31442153c16980bc21b7f19d940849

Note that calling reader.eof? or writer.eof? immediately after the spawn also hangs on JRuby, but not on MRI. Was hoping that would be a faster way of determining if the spawn failed, as opposed to the 5 second read timeout.

reidmorrison commented Feb 9, 2017

Trying PTY.spawn on jruby-9.1.7.0.

When the process hangs on Process.wait it is most likely when PTY.spawn randomly fails to launch the session. Adding the following option stops the hang:

Process.wait(the_pid, Process::WNOHANG)

However, what causes the spawn to sometimes fail is not known.

I also had to break apart the arguments on the spawn otherwise it would not work:

PTY.spawn('/bin/echo', 'blah') do ...

After much trial and error I was able to get it to work on JRuby with the following code that includes an automatic retry for when the spawn sometimes fails:
https://gist.github.com/reidmorrison/de31442153c16980bc21b7f19d940849

Note that calling reader.eof? or writer.eof? immediately after the spawn also hangs on JRuby, but not on MRI. Was hoping that would be a faster way of determining if the spawn failed, as opposed to the 5 second read timeout.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 9, 2017

Member

The subprocess support (spawn, popen, etc) on JRuby 1.7 is fairly limited, since it still uses JDK libraries for all process launching. Those libraries are known to have many problems, especially as relates to interactive IO with the subprocess. I doubt we'll be able to do much better there.

However I'd expect 9k to be better here, and it sounds like it's almost there.

@reidmorrison I guess you found this bug report looking for info about your 9k issue?

For now we can keep using this bug, but I suspect the reasons for 1.7 failing are different from 9k. If you can provide a script I can run that reproduces the intermittent spawn failure, I can have a look. I don't know why it would work sometimes and not others.

For reference, JRuby 9k uses posix_spawn for most subprocess logic, including Kernel#spawn, IO.popen, and so on. This has an unfortunate side effect that failed subprocess launches don't report errors well, but I'd expect them to fail consistently. Failing to spawn intermittently is a new issue to me.

Member

headius commented Feb 9, 2017

The subprocess support (spawn, popen, etc) on JRuby 1.7 is fairly limited, since it still uses JDK libraries for all process launching. Those libraries are known to have many problems, especially as relates to interactive IO with the subprocess. I doubt we'll be able to do much better there.

However I'd expect 9k to be better here, and it sounds like it's almost there.

@reidmorrison I guess you found this bug report looking for info about your 9k issue?

For now we can keep using this bug, but I suspect the reasons for 1.7 failing are different from 9k. If you can provide a script I can run that reproduces the intermittent spawn failure, I can have a look. I don't know why it would work sometimes and not others.

For reference, JRuby 9k uses posix_spawn for most subprocess logic, including Kernel#spawn, IO.popen, and so on. This has an unfortunate side effect that failed subprocess launches don't report errors well, but I'd expect them to fail consistently. Failing to spawn intermittently is a new issue to me.

@reidmorrison

This comment has been minimized.

Show comment
Hide comment
@reidmorrison

reidmorrison Feb 9, 2017

This standalone script reproduces the intermittent failed spawn with JRuby 9.1.7.0 on Mac OS X 10.12.3

require 'pty'
require 'expect'

itr = 0
while true
  itr = itr + 1
  PTY.spawn('/bin/echo', 'blah') do |reader, writer, pid|
    begin
      if reader.expect(/blah.*/, 5)
        puts "#{itr} - #{pid} - Successful"
      else
        puts "#{itr} - #{pid} - !!!! FAILED to spawn !!!!!"
      end
    ensure
      reader.close
      writer.close
      Process.wait(pid, Process::WNOHANG)
    end
  end
end

I also tried this script with similar results with JRuby 9.1.5.0 on Linux:
Linux 2.6.32-504.3.3.el6.x86_64 #1 SMP Wed Dec 17 01:55:02 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

reidmorrison commented Feb 9, 2017

This standalone script reproduces the intermittent failed spawn with JRuby 9.1.7.0 on Mac OS X 10.12.3

require 'pty'
require 'expect'

itr = 0
while true
  itr = itr + 1
  PTY.spawn('/bin/echo', 'blah') do |reader, writer, pid|
    begin
      if reader.expect(/blah.*/, 5)
        puts "#{itr} - #{pid} - Successful"
      else
        puts "#{itr} - #{pid} - !!!! FAILED to spawn !!!!!"
      end
    ensure
      reader.close
      writer.close
      Process.wait(pid, Process::WNOHANG)
    end
  end
end

I also tried this script with similar results with JRuby 9.1.5.0 on Linux:
Linux 2.6.32-504.3.3.el6.x86_64 #1 SMP Wed Dec 17 01:55:02 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 13, 2017

Member

Reproduced!

Member

headius commented Feb 13, 2017

Reproduced!

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 13, 2017

Member

Wow...ok, this is a problem. Our pty is currently implemented using forkpty, which like fork is going to be totally unreliable on JVM. This is a bigger issue than I thought; we're going to have to see if we can emulate the PTY library some other way.

Member

headius commented Feb 13, 2017

Wow...ok, this is a problem. Our pty is currently implemented using forkpty, which like fork is going to be totally unreliable on JVM. This is a bigger issue than I thought; we're going to have to see if we can emulate the PTY library some other way.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 13, 2017

Member

For those of you using PTY.spawn, can you tell me why you're not just using Kernel.spawn? We have implemented that nearly 100% using the posix_spawn C call, which (semi-) atomically forks and execs in a single call, avoiding the reliability problems with separate fork and exec calls. It's what I'll likely look into using for a new PTY.

Member

headius commented Feb 13, 2017

For those of you using PTY.spawn, can you tell me why you're not just using Kernel.spawn? We have implemented that nearly 100% using the posix_spawn C call, which (semi-) atomically forks and execs in a single call, avoiding the reliability problems with separate fork and exec calls. It's what I'll likely look into using for a new PTY.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 13, 2017

Member

Ok, I spent a few minutes poking around pty.c from MRI. It doesn't look great.

In general it appears to use most of the same internals as Kernel.spawn, which we've largely ported over to Java for JRuby. However it does NOT appear to support running on platforms that don't support fork. There's a guard in the extconf.rb to prevent pty from building at all on Windows platforms.

So this exact implementation isn't going to work for us.

Now, a bit of good news. So far I've seen nothing that we shouldn't be able to do with posix_spawn, and it's possible that the MRI folks have just never bothered to make pty work without fork.

I still believe it should be possible to implement the library, in pure-Ruby, using just Kernel.spawn.

Member

headius commented Feb 13, 2017

Ok, I spent a few minutes poking around pty.c from MRI. It doesn't look great.

In general it appears to use most of the same internals as Kernel.spawn, which we've largely ported over to Java for JRuby. However it does NOT appear to support running on platforms that don't support fork. There's a guard in the extconf.rb to prevent pty from building at all on Windows platforms.

So this exact implementation isn't going to work for us.

Now, a bit of good news. So far I've seen nothing that we shouldn't be able to do with posix_spawn, and it's possible that the MRI folks have just never bothered to make pty work without fork.

I still believe it should be possible to implement the library, in pure-Ruby, using just Kernel.spawn.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 14, 2017

Member

Another option for us would be to finally ship a small native library that can do forkpty and related calls all in one JNI downcall. This is how the JVM itself launches subprocesses, via a JNI call that eventually does both fork and exec in rapid succession. Because the entire JNI call is JVM safepoint-aware, it should avoid issues with JVM threads and signals firing inbetween the fork and exec.

I'm still trying to wrap my head around what forkpty actually does, though, to see if we can't emulate it with a simple spawn.

Member

headius commented Feb 14, 2017

Another option for us would be to finally ship a small native library that can do forkpty and related calls all in one JNI downcall. This is how the JVM itself launches subprocesses, via a JNI call that eventually does both fork and exec in rapid succession. Because the entire JNI call is JVM safepoint-aware, it should avoid issues with JVM threads and signals firing inbetween the fork and exec.

I'm still trying to wrap my head around what forkpty actually does, though, to see if we can't emulate it with a simple spawn.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 14, 2017

Member

For those following along...this isn't likely to be fixed any time soon, due to the discovery of fork. If you are able to use something known to work, like Kernel.spawn, you're going to be in better shape.

Member

headius commented Feb 14, 2017

For those following along...this isn't likely to be fixed any time soon, due to the discovery of fork. If you are able to use something known to work, like Kernel.spawn, you're going to be in better shape.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 14, 2017

Member

@reidmorrison @camlow325 Can you guys tell us why you need a PTY here? The only examples given so far are /bin/echo which doesn't really need a full PTY.

Member

headius commented Feb 14, 2017

@reidmorrison @camlow325 Can you guys tell us why you need a PTY here? The only examples given so far are /bin/echo which doesn't really need a full PTY.

@camlow325

This comment has been minimized.

Show comment
Hide comment
@camlow325

camlow325 Feb 14, 2017

Contributor

@headius Thanks for looking into this. TBH, it's been long enough now that I don't remember the exact details for our original use case. I believe one of my colleagues was wanting to programmatically invoke a command line tool which prompts for a passphrase on stdin. He wanted to be able to feed data into stdin and capture the output that the tool produced on stdout and thought to use PTY to do that. He gave up on that approach when he saw that PTY.spawn seemed to be producing inconsistent results. I believe he refactored his solution to eventually avoid the need to spawn a process altogether, which was a more robust solution in any event. Hoping someone else might be able to give you a more current, relevant example here.

Contributor

camlow325 commented Feb 14, 2017

@headius Thanks for looking into this. TBH, it's been long enough now that I don't remember the exact details for our original use case. I believe one of my colleagues was wanting to programmatically invoke a command line tool which prompts for a passphrase on stdin. He wanted to be able to feed data into stdin and capture the output that the tool produced on stdout and thought to use PTY to do that. He gave up on that approach when he saw that PTY.spawn seemed to be producing inconsistent results. I believe he refactored his solution to eventually avoid the need to spawn a process altogether, which was a more robust solution in any event. Hoping someone else might be able to give you a more current, relevant example here.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 14, 2017

Member

@camlow325 Ok thanks. Yeah anything that's using e.g. stty or that requires terminal characteristics like window size would need a pty, but simple subcommands should work fine with a plain spawn.

Member

headius commented Feb 14, 2017

@camlow325 Ok thanks. Yeah anything that's using e.g. stty or that requires terminal characteristics like window size would need a pty, but simple subcommands should work fine with a plain spawn.

@reidmorrison

This comment has been minimized.

Show comment
Hide comment
@reidmorrison

reidmorrison Feb 14, 2017

The reason I am using the PTY.spawn is to workaround a limitation of the linux sftp program. It tries to be clever and prevent a program from passing a password into it via stdin.

Using Kernel.spawn or any of the Open3 apis, sftp is able to intercept the stdin and forces a manual entry of the password. After researching sshpass we found that it was just using a new pty to fake out sftp.

Unfortunately the sftp ruby gem is extremely slow and does not support uploading files over 2gb, so using sftp at the command line appears to be the only option.

I only submitted the above script to reproduce the issue, to help out where I can. But I totally agree that in every other case it is a complete overkill and I prefer to use one of the Open3 apis that work extremely well. We use it heavily for streaming large file encryption with pgp. See iostreams

Adding a simple retry for now in this very rare case of requiring a new pty to workaround a sftp limitation is quite acceptable. As used in: https://gist.github.com/reidmorrison/de31442153c16980bc21b7f19d940849

Compared to other things like performance optimizations for keyword arguments, this would be very low on the priority list, almost a nice-to-have.

reidmorrison commented Feb 14, 2017

The reason I am using the PTY.spawn is to workaround a limitation of the linux sftp program. It tries to be clever and prevent a program from passing a password into it via stdin.

Using Kernel.spawn or any of the Open3 apis, sftp is able to intercept the stdin and forces a manual entry of the password. After researching sshpass we found that it was just using a new pty to fake out sftp.

Unfortunately the sftp ruby gem is extremely slow and does not support uploading files over 2gb, so using sftp at the command line appears to be the only option.

I only submitted the above script to reproduce the issue, to help out where I can. But I totally agree that in every other case it is a complete overkill and I prefer to use one of the Open3 apis that work extremely well. We use it heavily for streaming large file encryption with pgp. See iostreams

Adding a simple retry for now in this very rare case of requiring a new pty to workaround a sftp limitation is quite acceptable. As used in: https://gist.github.com/reidmorrison/de31442153c16980bc21b7f19d940849

Compared to other things like performance optimizations for keyword arguments, this would be very low on the priority list, almost a nice-to-have.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 15, 2017

Member

I have good news!!!

I discovered we have another forgotten bug, #3185, in which @Freaky provides a rewritten pty.rb that does not use fork! I'm testing it locally.

Unfortunately there's a small bug in JRuby that prevents it working properly on 9k, so there still needs to be a patch. Specifically, when opening an IO from a TTY file descriptor, it mistakenly thinks the descriptor is closed and errors out. However, with that patched...

~/projects/jruby $ jruby blah.rb
1 - 17683 - Successful
2 - 17685 - Successful
3 - 17686 - Successful
4 - 17687 - Successful
...
644 - 18328 - Successful
645 - 18329 - Successful
646 - 18330 - Successful
647 - 18331 - Successful
...

It's all good!

I'm going to push @Freaky's pty.rb and let you give it a shot, @reidmorrison. He claims it even passes MRI tests, so we'll just go for it.

Member

headius commented Feb 15, 2017

I have good news!!!

I discovered we have another forgotten bug, #3185, in which @Freaky provides a rewritten pty.rb that does not use fork! I'm testing it locally.

Unfortunately there's a small bug in JRuby that prevents it working properly on 9k, so there still needs to be a patch. Specifically, when opening an IO from a TTY file descriptor, it mistakenly thinks the descriptor is closed and errors out. However, with that patched...

~/projects/jruby $ jruby blah.rb
1 - 17683 - Successful
2 - 17685 - Successful
3 - 17686 - Successful
4 - 17687 - Successful
...
644 - 18328 - Successful
645 - 18329 - Successful
646 - 18330 - Successful
647 - 18331 - Successful
...

It's all good!

I'm going to push @Freaky's pty.rb and let you give it a shot, @reidmorrison. He claims it even passes MRI tests, so we'll just go for it.

@reidmorrison

This comment has been minimized.

Show comment
Hide comment
@reidmorrison

reidmorrison Feb 16, 2017

Excellent news, Thank you 👍

reidmorrison commented Feb 16, 2017

Excellent news, Thank you 👍

@enebo enebo added this to the JRuby 9.1.8.0 milestone Mar 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment