Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

created process can't be terminated #112

Closed
gsauthof opened this issue Nov 9, 2017 · 5 comments

Comments

@gsauthof
Copy link
Contributor

@gsauthof gsauthof commented Nov 9, 2017

Consider this small example:

#!/usr/bin/env python3

import asyncio
import asyncssh

hostname = 'example.org'
user = 'juser'

async def f():
  async with asyncssh.connect(hostname, username=user) as c:
      async with c.create_process('sleep 300') as p:
        print('sleep')
        await asyncio.sleep(3)

        print('terminate')
        p.terminate()

        print('wait')
        await p.wait()

        print('done')

if __name__ == '__main__':
  loop = asyncio.get_event_loop()
  loop.run_until_complete(f())
  loop.close()

Expected output:

sleep
terminate
wait
done
(program terminates - total runtime: 3 seconds or so)

Actual output:

sleep
terminate
wait
(program is still running)

That means it seems that the TERM signal isn't delivered.

(When I log into the remote system I see the process still running in pstree output - also after I kill the test program I see the orphaned sleep process via pgrep -u juser -P 1 -l -f)

I tested this against an sshd on a remote CentOS 7, an sshd on a remote Solaris 10 and a local sshd on Fedora 25 (localhost). Same results each time.

I also switched the 'sleep' command with '/usr/bin/sleep', and the p.terminate() call with p.kill() and p.send_break(1000). Doesn't make a difference.

Please advise if I'm using the asyncssh in a wrong way or if I'm missing something obvious.

@ronf

This comment has been minimized.

Copy link
Owner

@ronf ronf commented Nov 9, 2017

This came up recently on the asyncssh-users mailing list. Are you using OpenSSH as your SSH server? If so, the problem is that OpenSSH actually doesn't implement the "signal" channel request. As a result, even though AsyncSSH is sending the proper message listed in the SSH RFC, nothing happens when the send_signal(), terminate(), and kill() methods are called. You can find more info at https://bugzilla.mindrot.org/show_bug.cgi?id=1424. Unfortunately, this is a very old bug and as far as I know the patch listed there has never made it into a standard release.

It’s a bit of a hack, but depending on the signal you want to send it might be possible to do it by writing to stdin instead. If you set up the command to run in a PTY (by setting term_type), you can then write something like ‘\x03’ (Ctrl-C) to stdin to get the equivalent of a SIGINT, or ‘\x1c’ to get the equivalent of a SIGQUIT. I don’t know of anything you can write to get other signals like SIGTERM, SIGHUP, or SIGKILL, though.

Here’s an example of a program which uses the stdin trick I mentioned:

import asyncio, asyncssh, sys

async def run_client(loop):
    async with asyncssh.connect(‘localhost') as conn:
        proc = await conn.create_process('sleep 5', term_type='ansi')
        loop.call_later(1, proc.stdin.write, '\x03')
        await proc.wait_closed()
        print(proc.exit_signal)

try:
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run_client(loop))
except (OSError, asyncssh.Error) as exc:
    sys.exit('SSH connection failed: ' + str(exc))

This executes a remote command of ‘sleep 5’ but then arranges to abort after 1 second by sending a Ctrl-C. It prints the results of the signal which caused the process to exit. Try replacing ‘\x03’ with ‘\x1c’ to see the different between exiting with an INT signal vs. a QUIT signal.

Note the setting of term_type there to ‘ansi’ to make sure a PTY is requested on the remote system. Without that, I don’t believe these control characters will trigger a signal being sent to the process being executed.

@gsauthof

This comment has been minimized.

Copy link
Contributor Author

@gsauthof gsauthof commented Nov 10, 2017

Yes, I'm using the system's sshd (i.e. some version of opensshd) on each system.

Sending Ctrl+C (like in your example) works for me - also when using proc.wait() instead of proc.wait_closed.

It's true that with tty/ptys one can only trigger SIGINT and SIGQUIT via the keyboard. Sending SIGTERM could be archived via some shell one-liner, e.g.:

  async with c.create_process('/usr/bin/sleep 300 & read; kill $!') as p:
    print('sleep')
    await asyncio.sleep(3)
    print('write')
    p.stdin.write('\n')
    print('wait')
    await p.wait()

That means: you don't need to setup a term but you use an additional shell process. Even more appropriate could be to use kill -- -$! instead of kill $!. Edit 2017-11-19: Or rather kill 0.

Reading through the openssl bug I am wondering: would it be possible for asyncssh to retrieve the SSH_MSG_CHANNEL_FAILURE message and raise an exception then?

In any case - since openssh is so popular - perhaps it would make sense to add a note to the API documentation of terminate() and kill() that mentions how these methods don't work with opensshd.

@ronf

This comment has been minimized.

Copy link
Owner

@ronf ronf commented Nov 11, 2017

Unfortunately, the "signal" channel message is defined in RFC 4254 as a message with no reply:

  byte      SSH_MSG_CHANNEL_REQUEST
  uint32    recipient channel
  string    "signal"
  boolean   FALSE
  string    signal name (without the "SIG" prefix)

Note the FALSE value there. This says that the server should not send a success/failure response, so we don't get any indication back whether it succeeded or was just ignored. While I could try changing FALSE to TRUE there, that wouldn't match what was in the RFC, and I'm not sure I could count on all SSH implementations actually sending a proper reply in that case.

That said, I agree that it's probably worth documenting that this message is not currently supported by OpenSSH's, since the question has come up a couple of times now. I'll add that to my TODO list.

Thanks for the report, and for the suggested workaround -- that shell trick is perhaps a better (or at least more flexible) way of working around the fact that OpenSSH doesn't currently support this signal message, at least in cases where you aren't sending other data via stdin.

@ronf

This comment has been minimized.

Copy link
Owner

@ronf ronf commented Nov 11, 2017

The warning for this is now checked into the "develop" branch and will be rolled into the next release.

@ronf

This comment has been minimized.

Copy link
Owner

@ronf ronf commented Nov 16, 2017

This doc update is now released in AsyncSSH 1.11.1.

@ronf ronf closed this Nov 16, 2017
mrocklin added a commit to mrocklin/distributed that referenced this issue Aug 4, 2019
It turns out that OpenSSH doesn't pass through terminate/kill signals,
so we had some zombie processes hanging around sending signals around where
they shouldn't.

Now we place idle and death timeouts on the launched processes to keep them in
check.

See ronf/asyncssh#112 for more information on the
underlying issue.
mrocklin added a commit to dask/distributed that referenced this issue Aug 4, 2019
It turns out that OpenSSH doesn't pass through terminate/kill signals,
so we had some zombie processes hanging around sending signals around where
they shouldn't.

Now we place idle and death timeouts on the launched processes to keep them in
check.

See ronf/asyncssh#112 for more information on the
underlying issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.