New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for talking to our stdin/stdout/stderr as streams #174

Open
njsmith opened this Issue May 27, 2017 · 21 comments

Comments

Projects
None yet
4 participants
@njsmith
Member

njsmith commented May 27, 2017

There should be a convenient and standard way to read and write from the trio process's stdin/stdout/stderr streams. (Note that this is different from talking to the stdin/stdout/stderr of child processes, which is part of #4.) Probably this should use our standard stream abstraction.

Complications to consider:

Normally Python's I/O stack does a bunch of work here: text/binary conversion, newline conversion, buffering, convenience parsing things like readline and line iteration. (I think that's it - anything else?) We have to decide whether we want to re-use it (basically doing run_in_worker_thread for everything) or reimplement the parts we want. {Send,Receive,}TextStream classes seems like a reasonable thing to provide in general, and they should probably implement universal newline support too, why not. (Not sure we even need ABCs for these - they could just be concrete classes? though I suppose someone might eventually come up with a situation where they have an object that natively acts like this without any underlying binary stream, and want to explicitly declare that the interface is the same.) Buffering I'm somewhat dubious of – when applied to stdin/stdout/stderr it often causes user-visible problems (delayed output), it's redundant with buffering done by the kernel (as usual), and we try to minimize it in general. It's particularly bad if you want to speak some automated interactive protocol over stdin/stdout, which seems like a case that might come up in trio relatively often. And convenience parsing (readline etc.) might be better handled using sans-IO style protocol objects?

It might even make sense to do both; #20 might mean that we have a 3 line solution for the "wrap an io.TextIOWrapper object" approach if that's what you want, and then also provide a lower-level more-direct stream-based API.

On Windows, the only reliable way to do non-blocking I/O to the standard streams is via threads. In particular, it's the only thing that works if we're connected to a regular console. Everywhere else, non-blocking I/O is possible (and the sensible thing if we do decide to cut out Python's io stack). Edit: See update below.

On Windows, you often have to do these separate console control calls for things like cursor movement and coloring text, which need to be synchronized with the output stream. (In the very very latest Win 10 update they finally added VT100 support to the console, but it will be a while before anyone can count on that.) I believe that the output is still binary (UTF-16) rather than using some kind of first-class text read/write API.

I know prompt_toolkit has an async API and they support a lot of fancy terminal stuff in pure Python - we should check what they need to make sure whatever we come up with matches.

@njsmith

This comment has been minimized.

Member

njsmith commented May 27, 2017

io.IncrementalNewlineDecoder might be useful if we need to implement our own universal newline support. It's not documented, unfortunately.

@buhman buhman self-assigned this Jun 13, 2017

@buhman

This comment has been minimized.

Member

buhman commented Jun 13, 2017

Related to #4, in previous projects I've played with feeding ptys to subprocesses instead of pipes (not sure about the correctness of the below):

import asyncio
from asyncio.base_subprocess import ReadSubprocessPipeProto
import os
import pty


async def subprocess_exec_pty(protocol_factory, *args, **kwargs):
    loop = asyncio.get_event_loop()

    stdout_master, stdout_slave = pty.openpty()
    stderr_master, stderr_slave = pty.openpty()

    transport, protocol = await loop.subprocess_exec(
        protocol_factory, *args,
        stdout=stdout_slave, stderr=stderr_slave, **kwargs)

    _, pipe = await loop.connect_read_pipe(
        lambda: ReadSubprocessPipeProto(transport, 1),
        os.fdopen(stdout_master, 'rb', 0))
    transport._pipes[1] = pipe

    _, pipe = await loop.connect_read_pipe(
        lambda: ReadSubprocessPipeProto(transport, 2),
        os.fdopen(stderr_master, 'rb', 0))
    transport._pipes[2] = pipe

    return transport, protocol

separate console control calls for things like cursor movement

Unless we're reimplementing prompt_toolkit, is this required to provide a valid {Send,Receive,}TextStream ? The inverse of above, what if we just presume our stdin/stdout is always not a console?

I think it would be convenient if the API mirrored the stdlib a little: trio.stdout or something might even be ok.

Is the implementation here specifically that we set sys.{stdin,stdout,stderr}.fileno to nonblocking, and provide our own *Stream classes that provide specialized asynchronous file object interfaces? Is it considered invalid user behavior to use print() afterwards? Shouldn't we provide some await print() too? What happen when someone wants to use a stdout logging handler?

@buhman

This comment has been minimized.

Member

buhman commented Jun 13, 2017

https://github.com/twisted/twisted/blob/twisted-17.5.0/src/twisted/internet/stdio.py

Yet:

asyncio doesn't support this directly yet (nice):

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 13, 2017

Oh wow yeah this is way nastier than I had realized.

So the absolute simplest solution would be to suggest people use wrap_file(sys.stdin) etc. (Or some equivalent convenience API.) That's effectively the only thing that works on Windows, and it's the only thing that works on unixes when an fd has been redirected to a file, and it's by far the simplest thing that avoids the nasty issue with different processes fighting over non-blockingness state. For those who want to speak some protocol over stdin/stdout we can make simple implementations of ReceiveStream/SendStream that delegate to a wrapped unbuffered binary file object. That all would work. And it works today, which is nice.

It has the downside that it's probably pretty slow compared to doing real non-blocking io in the cases where that's possible. So there's a specific use case we're talking about where this might be inadequate, the one where you're specifically trying to push bulk data through the standard descriptors, probably talking between two programs. So one question is whether and how we can do better for this case. Can we detect when the fd supports non-blocking operation? (Apparently from the twisted discussion it sounds like epoll will refuse to work, so that's one indication if nothing else. Not sure if kqueue works the same way. I guess just setting and then checking the nonblocking flag might work.) If we can detect that, then we can potentially offer two modes: the "always works" mode, and the "always works as long as no one else minds us setting things to non-blocking", and people who need speed and don't mind taking a risk can use the latter.

I don't know how important this feature is in practice. It might not be worth the complexity.

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 14, 2017

Oh, here's another fun issue to keep in mind: TextIOWrapper objects are not thread safe. This means that if we naively wrap_file(sys.std...), then the resulting object is unsafe to call from multiple tasks simultaneously. Which is worrisome because it's a global object. Perhaps some locking is in order.

The globalness of the standard descriptors causes several problems, actually. If we set them non-blocking, then it's not just other processes that get messed up, it's also naive calls to print or similar within our process. Obviously print is not a great thing to be calling all over the place in a robust app that wants to make sure it never blocks, but for things like debugging or test output it's pretty useful. pdb.set_trace is another example of why we might want to keep stdin/stdout working in blocking mode.

... And actually this is also trickier than it might seem, because the thread safety issue also applies between the main thread and worker threads, i.e. even if trio.stdout has some locking inside it so that all calls that go through it serialized, then they can still race with direct accesses to sys.stdout. It's possible that we could avoid this by using two different TextIOWrapper objects pointed at the same underlying BufferedIO object, which creates different possibilities for corrupt output when both are used at the same time, but at least internal data structures would survive.

Anyway, one thing this makes clear is that the decision to use the standard fds for programmatic purposes is really not something to take lightly – if you're going to do it then the whole program needs to agree on how.

Oh, I just remembered another fun thing about stdin: trying to read from it can cause your whole program to get suspended (SIGTSTP).

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 14, 2017

Whoops, I don't mean SIGTSTP, I mean SIGTTIN and SIGTTOU. And apparently both writing and reading can trigger suspension.

@buhman

This comment has been minimized.

Member

buhman commented Jun 14, 2017

unsafe to call from multiple tasks simultaneously

What about task-local storage?

but for things like debugging or test output it's pretty useful

That's why I was saying an 'await print()' helper would be useful.

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 14, 2017

Hmm, here's another trick, but it might not be widely applicable enough to be worthwhile: the send and recv syscalls accept a flags argument, and one of the flags you can pass is MSG_DONTWAIT, which makes a socket effectively non-blocking just for this call, without affecting any global state.

But... AFAICT this is supported only on Linux, not Windows or MacOS. On MacOS, the MSG_DONTWAIT constant appears to be defined, but it's not mentioned in the send man page, and it doesn't seem to work:

# MacOS
In [1]: import socket

In [2]: socket.MSG_DONTWAIT
Out[2]: 128

In [3]: a, b = socket.socketpair()

In [4]: while True:
   ...:     print("sending")
   ...:     res = a.send(b"x" * 2 ** 16, socket.MSG_DONTWAIT)
   ...:     print("sent", res)
   ...:     
sending
[...freezes...]

And on Windows it doesn't appear to be either documented or defined.

And, even on Linux, it only works on sockets. If I try using nasty tricks to call send on a pty, I get:

# Linux
In [3]: s = socket.fromfd(1, socket.AF_INET, socket.SOCK_STREAM)

In [4]: s.send(b"x")
OSError: [Errno 88] Socket operation on non-socket

and similarly on a pipe:

# Linux
In [5]: p1, p2 = os.pipe()

In [6]: s = socket.fromfd(p2, socket.AF_INET, socket.SOCK_STREAM)

In [7]: s.send(b"x")
OSError: [Errno 88] Socket operation on non-socket

This has me wondering though if there's any other way to get a similar effect. There was a Linux patch submitted in 2007 to make Linux native AIO work on pipes and sockets; I don't know if it was merged, but in principle it might be usable to accomplish a similar effect.

On pipes, if no-one else is reading from the pipe, then the FIONREAD ioctl can be used to find out how many bytes are ready to be read, so reading that much won't block. Of course, someone else might be reading from the pipe at the same time, steal them out from under you, and then you get blocked for an arbitrary amount of time, whoops. And for writing, there doesn't seem to be any similar trick (you can use F_GETPIPE_SZ to find out how big the pipe buffer is, but not how full it is; possibly there's some undocumented IOCTL somewhere that I'm missing). So maybe this is useless.

Maybe we should focus on making threaded I/O as fast as possible :-)


Unrelated issue: there's also some question about how a hypothetical trio.stdout should respond if someone replaces sys.stdout. This is a fairly common and supported thing. If we just do trio.stdout = trio.wrap_file(sys.stdout), and then someone does sys.stdout = ..., then trio.stdout will keep pointing to the old stdout. OTOH, if we make trio.stdout a special object that always looks up sys.stdout on every call, then... it won't work, because of locking issues. Le sigh.

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 14, 2017

What about task-local storage?

Task-local storage would be useful if there were some way to give each task its own private stdin, stdout, etc., but.... I'm not sure what that would mean? :-) Those are kind of inherently process-global resources.

That's why I was saying an 'await print()' helper would be useful.

await trio.print might well be useful (isn't it lucky that in Python print is a regular function, not a piece of special syntax?), but it doesn't help for sticking a quick debug print in a sync function, or for the pdb.set_trace() case.

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 22, 2017

Update: Apparently I was wrong! On Windows, It is possible to read/write to the console without doing blocking read/write in threads. Which is good, because ReadFile and WriteFile on the console can't be cancelled, and we'd really like to be able to cancel these operations (e.g. because the user hits control-C).

This stackoverflow question seems to have reasonable info (once you filter through all the partial answers). AFAICT, the basic idea is that you call GetStdHandle to get a HANDLE pointing to the console, which can be passed to one of the WaitFor functions, and once that returns then ReadConsoleInput can be called to pull out "events", which might be keypresses or might be other things like mouse movements. We need to support WaitFor anyway (#233), so this is all pretty reasonable. And for output, I guess you just use WriteConsole and friends (SetConsoleTextAttribute etc.), and since these can only be used to write to the console, they might be slow (and you might want to push them off into a worker thread), but they shouldn't block indefinitely.

Now, all the APIs mentioned in the previous paragraph assume that your program is attached to a regular console (like a TTY on unix). And you can always get access to whatever console you're running under (if any) by opening CONIN$ or CONOUT$, sort of like opening /dev/tty on Unix, which might be useful sometimes. But for most purposes, we want to also do something sensible when stdin/stdout/stderr are redirected, and in this case all of the above APIs will just error out, and we need to fall back on some other strategy. There are five cases that I know of:

  • standard stream is a console object
  • standard stream is a socket with OVERLAPPED support disabled (if a socket has OVERLAPPED support enabled then it can't be used as a standard stream, because... Windows)
  • standard stream is a named pipe (I think OVERLAPPED might or might not be enabled)
  • standard stream is an anonymous pipe
  • standard stream is an actual on-disk file

The first case (magic console objects) is described above.

Socket without OVERLAPPED support: well, we can use select and non-blocking I/O, though this might be tricky if we end up switching trio.socket to using IOCP (#52). I guess blocking I/O in a thread + CancelSynchronousIo might work? It might be possible to enable OVERLAPPED I/O via ReOpenFile? (It also has poorly-documented limitations.)

Named pipe: can't assume OVERLAPPED is available; maybe ReOpenFile works, maybe not. Anonymous pipe: these are basically named pipes, except with a bonus limitation: "Asynchronous (overlapped) read and write operations are not supported by anonymous pipes" (ref). So we'd need some strategy that doesn't use IOCP, I think.

On-disk files: well, here just plain old threads are OK, because reading/writing to a file might be slow but it shouldn't block indefinitely.

So tentatively I'm thinking:

  • We probably want some lowish-level Windows-specific API for talking to the console, that wraps ReadConsoleInput and WriteConsole+friends in a thin layer of trio compatibility code.

  • We should experiment with CancelSynchronousIo to figure out whether it can be used together with blocking ReadFile/WriteFile to handle the other cases here. [Edit: alternatively, we should experiment with WaitForSingleObject to see if it can be used to tell whether a subsequent call to ReadFile/WriteFile will block, similar to what python-prompt-toolkit does on Unix (see comment below). Though this discussion isn't promising....]

  • If both of these work out, then we can provide a layer on top that figures out which kind of console stream we have, and then uses the appropriate lower-level API to expose a standard Stream-based interface.

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 22, 2017

Also, note for reference: looking at the python-prompt-toolkit code, it appears that the way they do async interactive applications on Unix is to select to see if a standard stream is readable/writable, and then issue a blocking read/write, i.e. they leave the streams in blocking mode and then cross their fingers that this won't bite them. And I guess they get away with it, because I don't see any bug reports related to this...

@njsmith

This comment has been minimized.

Member

njsmith commented Jun 29, 2017

Further Windows update: while I still can't find any references to CancelSynchronousIo working on console reads through web search, @eryksun claims in this message that it does with some caveats. (Eryk, if you happen to have any thoughts on this thread in general that'd be very welcome... The topic is, how can one reliably read/write to stdin/stdout without blocking the main thread, and so that all operations that might block indefinitely are cancelable.)

Another note: GetFileType may also be useful here.

@eryksun

This comment has been minimized.

eryksun commented Jun 30, 2017

Unfortunately canceling a console read via CancelSynchronousIo doesn't work prior to Windows 8. I haven't seriously used Windows 7 in a long time, so I forget about its limitations until I go out of my way to test on it. I should have known better. The console has only had a real device driver since Windows 8. Older versions use an LPC port to communicate between a client and the console. In this case console buffer handles are allocated by the console itself. These pseudohandles are flagged with the lower 2 bits set (e.g. 3, 7, 11), so regular I/O functions know to redirect to console functions (e.g. CloseHandle -> CloseConsoleHandle). Without an actual I/O request, there's nothing for CancelSynchronousIo to cancel.

@njsmith

This comment has been minimized.

Member

njsmith commented Jul 6, 2017

libuv has a clever trick! If you want to set stdin/stdout/stderr non-blocking, and it's a tty, then you can use os.ttyname to get the device node for this tty and open a second copy of it. And this is by far the main case where we might have other programs confused by sharing stdin/stdout/stderr. (Read the patch and probably check the current code, there are a number of subtleties.)

That blog post also mentions that kqueue on MacOS doesn't work on ttys, which would be super annoying, but apparently this got fixed in 10.7 (Lion). I don't think we need to care about supporting anything older than 10.7. Apparently even 10.9 is already out of security-bugfix-land. (ref)

@njsmith

This comment has been minimized.

Member

njsmith commented Jul 13, 2017

@remleduff has made a remarkable discovery: on Linux, libuv's clever trick of re-opening the file can actually be done on anonymous pipes too, by opening /proc/self/fd/NUM. I guess this makes some sense if you recall that FIFOs can be opened multiple times for read and/or write, and anonymous pipes and FIFOs are the same thing, but I was still shocked. (On MacOS there's /dev/fd/NUM, but opening this unfortunately seems to just dup the existing fd rather than actually re-opening it.)

So this means that technically on Linux I think we actually can handle every common case:

  • tty or pipes: re-open
  • files on disk: use threads, like any other on disk file
  • character devices like /dev/zero or /dev/urandom: re-open probably works? Are character devices ever stateful in such a way that re-opening them gives an fd that acts differently from the original? For example, do character devices ever support seeking? It looks like they do on FreeBSD. But it's easy to test if a fd is seekable with os.lseek(fd, 0, os.SEEK_CUR). [Edit: well, modulo files that claim to be seekable but are lying, if those really exist.] Are there any non-seekable files that nonetheless maintain per-open state?
  • sockets: re-open fails, but MSG_DONTWAIT works

The first three cases cover the vast vast vast majority of stdin/stdout/stderr configurations that actually occur in practice. I'm not sure sockets are common enough to justify a whole extra set of code paths, but maybe.

@njsmith

This comment has been minimized.

Member

njsmith commented Jul 13, 2017

I also spent some time trying to figure out if there was a way to making blocking I/O cancellable. I don't think there is. maybe there is?

The first idea I considered is: start a thread that will sit blocked in read or write, and then if we want to cancel it, use pthread_kill to send a signal to trigger an EINTR. The problem is that this is inherently racy for the same reason that pthread cancellation requires special kernel help – you might have the signal arrive just before entering the syscall, or it might arrive just after successfully exiting, and you want to treat these differently (in the first case return failure, in the second case return successfully), but there's absolutely no way to tell the difference between them except by examining the instruction pointer, which requires you write your own asm for doing syscalls. So that's out.

The second idea I considered is: dup the fd, issue a read or write on the dup, and then if you want to cancel the read or write early, close the dup. (We can't close the original, because we still need it, but we can close the dup.) Unfortunately, on Linux at least this doesn't work: read on a pipe doesn't actually return until the write side of the pipe has been fully closed (i.e., there are no remaining fds pointing to it). If the fd it's actually using disappears out from under it then oh well, it doesn't care. ...And even if this worked, there'd still be a race condition if we closed the fd just before entering read, because it could be re-opened by another thread in the mean time. I guess we could fix the race condition by dup2ing a bad fd on top of the original fd, but that still doesn't help with the part where you can't wake it up.

OH WAIT THOUGH. What if we combine these. Option 3: dup the fd. Dispatch a blocking read or write to a thread using the dup. On cancellation, use dup2 to atomically overwrite the original fd with one for which we know read/write will fail (e.g. an unconnected socket). Then use pthread_kill to send a no-op signal to the worker thread.

If the dup2 happens before we enter read/write, then they'll fail immediately, like we want.

Otherwise, it means the dup2 happens after we enter read/write, which implies that the signal does as well. So one possibility is that the signal arrives while we're still in read/write. In this case it returns early with EINTR, CPython attempts to re-issue the call, and then the new calls fails with EBADF because this is after the dup2. Alternatively, the signal arrives after the read/write have completed, in which case it does nothing, which is again what we want.

This still has the problems that we have to claim a signal, and if we're running outside the main thread then Python doesn't provide an API for registering a signal handler (and I'm pretty sure that to get EINTR we need to have a C-level signal handler registered, even though we want it to just be a no-op). But we could potentially grab, like, SIGURG which hopefully no-one actually uses and is ignored by default, and use ctypes to call sigaction.

This is kind of a terrible idea, but I do think it would work reliably and portably on all Unixes for all fd types.

@njsmith

This comment has been minimized.

Member

njsmith commented Oct 8, 2017

DJB has some commentary on how properly written kernels should do things, which is completely correct and yet useless in practice, alas: https://cr.yp.to/unix/nonblock.html

@njsmith

This comment has been minimized.

Member

njsmith commented May 17, 2018

I guess this is some kind of argument for... something: https://gist.github.com/njsmith/235d0355f0e3d647beb858765c5b63b3

(It exploits the fact that setuid(getuid()) is a no-op except that limitations of the Linux syscall interface mean that libc setuid wrappers in multi-threaded programs have to seize control of all the other threads, which they do by sending them a signal, so this forces all other threads to restart whatever syscalls they were doing.)

njsmith added a commit to njsmith/trio that referenced this issue May 21, 2018

Add two more notes-to-self files
blocking-read-hack.py: This demonstrates a really weird approach to
solving python-triogh-174. See:
  python-trio#174 (comment)

ntp-example.py: A fully-worked example of using UDP from Trio,
inspired by
  python-trio#472 (comment)
This should move into the tutorial eventually.
@njsmith

This comment has been minimized.

Member

njsmith commented Oct 2, 2018

Here's the discussion about this in mio: carllerche/mio#321

It looks like libuv has an amazing thing where their tty layer on windows actually implements a vt100 emulator in-process on top of the windows console APIs: https://github.com/libuv/libuv/blob/master/src/win/tty.c

I looked at SIGTTIN/SIGTTOU again. This is a useful article. It sounds like for SIGTTIN, you can detect when you've been blocked from reading (ignore SIGTTIN, and then read gives EIO), but when writing you just get a signal + EINTR, which is pretty awkward given that Python signal delivery happens after some delay and that the os.write handler unconditionally retries on EINTR. Also, in both cases, AFAICT there's no way to get a notification when you can retry; you just have to poll. Maybe we should just ignore this issue and document it as a limitation – sometimes your process will get put to sleep, deal with it. (I doubt it's an issue for most of the cases where people want to speak protocols on stdin/stdout.)

@remleduff

This comment has been minimized.

Contributor

remleduff commented Oct 2, 2018

You've probably seen this already, but Windows 10 has been making large changes (improvements) to console handling. Is it better to have a wait-and-see attitude on this one, and just try to make it work really well starting with Windows 10?

https://blogs.msdn.microsoft.com/commandline/2018/06/20/windows-command-line-backgrounder/

@njsmith

This comment has been minimized.

Member

njsmith commented Oct 3, 2018

The console changes are great, but unfortunately, as far as I know none of them change the basic api that apps use to talk to their stdin/stdout when it's a console.

That API did get some work in win 8 – in particular some possibly useful cancellation support – but there's still no real async API afaik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment