Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Add create_read_pipe_protocol/create_write_pipe_protocol to SubprocessProtocol #115

Closed
GoogleCodeExporter opened this issue Apr 10, 2015 · 44 comments

Comments

@GoogleCodeExporter
Copy link

I don't like the current API for subprocesses, I don't want to use a low-level 
protocol and I would prefer to use something like StreamReader and StreamWriter 
to get asynchronous read and write operations.

Attached patch is a first step in this direction to allow to override some 
methods: it allows to create a custom protocol for read and write pipes (stdin, 
stdout and stderr).

The patch adds create_read_pipe_protocol() and create_write_pipe_protocol() to 
SubprocessProtocol, the methods get two parameters (transport, fd).

See my subproces-stream branch in Tulip repository if you want to see the whole 
code to implement stream-like protocols for subprocess pipes.

Original issue reported on code.google.com by victor.s...@gmail.com on 26 Jan 2014 at 11:37

Attachments:

@GoogleCodeExporter
Copy link
Author

I have a different suggestion.

Instead of allowing to change the pipe protocol classes, write a subprocess 
protocol that implements pipe_data_received().

Attached are two separate examples, one showing how to do the reading, one to 
show how to do the writing.  Only a few lines in the asyncio package need to be 
changed (to allow StreamWriter to be called with protocol and reader set to 
None, as discussed).

Of course it is clear that this is still too complex, but I plan to work next 
on a version where the subprocess protocol class is incorporated in streams.py, 
and also the helper or helpers (I haven't decided yet) for starting a 
subprocess.  This is the pattern in streams.py: add helper functions to 
streams.py that return a stream reader and writer instead of a transport and 
protocol; these helpers have the same signature as the event loop methods they 
wrap, except the protocol factory argument is removed.

We separately need to implement flow control for pipes; you made a good start 
with that in your branch (especially taking the FIXMEs into account). Can you 
develop that further? Once pipe flow control exists I hope that there is an 
obvious refactoring of "MyProtocol" from the examples (unifying the read and 
write versions) and StreamReaderProtocol to share more code. (It's also 
possible that I will end up moving some of the logic into StreamReader as part 
of the refactoring.)

(PS. while trying to minimize the changes for this patch I noticed that there's 
a dummy eof_read() implementation at the end of WriteSubprocessPipeProto. I 
think it can go -- the base protocol class already defines an empty eof_read() 
method, and the write protocol class doesn't have it.)

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 1:17

Attachments:

@GoogleCodeExporter
Copy link
Author

A separate remark.

There are certain caveats if you connect both a reader and a writer to the same 
subprocess and do the reading/writing in the same task; that can cause a 
classic deadlock if a buffer fills up (even if we implement flow control).

The convenience API should still *support* requesting both a reader and a 
writer, because one could easily handle them in separate tasks.

But perhaps the *most* convenient API should mimic the behavior of popen, and 
return either a reader or a writer?

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 1:27

@GoogleCodeExporter
Copy link
Author

Here is a more versatile version of the example, subprocess_shell.py.  (No 
additional changes are required.)  Note the use of stderr=subprocess.STDOUT to 
capture the stderr output in the same reader as the stdout output.

Expected output:

--------------------
sending b'one\n'
sending b'two\n'
sending b'three\n'
received b'one\n'
received b'two\n'
received b'three\n'
received b''
--------------------
sending b'three\n'
sending b'four\n'
ERROR b'three\n'
ERROR b'four\n'
received b''
--------------------
No stdin
received b'Foo\n'
received b''
--------------------
No stdin
received b'Foo\n'
ERROR b'Bar\n'
received b''
--------------------
No stdin
No stderr
received b'Foo\n'
received b'Bar\n'
received b''
--------------------
No stdin
No stderr
Bar
received b'Foo\n'
received b''
--------------------
No stdin
No stdout
Foo

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 4:57

Attachments:

@GoogleCodeExporter
Copy link
Author

Note that the code up to and including the shell() coroutine is the code that I 
propose to add to streams.py.  The rest is examples of use.

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 5:00

@GoogleCodeExporter
Copy link
Author

Unfortunately there's a flaw in my plan.

Adding flow control and supporting drain() will require adding pause_writing() 
and resume_writing() methods to class WriteSubprocessPipeProto (in 
base_subprocess.py), and then we would have to add 
_pipe_{pause,resume}_writing(fd) to class BaseSubprocessTransport, and those in 
turn would have to call new protocol methods pipe_{pause,resume}_writing(fd) on 
the SubprocessProtocol class, and then we would have to implement *those* in 
MyProtocol (in my new subprocess_shell.py example) to do the same thing that 
pause_writing() and resume_writing() do in StreamReaderProtocol (in streams.py).

Fortunately a simplification is possible: this only applies to the stdin pipe, 
so there is no reason to implement APIs that take an fd. So the 
SubprocessProtocol class (and MyProtocol) would only need to add the existing 
pause_writing() and resume_writing() (which technically are already defined 
because SubprocessProtocol inherits from BaseProtocol, and that's where they 
are defined).

So in the end maybe all we need to do is add this to WriteSubprocessProtocol:

    def pause_writing(self):
        self.proc._protocol.pause_writing()

    def resume_writing(self):
        self.proc._protocol.resume_writing()

-- plus of course the flow control implementation (starting with your code in 
_UnixWritePipeTransport in unix_events.py the subproces-stream branch); but 
we'd need that regardless.

Then MyProtocol can share the pause logic with StreamReaderProtocol, after a 
suitable refactoring of the latter.

All in all I still think I like my version better. It keeps the 
{Read,Write}SubprocessPipeProto implementation classes fixed, putting all 
flexibility in the SubprocessProtocol interface instead.

Thinking more about it, I think I like it better because it doesn't require 
exposing and subclassing base_subprocess.WriteSubprocessPipeProto. So issue 19 
isn't dead yet! I still believe that we should not encourage subclassing 
implementation classes by library users.

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 5:56

@GoogleCodeExporter
Copy link
Author

Here's another refactoring. I added primitive flow control to the low-level 
subprocess code and implemented drain by refactoring the flow-control part of 
StreamReaderProtocol into a separate base/mixin class.

I have plans for another refactoring: the flow control code from socket 
transports needs to be made reusable so that I can inherit this in the pipe 
transport. But that is completely optional -- the current flow control works, 
it is just somewhat inefficient (it pauses the task as soon as the kernel 
buffer fills up).

Victor, I've seen your changes in the subproces-stream branch -- can I get some 
comments from you on what I'm doing here?

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 5:48

Attachments:

@GoogleCodeExporter
Copy link
Author

Here's the refactoring I promised.

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 6:23

Attachments:

@GoogleCodeExporter
Copy link
Author

my_subprocess_3.patch implements what I asked for: streams for stdout and 
stderr, and stream for stdin with a working drain() method. Thanks.

my_subprocess_3.patch looks simpler than my subproces-stream branch. I tried to 
leaved base_subprocess unchanged, if it's too late to add a stream subprocess 
in python 3.4. But even if I move my code from the child class to the parent 
class, my code is still more complex.

I don't understand why WriteSubprocessPipeProto is so much linked to 
BaseSubprocessTransport. A pipe transport is created with a waiter which waits 
indirectly until the connection_made() method of the protocol is called, so 
WriteSubprocessPipeProto code can be simplified. I wrote a new patch based on 
yours which includes this cleanup.

A wait() method on the protocol, similar to subprocess.Popen.wait(), would be 
convinient and may avoid process zombis. I also added such method using the 
code I wrote in my branch.

Your example is complex, I prefer to move the class and add 
subprocess_exec/subprocess_shell helpers directly in asyncio. The helpers do 
the "yield from protocol.waiter trick" for you. The helpers are similar to what 
open_connection() provides (streams). I prefer to return also the transport to 
get transport methods like get_pid(), and return the whole protocol to get all 
methods including wait().

I'm not sure that it is correct to link stdin to stdout in StreamWriter with 
the reader parameter. It may be possible that stdout file descriptor is closed 
in the child process, whereas stdin is not. For socket, it's different because 
it is the same file descriptor for read and write ends. By the way, I don't 
understand why the writer raise if the reader has an exception: is it an 
optimization to fail earlier when a socket is closed?

In my_subprocess_3.patch, ReadSubprocessPipeProto has pause/resume_writing 
methods, which is wrong. I splitted the code using a mixin to avoid that.

SubprocessStreamProtocol with subprocess_shell/subprocess_exec helpers can be 
commited in a second changeset if you prefer. But as I already wrote before, 
the current API is very hard to use.

I also modified your example to use wait(), so it's more easy to cancel all 
tasks at once.

Original comment by victor.s...@gmail.com on 27 Jan 2014 at 9:50

Attachments:

@GoogleCodeExporter
Copy link
Author

I wanted to test my patch on Windows, but subprocess transport is not 
implemented yet on Windows!

Original comment by victor.s...@gmail.com on 27 Jan 2014 at 10:00

@GoogleCodeExporter
Copy link
Author

I'll respond to your longer message after I've reviewed your code. (I'm 
confident that we can get this into the 3.4 release candidate even though we 
missed the beta.)

As to Windows, there is subprocess code in windows_events.py, but only in the 
ProactorEventLoop, so you'll have to use that.

Original comment by gvanrossum@gmail.com on 27 Jan 2014 at 11:50

@GoogleCodeExporter
Copy link
Author

>I'll respond to your longer message after I've reviewed your code.

Instead of an huge patch, I just created a new "subprocess_stream" branch with 
shorter commits. It should be easier to review.

> (I'm confident that we can get this into the 3.4 release candidate even 
though we missed the beta.)

Nice.

> As to Windows, there is subprocess code in windows_events.py, but only
> in the ProactorEventLoop, so you'll have to use that.

Ah yes. I updated asyncio documentation to explain that.

I found a bug: broken pipe error was not handled correctly: 
SubprocessStreamProtocol.pipe_connection_lost() doesn't wake up the drain 
waiter, it should pass the exception.

I fixed this bug in my branch using a new 
FlowControlMixin._wakeup_drain_waiter() method (I'm not good at naming methods, 
don't hesitate to rename it). I also modified pipe protocols to not not broken 
pipe and connection reset errors.

My branch works on Windows. I just had to adapt the example: use IOCP event 
loop and use subprocess_exec() with a Python script (python -c ...), instead of 
a shell script.

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 1:32

@GoogleCodeExporter
Copy link
Author

I just got out of class and will soon have to catch the train home.  Still 
haven't seen your code in detail, responding to your messages here (in the 
wrong order, sorry).

Thanks for the new branch, I am sure it will be helpful.

I always meant to move the protocol class and the shell method from the example 
to streams.py, I just hadn't gotten to it.  We need to talk about the 
convenience function names which appear as toplevel functions -- I want them to 
be different from but reminiscent of the loop methods they wrap.  Maybe 
run_shell() and run_exec()?

I'm fine with adding get_pid() and wait() helper, but where should they go? 
It's possible that any (or all!) of the streams are not present.

pause/resume_writing() are defined in BaseProtocol so it should be fine (and 
harmless) if ReadSubprocessPipeProto implements them.

You make a good point about linking stdin to stdout -- they should probably not 
be linked at all.

You may consider WriteSubprocessPipeProto an implementation detail of 
SubprocessTransport. The user should only have to define the methods on 
SubprocessProtocol that it calls. (That's how my patch works.)

Gotta run.

Original comment by gvanrossum@gmail.com on 28 Jan 2014 at 2:10

@GoogleCodeExporter
Copy link
Author

The new subprocess_stream branch is definitely a move in the right direction. 
Great job reducing things to their essence!

I think the changes to the following files are good to go, and you're free to 
commit these to the default branch.

- selector_events.py (refactor write flow control)
- unix_events.py (add write flow control for pipes, skip logging 
BrokenPipeError/ConnectionResetError)
- proactor_events.py (skip logging BrokenPipeError/ConnectionResetError)
- *part of* base_subprocess.py (killing eof_received(), adding 
pause_reading/writing) 

Maybe the correct grouping is to commit the 
BrokenPipeError/ConnectionResetError changes first (and together).

The pipe write flow control needs a unit test.

These are nearly there:

- streams.py (refactor write flow control -- the docstring is XXX)
- the rest of base_subprocess.py (getting rid of _try_connected() -- I have to 
research why it was originally not done that way)

I still have some reservations about:

- subprocess_stream.py (I want different names for the helper functions, there 
are some attribute naming inconsistencies, an odd docstring formatting, and I 
wonder if it should just be merged into streams.py -- otherwise should probably 
be named subprocess_stream*s*.py)
- __init__.py (depends on what we do for subprocess_stream.py)

The example also needs some cleaning up.

Original comment by gvanrossum@gmail.com on 28 Jan 2014 at 4:50

@GoogleCodeExporter
Copy link
Author

> I want them to be different from but reminiscent of the loop methods
> they wrap. Maybe run_shell() and run_exec()?

I propose run_shell() (use subprocess_shell) and run_program (subprocess_exec). 
I dislike the name "run_exec", it's not directly related to subprocess.

> I'm fine with adding get_pid() and wait() helper, but where should they go?
> It's possible that any (or all!) of the streams are not present.

I proposed that run_shell() and run_program() return a tuple (transport, 
protocol), so you can pick what you need:

- transport.close()
- transport.get_pid()
- transport.get_extra_info('subprocess')   # subprocess.Popen object
- protocol.wait(): future waiting for the completion of the process
- prococol.stdin, protocol.stdout, protocol.stderr (can be None)

By the way, I'm not sure that creating 3 pipes for stdin, stdout and stderr *by 
default* is a safe choice. As you mentionned before, it's easy to block a 
program if pipes are not consumed correctly. subprocess doesn't create any pipe 
by default. Why not using the same choice.

I prefer an explicit run_shell(stdout=subprocess.PIPE), rather than the 
opposite run_shell(stdin=None, stderr=None).

With subprocesses, it's easy to leave open pipes or get zombi processes. What 
is the same pattern to avoid such issues? Always call transport.close()?

SubprocessStreamProtocol.wait() should maybe wait until all pipes are closed. I 
see that BaseSubprocessTransport waits until all pipes are closed before 
calling connection_lost(), but I chose to use process_exit() to wake up the 
wait() waiter. The documentation of SubprocessProtocol.connection_lost() and 
Subprocess.process_exit() should be more explicit about the state of the 
process: process exited? pipes closed?

If you only return (stdin, stdout, stderr), it becomes more tricky to ensure 
that all pipes are closed and the process state was read (to avoid zombi).

> pause/resume_writing() are defined in BaseProtocol so it should be fine (and
> harmless) if ReadSubprocessPipeProto implements them.

Oh I missed that, I reverted my changeset. That's the advantage of small 
changesets ;-)

> You make a good point about linking stdin to stdout -- they should probably
> not be linked at all.

Ok, fixed.

> I think the changes to the following files are good to go, and you're free
> to commit these to the default branch.

I prefer to wait until the whole branch is ready to start merging. I would 
prefer to merge the whole branch at once. You prefer to cherry-pick commits and 
reorganize them?

> The pipe write flow control needs a unit test.

Ok, it can be done later :)

> I still have some reservations about:
>
> - subprocess_stream.py (I want different names for the helper functions,
> there are some attribute naming inconsistencies, an odd docstring
> formatting, and I wonder if it should just be merged into streams.py --
> otherwise should probably be named subprocess_stream*s*.py)

I renamed the file to add the trailing S.

> The example also needs some cleaning up.

I added a new examples/shell.py to show more (simple) examples.

I pushed my changes directly in the subprocess_stream branch.

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 9:48

@GoogleCodeExporter
Copy link
Author

Thanks! I think the big point left is the convenience API. You make a bunch of 
good points but I am still hesitant:

- I don't like that the shell/program names don't match the shell/exec in the 
loop API, nor the subprocess.py API, nor the os.system/popen/spawn*/exec* APIs. 
 We should be able to model this after *something* else rather than inventing a 
new API every time.

- It seems pretty arbitrary that wait() and std{in,out,err} are protocol 
methods/attributes but close(), get_pid() and get_extra_info() are transport 
methods.  It may be better to have a single object that implements all of this 
functionality and delegates to the appropriate parts.

- Agreed on concerns about creating three pipes by default.

Here is how I propose to proceed:

- I will cherry-pick the refactorings and other small improvements to Tulip's 
default branch, and also merge them into the CPython repo.

- I will ask for API design help on python-dev.

- I don't want to wait until we've reached agreement on the API (in the worst 
case scenario, the discussion takes us beyond RC1 and then the backup plan 
should be that subprocess_streams.py can be a helper module distributed via 
PyPI (you can include a copy in Trollius) until 3.5 comes along.

Original comment by gvanrossum@gmail.com on 28 Jan 2014 at 4:10

@GoogleCodeExporter
Copy link
Author

> - I don't like that the shell/program names don't match the shell/exec in
> the loop API, nor the subprocess.py API, nor the
> os.system/popen/spawn*/exec* APIs.  We should be able to model this after
> *something* else rather than inventing a new API every time.

What's the problem with the original names subprocess_exec() and 
subprocess_shell()?

> - It seems pretty arbitrary that wait() and std{in,out,err} are protocol
> methods/attributes but close(), get_pid() and get_extra_info() are transport
> methods.  It may be better to have a single object that implements all of
> this functionality and delegates to the appropriate parts.

Agreed, I added get_pid(), get_subprocess() and close() methods to the 
subprocess (stream) protocol. So helper functions now only returns the 
protocol, the transport is dropped (as does open_connection()).

> - Agreed on concerns about creating three pipes by default.

Cool :-)

> Here is how I propose to proceed:
>
> - I will cherry-pick the refactorings and other small improvements to
> Tulip's default branch, and also merge them into the CPython repo.
>
> - I will ask for API design help on python-dev.

Ok.

> - I don't want to wait until we've reached agreement on the API (in the
> worst case scenario, the discussion takes us beyond RC1 and then the backup
> plan should be that subprocess_streams.py can be a helper module distributed
> via PyPI (you can include a copy in Trollius) until 3.5 comes along.

I don't like the backup plan, I would prefer to avoid it :-)

Victor

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 4:30

@GoogleCodeExporter
Copy link
Author

> > - Agreed on concerns about creating three pipes by default.
> Cool :-)

I changed that in my branch.

I also simplified parameter passing between subprocess functions. All these 
methods and classes are private, so I don't think that it matters. The extra 
parameter was unused, I removed it.

By the way, is it safe to use buffered pipes? I mean using bufsize != 0. In my 
opinion, it's not safe, and an error should be raised if bufsize is specified 
(to a value different than zero).

@Guido: Since you plan to cherry-pick, you can decide if you want these cleanup 
changes or not ;-)

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 5:17

@GoogleCodeExporter
Copy link
Author

IMO SubprocessStreamProtocol should be also almost a drop-in replacement of 
Popen. So I added send_signal(), kill(), terminate() methods (which are present 
in the transport) and an optional timeout parameter to wait().

Ok, the code is now ready for a review ;-)

(By the way, Popen has no get_pid() but a standard Popen.pid attribute.)

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 5:32

@GoogleCodeExporter
Copy link
Author

Oh, I see that Guido merged most of the changes to support subprocess stream 
protocol. Here is a patch implementing the last part of the puzzle: add 
create_subprocess() and create_shell_subprocess() functions. The functions 
return a SubprocessStreamProtocol instance which has an API close to 
subprocess.Popen API.

The patch is a little bit different than the file in my subprocess_stream 
branch: helper functions get a different name, get_pid() replaced with a pid 
attribute, reformat a docstring.

The patch adds also two new examples: 

- examples/shell.py
- examples/subprocess_shell.py

The examples were written to test the patch, they can be simplified or simply 
dropped if you prefer.

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 10:25

Attachments:

@GoogleCodeExporter
Copy link
Author

Getting better all the time!  (Time to kill the subprocess_stream branch and 
instead just swap patches here?)

I noticed shell.py crashes, undoubtedly because of issue 118.  Why not set 
self._transport = None in connection_closed() and check for self._transport in 
send_signal() etc.?

There's still a lack of symmetry in the toplevel helper names -- 
create_shell_subprocess() and create_subprocess(). Why not call them 
create_subprocess_shell() and create_subprocess_exec()?

Specifying which pipes you want and then getting them feels too verbose 
(especially since it requires you to import the subprocess module just to use 
the PIPE constant).

Can you unify the examples?  Note that some of the commented-out examples in my 
original example were actually useful -- I just commented them out while I was 
focusing on the flow control example.  In particular there were various 
combinations of the subprocess writing to stdout and/or stderr and the parent 
pulling those out separately, or not at all, or having stdout and stderr 
combined in a single pipe (the subprocess.STDOUT special constant).

You might also want to patch the docstring in child_process.py explaining that 
it is a demo for loop.connect_{read,write}_pipe(), but that for a demo of the 
subprocess management capabilities there's a separate example.

Original comment by gvanrossum@gmail.com on 28 Jan 2014 at 11:06

@GoogleCodeExporter
Copy link
Author

> I noticed shell.py crashes, undoubtedly because of issue 118.

I just fixed the issue 118.

> Why not set self._transport = None in connection_closed()
> and check for self._transport in send_signal() etc.?

I prefer to not duplicate states of the child process, Popen already has its 
own state, the transport has its state. For example, I would like to call 
transport.get_returncode() even after the exit of the process. It's a common 
case when wait() is called.

Oh, I just notificed that transport clears is _proc attribute, but not the 
"subprocess" extra info. But I think that protocol.get_subprocess() should 
raise an exception if the process exited. I don't see why you would need the 
Popen object after the process exited.

> There's still a lack of symmetry in the toplevel helper names --
> create_shell_subprocess() and create_subprocess(). Why not call them
> create_subprocess_shell() and create_subprocess_exec()?

Ah yes, it sounds better.

> Specifying which pipes you want and then getting them feels too verbose
> (especially since it requires you to import the subprocess module just to
> use the PIPE constant).

You mean that you prefer to create a pipe for stdin, stdout and stderr by 
default?

I prefer to not create any pipe by default to avoid the common issue of process 
blocked because a pipe is full. I rarely need to feed a process stdin, and 
usually I don't care of stderr.

I'm not shocked of having to import subprocess to get the PIPE constant. In 
fact, they are at least 2 other useful constants: STDOUT (stderr=STDOUT) and 
DEVNULL. In my opinion, it's my common to redirect all standard streams to 
/dev/null than creating a new pipe for all standard streams. It's not because I 
use asyncio that I always want to retrieve the output. Sometimes, waiting for 
the process exit is enough.

> Can you unify the examples?

With my patch, there are 3 examples. But each example is different:
- child_process.py shows the low-level transport/protocol API
- shell.py shows the common pattern: feed stdin, read stdout
- subprocess_shell.py is the most advanced example because it handles stdin, 
stdout and stderr at the same time

subprocess_shell.py should maybe just use python as child_process.py does.

I prefer to improve examples later and focus on the asyncio module instead 
right now :)

> Note that some of the commented-out examples in
> my original example were actually useful -- I just commented them out while
> I was focusing on the flow control example.  In particular there were
> various combinations of the subprocess writing to stdout and/or stderr and
> the parent pulling those out separately, or not at all, or having stdout and
> stderr combined in a single pipe (the subprocess.STDOUT special constant).

Oh, don't hesitate to restore your full example.

> You might also want to patch the docstring in child_process.py explaining
> that it is a demo for loop.connect_{read,write}_pipe(), but that for a demo
> of the subprocess management capabilities there's a separate example.

I tried to describe shorter what each example does/shows.

All these examples should be rewritten later to make them simpler.

I attach a new version of my patch. I renamed the helper methods, updated 
docstring of the examples, and the protocol now raises a ProcessLookupError if 
the process exited on methods: get_subprocess(), send_signal(), terminate(), 
kill().

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 11:35

Attachments:

@GoogleCodeExporter
Copy link
Author

> (Time to kill the subprocess_stream branch and instead just swap patches 
here?)

I'm not using the branch anymore, but I prefer to close it when all code will 
be "merged" (manually). I will just close the branch without merging.

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 11:36

@GoogleCodeExporter
Copy link
Author

For the names: I would prefer to rename BaseEventLoop methods to add "create_" 
prefix, it would be more consistent with other methods (create_connection, 
create_server) and drop the prefix from helper functions.

Original comment by victor.s...@gmail.com on 28 Jan 2014 at 11:40

@GoogleCodeExporter
Copy link
Author

The following paragraph is still unclear to me. Does wait() only wait for the 
process exit, or also until all pipes are closed? Why is responsible to close 
all pipes?

"SubprocessStreamProtocol.wait() should maybe wait until all pipes are closed. 
I see that BaseSubprocessTransport waits until all pipes are closed before 
calling connection_lost(), but I chose to use process_exit() to wake up the 
wait() waiter. The documentation of SubprocessProtocol.connection_lost() and 
Subprocess.process_exit() should be more explicit about the state of the 
process: process exited? pipes closed?"

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 12:01

@GoogleCodeExporter
Copy link
Author

While in common cases the pipes are closed when the process exits, there are 
various scenarios where this is not the case:

- When the subprocess creates a sub-subprocess that inherits the pipes and then 
the subprocess exits, the parent can reap the subprocess's exit status but the 
pipes remain open until the last sub-subprocess closes them.

- OTOH, the subprocess could close all pipes but keep running.

Of course, it is also possible that different pipes are closed at different 
times.  So there are four different things you might conceivably wait for: the 
process exit, and the three pipe closures. For the stdin pipe, you will get a 
broken pipe exception, while for the stdout and stderr pipes you will read EOF 
(i.e. an empty string).

This is why we have the process_exit() callback on the subprocess protocol; the 
connection_closed() callback waits for all conditions together (so after it 
returns you know you are really done with all four items).

(I haven't looked at your new patch yet.)

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 12:11

@GoogleCodeExporter
Copy link
Author

Oh, the helper functions should allow to specify the limit of StreamReader. 
open_connection() has an optional limit parameter for that. I think we should 
do the same here.

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 12:14

@GoogleCodeExporter
Copy link
Author

I don't want to change the base loop method names for subprocesses, it would be 
much harder to defend an exemption from the feature freeze.

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 12:14

@GoogleCodeExporter
Copy link
Author

pipe_connection_lost() should not set stdin/stdout/stderr to None, otherwise it 
is not possible to use these streams after the pipes are closed which is a mess.

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 12:49

@GoogleCodeExporter
Copy link
Author

> Oh, the helper functions should allow to specify the limit of StreamReader. 
open_connection() has an optional limit parameter for that. I think we should 
do the same here.

> pipe_connection_lost() should not set stdin/stdout/stderr to None, otherwise 
it is not possible to use these streams after the pipes are closed which is a 
mess.

Fixed in patch 3.

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 12:57

Attachments:

@GoogleCodeExporter
Copy link
Author

Odd. When running shell.py I get a traceback in ls():

python3 <examples/shell.py
pid: 20123
cat write: 'Hello World!'
cat read: 'Hello World!'
(exit code 0)
Traceback (most recent call last):
  File "<stdin>", line 54, in <module>
  File "/Users/guido/tulip/asyncio/base_events.py", line 178, in run_until_complete
    return future.result()
  File "/Users/guido/tulip/asyncio/futures.py", line 236, in result
    raise self._exception
  File "/Users/guido/tulip/asyncio/tasks.py", line 281, in _step
    result = next(coro)
  File "<stdin>", line 29, in ls
  File "/Users/guido/tulip/asyncio/subprocess_streams.py", line 133, in create_subprocess_exec
    stderr=stderr, **kwds)
  File "/Users/guido/tulip/asyncio/base_events.py", line 572, in subprocess_exec
    protocol, args, False, stdin, stdout, stderr, bufsize, **kwargs)
  File "/Users/guido/tulip/asyncio/unix_events.py", line 162, in _make_subprocess_transport
    extra=extra, **kwargs)
  File "/Users/guido/tulip/asyncio/base_subprocess.py", line 34, in __init__
    stderr=stderr, bufsize=bufsize, **kwargs)
  File "/Users/guido/tulip/asyncio/unix_events.py", line 409, in _start
    universal_newlines=False, bufsize=bufsize, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'limit'

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 1:09

@GoogleCodeExporter
Copy link
Author

I think you have most faithfully recreated the API of the Popen object. That 
may actually be a good plan. We could even add poll(), communicate() and 
returncode.

Perhaps we should go further and add all the convenience methods and constants 
from the subprocess module too?  E.g.

call()
check_call()
check_output()
DEVNULL
STDOUT
PIPE

If we go with this idea we should probably also switch the "main" call to a 
single function with a shell=True/False (and defaulting to False) -- although I 
can't get myself to name it Popen, unless we actually make it a class.

Which reminds me -- I do like the idea that the returned object is not itself a 
Protocol object but just holds a reference to one -- so clueless users can't 
try to call Protocol methods like connection_closed().

And do we need close()?  (Maybe -- what's the use case?  Popen doesn't have it.)

If we go this route (emulate the subprocess module) I have one additional 
concern -- that's a lot of API surface to export as part of the toplevel 
asyncio module.  Maybe these should be in a submodule named "subprocess" 
(always imported) so that you can write e.g.

proc = asyncio.subprocess.call("ls -l", shell=True)

?

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 4:48

@GoogleCodeExporter
Copy link
Author

I like very much the idea of emulating Popen (in fact, my proposal a few days 
ago was to just use Popen and then connect pipes as needed).  It makes porting 
of synchronous code using subprocess into asyncio so much easier.  This does 
not mean that you have to support 100% of the Popen methods in the emulated 
version; it is perfectly reasonable to emulate only a subset.

Original comment by gjcarne...@gmail.com on 29 Jan 2014 at 10:48

@GoogleCodeExporter
Copy link
Author

Guido started a thread on the python-tulip Google Group, please continue the 
discussion there to not split the discussion.

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 11:04

@GoogleCodeExporter
Copy link
Author

I updated my subprocess_stream branch, it is now based on 
subprocess_streams-3.patch with the following changes:

- fix "limit" bug
- add a new communicate() method
- add a new returncode attribute: communicate() waits for the process exit, so 
you may just get returncode after calling communicate()

stdout, stderr = yield from proc.communicate(b'data)
returncode = proc.returncode

Calling .wait() creates a coroutine looks seem overkill on such example.

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 11:49

@GoogleCodeExporter
Copy link
Author

This looks pretty good! I still have nagging concerns about the namespace but 
it seems your proposal is that create_subprocess_shell/exec are available at 
the asyncio top level while the rest is only available on the 
asyncio.subprocess_streams submodule. I might want to rename it to just 
'subprocess' then, but otherwise that seems like a good plan.

I have some detailed comments, I'll try to give a line-by-line review in the 
repo.

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 4:10

@GoogleCodeExporter
Copy link
Author

I've now added my review: 
http://code.google.com/p/tulip/source/detail?r=d82f91239ad521ee9fdab51bf70343d02
c2bcdf8

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 5:48

@GoogleCodeExporter
Copy link
Author

FYI, I've pushed the foundational changes for all this (and issue 119) to the 
CPython repo (which is now once again up to date with the Tulip repo default 
branch). (Victor, I've marked your changes as coming from you using -u to give 
credit where due.) Hopefully the buildbots won't give us trouble. :-)

Original comment by gvanrossum@gmail.com on 29 Jan 2014 at 11:06

@GoogleCodeExporter
Copy link
Author

"FYI, I've pushed the foundational changes for all this (and  issue 119 ) to 
the CPython repo (which is now once again up to date with the Tulip repo 
default branch). (Victor, I've marked your changes as coming from you using -u 
to give credit where due.) Hopefully the buildbots won't give us trouble. :-)"

Nice, thanks. I also the documentation. Hum, in fact I documented wait_for(), 
it was not documented before!

Original comment by victor.s...@gmail.com on 29 Jan 2014 at 11:22

@GoogleCodeExporter
Copy link
Author

Ok, here is a review for my work-in-progress patch (which is the 
subprocess_stream branch):
https://codereview.appspot.com/52320045

Oops, I forgot to merge with default to retrieve the fix on 
Future.set_exception. I will do it in next patch, but I don't think it's 
required for a first review.

Original comment by victor.s...@gmail.com on 30 Jan 2014 at 6:16

@GoogleCodeExporter
Copy link
Author

A lot has happened in the repos, and generally I am happy with the API (and 
with the green buildbots :-). I have to think about issue you brought up on the 
list.

There's still one issue that bugs me: there are now three places where a 
constant STDOUT is defined -- not all with the same value, and two of these 
appear in __all__ variables.  There are also multiple STDIN and STDERR 
definitions.  This is all very confusing.  My proposal is to get rid of the 
ones that just define the standard (0, 1, 2) values, or give those more obscure 
names (e.g. FD_STDIN) and remove them from the __all__ exports (subprocess.py 
doesn't define these, so I think we don't have to either -- the actual constant 
values are universal).

Original comment by gvanrossum@gmail.com on 1 Feb 2014 at 6:02

@GoogleCodeExporter
Copy link
Author

Another subtle thing related to exported symbols: *most* submodules have an 
__all__ that indicates the things that the asyncio package should export.  But 
a few submodules have an __all__ even though the package doesn't reference it.  
I think this is confusing and I hope we can get rid of those.  Let's make a new 
rule: if it's in __all__ it is for consumption by customers of the package; if 
it's just for use by sibling submodules, it needn't be mentioned in __all__ -- 
the siblings are all considered "friends" (in the C++ sense).  If it's not for 
siblings, it should start with an underscore.  (But don't take this too 
literally -- this rule does not apply to imported items.  Those are never for 
export -- if you want to export something you imported, use an explicit 
assignment (you already do for subprocess.PIPE etc.)

Original comment by gvanrossum@gmail.com on 1 Feb 2014 at 6:17

@GoogleCodeExporter
Copy link
Author

Sorry, don't commit the proactor changes yet, I have a few review issues in 
Rietveld that I'm about to send.

Original comment by gvanrossum@gmail.com on 1 Feb 2014 at 6:27

@GoogleCodeExporter
Copy link
Author

Ok, issue fixed by changeset 94debf321122. I merged my subprocess_stream branch 
(manually) into default. Thanks for your feedback and reviews Guido. I also 
merged the changes in Python.

I ran runtests.py (Tulip) and python -m test test_asyncio (Python) with success 
on Linux and Windows.

Original comment by victor.s...@gmail.com on 1 Feb 2014 at 9:56

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Awesome job, Victor. Thanks you! And congrats. This was quite a complicated 
piece of precision surgery but well worth the effort.

Original comment by gvanrossum@gmail.com on 1 Feb 2014 at 10:04

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant