New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning channels #719

Open
njsmith opened this Issue Oct 5, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@njsmith
Member

njsmith commented Oct 5, 2018

Channels (#586, #497) are a pretty complicated design problem, and fairly central to user experience, so while I think our first cut is pretty decent, we'll likely want to fine tune it as we get experience using them.

Here are my initial notes (basically questions that I decided to defer while implementing version 1):

In open_memory_channel, should we make max_buffer_size=0 default? Is this really a good choice across a broad range of settings? The docs recommend it right now, but that's not really based on lots of practical experience or anything. And it's way easier to add defaults later than it is to remove or change them.

For channels that allow buffering, it's theoretically possible for values to get "lost" without there ever being an exception (send puts it in the buffer, then receiver never takes it out). This even happens if both sides are being careful, and closing their endpoint objects when they're done with them. We could potentially make it so that ReceiveChannel.aclose raises an exception (I guess BrokenResourceError) if the channel wasn't already cleanly closed by the sender. This would be inconsistent with the precedent set by ReceiveStream (which inherited it from BSD sockets), but OTOH, ReceiveStream is a much lower-level API – to use a ReceiveStream you need some non-trivial engineering work to implement a protocol on top, while ReceiveChannel is supposed to be a simple high-level protocol that's usable out-of-the-box. Practically speaking, though, it would be pretty annoying if some code inside a consumer crashes, and then ReceiveChannel.__aexit__ throws away the actual exception and replaces it with BrokenChannelError.

Should memory channels have a sync close() method? They could easily, they only reason not to is that it makes life simpler for hypothetical IPC channels. I have no idea yet how common those will be, or whether it's actually important for in-process channels and cross-process channels to have compatible APIs.

I have mixed feelings about open_memory_stream returning a bare tuple (send_channel, receive_channel). It's simple, and allows for unpacking. But, it requires everyone to memorize which order they go in, which is annoying, and it forces you to juggle two objects, as compared to the traditional Queue design that only has one object. I'm convinced that having two objects is important for closure-tracking and IPC, but there is an option that's a kind of hybrid: it could return a ChannelPair object, that's a simple container for a .send_channel and .receive_channel object, and, if we want, the ChannelPair could have send and receive methods that simply delegate to its constituent objects. Then people who hate having two objects could treat ChannelPair like a Queue, while still breaking it into pieces if desired, or doing closure tracking like async with channel_pair.send_channel: .... But... I played with this a bit, and found it annoying to type out channel_pair.send_channel all the time. In particular, if you do want to split the objects up, then you lose the option of deconstructing-assignment, so you have to do cumbersome attribute access instead. (Or we could also support deconstructing-assignemtn by implementing __iter__, but that's yet another confusing way then...) For now my intuition is that closure tracking is so broadly useful that everyone will want to use it, and that's easiest with destructuring assignment. But experience might easily prove that intuition wrong.

Is clone confusing? Is there anything we can do to make it less confusing? Maybe a better name? So far people have flagged this multiple times, but we didn't have docs yet, so it could just be that...

And btw, the Channel vs Stream distinction is much less obvious than I'd like. No-one would look at those and immediately guess that ah, a Channel carries objects while a Stream carries bytes. It is learnable, and has some weak justification, and I haven't though of anything better yet, but if someone thinks of something brilliant please speak up.

Should the *_nowait methods be in the ABC, or just in the memory implementation? They don't make a lot of sense for things like inter-process channels, or like, say... websockets. (A websocket can implement the SendChannel and ReceiveChannel interfaces, just with the proviso that objects being sent have to be type str or bytes.) The risk of harm is not large, you can always implement these as unconditional raise WouldBlock, but maybe it will turn out to be more ergonomic to move them out of the ABC.

We should probably have a trio.testing.check_channel, like we do for streams.

njsmith added a commit to njsmith/trio that referenced this issue Oct 5, 2018

@njsmith

This comment has been minimized.

Member

njsmith commented Oct 5, 2018

Would it be helpful to have a way to explicitly mark a channel as broken, e.g. send_channel.poison()? (This name is taken from the CSP tradition – here's the earliest reference I can find in a quick look. Their version of "poison" also encompasses our send_channel.aclose, though.) Anyway, the point is, if a producer crashes, maybe you want to tell the other side that this wasn't a clean shutdown, so it can do... something.

While I'm here, also check out this mailing list post from 2002 with subject "Concurrency, Exceptions, and Poison", by the same author. It's basically about how exceptions and nurseries should interact.

@smurfix

This comment has been minimized.

Contributor

smurfix commented Oct 5, 2018

Yes, poison as an exception-on-the-other-side-raising alternate to closing would be useful.
(Maybe just add an appropriate argument to .close?)

@ziirish

This comment has been minimized.

ziirish commented Nov 3, 2018

Hello,

Would it be possible to document how we should replace Queue with memory_channel?
Maybe my usage of Queues is wrong, but I don't understand yet how I should use the memory_channel instead.

I'm currently using a Queue as a pool of objects. These objects are basically wrappers around subprocess.Popen so the idea is to have a pre-determined number of processes to parallelize operations up to queue-size.
I know there are CapacityLimiter which achieve something similar, but they only work for actual python code. Here I want to limit external processes operations.

The pseudo code looks like:

class MonitorPool:
    pool_size = 5
    pool = trio.Queue(pool_size)

    async def handle(self, server_stream: trio.StapledStream):
        # do something
        ident = next(CONNECTION_COUNTER)
        data = await server_stream.receive_some(8)
        async with self.get_mon(ident) as mon:
            response = mon.status(data)
        # do other things

    async def launch_monitor(self, id):
        mon = Monitor(ident=id)
        await self.pool.put(mon)

    async def cleanup_monitor(self):
        while not self.pool.empty():
            mon = await self.pool.get()  # noqa
            del mon

    @asynccontextmanager
    async def get_mon(self, ident) -> Monitor:
        mon = await self.pool.get()  # type: Monitor
        yield mon
        await self.pool.put(mon)

    async def run(self):
        async with trio.open_nursery() as nursery:
            for i in range(self.pool_size):
                nursery.start_soon(self.launch_monitor, i + 1)
        try:
            await trio.serve_tcp(self.handle, self.port, host=self.bind)
        except KeyboardInterrupt:
            pass

        async with trio.open_nursery() as nursery:
            nursery.start_soon(self.cleanup_monitor)
@njsmith

This comment has been minimized.

Member

njsmith commented Nov 5, 2018

@ziirish Oo, that's clever! If I were implementing a pool I probably would have reached for a Semaphore plus a set, but your way is probably simpler... I wonder if we should have a first-class pool object, since this pattern shows up in different places? I don't think I understand the variations well enough to design such a thing right now though, so that's just an idea for later.

Generally, to replace a Queue with a memory stream, you replace queue.put with send_channel.send, and queue.get with receive_channel.receive. It does make this use case a little more cumbersome, because now you have two objects to keep track of instead of just one, but once you do that then most things transfer over directly. So something like:

class MonitorPool:
    def __init__(self, pool_size=5):
        self.pool_size = pool_size
        self.send_channel, self.receive_channel = trio.open_memory_stream(pool_size)

    async def launch_monitor(self, id):
        mon = Monitor(ident=id)
        await self.send_channel.send(mon)

    @asynccontextmanager
    async def get_mon(self, ident) -> Monitor:
        mon = await self.receive_channel.receive()  # type: Monitor
        yield mon
        await self.pool.send_channel.send(mon)

The one thing that doesn't transfer over directly is cleanup_monitor, since channels don't have a .empty() method. But, I'm not quite sure how to translate this code, because I'm not sure what you're trying to do :-). Remember that Python is a garbage collected language; the way you free objects is by simply stopping referring to them. So when your MonitorPool object goes out of scope, then any Monitor objects inside it will be automatically freed. Your cleanup_monitor method does speed this up a bit by emptying out the queue, but the del mon line is effectively a no-op: all it does is undefine the local variable mon, but that will happen anyway as soon as the function exits. So if you do want to empty out the pool immediately, you don't really need to iterate over the individual objects; you just need to clear the internal buffer. And for memory channels, the easy way to do this is just to close the receive channel; this tells Trio that you're never going to call receive again, so it immediately discards all the data in the internal buffer.

Also, this code:

        async with trio.open_nursery() as nursery:
            nursery.start_soon(self.cleanup_monitor)

is basically a more complicated way of writing:

    await self.cleanup_monitor()

And it's generally nice to run cleanup code even if there's an exception... with blocks are good for this. In particular, you can use async with receive_stream to automatically call receive_stream.aclose() when exiting a block. So I'd probably drop the cleanup_monitor method and rewrite run as:

    async def run(self):
        async with self.receive_stream:
            async with trio.open_nursery() as nursery:
                for i in range(self.pool_size):
                    nursery.start_soon(self.launch_monitor, i + 1)
            try:
                await trio.serve_tcp(self.handle, self.port, host=self.bind)
            except KeyboardInterrupt:
                pass

BTW, the type annotation for server_stream should probably be trio.abc.Stream?

@njsmith

This comment has been minimized.

Member

njsmith commented Nov 5, 2018

Speaking of type annotations, I wonder if we should make the channel ABCs into parametrized types, like:

T_cov = TypeVar("T_cov", covariant=True)
T_contra = TypeVar("T_contra", contravariant=True)

class ReceiveChannel(Generic[T_cov]):
    async def receive(self) -> T_cov:
        ...

class SendChannel(Generic[T_contra]):
    async def send(self, obj: T_contra) -> None:
        ...

def open_memory_channel(max_buffer_size) -> Tuple[SendChannel[Any], ReceiveChannel[Any]]:
    ...

(For the variance stuff, see: https://mypy.readthedocs.io/en/latest/generics.html#variance-of-generic-types. I always get confused by this, so I might have it wrong...)

It might even be nice to be able to request a type-restricted memory channel. E.g. @ziirish might want to do something like:

s, r = open_memory_channel[Monitor](pool_size)
# or maybe:
s, r = open_memory_channel(pool_size, restrict_type=Monitor)

Ideally this would both be enforced at runtime (the channels would check for isinstance(obj, Monitor) and raise an error if it didn't match), and statically (mypy infers that the return type is Tuple[SendStream[Monitor], ReceiveStream[Monitor]]). ...I think?

Complications:

Runtime and static types are kind of different things. They overlap a lot, but I don't actually know if there's any way to take an arbitrary static type and check it at runtime? And PEP 563 probably affects this somehow too... It's possible we might want both a way to pass in a runtime type (and have the static type system automatically pick this up when possible), and also a purely static way to tell the static type system what type we want it to enforce for these streams without any runtime overhead?

Even if we ignore the runtime-checking part, I don't actually know whether there's any standard syntax for a function like this in python's static type system. Maybe it would need a mypy plugin regardless? (Of course, we're already talking about maintaining a mypy plugin for Trio, see python/mypy#5650, so this may not be a big deal.)

Of course for a pure static check you can write something ugly like:

s, r = cast(Tuple[SendStream[Monitor], ReceiveStream[Monitor]], open_memory_stream(pool_size))

but that's awful.

@njsmith

This comment has been minimized.

Member

njsmith commented Nov 5, 2018

Actually, it is possible to do something like restrict_type=<Python class object> and have mypy understand it. It's a bit awkward:

@overload
def open_memory_channel() -> Tuple[SendChannel[Any], ReceiveChannel[Any]]:
    pass

@overload
def open_memory_channel(*, restrict_type: Type[T]) -> Tuple[SendChannel[T], ReceiveChannel[T]]:
    pass

def open_memory_channel(*, restrict_type=object):
    ...

reveal_type(open_memory_channel())  # Tuple[SendChannel[Any], ReceiveChannel[Any]]
reveal_type(open_memory_channel(restrict_type=int))  # Tuple[SendChannel[int], ReceiveChannel[int]]

You would think you could write:

def open_memory_channel(*, restrict_type: Type[T]=object):
    ...

but if you try then mypy gets confused because it checks the default value against the type annotation before doing generic inferencing (see python/mypy#4236).

One annoying issue with this is that it doesn't support restrict_type=(A, B), which is the standard way to do runtime union type checks.

Also, you can't express types like List[str] or Tuple[int, int, str], or do zero-runtime-overhead type checks. It actually is possible to make both of these work:

open_memory_channel(...)
open_memory_channel[int](...)

BUT AFAICT you can't make them work at the same time.

The way you make open_memory_channel[int](...) work is a gross hack. What it does is exploit the fact that there is a special fn[type](...) syntax specifically for calling the constructors of generic classes. So we can do:

class open_memory_channel(Tuple[SendChannel[T], ReceiveChannel[T]]):
    def __new__(self, max_buffer_size):
        return (SendChannel[T](), ReceiveChannel[T]())

    # Never called, but must be provided, with the same signature as __new__
    def __init__(self, max_buffer_size):
        assert False

# This is basically a Tuple[SendChannel[int], ReceiveChannel[int]]
reveal_type(open_memory_channel[int](0))
# This is Tuple[SendChannel[<nothing>], ReceiveChannel[<nothing>]], and you have to start throwing
# around manual type annotations to get anything sensible
reveal_type(open_memory_channel(0))

Did I mention it was gross? It's pretty gross.

Now... in this version, a bare open_memory_channel(0) does work fine at runtime, it's just that if you're using mypy, that bare open_memory_channel(0) isn't properly inferred to be a channel that can send any type of object, and you're forced to write open_memory_channel[Any](0) if that's what you want. I guess that might not be too terrible, for folks who are taking the trouble to use mypy?

@ziirish

This comment has been minimized.

ziirish commented Nov 5, 2018

Thanks @njsmith for the explanations.
I'll rework my Pool then and I could send you a PR to implement such a primitive based on memory_channel.

But the question then will be about this:

For channels that allow buffering, it's theoretically possible for values to get "lost" without there ever being an exception (send puts it in the buffer, then receiver never takes it out).

The purpose of the pool is to be always filled with objects and then to block/pause the execution of the job when it is empty. So at the end of your job, the pool should be "full" resulting in "lost" data.
I guess the Pool primitive should just ignore the BrokenResource exception then.

@belm0

This comment has been minimized.

Member

belm0 commented Nov 7, 2018

I see that type annotations are used in a few spots already. @njsmith would you accept a PR to add type annotation to open_memory_channel() return value?

./trio/_subprocess/linux_waitpid.py:async def _task(state: WaitpidState) -> None:
./trio/_subprocess/linux_waitpid.py:async def waitpid(pid: int) -> Any:
./trio/_subprocess/unix_pipes.py:    def __init__(self, pipefd: int) -> None:
./trio/_subprocess/unix_pipes.py:    def fileno(self) -> int:
./trio/_subprocess/unix_pipes.py:    async def wait_send_all_might_not_block(self) -> None:
./trio/_subprocess/unix_pipes.py:    async def receive_some(self, max_bytes: int) -> bytes:
./trio/_subprocess/unix_pipes.py:async def make_pipe() -> Tuple[PipeSendStream, PipeReceiveStream]:
./trio/_util.py:def fspath(path) -> t.Union[str, bytes]:
@njsmith

This comment has been minimized.

Member

njsmith commented Nov 7, 2018

@belm0 What annotation are you planning to use? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment