Channels, a replacement for Queues #586

njsmith · 2018-07-30T09:02:54Z

(not ready to merge, but posting it now for discussion)

This is a possible answer to #497. Essentially, it rethinks how Queue
might work, based on Trio idioms instead of whatever someone stuck
into the stdlib 15 years ago. The main difference is that you call
open_channel and get back a put_channel and a get_channel, which
act like the two ends of a stream, and have their own closure states.
Also, you can clone() an endpoint to extend the closure tracking to
fan-in/fan-out scenarios. If you don't care about any of this, you can
use it exactly like our current Queue, except that you pass around
the put and get ends seperately. But it also allows for fancier things
like this fan-in example:

import trio

async def producer(put_channel):
    # We close our handle when we're done with it
    with put_channel:
        for i in range(3):
            await put_channel.put(i)

async def main():
    put_channel, get_channel = trio.open_channel(0)
    async with trio.open_nursery() as nursery:
        # We hand out clones to all the new producers, and then close the
        # original.
        with put_channel:
            for _ in range(10):
                nursery.start_soon(producer, put_channel.clone())
        # Prints the numbers [0, 1, 2], ten times each, in some order, and
        # then exits.
        async for value in get_channel:
            print(value)

trio.run(main)

Decisions in this first draft (not necessarily good ones):

Semantics generally modelled after SendStream/ReceiveStream
- When put handle is closed:
  - pending and future put() calls on the same handle immediately
    raise ClosedResourceError
  - if this was the last put handle:
    - further calls to get continue to drain any queued data
    - and then eventually start raising EndOfChannel
    - __anext__ raises StopAsyncIteration instead of EndOfChannel
- When get handle is closed:
  - pending and future get() calls on the same handle immediately
    raise ClosedResourceError
  - if this was the last get handle
    - pending and future puts immediately raise BrokenChannelError
Our handles do not automatically call close() on __del__
- don't want people to accidentally depend on GC
- issuing ResourceWarnings would be annoying for users who don't
  care about the close functionality
closing all get handles while there is still data queued up silently
discards the data -- but maybe it should cause the last
get_channel.close() to raise BrokenChannelError instead?

This "always_abort=" thing is kind of half-baked and needs more
review... there are more things that mess with _abort_func than I
realized, and I kind of hacked at them until they worked. For example,
it's kind of weird and non-obvious that if you use always_abort=True,
then

codecov · 2018-07-30T09:22:12Z

Codecov Report

Merging #586 into master will increase coverage by 15.34%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           master     #586       +/-   ##
===========================================
+ Coverage   83.97%   99.31%   +15.34%     
===========================================
  Files          94       96        +2     
  Lines       13001    11464     -1537     
  Branches      783      820       +37     
===========================================
+ Hits        10918    11386      +468     
+ Misses       2061       58     -2003     
+ Partials       22       20        -2

Impacted Files	Coverage Δ
trio/_core/__init__.py	`100% <ø> (ø)`	⬆️
trio/tests/test_highlevel_serve_listeners.py	`100% <100%> (+8.57%)`	⬆️
trio/_sync.py	`100% <100%> (+3.87%)`	⬆️
trio/_core/tests/test_windows.py	`100% <100%> (+6.25%)`	⬆️
trio/_core/_unbounded_queue.py	`100% <100%> (+2%)`	⬆️
trio/tests/test_channel.py	`100% <100%> (ø)`
trio/__init__.py	`100% <100%> (+54.92%)`	⬆️
trio/tests/test_sync.py	`100% <100%> (+6.07%)`	⬆️
trio/_abc.py	`100% <100%> (+10.9%)`	⬆️
trio/_core/_exceptions.py	`100% <100%> (+9.52%)`	⬆️
... and 73 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16b6447...d424aa9. Read the comment docs.

njsmith · 2018-07-30T17:16:29Z

Two things I'm having doubts about:

A chunk of the complexity here – and in particular, the whole motivation for the always_abort thing – is that we keep track of which pending calls came in through which cloned handle. An alternative would be to not do this, so that if you're blocked in put, and all put handles are closed, then you get woken up, but if just your put handle is closed, you don't notice. (If you don't use clone, the two options are identical.)
The fan-in example to me is a pretty compelling argument for supporting PutChannel.clone or something like it. The case for GetChannel.clone and GetChannel.close are much weaker. I'm not sure if supporting them is actually useful.

njsmith · 2018-08-12T01:10:26Z

I'm learning against supporting GetChannel.clone for now... in addition to being poorly motivated, there's a possible source of confusion: if I have two clones of the same GetChannel, and put one item into the channel, then does it get sent to both clones (fork) or just one of them (round-robin)?

I'm also being tempted to make GetChannel.close raise an error if it's closed while there are still pending items, and GetChannel.__del__ issue a warning in the same case. Maybe if we do this then we'd let people do get_channel.close(ignore_pending=True) when they want to intentionally throw away pending data and don't want to bother putting a try/except around things.

I'm also wondering if we should try making this API start out relatively minimal. I don't really want to add qsize and full and all that stuff that Queue has, it's mostly useless and encourages bad habits. I'm tempted to even remove put_nowait and get_nowait until someone has a use case.

Still not sure what to do about the always_abort thing. I don't like the way it works here. Some options:

Move the flag to reschedule instead of wait_task_rescheduled?
Give up on tracking the relationship between calls to put and PutHandle objects
Hold our noses and put up with the hacky version I started with, where the PutHandle._tasks set could be slightly out-of-sync with reality and PutHandle.close had to catch this and work around it.

And finally I am wondering about how to roll this out. The idea would be to eventually replace trio.Queue, which is heavily used. There are a lot of subtle design questions here (as you can see above), so it makes me nervous that this PR hasn't attracted any feedback yet :-). Maybe the way to do it is to have a release where we advertise channels as provisional, without yet deprecating trio.Queue, and then hassle folks to try it out and give feedback? (And then if it passes, the next release could deprecate trio.Queue in favor of channels.)

smurfix · 2018-08-12T06:00:46Z

put_nowait is very useful when called from an asyncio callback.
get_nowait also has its use; during connection shutdown one might want to drain the queue and log the pending messages without introducing cancel points.

njsmith · 2018-08-12T07:16:51Z

put_nowait is very useful when called from an asyncio callback.

Do you mean the queue that trio-asyncio uses to send requests back to the main asyncio loop dispatcher? I'm not 100% sure if that's in scope or not, since it really should be using an unbounded queue... but then I've also been wondering if we should go ahead and allow for unbounded channels, e.g. by explicitly passing capacity=inf. (The example that's got me thinking this: consider a web crawler that keeps a queue of URLs to crawl. Each worker takes a URL off the queue, fetches it, parses the page, and pushes the resulting URLs back onto the queue to get to later. We put a limit on how many workers run at once. That's enough to produce deadlocks with any finite queue capacity, if the queue fills up so no workers can push new URLs onto it, so they stop pulling items off the queue...) So fair enough.

get_nowait also has its use; during connection shutdown one might want to drain the queue and log the pending messages without introducing cancel points.

Yeah, mayyybe... this kind of consideration is why I'm more likely to keep get_nowait than full or task_done or whatever. But I'm not convinced yet that what you said is something that really happens – like if you're doing a graceful shutdown, you want to signal that to the producer and then let it finish draining the queue until it gets EndOfQueue. "All the items that are currently in the queue" is not really a reliable concept because of race conditions. And if you've decided to cancel everything and abandon the queue without waiting for EndOfQueue, then there's not much point in worrying about this.

I guess get_nowait is potentially useful for trio-asyncio too, if you want to make sure you do one tick of the asyncio loop per tick of the trio loop...

smurfix · 2018-08-12T07:26:16Z

Do you mean the queue that trio-asyncio uses to send requests back to the main asyncio loop dispatcher?

No. Just your garden variety asyncio receive-data callback when you're hooking into a transport+protocol.

That send-back queue is a prime candidate for capacity=inf though.

njsmith · 2018-08-18T07:38:58Z

Some discussion of what the always_abort thing is trying to do here: https://gitter.im/python-trio/general?at=5b77863e802bc42c5f356669

Interesting realization: the fundamental problem is that when there are multiple ways that a task can wake back up, it can be difficult for one path to know how to clean up the state used by the other paths. In this case, put_handle.put can be woken by either get_handle.get or put_handle.close. Other examples where I've brushed against this are similar: Condition.wait / ParkingLot.repark involves a task that goes to sleep on one queue and then later gets moved to another, so it's challenging for its abort_fn to keep track of which queue it should be aborted from. #242 discusses the case where we sleep on an arbitrary collection of different events, and then of course we need something clever indeed to keep track of all the different wakeup paths.

I don't think always_abort is really working. In that discussion though I realized a much simpler hack that would work for both this case, and for ParkingLot.repark, is to add a bit of state attached to the task object that the sleeper and waker are free to use to pass data back and forth. Maybe task.sleep_state, which as far as the trio core is concerned is meaningless except that it gets reset to None every time a task is rescheduled. Of course this is also potentially an argument for a more structured solution like #242, but we don't need to solve all that right now...

For discussion see: python-trio#586 (comment)

njsmith · 2018-08-22T09:12:00Z

Okay, gave up on always_abort, rebased, and redid the wakeup stuff based on #616. This is much simpler.

Also removed GetChannel.clone, though the infrastructure to support it is still there. Should refactor to simplify this.

Added some initial tests, including a version of the fan-in example at the top of this thread. Weirdly enough, they seem be passing.

Still need a bunch more tests, docs, statistics and repr.

mehaase · 2018-09-18T02:33:00Z

There are a lot of subtle design questions here (as you can see above), so it makes me nervous that this PR hasn't attracted any feedback yet :-).

I just experimented with the channels implementation (from commit abca8bd) as a closeable message queue for python-trio/trio-websocket#11. My use case is pretty simple, and this API is perfect for it. (I'm afraid I don't really understand the always_abort, cloning, or other design discussions.)

njsmith · 2018-09-28T10:53:24Z

I played around a bit with what it would take to implement my prototype channel interfaces across a process boundary, on top of a Stream. It's actually very simple (e.g. send(obj) pickles the object, adds some framing, and sends it down the pipe, receive() reverses that; it's even quite easy to support clone()).

This is much simpler if we standardize on BrokenResourceError type instead of having separate types BrokenStreamError + BrokenChannelError (#620).

The places where the APIs don't quite line up are:

We can't reasonably implement nowait methods on top of a generic Stream which might have internal buffering etc. But this doesn't seem like a big deal; either the IPC channel just doesn't have those methods, or we can implement versions that just raise WouldBlock.
Making send atomic-wrt-cancellation: this is natural for in-memory channels (either send completes or it doesn't). For inter-process channels, it's much trickier, because the cancellation might arrive at a moment when you're only half-way through transferring the object through the pipe, and now what? Basically you don't have any option other than to abort the send unfinished, and now your pipe has lost synchronization and is unusable. (I guess if you can afford a background task to pump the pipe, then you could buffer one object, and make it atomic-wrt-cancellation again. But that's a pretty substantial API change. OTOH I guess if you have a full-fledged multiprocessing-esque library with process pools and cleverness to let you pass channel endpoints across processes and all that, then it might be fine.)
For an in-process channel, close is naturally synchronous. For an inter-process channel, it has to be async.

The first two don't seem like dealbreakers, or things we necessarily need to figure out right now. But that's what makes the last one tricky :-).

We could make close synchronous for now, and then if later we decide to make abstract trio.abc.{Send,Receive}Channel interfaces we could add an aclose method too at that point. The trio.testing in-process Stream classes have a synchronous close method in addition to the async aclose that's required by the Stream interface; you just use aclose in generic code and close in code that knows it has a concrete class.

Or, if our guess is that inter-process channels will be important, then we could make it aclose from the start, even for in-memory streams. Again, if we decide later that this was a mistake, we can add a synchronous close method. Or even deprecate aclose, if we do it soon enough... I guess if we're wrong about inter-process channels being important, then we'll know that before 1.0.

Any thoughts? As a user, would it annoy you to have to do await channel.aclose() when you know you're working with an in-process synchronization primitive whose close method could just as well be synchronous, and is just inserting a superfluous checkpoint? Would you value being able to take code written to use channels within a process and switch it to speaking across processes with the same API?

Discovered while writing the docs that not having it is really confusing and hard to justify.

njsmith · 2018-10-04T10:56:38Z

Done, maybe

I think the code, tests, docs, and newsfragments are all ready here. Anyone want to give reviewing it a try? The diff is large, but even just reading through the new docs and docstrings would be helpful!

To summarize:

New abstract base classes trio.abc.SendChannel and trio.abc.ReceiveChannel. The former has send and send_nowait methods; the latter has receive and receive_nowait and __aiter__. They both have clone and aclose.
New concrete constructor trio.open_memory_channel(max_buffer_size), which returns a (SendChannel, ReceiveChannel) pair.
Detailed examples in the docs to illustrate why closing is useful, why cloning is useful, and how buffering works.
Deprecated trio.Queue and trio.hazmat.UnboundedQueue.
I have a list of possible follow-up tweaks to think about, currently stashed in a comment at the top of trio/_channel.py. These should move into a new follow-up issue before merging.

Fuyukai · 2018-10-04T11:09:56Z

From the code it seems clear that sending does a round-robin send to all the tasks, but the docs don't seem to specify this. Seems like it would be useful to include.

oremanj

This looks great; just some suggestions for documentation cleanup.

docs/source/reference-core.rst

oremanj · 2018-10-04T18:21:44Z

trio/_channel.py

+# underlying stream when all SendChannels are closed.
+
+# to think about later:
+# - max_buffer_size=0 default?


I'd support this - it seems to go hand-in-hand with the docs saying "if in doubt, use 0".

trio/_core/_unbounded_queue.py

trio/_sync.py

trio/tests/test_channel.py

docs/source/reference-core.rst

newsfragments/497.feature.rst

trio/_channel.py

trio/_abc.py

trio/tests/test_channel.py

njsmith · 2018-10-05T07:14:42Z

Thanks everyone for the detailed comments, you caught a ton of stuff I missed :-).

I think they've all been addressed now.

@Fuyukai I tried to clarify this in a few places, but not quite sure how to do it best... would you mind taking another look, and if you think it could be clearer let me know where you expected to see it?

pquentin · 2018-10-05T07:54:38Z

Do you want to move the follow-up tweaks into a new issue before I hit merge?

Now immortalized in python-triogh-719

njsmith · 2018-10-05T08:34:10Z

Do you want to move the follow-up tweaks into a new issue before I hit merge?

Done (see #719), plus a few more tweaks I noticed.

I'd still like to hear @Fuyukai (or anyone's :-)) thoughts on whether/how to make it clear that receivers alternate rather than each getting their own copy of incoming objects, but I suppose that can always be handled in a follow-up PR if necessary.

pquentin · 2018-10-05T08:42:15Z

I think that the new version is clear enough because you 1/ switched from fan-in/fan-out to producers/consumers and 2/ called this out explicitly in the docs.

But yeah, I think further improvements can go in another PR.

pquentin · 2018-10-05T09:18:25Z

Okay, let's do this! ✨

docs/source/reference-core.rst

njsmith · 2018-10-05T11:07:06Z

Whoops, well spotted. I'm going to bed now though. Perhaps someone will fix it will I'm asleep :-)

…

On Fri, Oct 5, 2018, 03:22 Davide Rizzo ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In docs/source/reference-core.rst <#586 (comment)>: > +You can disable buffering entirely, by doing +``open_memory_channel(0)``. In that case any task calls +:meth:`~trio.abc.SendChannel.send` will wait until another task calls +`~trio.abc.SendChannel.receive`, and vice versa. This is similar to +how channels work in the `classic Communicating Sequential Processes +model <https://en.wikipedia.org/wiki/Channel_(programming)>`__, and is +a reasonable default if you aren't sure what size buffer to use. +(That's why we used it in the examples above.) + +At the other extreme, you can make the buffer unbounded by using +``open_memory_channel(math.inf)``. In this case, +:meth:`~trio.abc.SendChannel.send` *always* returns immediately. +Normally, this is a bad idea. To see why, consider a program where the +producer runs more quickly than the consumer:: + +.. literalinclude:: reference-core/channels-backpressure.py Double colon above so literalinclude is not executed — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#586 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOaNiXjbCFOCCcAhtqTDgM65pcgcPfks5uhzL7gaJpZM4VmCTL> .

njsmith force-pushed the channels branch from e145a9c to 045cd60 Compare July 30, 2018 09:03

njsmith mentioned this pull request Jul 30, 2018

Add the ability to close a queue. #573

Closed

njsmith mentioned this pull request Aug 18, 2018

Discussion: what's the best strategy for "nowait" and "select" operations? #242

Open

njsmith added a commit to njsmith/trio that referenced this pull request Aug 21, 2018

Add Task.custom_sleep_data

98c38ce

For discussion see: python-trio#586 (comment)

njsmith mentioned this pull request Aug 21, 2018

Add Task.custom_sleep_data #616

Merged

njsmith force-pushed the channels branch from 045cd60 to abca8bd Compare August 22, 2018 09:12

njsmith mentioned this pull request Aug 29, 2018

WebSocketConnection should implement the trio.abc.AsyncResource interface python-trio/trio-websocket#3

Closed

mehaase mentioned this pull request Sep 4, 2018

connection _close_message_queue() implementation python-trio/trio-websocket#11

Closed

njsmith mentioned this pull request Sep 21, 2018

Do a better job of communicating the problems with using nurseries/cancel scopes inside generators #638

Open

njsmith force-pushed the channels branch from abca8bd to 215ac6d Compare September 26, 2018 10:06

njsmith mentioned this pull request Sep 28, 2018

Make it easier to farm out work #685

Closed

njsmith mentioned this pull request Sep 28, 2018

Finalize exception names: BrokenResourceError? BusyResourceError? #620

Closed

njsmith added 8 commits September 30, 2018 22:11

Add trio.open_channel

1409c7a

Drop some TODOs that aren't on the critical path

87b3c5d

yapf

f22bd4b

Simplify now that we don't support cloning get channels

58b8d46

hack hack

bdc5092

Tests pass

5006e3a

Add channel-based lock tests

6468e28

Get channel tests to 100% coverage

c635108

njsmith added 4 commits October 4, 2018 02:40

Add ReceiveChannel.clone

6a86490

Discovered while writing the docs that not having it is really confusing and hard to justify.

Fix test failure on windows

0ee6317

Add tests for ReceiveChannel.clone

17242ba

Finish writing channel docs

c5a8b5c

njsmith changed the title ~~[rfc, wip] Channels, a potential replacement for Queues~~ Channels, a replacement for Queues Oct 4, 2018

Add another followup reminder

48b3058

oremanj approved these changes Oct 4, 2018

View reviewed changes

pquentin approved these changes Oct 5, 2018

View reviewed changes

Address review comments

55eac43

njsmith added 3 commits October 5, 2018 00:16

yapf

9600a1c

Add forgotten files

177bc0c

Fix sphinx link

a9b95ef

njsmith mentioned this pull request Oct 5, 2018

Fine-tuning channels #719

Open

njsmith added 3 commits October 5, 2018 01:24

Remove working notes

c23dcba

Now immortalized in python-triogh-719

Fix another stray reference to queues

ad74f70

Document channel .statistics(), and add open_receive_channels stat

d424aa9

pquentin merged commit a1d2dbd into python-trio:master Oct 5, 2018

njsmith deleted the channels branch October 5, 2018 10:09

sorcio reviewed Oct 5, 2018

View reviewed changes

docs/source/reference-core.rst Show resolved Hide resolved

pquentin mentioned this pull request Oct 5, 2018

docs: fix Sphinx errors in Channel docs #721

Merged

goodboy mentioned this pull request Nov 30, 2018

Port to trio.open_memory_channel() goodboy/tractor#49

Closed

njsmith mentioned this pull request Oct 10, 2019

Names for communication ABCs #1208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channels, a replacement for Queues #586

Channels, a replacement for Queues #586

njsmith commented Jul 30, 2018 •

edited

Loading

codecov bot commented Jul 30, 2018 •

edited

Loading

njsmith commented Jul 30, 2018

njsmith commented Aug 12, 2018

smurfix commented Aug 12, 2018 •

edited

Loading

njsmith commented Aug 12, 2018

smurfix commented Aug 12, 2018 •

edited by njsmith

Loading

njsmith commented Aug 18, 2018

njsmith commented Aug 22, 2018

mehaase commented Sep 18, 2018

njsmith commented Sep 28, 2018

njsmith commented Oct 4, 2018

Fuyukai commented Oct 4, 2018

oremanj left a comment

oremanj Oct 4, 2018

njsmith commented Oct 5, 2018

pquentin commented Oct 5, 2018

njsmith commented Oct 5, 2018

pquentin commented Oct 5, 2018

pquentin commented Oct 5, 2018

njsmith commented Oct 5, 2018 via email

Channels, a replacement for Queues #586

Channels, a replacement for Queues #586

Conversation

njsmith commented Jul 30, 2018 • edited Loading

codecov bot commented Jul 30, 2018 • edited Loading

Codecov Report

njsmith commented Jul 30, 2018

njsmith commented Aug 12, 2018

smurfix commented Aug 12, 2018 • edited Loading

njsmith commented Aug 12, 2018

smurfix commented Aug 12, 2018 • edited by njsmith Loading

njsmith commented Aug 18, 2018

njsmith commented Aug 22, 2018

mehaase commented Sep 18, 2018

njsmith commented Sep 28, 2018

njsmith commented Oct 4, 2018

Done, maybe

Fuyukai commented Oct 4, 2018

oremanj left a comment

Choose a reason for hiding this comment

oremanj Oct 4, 2018

Choose a reason for hiding this comment

njsmith commented Oct 5, 2018

pquentin commented Oct 5, 2018

njsmith commented Oct 5, 2018

pquentin commented Oct 5, 2018

pquentin commented Oct 5, 2018

njsmith commented Oct 5, 2018 via email

njsmith commented Jul 30, 2018 •

edited

Loading

codecov bot commented Jul 30, 2018 •

edited

Loading

smurfix commented Aug 12, 2018 •

edited

Loading

smurfix commented Aug 12, 2018 •

edited by njsmith

Loading