New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParkingLot and Queue are embarrassingly slow #272

Closed
njsmith opened this Issue Aug 8, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@njsmith
Member

njsmith commented Aug 8, 2017

@sorcio sent a small benchmark that I don't think I can share publically, but that implements a simple message broker: clients can connect and say "send this blob of data to channel X" or "please send me the data from channel X", and it brokers between them. His initial version used a trio.Queue for message passing and a trio.hazmat.ParkingLot to implement a little Event-like object. It turned out that some ridiculous proportion of runtime was going into these, like when I dropped in a some simple non-fair versions using a set to track sleeping tasks, the overall execution got 2x faster.

Looking at ParkingLot, I think at least for now we should be able to make it much faster without losing any features, by replacing the SortedDict-of-tickets with an OrderedDict-of-tasks. The trick to getting rid of the tickets is that when we requeue, we can go modify task._abort_fn directly. (Hey, we're in _core, might as well make use of it!) This means that requeuing tasks from one lot to another would put them behind the tasks that were waiting there rather than preserving the global order, but I think that's fine, maybe even better than the current system. This also means if we wanted to add tags to sleeping tasks to implement task-fair RWLocks, we could do that for "free", by storing it in the value entry in the OrderedDict (otherwise the value would just be None).

I'm not sure if switching to a shared global ParkingLot API would be better or not. It would allow us to skip allocating a ParkingLot entirely for uncontended objects – like a Queue(1) that just gets used once wouldn't need a put lot at all. OTOH with a global object we'd have to jump through several dict lookups each time to find it. It likely doesn't make a big difference either way, but it's probably worth benchmarking at some point.

I'm not sure if the Queue issue is the same or different – Queue obviously depends on ParkingLot, but has a somewhat overcomplicated implementation on top of that using a bunch of semaphores and stuff.

We could redo Queue as:

  • put: if there's a get waiting, hand it the object; otherwise, immediately append to deque, and if over capacity, go to sleep.
  • get: if there's data in the queue, take it and wake a put; otherwise, go to sleep and wait to be handed some data.

Downside: this requires ParkingLot grow back the ability to hand off an object to the woken task. I think this is OK? Upside: in this model, Queue(0) automatically works correctly. Neutral: this actually give even stricter FIFO fairness than we have currently -- right now if G1 and G2 block in get, and then P1 and P2 call put in quick succession so that G1 and G2 are both woken up on the same tick, then it's possible for G1 to get P2's object and G2 to get P1's. This is only possible when G1 and G2 wake up on the same tick though, so I don't know that it matters – what we have now is not strict FIFO, but that's not the same as fair and it'd be hard to say that what we have isn't fair. Actually, if we're OK with this kind of fairness, then maybe we can drop the requirement of handing off data through ParkingLot.unpark?

Maybe drop the whole join API too because ugh, it's this second specialized synchronization primitive bolted onto Queue that's rarely useful.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 8, 2017

OrderedDict trade-offs:

  • on cpython 3.5, collections.OrderedDict is the thing, and it's implemented in C, so that's cool
  • on cpython 3.6, collections.OrderedDict is the same as it was in 3.5, but now regular dict is ordered. The one problem with using regular dict though is that it doesn't have popitem(last=False), like OrderedDict does. You can fake it, but is it worth it?
  • pypy AFAICT in a quick look (should verify) is like cpython 3.6, dict is ordered and OrderedDict is something else entirely
@njsmith

This comment has been minimized.

Member

njsmith commented Aug 15, 2017

An advantage of doing immediate hand-off in Queue is that it would fix #63.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 15, 2017

...though we could also fix #63 by keeping a count of how many tasks have been unparked from get but haven't yet dequeued their value.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 15, 2017

The Queue algorithm above requires that the put lot use strict FIFO, to avoid weirdness where we wake up the wrong put task. (Like, if you have a capacity 0 queue, it should never be the case that a task is still asleep after its particular value has been returned by get, or that a task can wake up before its particular value was returned by get.) Though I guess we could also skip the lot and have a deque of put tasks.

Oh ugh, it also makes put cancellation like... impossible to support.

I guess we could use a queue that maps tasks to the queue values, though this is problematic for the tasks that have woken up (and particular if they want to queue another item!). So I guess it would have to be

self._queued = deque()
self._put_waiting = OrderedDict()

and get tries to pull from _queued, and if that works it refills from _put_waiting. Otherwise it tries to pull from _put_waiting directly, to support capacity=0, then if that doesn't work it sleeps.

Or... we can tweak it. get tries to move one item from _put_waiting to _queued, ignoring the capacity limit in _queued (because it will always try to remove an item from _queued, so even if it adds an item this can't cause its size to increase). Then it tries to remove an item from _queued. If that fails, it sleeps. When it wakes up, it decrements the _get_waking counter, and repeats the dequeue dance (which this time must succeed)?

Dunno, need to think about this when I'm mor eawake.

njsmith added a commit to njsmith/trio that referenced this issue Aug 15, 2017

Simplify and speed up ParkingLot
Use OrderedDict instead of SortedDict, and don't make park() pay for
repark().

See python-triogh-272 for context. This doesn't make ParkingLot as fast as a
version that just holds a set of tasks and does

  def unpark_all(self):
      while self._waiters:
          reschedule(self._waiters.pop())

  def unpark(self):
      if self._waiters:
          reschedule(self._waiters.pop())

but it's about half-way in between the old version and this minimal
version on the broker microbenchmark.
@njsmith

This comment has been minimized.

Member

njsmith commented Sep 5, 2017

This was written for other reasons, but it includes a sketch of how a more efficient Queue could work: https://gist.github.com/njsmith/40b7b7f65e5f433789153c7b668ce643

@njsmith

This comment has been minimized.

Member

njsmith commented Sep 8, 2017

See #321 for dropping Queue.join

sorcio added a commit to sorcio/trio that referenced this issue Mar 17, 2018

@sorcio sorcio referenced this issue Mar 17, 2018

Merged

New faster Queue with capacity >= 0 #473

2 of 2 tasks complete

sorcio added a commit to sorcio/trio that referenced this issue Mar 17, 2018

@belm0

This comment has been minimized.

Member

belm0 commented Aug 23, 2018

I haven't looked at the use case here deeply, but often I find that heapq is best for jobs initially trying to use OrderedDict.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 23, 2018

@belm0 in this case we need a simple FIFO queue, except with the ability to quickly delete arbitrary items (e.g. when a put gets cancelled). So OrderedDict's FIFO-plus-O(1)-random-access is kind of perfect in principle :-).

Both Queue and ParkingLot have been rewritten since the last comment here... maybe we can close it? @sorcio, do think this is resolved or is there more work to do...?

@sorcio

This comment has been minimized.

Contributor

sorcio commented Aug 23, 2018

Oh, right. Yes, this is resolved on my end. I’ve using the new queue implementation for a while. We might find room for improvement going forward but we’re far off “embarrassingly slow” territory :)

@sorcio sorcio closed this Aug 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment