More informative Queue class: new method that returns number of unfinished tasks #73789

slytomcat · 2017-02-20T09:12:24Z

BPO	29603
Nosy	@rhettinger, @serhiy-storchaka, @slytomcat
PRs	bpo-29603 New Queue.unfinished() method #192

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2017-02-22.17:54:15.248>
created_at = <Date 2017-02-20.09:12:24.190>
labels = ['3.7', 'type-feature', 'library']
title = 'More informative Queue class: new method that returns number of unfinished tasks'
updated_at = <Date 2017-02-22.17:54:15.247>
user = 'https://github.com/slytomcat'

bugs.python.org fields:

activity = <Date 2017-02-22.17:54:15.247>
actor = 'slytomcat'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2017-02-22.17:54:15.248>
closer = 'slytomcat'
components = ['Library (Lib)']
creation = <Date 2017-02-20.09:12:24.190>
creator = 'slytomcat'
dependencies = []
files = []
hgrepos = []
issue_num = 29603
keywords = []
message_count = 11.0
messages = ['288189', '288261', '288269', '288277', '288294', '288305', '288306', '288307', '288357', '288376', '288377']
nosy_count = 3.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'slytomcat']
pr_nums = ['192']
priority = 'normal'
resolution = None
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue29603'
versions = ['Python 3.7']

slytomcat · 2017-02-20T09:12:23Z

Class queue.Queue control the number of unfinished tasks via method task_done(). But it is only possible to get the information about all task done (via join() method).
I'm sure that exposing the number of unfinished tasks (unfinished_tasks class variable) can be very useful in many situations when you need more control over the process.

But it is not good idea to provide write access to this internal variable (as it controls internal queue class status). Better way - provide RO access via class method like qsize or empty. It can be look like this:

    def unfinished(self):
        return self.unfinished_tasks

One example of this method usage: there is not optimal function _adjust_thread_count in concurrent.futures.ThreadPoolExecutor with following comment:
# TODO(bquinlan): Should avoid creating new threads if there are more
# idle threads than items in the work queue.

It can be easily done with following condition:

    if self._work_queue.unfinished() <= len(self._threads):
        return

rhettinger · 2017-02-21T03:56:52Z

I'm sure that exposing the number of unfinished tasks
(unfinished_tasks class variable) can be very useful in many situations

Sorry, but I don't share your certainty of the usefulness of this method. Since the task_done/join API was added many years ago, I've not seen any use cases arise where we needed to quantify the number of unfinished tasks. Have you seen an actual need in real code or is this PR more of an intuitive guess that the extra method might be useful?

In the examples of task_done/join that I've seen, the number of unfinished tasks is typically in the range: qsize() plus between zero and the number of parallel consumers. Where the actual value it falls this range doesn't seem very useful. One the queue is empty, join() just waits to the parallel consumers to complete their one final task that is already under way.

Also, I'm reluctant to expand the API for several reasons. Keeping it small makes it more intelligible (giving users additional options for things theh rarely need makes it more difficult to make correct decisions in the common cases). Also, it would be nice to avoid a ripple effect into the other APIs such as multiprocessing which are supersets of this API. And lastly, I'm disinclined to have another informational method like empty, full, and qsize which have to be documented as potentially introducing unexpected race conditions in user code (all three methods return information that may be completely out-of-date or wrong by the time the caller sees the result).

serhiy-storchaka · 2017-02-21T05:41:23Z

I concur with Raymond.

slytomcat · 2017-02-21T08:25:26Z

Raymond, Serhiy, thanks for your opinions.

I agree that this method like empty, full, and qsize returns information that may be out-of-date in time of its usage.

But like those empty, full, and qsize it provides information that helps to make some decisions.

What I disagree that this information can be correctly estimated based on qsize and number of parallel consumers as there is no information what exactly consumers do at the moment. Such estimation will be absolutely useless for a decision making.

In the example (see https://github.com/slytomcat/cpython/commit/ea313986d86c083e81624fbc8a7ab6a75784e15d) we need to estimate the number of unfinished tasks and comparing it to the number of active threads we can decide: should we create a new thread or not. There is no reason to wait until all tasks done (via join) we need to make decision in working conditions.

Actually I've implemented this solution (Queue.unfinished + modified concurrent.futures.ThreadPoolExecutor) in my project (see https://github.com/slytomcat/yandex-disk-client) and it works perfectly: the number of working threads is increasing only on massive flow of tasks, while weak flow of tasks keeps low number of working threads.
And even when Queue.unfinished gives wrong (outdated) information, it can lead to creation one new thread that is not really required at the moment. That is not a big mistake, I suppose (comparing to creation all N threads during placing first N tasks in queue without considering the volume of concurrent tasks).

rhettinger · 2017-02-21T13:09:24Z

Should avoid creating new threads if there are more
# idle threads than items in the work queue.

Shouldn't this just check to see if qsize() is greater than zero? That would mean that there are no idle threads. I'm not seeing why there would be any idle threads if there is work in the queue (the queue unblocks idle threads).

I don't think "unfinished() - num_threads" makes sense. The meaning of "unfinished() - qsize()" is the number of non-idle worker threads (work is taken out of the queue but has not yet reported that the task is done).

slytomcat · 2017-02-21T15:38:22Z

Not exactly.... there are 3 cases:

If qsize <> 0 it means there is no idle consumers threads, all of them must be busy: we need to create one more. No doubt.

If qsize = 0 it means one of two cases:

all consumers threads are busy: we need to create one more
some or all consumers threads are idle: no need create new one.

But there is no way to distinguish two last cases.

That's why I consider that (unfinished() <= num_threads) gives clear key for decision.

slytomcat · 2017-02-21T15:55:12Z

One more problem that adjusting of number of threads is performed exactly after placing the new task in the queue. In in some cases we can find that qsuze <> 0 but it doesn't mean that there is no idle threads. It can be equal to 1 (just queued task) as no threads manage to get it yet (don't forgot about GIL).

So even qsuze <> 0 dosn't mean that we need to create a new thread.

But (unfinished() <= num_threads) works perfect in this case also.

slytomcat · 2017-02-21T15:59:50Z

num_threads - unfinished() = the estimation for number of idle threads.

rhettinger · 2017-02-22T14:01:17Z

And even when Queue.unfinished gives wrong (outdated) information,
it can lead to creation one new thread that is not really required
at the moment. That is not a big mistake, I suppose (comparing to
creation all N threads during placing first N tasks in queue
without considering the volume of concurrent tasks).

IMO, this isn't worth expanding the Queue API. The thread pool concept explicitly allows up to poolsize threads and occasionally having one extra thread within the allowed poolsize isn't important. Also, it is unclear why a new thread is being launched if there in nothing in the work queue (unfinished tasks is only interesting when qsize==0).

For other users, the new method is problematic because it is distracting and prone to misuse (potentially out-of-date upon return and potentially meaningless if task_done isn't being used). I believe the new method will cause more problems than it fixes.

Also, I really don't like the mission creep. Queue objects are primarily about managing a queue of inputs. Monitoring and managing consumers is another (mostly orthogonal) realm.

slytomcat · 2017-02-22T17:51:42Z

I can't fully agree with this:

Queue objects are primarily about managing a queue of inputs.
Monitoring and managing consumers is another (mostly orthogonal) realm.

Monitoring of consumers is already added via task_done() and join() methods. At least this two methods allows to understand that all consumers are in idle state.
The unfinished() - is just functional extension to this existing functionality.

For other users, the new method is problematic because it is distracting
and prone to misuse (potentially out-of-date upon return and potentially
meaningless if task_done isn't being used). I believe the new method
will cause more problems than it fixes.

This sounds reasonable because I also understand that this method is not usable in all cases as in most cases task_done is not used.

Probably my idea for method unfinished is really not so good...

Actually I've manage to find unpublished Queue.unfinished_tasks variable just in several minutes when I tried to find solution for threaded PoolExecutor. Hope that any curious developer can find it too.

I don't mind if you close this CR.

Thanks for pleasant discussion.

slytomcat · 2017-02-22T17:54:15Z

I've closed pull request on GitHub.

slytomcat mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 20, 2017

rhettinger self-assigned this Feb 21, 2017

slytomcat mannequin closed this as completed Feb 22, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More informative Queue class: new method that returns number of unfinished tasks #73789

More informative Queue class: new method that returns number of unfinished tasks #73789

slytomcat mannequin commented Feb 20, 2017

slytomcat mannequin commented Feb 20, 2017

rhettinger commented Feb 21, 2017

serhiy-storchaka commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

rhettinger commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

rhettinger commented Feb 22, 2017

slytomcat mannequin commented Feb 22, 2017

slytomcat mannequin commented Feb 22, 2017

More informative Queue class: new method that returns number of unfinished tasks #73789

More informative Queue class: new method that returns number of unfinished tasks #73789

Comments

slytomcat mannequin commented Feb 20, 2017

slytomcat mannequin commented Feb 20, 2017

rhettinger commented Feb 21, 2017

serhiy-storchaka commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

rhettinger commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

slytomcat mannequin commented Feb 21, 2017

rhettinger commented Feb 22, 2017

slytomcat mannequin commented Feb 22, 2017

slytomcat mannequin commented Feb 22, 2017