Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unclear documentation on Queue.qsize() #71811

Closed
DougHoskisson mannequin opened this issue Jul 26, 2016 · 16 comments
Closed

unclear documentation on Queue.qsize() #71811

DougHoskisson mannequin opened this issue Jul 26, 2016 · 16 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@DougHoskisson
Copy link
Mannequin

DougHoskisson mannequin commented Jul 26, 2016

BPO 27624
Nosy @rhettinger, @vstinner, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2016-07-29.06:22:11.042>
created_at = <Date 2016-07-26.13:00:53.364>
labels = ['docs']
title = 'unclear documentation on Queue.qsize()'
updated_at = <Date 2016-07-29.14:13:06.791>
user = 'https://bugs.python.org/DougHoskisson'

bugs.python.org fields:

activity = <Date 2016-07-29.14:13:06.791>
actor = 'r.david.murray'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2016-07-29.06:22:11.042>
closer = 'rhettinger'
components = ['Documentation']
creation = <Date 2016-07-26.13:00:53.364>
creator = 'Doug Hoskisson'
dependencies = []
files = []
hgrepos = []
issue_num = 27624
keywords = []
message_count = 16.0
messages = ['271362', '271386', '271404', '271405', '271409', '271415', '271416', '271459', '271460', '271467', '271468', '271469', '271470', '271471', '271610', '271626']
nosy_count = 5.0
nosy_names = ['rhettinger', 'vstinner', 'r.david.murray', 'docs@python', 'Doug Hoskisson']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue27624'
versions = ['Python 3.5']

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 26, 2016

The documentation for Queue.qsize():

"Return the approximate size of the queue."

"approximate" is unclear. It might suggest some strategy used for approximating, or it might be the exact size at an arbitrary time.
It should be made more clear.

@DougHoskisson DougHoskisson mannequin added the docs Documentation in the Doc dir label Jul 26, 2016
@bitdancer
Copy link
Member

Since we're talking about multi-threaded operations, the concept of "exact size at an arbitrary time" isn't operationally different from "a strategy used for approximating". The subsequent text clarifies what "approximately" means operationally. Specifying it further would be, I think, overspecification.

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 26, 2016

Some strategies for approximating might report a size the the queue has never been and never will be. For example, a strategy could gather data and find the size is increasing at some rate, and approximate based on that rate, but then the rate of increase changes before it reaches the approximated size. That's the kind of thing that "approximate" would suggest to some people.

@bitdancer
Copy link
Member

What if we just replaced the period with a colon? That is, the definition of "approximate" is the two rules in the second sentence.

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 26, 2016

The way that this whole page of documentation is written does not suggest that this class is ONLY for use in a multi-threaded setting.

This class can be used without multi-threading, right?

Wouldn't it be useful to know whether this function does give the exact size of the queue in a single-threaded setting?

Right now, it doesn't contain that information.

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 26, 2016

One thing that is important to recognize in considering this, is which information is specific to what is being documented, and which information is more general.

Some people may think that documentation should only give information specific to what is being documented. Others may think it is useful to also include general information that can help people learn.

I don't know whether the writers of Python documentation lean to one of these or the other, but this contains a significant amount of information that has nothing to do with Python specifically, nothing to do with this class specifically, and nothing to do with this function specifically. (Again, I'm not saying this is bad. I just think it's important for people to recognize it.)

It's just general multi-threading knowledge. Anyone who knows about multi-threading (in any language) knows that the queue could change between two function calls.

But despite that extra general information, there is some specific information missing. Does it return the size of the queue (at the time the memory is accessed by the function call)? or does it use a more complex strategy for approximating the size of the queue? The reason this information is important is that if it is the former, that would be useful in single-threaded situations.

I am guessing that it is the former, but I don't know because not enough information is given.

Assuming that guess, I think following the model I see in the documentation of the next 2 functions on the page (Queue.empty() and Queue.full()) would be a good idea. That is, that the first sentence should only contain information specific to what is being documented, and more general information (about multi-threading) can be given afterward.

The fact that the size returned is approximate would have nothing to do with this function specifically, and it is just general information about how multi-threading works.

My suggestion for this documentation (again, assuming that my guess of the missing information is correct) I will put in a separate comment because this comment will be TLDR for many.

If my guess is incorrect, then something should be clarified to lessen people guessing thus. (Maybe this is just projecting, but I think most people would make the same guess that I am making.)

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 26, 2016

My suggestion for this documentation:

"""
Return the number of items in the queue. Note, in multi-threading this mostly just serves as an approximation, and information from this doesn’t guarantee that a subsequent get() or put() will not block.
"""

@vstinner
Copy link
Member

"""
Return the number of items in the queue. Note, in multi-threading this mostly just serves as an approximation, and information from this doesn’t guarantee that a subsequent get() or put() will not block.
"""

I dislike this description. If I understand correctly, the issue is that someone must not rely on the size to check if the queue is empty or not. If I'm right, the doc must be more explicit. Something like:

"The size must not be used to check if get() or put() will block. Use get_nowait() and put_nowait(), or get() and put() in non-blocking mode (block=False)."

There is even a Wikipedia article on the bug :-)

https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use

(I'm not sure that it's exactly the same class of bug.)

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 27, 2016

More explicit is ok, if that's what people want, but just not in the first sentence, because that stuff has nothing to do with what is being documented specifically (as evidenced by referencing a wikipedia article that doesn't even mention python).

I don't think more explicit is necessary, but if that's what others want, it's not bad.

How much of the python documentation should be dedicated to teaching people stuff that has nothing to do with python specifically?

@bitdancer
Copy link
Member

The current wording is, IMO, better than the suggested wording, especially if you don't want to be "teaching stuff". The current wording is a specification of the method's behavior. I really don't know what you could replace "approximate" with that would improve it without having to get into a description of the behavior of a threaded program.

It seems like you are wanting us to document that the function will return an accurate size if the program is single threaded, but I don't think we want to do that, because that is not part of the specification of the method.

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 27, 2016

It is inconsistent with other documentation right next to it.

Should the documentation for empty() say "Return True if the queue is approximately empty, False otherwise."?
Should the documentation for full() say "Return True if the queue is approximately full, False otherwise."?

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 27, 2016

If the specification of the empty method is to return whether the queue is empty, then the programmers have failed to meet that specification, because by the time you get that return value, it might not be empty anymore.

@bitdancer
Copy link
Member

The subsequent text specifies the behavior. You *could* delete the 'approximately' from the qsize documentation to be parallel, but I think that would be a disservice to the reader. You could also use the phrase "at the moment of the call" in all three, which might be acceptable. Let's see what Raymond has to say, when he has time to respond.

@DougHoskisson
Copy link
Mannequin Author

DougHoskisson mannequin commented Jul 27, 2016

My suggestion was not to delete the "approximate" entirely. Just move it out of the first sentence to make it more consistent with the other documentation.

This is the model I'm seeing in empty() and full():

The first sentence is something simple and direct (without nebulous words that will make people wonder what it means), and then after that, comment on the race-condition stuff.

I think that documentation is good, and it would be good to follow the same model for qsize().

@rhettinger rhettinger assigned rhettinger and unassigned docspython Jul 27, 2016
@rhettinger
Copy link
Contributor

Sorry Doug, I don't find any of the suggestions to be an improvement and I concur with David Murray that the docstring for qsize() isn't the place for a tutorial on race conditions and LBYL vs EAPF which are general threading topics rather than a queue specific topics.

Also, I'm reluctant to change Guido's original wording which has served well for a decade. While I'm sure you can invent ways to misread the word "approximate", it does communicate that this method cannot relied upon to return the exact size. If we were seeing recurring source of confusion, there might be a basis for a change, but that is not the case.

Sorry, but I'm going to close this one. Every person might find a different way to wordsmith this one, but I think we should favor Guido's choice.

@bitdancer
Copy link
Member

Doug: side note. Raymond teaches Python, and makes a study of what works and doesn't work in communicating it to students, so he isn't rejecting this lightly.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

3 participants