New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify map API in concurrent.futures #76487
Comments
The docstring for The lower-level multiprocessing module also describes I can think of several ways the situation could be improved, listed here from most conservative to most progressive:
If the latter options seem much too radical, please consider at least something along the lines of #1 above, I think it would help people correct their expectations when they first encounter the API :) |
Hi David,
The current implementation of the Executor.map() generator is: def result_iterator():
try:
# reverse to keep finishing order
fs.reverse()
while fs:
# Careful not to keep a reference to the popped future
if timeout is None:
yield fs.pop().result()
else:
yield fs.pop().result(end_time - time.time())
finally:
for future in fs:
future.cancel() So it seems to me that results are yielded as soon as they arrive (provided they arrive in the right order). |
Hi Antoine, Thanks for the response! :) I think the problem lies in the line immediately preceding the code you've posted:
In other words, all the jobs are first submitted and their futures stored in a list, which is then iterated over. This approach obviously breaks down when there is a great number of jobs, or when it's part of a pipeline meant for processing jobs continuously as they come. |
I see. So the problem you are pointing out is that the tasks *arguments* are consumed eagerly. I agree that may be a problem in some cases, though I think in most cases people are concerned with the *results*. (note that multiprocessing.Pool() has an imap() method which does what you would like) |
Yes, sorry for not being quite clear the first time around :) I eventually found out about Pool.imap (see item 3 on list in OP) and indeed it fits my use case very nicely, but my point was that the documentation is somewhat misleading with respect to the semantics of built-in Specifically, I would argue that it is unexpected for a function which claims to be "Equivalent to map(func, *iterables)" to require allocating a list the length of the shortest iterable. Maybe a code example will make this clearer for potential newcomers to the discussion -- this is what I would expect to happen (= the behavior of built-in
This is what happens instead with
|
I think the documentation is now clearer. Closing! |
Perfect, thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: