Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return_generator={True,False} -> return_as={'list','generator'} #1458

Merged
merged 10 commits into from Jun 28, 2023

Conversation

fcharras
Copy link
Contributor

Change the boolean return_generator keyword, to return_as that is expected to take values in {'list','submitted'}, to anticipate a future 'completed' keyword when implementing #1449 .

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Patch coverage: 92.30% and project coverage change: -0.02 ⚠️

Comparison is base (5d88860) 94.88% compared to head (8862e02) 94.87%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1458      +/-   ##
==========================================
- Coverage   94.88%   94.87%   -0.02%     
==========================================
  Files          45       45              
  Lines        7471     7474       +3     
==========================================
+ Hits         7089     7091       +2     
- Misses        382      383       +1     
Impacted Files Coverage Δ
joblib/_parallel_backends.py 93.47% <ø> (+1.08%) ⬆️
joblib/parallel.py 96.90% <83.33%> (+0.01%) ⬆️
joblib/test/test_parallel.py 96.11% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nitpicks. Happy to have the opinion of @GaelVaroquaux on this one.

CHANGES.rst Outdated Show resolved Hide resolved
doc/parallel.rst Outdated Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
joblib/parallel.py Outdated Show resolved Hide resolved
@betatim
Copy link

betatim commented Jun 23, 2023

Came here via the sprint/Franck at the sprint.

How about Parallel(..., results=...), with possible values: immediate for "return results in any order as soon as they are ready", ordered for "return results in the order in which tasks were submitted" and complete for "return all results in order in which tasks were submitted when all tasks are ready". Could also use "unordered", "ordered" and "all" (or something?). I think main my comment is that I prefer results= over return_as=.

@rth
Copy link
Contributor

rth commented Jun 23, 2023

As @adrinjalali mentioned it could be worth at least considering adding an extra method for different output type. Something like (but the name can certainly be better),

joblib.Parallel(...).call_as_generator(delayed(sqrt)(i**2) for i in range(10)))

It also feels that something that returns a variable type depending on the input parameter is not great API wise, and would confuse type checkers. I mean between list and iterators of results it could still work. But if you want to add an iterator of future results later that's a completely different return type.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 23, 2023

  • return_type="list"/"generator" (or return_as?)
  • return_order="as_submitted"/"as_completed" (or collection_order?)

with raise ValueError when return_as="list" and collection_order="as_completed" because it's useless.

At least it's very explicit. It's a bit verbose but I think I prefer this side of the tradeoff.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 23, 2023

But if you want to add an iterator of future results later that's a completely different return type.

If we ever go into returning "future" or "promise" objects I think we should introduce a completely new API (and probably mimic that of concurrent.future / dask).

@fcharras
Copy link
Contributor Author

fcharras commented Jun 23, 2023

Some more online discussion later, we converge on return_as=list/generator/unordered_generator. I'll update the PR in this direction.

(About futures/promises, returning such objects is definitely not in the scope of what Parallel offers, the misunderstanding comes from bad wording from me early on, sorry about that.)

@ogrisel
Copy link
Contributor

ogrisel commented Jun 23, 2023

After discussion at the scikit-learn sprint, I also prefer the following:

  • return_as="list"/"generator"/"unordered_generator"

This is explicit enough, technically correct and concise enough.

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nitpick

joblib/_parallel_backends.py Outdated Show resolved Hide resolved
@fcharras fcharras changed the title return_generator={True,False} -> return_as={'list','submitted'} return_generator={True,False} -> return_as={'list','generator'} Jun 23, 2023
Co-authored-by: Thomas Moreau <thomas.moreau.2010@gmail.com>
@tomMoral tomMoral merged commit 83f9169 into joblib:master Jun 28, 2023
13 of 16 checks passed
@tomMoral
Copy link
Contributor

Merging as the failure on sklearn is not related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants