Some TODOs for joblib 0.12.2 #11771

qinhanmin2014 · 2018-08-07T15:42:24Z

We introduce joblib 0.12.2 in #11741, but there're still some TODOs (open an issue for further discussion)
(1) Circle CI is failing

Unexpected failing examples:
/home/circleci/project/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/circleci/project/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py", line 75, in <module>
    clf.fit(X, y)  # set the best parameters
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 663, in fit
    cv.split(X, y, groups)))
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 981, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 818, in dispatch_one_batch
    self._pickle_cache)
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 253, in __init__
    self.items = list(iterator_slice)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 662, in <genexpr>
    for parameters, (train, test) in product(candidate_params,
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 50, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 53, in clone
    return copy.deepcopy(estimator)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 274, in _reconstruct
    y = func(*args)
  File "/home/circleci/project/sklearn/externals/joblib/memory.py", line 830, in __init__
    location, cachedir))
ValueError: You set both "location='/tmp/tmp1v2mgts1' and "cachedir=False". 'cachedir' has been deprecated in version 0.12 and will be removed in version 0.14.
Please only set "location='/tmp/tmp1v2mgts1'"


/home/circleci/project/examples/compose/plot_compare_reduction.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/circleci/project/examples/compose/plot_compare_reduction.py", line 119, in <module>
    grid.fit(digits.data, digits.target)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 663, in fit
    cv.split(X, y, groups)))
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 981, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 818, in dispatch_one_batch
    self._pickle_cache)
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 253, in __init__
    self.items = list(iterator_slice)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 662, in <genexpr>
    for parameters, (train, test) in product(candidate_params,
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 53, in clone
    return copy.deepcopy(estimator)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 274, in _reconstruct
    y = func(*args)
  File "/home/circleci/project/sklearn/externals/joblib/memory.py", line 830, in __init__
    location, cachedir))
ValueError: You set both "location='/tmp/tmpa408j1cz' and "cachedir=False". 'cachedir' has been deprecated in version 0.12 and will be removed in version 0.14.
Please only set "location='/tmp/tmpa408j1cz'"

(2) We get many extra warnings in the examples
e.g., UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = None.
(3) In the doc, we still have things like n_jobs : int, optional (default=1)
(4) Maybe add some explanations about the difference between n_jobs=1 and n_jobs=None?

The text was updated successfully, but these errors were encountered:

jnothman · 2018-08-07T23:30:35Z

I think we need to remove all uses of cachedir...

rth · 2018-08-13T15:38:11Z

(3) In the doc, we still have things like n_jobs : int, optional (default=1)
(4) Maybe add some explanations about the difference between n_jobs=1 and n_jobs=None?

I think the relevant joblib lines are,

https://github.com/joblib/joblib/blob/7a17c36c83326503762a9b23d02407652ece8e3f/joblib/parallel.py#L655-L663

so this would still use n_jobs=1 by default but, I suppose, allow better interaction with a custom parallel backend (i.e. dask distributed). We should include some docs about it (possibly pointing to joblib docs?).

I think leaving it to 1 in docs might make sense: it is consistent with how we e.g. deprecate/change parameters where the value in the doc (actual value) is different from that in the code (changed to illustrate the future warning etc).

We might still might want to do some API change entry about this.

jnothman · 2018-08-14T02:39:24Z

I don't think we should worry too much about the default thing, at least for now. However, it might be worth documenting the n_jobs=None semantics in the Glossary entry on n_jobs.

qinhanmin2014 · 2018-08-14T03:38:24Z

So the final decision for (3) and (4) is to update the glossary and keep the doc? I won't argue about the decision but honestly I don't like it. Users will see n_jobs=None at the top of the doc and n_jobs : int, optional (default=1) in parameters section, which might be confused.

jnothman · 2018-08-14T03:45:11Z

At this stage, I'm a bit ambivalent.

…

On 14 August 2018 at 13:38, Hanmin Qin ***@***.***> wrote: So the final decision for (3) and (4) is to update the glossary and keep the doc? I won't argue about the decision but honestly I don't like it. Users will see n_jobs=None at the top of the doc and n_jobs : int, optional (default=1) in parameters section, which might be confused. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11771 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68AO8tTW_9OsPWICb1zgJfjfhydUks5uQkYygaJpZM4VyaQg> .

rth · 2018-08-14T08:04:33Z

So the final decision for (3) and (4) is to update the glossary and keep the doc? I won't argue about the decision but honestly I don't like it. Users will see n_jobs=None at the top of the doc and n_jobs : int, optional (default=1) in parameters section, which might be confused.

Well as far as I see the other alternatives would be,

set n_jobs=None also in the docs, and specify that it corresponds to 1 job in each docstring. We can do it, but it's a lot of changes, and I wonder if it's really worth the effort, particularly now that we are a bit resource limited for the release/RC.
revert n_jobs=1 in the code and the docs, but that would not work with custom joblib backends (initial motivation for this change), so I don't think it's a solution.

qinhanmin2014 · 2018-08-14T08:13:27Z

revert n_jobs=1 in the code and the docs, but that would not work with custom joblib backends (initial motivation for this change), so I don't think it's a solution.

+1

set n_jobs=None also in the docs, and specify that it corresponds to 1 job in each docstring.

We've updated the glossary, so we can refer to the glossary like what we've done for random_state.

I prefer to update the doc accordingly, but I agree that this should not block the release.

qinhanmin2014 · 2018-08-31T00:56:06Z

Closing given joblib 0.12.3.

qinhanmin2014 added Bug Build / CI Blocker help wanted labels Aug 7, 2018

qinhanmin2014 added this to the 0.20 milestone Aug 7, 2018

jnothman mentioned this issue Aug 9, 2018

[MRG] FIX Access Memory's location as a positional arg #11782

Closed

This was referenced Aug 13, 2018

Joblib 0.12.2 #11741

Merged

Fix #741: Memory pickling was broken with mmap_mode != None joblib/joblib#743

Closed

rth mentioned this issue Aug 13, 2018

[MRG+1] MAINT Fix effective_n_jobs in LogisticRegression #11803

Merged

jnothman mentioned this issue Aug 14, 2018

DOC glossary on n_jobs=None semantics #11806

Merged

qinhanmin2014 mentioned this issue Aug 14, 2018

[MRG+2] DOC Correct default n_jobs & reference the glossary #11808

Merged

qinhanmin2014 closed this as completed Aug 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some TODOs for joblib 0.12.2 #11771

Some TODOs for joblib 0.12.2 #11771

qinhanmin2014 commented Aug 7, 2018 •

edited

Loading

jnothman commented Aug 7, 2018 via email

rth commented Aug 13, 2018 •

edited

Loading

jnothman commented Aug 14, 2018 via email

qinhanmin2014 commented Aug 14, 2018

jnothman commented Aug 14, 2018 via email

rth commented Aug 14, 2018 •

edited

Loading

qinhanmin2014 commented Aug 14, 2018 •

edited

Loading

qinhanmin2014 commented Aug 31, 2018

Some TODOs for joblib 0.12.2 #11771

Some TODOs for joblib 0.12.2 #11771

Comments

qinhanmin2014 commented Aug 7, 2018 • edited Loading

jnothman commented Aug 7, 2018 via email

rth commented Aug 13, 2018 • edited Loading

jnothman commented Aug 14, 2018 via email

qinhanmin2014 commented Aug 14, 2018

jnothman commented Aug 14, 2018 via email

rth commented Aug 14, 2018 • edited Loading

qinhanmin2014 commented Aug 14, 2018 • edited Loading

qinhanmin2014 commented Aug 31, 2018

qinhanmin2014 commented Aug 7, 2018 •

edited

Loading

rth commented Aug 13, 2018 •

edited

Loading

rth commented Aug 14, 2018 •

edited

Loading

qinhanmin2014 commented Aug 14, 2018 •

edited

Loading