Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some TODOs for joblib 0.12.2 #11771

Closed
qinhanmin2014 opened this issue Aug 7, 2018 · 8 comments
Closed

Some TODOs for joblib 0.12.2 #11771

qinhanmin2014 opened this issue Aug 7, 2018 · 8 comments

Comments

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Aug 7, 2018

We introduce joblib 0.12.2 in #11741, but there're still some TODOs (open an issue for further discussion)
(1) Circle CI is failing

Unexpected failing examples:
/home/circleci/project/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/circleci/project/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py", line 75, in <module>
    clf.fit(X, y)  # set the best parameters
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 663, in fit
    cv.split(X, y, groups)))
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 981, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 818, in dispatch_one_batch
    self._pickle_cache)
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 253, in __init__
    self.items = list(iterator_slice)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 662, in <genexpr>
    for parameters, (train, test) in product(candidate_params,
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 50, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 50, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 53, in clone
    return copy.deepcopy(estimator)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 274, in _reconstruct
    y = func(*args)
  File "/home/circleci/project/sklearn/externals/joblib/memory.py", line 830, in __init__
    location, cachedir))
ValueError: You set both "location='/tmp/tmp1v2mgts1' and "cachedir=False". 'cachedir' has been deprecated in version 0.12 and will be removed in version 0.14.
Please only set "location='/tmp/tmp1v2mgts1'"


/home/circleci/project/examples/compose/plot_compare_reduction.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/circleci/project/examples/compose/plot_compare_reduction.py", line 119, in <module>
    grid.fit(digits.data, digits.target)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 663, in fit
    cv.split(X, y, groups)))
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 981, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 818, in dispatch_one_batch
    self._pickle_cache)
  File "/home/circleci/project/sklearn/externals/joblib/parallel.py", line 253, in __init__
    self.items = list(iterator_slice)
  File "/home/circleci/project/sklearn/model_selection/_search.py", line 662, in <genexpr>
    for parameters, (train, test) in product(candidate_params,
  File "/home/circleci/project/sklearn/base.py", line 62, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/circleci/project/sklearn/base.py", line 53, in clone
    return copy.deepcopy(estimator)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/copy.py", line 274, in _reconstruct
    y = func(*args)
  File "/home/circleci/project/sklearn/externals/joblib/memory.py", line 830, in __init__
    location, cachedir))
ValueError: You set both "location='/tmp/tmpa408j1cz' and "cachedir=False". 'cachedir' has been deprecated in version 0.12 and will be removed in version 0.14.
Please only set "location='/tmp/tmpa408j1cz'"

(2) We get many extra warnings in the examples
e.g., UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = None.
(3) In the doc, we still have things like n_jobs : int, optional (default=1)
(4) Maybe add some explanations about the difference between n_jobs=1 and n_jobs=None?

@jnothman
Copy link
Member

jnothman commented Aug 7, 2018 via email

@rth
Copy link
Member

rth commented Aug 13, 2018

(3) In the doc, we still have things like n_jobs : int, optional (default=1)
(4) Maybe add some explanations about the difference between n_jobs=1 and n_jobs=None?

I think the relevant joblib lines are,

https://github.com/joblib/joblib/blob/7a17c36c83326503762a9b23d02407652ece8e3f/joblib/parallel.py#L655-L663

so this would still use n_jobs=1 by default but, I suppose, allow better interaction with a custom parallel backend (i.e. dask distributed). We should include some docs about it (possibly pointing to joblib docs?).

I think leaving it to 1 in docs might make sense: it is consistent with how we e.g. deprecate/change parameters where the value in the doc (actual value) is different from that in the code (changed to illustrate the future warning etc).

We might still might want to do some API change entry about this.

@jnothman
Copy link
Member

jnothman commented Aug 14, 2018 via email

@qinhanmin2014
Copy link
Member Author

So the final decision for (3) and (4) is to update the glossary and keep the doc? I won't argue about the decision but honestly I don't like it. Users will see n_jobs=None at the top of the doc and n_jobs : int, optional (default=1) in parameters section, which might be confused.

@jnothman
Copy link
Member

jnothman commented Aug 14, 2018 via email

@rth
Copy link
Member

rth commented Aug 14, 2018

So the final decision for (3) and (4) is to update the glossary and keep the doc? I won't argue about the decision but honestly I don't like it. Users will see n_jobs=None at the top of the doc and n_jobs : int, optional (default=1) in parameters section, which might be confused.

Well as far as I see the other alternatives would be,

  • set n_jobs=None also in the docs, and specify that it corresponds to 1 job in each docstring. We can do it, but it's a lot of changes, and I wonder if it's really worth the effort, particularly now that we are a bit resource limited for the release/RC.
  • revert n_jobs=1 in the code and the docs, but that would not work with custom joblib backends (initial motivation for this change), so I don't think it's a solution.

@qinhanmin2014
Copy link
Member Author

qinhanmin2014 commented Aug 14, 2018

revert n_jobs=1 in the code and the docs, but that would not work with custom joblib backends (initial motivation for this change), so I don't think it's a solution.

+1

set n_jobs=None also in the docs, and specify that it corresponds to 1 job in each docstring.

We've updated the glossary, so we can refer to the glossary like what we've done for random_state.

I prefer to update the doc accordingly, but I agree that this should not block the release.

@qinhanmin2014
Copy link
Member Author

Closing given joblib 0.12.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants