New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n_jobs in GridSearchCV issue #6147

Closed
armgilles opened this Issue Jan 9, 2016 · 12 comments

Comments

Projects
None yet
9 participants
@armgilles

armgilles commented Jan 9, 2016

Hi,

First thanks for your awesome work !

I have an issue with GridSearchCV and n_jobs for a ExtraTreesClassifier model.

  • platform.platform() : Linux-3.13.0-74-generic-x86_64-with-debian-jessie-sid
  • cpu_count() : 8
  • RAM : 32 GB (never exceeds 6 GO during exec)
  • Python 2.7.11 :: Anaconda 2.4.1 (64-bit)
  • sklearn.__version__ : '0.17'
  • numpy.__version__ : '1.10.4'
  • scipy.__version__ : '0.16.1'
  • pandas.__version__ : '0.17.1'
  • joblib.__version__ : '0.9.3'

Code KO :

model = ExtraTreesClassifier(class_weight='balanced')
parameters = {'criterion': ['gini', 'entropy'],
                       'max_depth': [4, 10, 20],
                       'min_samples_split' : [2, 4, 8],
                       'max_depth' : [3, 10, 20]}

clf = GridSearchCV(model, parameters, verbose=3, scoring='roc_auc',
                        cv=StratifiedKFold(y_train, n_folds=5, shuffle=True),  
                        n_jobs=4)

clf.fit(X_train.values, y_train.values)
Traceback (most recent call last):
  File "create_extratrees.py", line 305, in <module>
    clf.fit(X_train.values, y_train.values)
  File "/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
    for parameters in parameter_iterable
  File "/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 812, in __call__
    self.retrieve()
  File "/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 762, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/home/gillesa/github/mailling/create_extratrees.py in <module>()
    300                 'max_depth' : [3, 10, 20]}
    301
    302 clf = GridSearchCV(model, parameters,
    303                    cv=StratifiedKFold(y_train, n_folds=5, shuffle=True), verbose=3, scoring='roc_auc', n_jobs=4)
    304
--> 305 clf.fit(X_train.values, y_train.values)
    306
    307 best_parameters, score, _ = max(clf.grid_scores_, key=lambda x: x[1])
    308 print(clf.scoring  + ' score : ', score)
    309 for param_name in sorted(best_parameters.keys()):

...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=sklearn.cross_validation.Stratif..._jobs', refit=True, scoring='roc_auc', verbose=3), X=array([[  0.,   9.,  56., ...,   1.,   0.,   0.]...      [  0.,   7.,  68., ...,   0.,   0.,   0.]]), y=array([0, 0, 0, ..., 1, 0, 0]))
    799         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    800             Target relative to X for classification or regression;
    801             None for unsupervised learning.
    802
    803         """
--> 804         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method GridSearchCV._fit of GridSearchCV(...jobs', refit=True, scoring='roc_auc', verbose=3)>
        X = array([[  0.,   9.,  56., ...,   1.,   0.,   0.]...      [  0.,   7.,  68., ...,   0.,   0.,   0.]])
        y = array([0, 0, 0, ..., 1, 0, 0])
        self.param_grid = {'criterion': ['gini', 'entropy'], 'max_depth': [3, 10, 20], 'min_samples_split': [2, 4, 8]}
    805
    806
    807 class RandomizedSearchCV(BaseSearchCV):
    808     """Randomized search on hyper parameters.

...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=sklearn.cross_validation.Stratif..._jobs', refit=True, scoring='roc_auc', verbose=3), X=array([[  0.,   9.,  56., ...,   1.,   0.,   0.]...      [  0.,   7.,  68., ...,   0.,   0.,   0.]]), y=array([0, 0, 0, ..., 1, 0, 0]), parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    548         )(
    549             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    550                                     train, test, self.verbose, parameters,
    551                                     self.fit_params, return_parameters=True,
    552                                     error_score=self.error_score)
--> 553                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    554                 for train, test in cv)
    555
    556         # Out is a list of triplet: score, estimator, n_test_samples
    557         n_fits = len(out)

...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=4), iterable=<generator object <genexpr>>)
    807             if pre_dispatch == "all" or n_jobs == 1:
    808                 # The iterable was consumed all at once by the above for loop.
    809                 # No need to wait for async callbacks to trigger to
    810                 # consumption.
    811                 self._iterating = False
--> 812             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=4)>
    813             # Make sure that we get a last message telling us we are done
    814             elapsed_time = time.time() - self._start_time
    815             self._print('Done %3i out of %3i | elapsed: %s finished',
    816                         (len(self._output), len(self._output),

---------------------------------------------------------------------------

Sub-process traceback:

---------------------------------------------------------------------------
ValueError                                         Sat Jan  9 16:42:09 2016
PID: 18076                Python 2.7.11: /home/gillesa/anaconda2/bin/python
...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
     73
     74     def __len__(self):
     75         return self._size
     76

...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator=ExtraTreesClassifier(bootstrap=False, class_weig..., random_state=None, verbose=0, warm_start=False), X=memmap([[  0.,   9.,  56., ...,   1.,   0.,   0....      [  0.,   7.,  68., ...,   0.,   0.,   0.]]), y=memmap([0, 0, 0, ..., 1, 0, 0]), scorer=make_scorer(roc_auc_score, needs_threshold=True), train=memmap([      0,       1,       2, ..., 1217841, 1217842, 1217843]), test=memmap([      0,       2,       3, ..., 1217824, 1217825, 1217833]), verbose=3, parameters={'criterion': 'gini', 'max_depth': 3, 'min_samples_split': 4}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1545                              " numeric value. (Hint: if using 'raise', please"
   1546                              " make sure that it has been spelled correctly.)"
   1547                              )
   1548
   1549     else:
-> 1550         test_score = _score(estimator, X_test, y_test, scorer)
   1551         if return_train_score:
   1552             train_score = _score(estimator, X_train, y_train, scorer)
   1553
   1554     scoring_time = time.time() - start_time

...........................................................................
/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _score(estimator=ExtraTreesClassifier(bootstrap=False, class_weig..., random_state=None, verbose=0, warm_start=False), X_test=memmap([[  0.,   9.,  56., ...,   1.,   0.,   0....      [  0.,   6.,  57., ...,   1.,   0.,   0.]]), y_test=memmap([0, 0, 0, ..., 0, 0, 0]), scorer=make_scorer(roc_auc_score, needs_threshold=True))
   1604         score = scorer(estimator, X_test)
   1605     else:
   1606         score = scorer(estimator, X_test, y_test)
   1607     if not isinstance(score, numbers.Number):
   1608         raise ValueError("scoring must return a number, got %s (%s) instead."
-> 1609                          % (str(score), type(score)))
   1610     return score
   1611
   1612
   1613 def _permutation_test_score(estimator, X, y, cv, scorer):

ValueError: scoring must return a number, got 0.671095795498 (<class 'numpy.core.memmap.memmap'>) instead.

If I set my n_jobs model to 8 and n_jobs GridSearchCV to 1, it's OK

model = ExtraTreesClassifier(class_weight='balanced', n_jobs=8)
parameters = {'criterion': ['gini', 'entropy'],
                       'max_depth': [4, 10, 20],
                       'min_samples_split' : [2, 4, 8],
                       'max_depth' : [3, 10, 20]}

clf = GridSearchCV(model, parameters, verbose=3, scoring='roc_auc',
                        cv=StratifiedKFold(y_train, n_folds=5, shuffle=True),  
                        n_jobs=1)

clf.fit(X_train.values, y_train.values)

I try different setup but if GridSearchCV n_jobs > 1 it fails.

I would like to optimize my CPU and i think n_jobs > 1 on GridSearchCV it better than n_jobs on your model. Maybe someone has feedback ?

Possible relation with #6023

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jan 18, 2016

I can reproduce the problem on my machine. Here is the code I used:

import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import StratifiedKFold
from sklearn.datasets import make_classification


X_train, y_train = make_classification(n_samples=int(1e6), n_features=5, random_state=0)



model = ExtraTreesClassifier(class_weight='balanced')
parameters = {'criterion': ['gini', 'entropy'],
                       'max_depth': [4, 10, 20],
                       'min_samples_split' : [2, 4, 8],
                       'max_depth' : [3, 10, 20]}

clf = GridSearchCV(model, parameters, verbose=3, scoring='roc_auc',
                        cv=StratifiedKFold(y_train, n_folds=5, shuffle=True),  
                        n_jobs=4)

The data needs to be big enough to trigger the memmaping.

@ogrisel ogrisel added the Bug label Jan 18, 2016

@hershaw

This comment has been minimized.

hershaw commented Jan 21, 2016

@ogrisel FIY, the call to fit is missing at the bottom of your reproduction code.

We are also having the same problem with random forest and logistic regression classifiers within our application.

Inside of cross_validation.py we have added the following hacky if statement before the the check and cast the score to a float here if it's an instance of np.core.memmap

# We added this
if isinstance(score, np.core.memmap):
   score = np.float(score)
if not isinstance(score, numbers.Number):
    raise ValueError("scoring must return a number, got %s (%s) instead."

Another bit of info, downgrading scikit to 0.16 fixes the problem.

We are deciding if we should deploy the hack because right now our application depends on 0.17 features but because of this bug, we can't handle large datasets.

For what it's worth, we forked it and ran the unit tests locally and they passed. See the fork below by @nfcampos.

nfcampos pushed a commit to nfcampos/scikit-learn that referenced this issue Jan 21, 2016

@amueller amueller added this to the 0.17.1 milestone Jan 21, 2016

@amueller

This comment has been minimized.

Member

amueller commented Jan 21, 2016

@ogrisel this one also for 0.17.1?

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jan 25, 2016

+1 working on it.

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 25, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 26, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 26, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 27, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 27, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 28, 2016

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 28, 2016

ogrisel added a commit that referenced this issue Jan 29, 2016

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jan 29, 2016

Fixed in #6225, should be part of 0.17.1.

@ogrisel ogrisel closed this Jan 29, 2016

glemaitre added a commit to glemaitre/scikit-learn that referenced this issue Feb 13, 2016

glemaitre added a commit to glemaitre/scikit-learn that referenced this issue Feb 13, 2016

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge tag '0.17.1' into releases
* tag '0.17.1': (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge branch 'releases' into dfsg
* releases: (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge branch 'dfsg' into debian
* dfsg: (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...
@mrconway

This comment has been minimized.

mrconway commented Apr 14, 2016

Issues with cross_val_score too

ValueError: scoring must return a number, got 0.9762644725410562 (<class 'numpy.core.memmap.memmap'>) instead.

sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError

@lesteve

This comment has been minimized.

Member

lesteve commented Apr 15, 2016

@MarkRConway what's your scikit-learn version? It should be fixed in 0.17.1.

@mrconway

This comment has been minimized.

mrconway commented Apr 15, 2016

Yes, it is fixed. Thank you.

mannby pushed a commit to mannby/scikit-learn that referenced this issue Apr 22, 2016

mannby pushed a commit to mannby/scikit-learn that referenced this issue Apr 22, 2016

bjlkeng added a commit to bjlkeng/scikit-learn that referenced this issue Sep 12, 2016

apetresc added a commit to rubikloud/scikit-learn that referenced this issue Sep 20, 2016

apetresc added a commit to rubikloud/scikit-learn that referenced this issue Sep 20, 2016

TomDLT added a commit to TomDLT/scikit-learn that referenced this issue Oct 3, 2016

TomDLT added a commit to TomDLT/scikit-learn that referenced this issue Oct 3, 2016

@eetuko

This comment has been minimized.

eetuko commented Jun 20, 2017

This problem is back in 0.18, downgrading it to 0.17.1 resolves it.
I don't know if it is normal ?

@amueller

This comment has been minimized.

Member

amueller commented Jun 20, 2017

@eetuko can you please open a new issue and with code to reproduce. And did you use 0.18 or 0.18.1?

@ZeerakW

This comment has been minimized.

ZeerakW commented Jul 3, 2017

@amueller I couldn't find any issue opened by eetuko, so I opened a new one: #9264

@ShichengChen

This comment has been minimized.

ShichengChen commented Sep 14, 2017

scikit-learn==0.19 works well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment