New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearchCV with xgboost estimator hangs when n_jobs!=1 #6627

Closed
vzocca opened this Issue Apr 5, 2016 · 16 comments

Comments

Projects
None yet
5 participants
@vzocca

vzocca commented Apr 5, 2016

I don't know if this is related to #6147. I am using "The scikit-learn version is 0.18.dev0" and I have no exception though, so this is different.

In any case, this is my code (the data I am using is the same as the data for the Santander kaggle competition, too big to attach).

alg = XGBClassifier(max_depth=4, min_child_weight = 1, n_estimators=1000, learning_rate=0.0202, gamma=0, nthread=4, subsample=0.6815, colsample_bytree=0.701, seed=1, silent=False)

param_test1 = {
 'max_depth':range(3,10,2),
 'min_child_weight':range(1,10,2)
}

gsearch1 = GridSearchCV(estimator = alg, param_grid = param_test1, scoring='roc_auc', iid=False, n_jobs=4, cv=5)
gsearch1.fit(train_data[predictors].as_matrix(),train_data[target].as_matrix())

The program will not crash, will not throw an exception, but will not do anything (activity monitor shows no activity). Quick debugging shows the program enters _fit in grid_search.py but never reaches line 564. I did not debug further. A quick search brought me to issue #6147 and tried removing the n_jobsvariable.

Removingn_jobs from the GridSearchCV call solves the issue.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

I don't know if this is related to #6147. I am using "The scikit-learn version is 0.18.dev0" and I have no exception though, so this is different.

This is very likely a completely different issue indeed.

Could you post the full code that you are using ? It looks like the Santander dataset from Kaggle is not that big so we could download it and see whether we can reproduce the problem. I am assuming this is the dataset you are talking about: https://www.kaggle.com/c/santander-customer-satisfaction/data.

Member

lesteve commented Apr 6, 2016

I don't know if this is related to #6147. I am using "The scikit-learn version is 0.18.dev0" and I have no exception though, so this is different.

This is very likely a completely different issue indeed.

Could you post the full code that you are using ? It looks like the Santander dataset from Kaggle is not that big so we could download it and see whether we can reproduce the problem. I am assuming this is the dataset you are talking about: https://www.kaggle.com/c/santander-customer-satisfaction/data.

@vzocca

This comment has been minimized.

Show comment
Hide comment
@vzocca

vzocca Apr 6, 2016

Yes, that is the dataset.
There is not much more code besides what I posted. I do some data manipulation (removing constant columns and duplicates) but no more than that, and the issue will remain even if you don't.

So:

#load data
target = "TARGET"  
train_data = pandas.read_csv("Data/train.csv")      
test_data = pandas.read_csv("Data/test.csv")

predictors = test_data.columns.values.tolist()
predictors.remove("ID")

#define alg
alg = XGBClassifier(max_depth=4, min_child_weight = 1, n_estimators=1000, learning_rate=0.0202,     gamma=0, nthread=4, subsample=0.6815, colsample_bytree=0.701, seed=1)

#define the params range
param_test1 = {
     'max_depth':range(3,10,2),
     'min_child_weight':range(1,10,2)
    } 

#define the GridSearchCV using the alg defined above  
gsearch1 = GridSearchCV(estimator = alg, param_grid = param_test1, scoring='roc_auc', n_jobs=4,  iid=False, cv=5)

#fit using the train_data on the target values
gsearch1.fit(train_data[predictors].as_matrix(),train_data[target].as_matrix())
print gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

As I mentioned I am using scikit-learn version is 0.18.dev0, don't know if this works with previous versions.

vzocca commented Apr 6, 2016

Yes, that is the dataset.
There is not much more code besides what I posted. I do some data manipulation (removing constant columns and duplicates) but no more than that, and the issue will remain even if you don't.

So:

#load data
target = "TARGET"  
train_data = pandas.read_csv("Data/train.csv")      
test_data = pandas.read_csv("Data/test.csv")

predictors = test_data.columns.values.tolist()
predictors.remove("ID")

#define alg
alg = XGBClassifier(max_depth=4, min_child_weight = 1, n_estimators=1000, learning_rate=0.0202,     gamma=0, nthread=4, subsample=0.6815, colsample_bytree=0.701, seed=1)

#define the params range
param_test1 = {
     'max_depth':range(3,10,2),
     'min_child_weight':range(1,10,2)
    } 

#define the GridSearchCV using the alg defined above  
gsearch1 = GridSearchCV(estimator = alg, param_grid = param_test1, scoring='roc_auc', n_jobs=4,  iid=False, cv=5)

#fit using the train_data on the target values
gsearch1.fit(train_data[predictors].as_matrix(),train_data[target].as_matrix())
print gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

As I mentioned I am using scikit-learn version is 0.18.dev0, don't know if this works with previous versions.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

Great thanks ! General comment: the easier you make it to reproduce your problem the better the quality of feedback you'll get.

Minor comment you can use python after the triple backquotes to have syntax highlighting in your snippet, see this link.

Member

lesteve commented Apr 6, 2016

Great thanks ! General comment: the easier you make it to reproduce your problem the better the quality of feedback you'll get.

Minor comment you can use python after the triple backquotes to have syntax highlighting in your snippet, see this link.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

OK I am guessing you are using xgboost and there may be a bad interaction going on between the xgboost thread pool and multiprocessing from the python stdlib. You can find a bit more details there: https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries

A few things you could try to see whether the issue goes away:

  • use nthread=1 in XGBoostClassifier
  • use Python 3 and set the joblib start method to forkserver as mentioned in the previous link
Member

lesteve commented Apr 6, 2016

OK I am guessing you are using xgboost and there may be a bad interaction going on between the xgboost thread pool and multiprocessing from the python stdlib. You can find a bit more details there: https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries

A few things you could try to see whether the issue goes away:

  • use nthread=1 in XGBoostClassifier
  • use Python 3 and set the joblib start method to forkserver as mentioned in the previous link
@vzocca

This comment has been minimized.

Show comment
Hide comment
@vzocca

vzocca Apr 6, 2016

Thank you lesteve, I removed n_jobs from the GridSearchCV call and that fixed the issue. I might try what you suggest in the future but for now I am happy as it is. I just wanted to make you aware of this issue, and possibly help others who may encounter the same problem, suggesting a possible temporary solution. Thank you.

vzocca commented Apr 6, 2016

Thank you lesteve, I removed n_jobs from the GridSearchCV call and that fixed the issue. I might try what you suggest in the future but for now I am happy as it is. I just wanted to make you aware of this issue, and possibly help others who may encounter the same problem, suggesting a possible temporary solution. Thank you.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

Would you be kind enough to just try whether setting nthread=1 in XGBoostClassifier fixes it as well ?

This would allow to be more confident that the source of the problem is understood. I don't have xgboost installed unfortunately and I would like to avoid spending too much time on this.

Member

lesteve commented Apr 6, 2016

Would you be kind enough to just try whether setting nthread=1 in XGBoostClassifier fixes it as well ?

This would allow to be more confident that the source of the problem is understood. I don't have xgboost installed unfortunately and I would like to avoid spending too much time on this.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

Probably relevant, from https://github.com/dmlc/xgboost/tree/master/python-package#note:

If you want to run XGBoost process in parallel using the fork backend for joblib/multiprocessing, you must build XGBoost without support for OpenMP by make no_omp=1. Otherwise, use the forkserver (in Python 3.4) or spawn backend. See the sklearn_parallel.py demo.

Member

lesteve commented Apr 6, 2016

Probably relevant, from https://github.com/dmlc/xgboost/tree/master/python-package#note:

If you want to run XGBoost process in parallel using the fork backend for joblib/multiprocessing, you must build XGBoost without support for OpenMP by make no_omp=1. Otherwise, use the forkserver (in Python 3.4) or spawn backend. See the sklearn_parallel.py demo.

@vzocca

This comment has been minimized.

Show comment
Hide comment
@vzocca

vzocca Apr 6, 2016

Of course, I'd be happy to help. Yes, that solves the issue as well (setting nthread=1). I don't have Python3 and at this point I'd rather not mess up with Python's installation. By the way, I am using MacOS and found issue #5115. Don't know enough about the system, could it be related?

vzocca commented Apr 6, 2016

Of course, I'd be happy to help. Yes, that solves the issue as well (setting nthread=1). I don't have Python3 and at this point I'd rather not mess up with Python's installation. By the way, I am using MacOS and found issue #5115. Don't know enough about the system, could it be related?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 6, 2016

Member

Good to know that setting nthread=1 fixes it for you.

Don't worry about trying to test the solution proposed for Python 3. From the xgboost note mentioned above I am reasonably confident that it would work.

I am afraid you'll have to leave with the work-around for now (either setting n_jobs=1 or nthread=1). From what I understand this is a fundamental limitation of multiprocessing in python.

Member

lesteve commented Apr 6, 2016

Good to know that setting nthread=1 fixes it for you.

Don't worry about trying to test the solution proposed for Python 3. From the xgboost note mentioned above I am reasonably confident that it would work.

I am afraid you'll have to leave with the work-around for now (either setting n_jobs=1 or nthread=1). From what I understand this is a fundamental limitation of multiprocessing in python.

@vzocca

This comment has been minimized.

Show comment
Hide comment
@vzocca

vzocca Apr 6, 2016

I am happy with the work-around. I just think it is useful to document it. Thank you for your time.

vzocca commented Apr 6, 2016

I am happy with the work-around. I just think it is useful to document it. Thank you for your time.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Apr 7, 2016

Member

@lesteve should we close this? @TomDLT

Member

raghavrv commented Apr 7, 2016

@lesteve should we close this? @TomDLT

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 8, 2016

Member

@lesteve should we close this? @TomDLT

I think so.

Member

lesteve commented Apr 8, 2016

@lesteve should we close this? @TomDLT

I think so.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Apr 8, 2016

Member

If someone can edit the title to be something like "GridSearchCV with xgboost estimator hangs when n_jobs!=1" even better.

Member

lesteve commented Apr 8, 2016

If someone can edit the title to be something like "GridSearchCV with xgboost estimator hangs when n_jobs!=1" even better.

@vzocca

This comment has been minimized.

Show comment
Hide comment
@vzocca

vzocca Apr 8, 2016

Is there a list of know issues? Should this be added there or add a note somewhere with explained why it cannot be solved?

vzocca commented Apr 8, 2016

Is there a list of know issues? Should this be added there or add a note somewhere with explained why it cannot be solved?

@TomDLT TomDLT changed the title from Issue with n_jobs in GridSearchCV to GridSearchCV with xgboost estimator hangs when n_jobs!=1 Apr 8, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 5, 2016

Member

We added something to the FAQ sometimes. Or we could actually have a list of known problems.

Member

amueller commented Oct 5, 2016

We added something to the FAQ sometimes. Or we could actually have a list of known problems.

@QtRoS

This comment has been minimized.

Show comment
Hide comment
@QtRoS

QtRoS Mar 8, 2017

I can confirm this!
Thanks for workaround.

QtRoS commented Mar 8, 2017

I can confirm this!
Thanks for workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment