New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too much memory consumption by xgboost #13
Comments
Ran into this with GPU implementation of Xgboost: Consider deleting previous booster objects. They tend to keep data around |
@tRosenflanz thank you for the suggestion, I got a hard time today to figure out how to fix it. I will try and let you know how it goes. |
Having the same issue. Would a dump of booster model to disk while keeping just run results work? With a 900mb dataset, running 10 models for max 300 seconds each consumes 400 gig of ram. Thanks for a great tool. |
@tmontana would you send me some more details so I can reproduce the issue:
|
Rows: 391,032 arguments used: model_types=['Xgboost'] automl._validation = {"validation_type": "kfold", "k_folds": 15, "shuffle": False, "stratify": True} %%time Result on a machine with 384 gigs: ERROR ~/anaconda3/envs/mlj_shap/lib/python3.6/site-packages/xgboost/core.py in _maybe_pandas_data(data, feature_names, feature_types) MemoryError: |
And this fails after 22 models on a machine with 784 gigs: automl = AutoML(total_time_limit=None, learner_time_limit=30,algorithms=model_types,train_ensemble=True,start_random_models=30,hill_climbing_steps=5,top_models_to_improve=3) |
Forgot to mention but there is no preprocessing - all features are numeric and no missing values |
It is a serious problem. I'm thinking about a major change in the code to not keep all models in RAM memory. It will take me some time to rewrite the package, though. In the meantime, I can offer you help through my web service. If you want you can try to tune models in https://mljar.com please set the account and I will give you as many free credits as needed to tune models to your dataset. Apologize for the problems! |
Hi Piotr: no worries - I'm already a client of the web platform. Thanks |
Please send me your username or email to contact@mljar.com so I can find your user and give credits for computation. |
I've investigated the memory consumption by xgboost. You can find notebook with example here I've submitted the ticket to xgboost dmlc/xgboost#5474 with asking for ways to limit memory usage. For now, it can be saving model to hard drive and then loading it back. |
Seems like this is an ongoing issue for xgboost. From what I can read they have solved this for GPUs. |
@tmontana I've made a few improvements in the
There is still a lot of things to be added in the package, but I hope memory consumption should not be so huge. |
when running several xgboost algorithms in row, with dataset > 100 MB, the RAM memory consumption is growing very fast - looks like a bug
The text was updated successfully, but these errors were encountered: