Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++ vs python]performance comparisons between different language packages #207

Closed
wxchan opened this issue Jan 13, 2017 · 11 comments
Closed

Comments

@wxchan
Copy link
Contributor

wxchan commented Jan 13, 2017

CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Memory: 8G
OS: ubuntu 16.04
python version: 2.7

preprocessing stage is same as https://github.com/guolinke/boosting_tree_benchmarks

run each dataset three times, all the number in table represents seconds of 500 iterations; time = load data time + train time

higgs: full log and config

time c++ python python-load-data-from-file
1 47.07+951.39 387.13+980.36 1032.72
2 67.58+982.06 398.33+945.63 1048.18
3 46.47+982.29 403.52+999.51 1089.37

yahoo: full log and config

time c++ python python-load-data-from-file
1 47.07+542.64 177.97+530.40 509.66
2 67.58+523.18 200.66+542.34 555.82
3 46.47+553.55 212.78+531.90 574.10

Look like python speed is as good as c++ version. Is there anything wrong (especially with num of threads, I use default in all cases)?

btw, python result is a little different from c++ because load_svmlight_file losses precision.

@guolinke
Copy link
Collaborator

@wxchan can you try load from file in python? I think the different result may cause by this.

@wxchan
Copy link
Contributor Author

wxchan commented Jan 13, 2017

do you mean load data direct from file? it produces another result.

log and script are here: https://github.com/wxchan/LightGBM/tree/cmp/performance/higgs

@guolinke
Copy link
Collaborator

@wxchan Did you use the latest code to rebuild python package? Maybe this time/result difference is caused by float/double type

@wxchan
Copy link
Contributor Author

wxchan commented Jan 13, 2017

I think I am using the latest code. Let me check it again.

@guolinke
Copy link
Collaborator

@wxchan you accuracy result in sh version is same with my benchmarks about 2 month again.. So I think it may is not the latest code.

@wxchan
Copy link
Contributor Author

wxchan commented Jan 13, 2017

right, 0.844997 should be current result, I will update the log later.

@wxchan wxchan changed the title performance comparison performance comparisons between different language packages Jan 13, 2017
@wxchan
Copy link
Contributor Author

wxchan commented Jan 14, 2017

@guolinke updated. The result of c++ and python-load-data-from-file is same.

btw, xgboost adds new tree building algorithm similar to lightgbm. dmlc/xgboost#1940. would you like to test on it? (The memory is not enough for my machine.)

@guolinke
Copy link
Collaborator

guolinke commented Jan 14, 2017

@wxchan the new result seems much reasonable now.
So the python-package is almost fast as raw cpp version, happy to know this.
Thanks for your benchmark!

I just compared with the new xgboost_hist, refer to: #211 .

BTW, you means xgboost_hist is still cost much on memory usage?

@wxchan wxchan changed the title performance comparisons between different language packages [c++ vs python]performance comparisons between different language packages Jan 14, 2017
@Allardvm
Copy link
Contributor

I can write a Julia version of the data-loading and training scripts, if you're willing to add an unofficial language package to a comparison on the official git. You'd have to run the script on your computer to make the results comparable, but I'd be happy to help out with that if you encounter any issues.

@wxchan
Copy link
Contributor Author

wxchan commented Jan 14, 2017

@Allardvm I never used Julia before. It will be great if you provide a script can be run within a couple of commands. Or you can run c++ version again on your machine for comparison. It is already provided in https://github.com/guolinke/boosting_tree_benchmarks.

@guolinke I test on python version. xgboost_hist still costs much on memory, it cause memory error on my 8G machine with higgs dataset.

@wxchan wxchan closed this as completed Jan 22, 2017
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants