-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++ vs python]performance comparisons between different language packages #207
Comments
@wxchan can you try load from file in python? I think the different result may cause by this. |
do you mean load data direct from file? it produces another result. log and script are here: https://github.com/wxchan/LightGBM/tree/cmp/performance/higgs |
@wxchan Did you use the latest code to rebuild python package? Maybe this time/result difference is caused by float/double type |
I think I am using the latest code. Let me check it again. |
@wxchan you accuracy result in sh version is same with my benchmarks about 2 month again.. So I think it may is not the latest code. |
right, 0.844997 should be current result, I will update the log later. |
@guolinke updated. The result of c++ and python-load-data-from-file is same. btw, xgboost adds new tree building algorithm similar to lightgbm. dmlc/xgboost#1940. would you like to test on it? (The memory is not enough for my machine.) |
I can write a Julia version of the data-loading and training scripts, if you're willing to add an unofficial language package to a comparison on the official git. You'd have to run the script on your computer to make the results comparable, but I'd be happy to help out with that if you encounter any issues. |
@Allardvm I never used Julia before. It will be great if you provide a script can be run within a couple of commands. Or you can run c++ version again on your machine for comparison. It is already provided in https://github.com/guolinke/boosting_tree_benchmarks. @guolinke I test on python version. xgboost_hist still costs much on memory, it cause memory error on my 8G machine with higgs dataset. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Memory: 8G
OS: ubuntu 16.04
python version: 2.7
preprocessing stage is same as https://github.com/guolinke/boosting_tree_benchmarks
run each dataset three times, all the number in table represents seconds of 500 iterations; time = load data time + train time
higgs: full log and config
yahoo: full log and config
Look like python speed is as good as c++ version. Is there anything wrong (especially with num of threads, I use default in all cases)?
btw, python result is a little different from c++ because load_svmlight_file losses precision.
The text was updated successfully, but these errors were encountered: