[c++ vs python]performance comparisons between different language packages #207

wxchan · 2017-01-13T13:57:04Z

CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Memory: 8G
OS: ubuntu 16.04
python version: 2.7

preprocessing stage is same as https://github.com/guolinke/boosting_tree_benchmarks

run each dataset three times, all the number in table represents seconds of 500 iterations; time = load data time + train time

higgs: full log and config

time	c++	python	python-load-data-from-file
1	47.07+951.39	387.13+980.36	1032.72
2	67.58+982.06	398.33+945.63	1048.18
3	46.47+982.29	403.52+999.51	1089.37

yahoo: full log and config

time	c++	python	python-load-data-from-file
1	47.07+542.64	177.97+530.40	509.66
2	67.58+523.18	200.66+542.34	555.82
3	46.47+553.55	212.78+531.90	574.10

Look like python speed is as good as c++ version. Is there anything wrong (especially with num of threads, I use default in all cases)?

btw, python result is a little different from c++ because load_svmlight_file losses precision.

guolinke · 2017-01-13T14:06:54Z

@wxchan can you try load from file in python? I think the different result may cause by this.

wxchan · 2017-01-13T14:46:33Z

do you mean load data direct from file? it produces another result.

log and script are here: https://github.com/wxchan/LightGBM/tree/cmp/performance/higgs

guolinke · 2017-01-13T15:28:58Z

@wxchan Did you use the latest code to rebuild python package? Maybe this time/result difference is caused by float/double type

wxchan · 2017-01-13T15:52:11Z

I think I am using the latest code. Let me check it again.

guolinke · 2017-01-13T15:54:19Z

@wxchan you accuracy result in sh version is same with my benchmarks about 2 month again.. So I think it may is not the latest code.

wxchan · 2017-01-13T16:30:44Z

right, 0.844997 should be current result, I will update the log later.

wxchan · 2017-01-14T03:31:33Z

@guolinke updated. The result of c++ and python-load-data-from-file is same.

btw, xgboost adds new tree building algorithm similar to lightgbm. dmlc/xgboost#1940. would you like to test on it? (The memory is not enough for my machine.)

guolinke · 2017-01-14T04:11:12Z

@wxchan the new result seems much reasonable now.
So the python-package is almost fast as raw cpp version, happy to know this.
Thanks for your benchmark!

I just compared with the new xgboost_hist, refer to: #211 .

BTW, you means xgboost_hist is still cost much on memory usage?

Allardvm · 2017-01-14T09:40:25Z

I can write a Julia version of the data-loading and training scripts, if you're willing to add an unofficial language package to a comparison on the official git. You'd have to run the script on your computer to make the results comparable, but I'd be happy to help out with that if you encounter any issues.

wxchan · 2017-01-14T11:12:50Z

@Allardvm I never used Julia before. It will be great if you provide a script can be run within a couple of commands. Or you can run c++ version again on your machine for comparison. It is already provided in https://github.com/guolinke/boosting_tree_benchmarks.

@guolinke I test on python version. xgboost_hist still costs much on memory, it cause memory error on my 8G machine with higgs dataset.

github-actions · 2023-08-24T02:42:28Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

wxchan changed the title ~~performance comparison~~ performance comparisons between different language packages Jan 13, 2017

wxchan changed the title ~~performance comparisons between different language packages~~ [c++ vs python]performance comparisons between different language packages Jan 14, 2017

wxchan closed this as completed Jan 22, 2017

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[c++ vs python]performance comparisons between different language packages #207

[c++ vs python]performance comparisons between different language packages #207

wxchan commented Jan 13, 2017 •

edited

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017 •

edited

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017

wxchan commented Jan 14, 2017

guolinke commented Jan 14, 2017 •

edited

Allardvm commented Jan 14, 2017

wxchan commented Jan 14, 2017

github-actions bot commented Aug 24, 2023

[c++ vs python]performance comparisons between different language packages #207

[c++ vs python]performance comparisons between different language packages #207

Comments

wxchan commented Jan 13, 2017 • edited

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017 • edited

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017

guolinke commented Jan 13, 2017

wxchan commented Jan 13, 2017

wxchan commented Jan 14, 2017

guolinke commented Jan 14, 2017 • edited

Allardvm commented Jan 14, 2017

wxchan commented Jan 14, 2017

github-actions bot commented Aug 24, 2023

wxchan commented Jan 13, 2017 •

edited

wxchan commented Jan 13, 2017 •

edited

guolinke commented Jan 14, 2017 •

edited