compared with xgboost new histogram based algorithm #211

guolinke · 2017-01-14T04:04:11Z

xgboost just adapt the histogram based idea from LightGBM, dmlc/xgboost#1950 .
It is much faster now and I run a new compare experiment: https://github.com/guolinke/boosting_tree_benchmarks

Environment

CPU: E5-2670 v3 * 2
Memory: 256GB DDR4 2133 MHz

speed

Data	xgboost	xgboost_approx	xgboost_hist	LightGBM
Higgs	3856.06 s	2256.88 s	582.192 s	262.364174 s
Yahoo LTR	671.803 s	479.533 s	265.916 s	167.210277 s
MS LTR	1257.1 s	1078.87 s	384.832 s	255.209744 s

The gap is much smaller, and LightGBM is about 1x faster(total about 2x) now.

accuracy

Higgs's AUC:

Metric	xgboost	xgboost_approx	xgboost_hist	LightGBM
AUC	0.839593	0.840521	0.845605	0.84611

ndcg at Yahoo LTR:

Metric	xgboost	xgboost_approx	xgboost_hist	LightGBM
ndcg@1	0.719748	0.717839	0.720223	0.731098
ndcg@3	0.717813	0.718188	0.721519	0.736522
ndcg@5	0.737849	0.738389	0.739904	0.754748
ndcg@10	0.78089	0.780146	0.783013	0.796101

ndcg at MS LTR:

Metric	xgboost	xgboost_approx	xgboost_hist	LightGBM
ndcg@1	0.483956	0.487025	0.488649	0.524228
ndcg@3	0.467951	0.468897	0.473184	0.50649
ndcg@5	0.472476	0.473395	0.477438	0.510901
ndcg@10	0.492429	0.49303	0.496967	0.527279

Laurae2 · 2017-01-14T11:14:03Z

xgboost approx / hist methods currently are scaling very poorly when it comes to multithreading. This is one of my test benchs using Bosch (I can test other sets), I can't even keep the CPU more busy than 60% (sparse 1Mx1K is too small in this case):

Worse is the approx method, which can't even use 25%.

Did you check the CPU usage while running with approx and with hist methods? When it comes to singlethreading, I found xgboost (fast method) to be faster than LightGBM. But for multithreading, LightGBM always wins as xgboost doesn't scale linearly with histograms.

guolinke · 2017-01-14T11:25:49Z

@Laurae2 can you give more details about your test dataset ? e.g #data, #feature, sparse / one-hot coding and so on

wxchan · 2017-01-14T11:28:15Z

I think is https://www.kaggle.com/c/bosch-production-line-performance/data?

@guolinke did you try grow_policy=depthwise?

Laurae2 · 2017-01-14T11:30:52Z

@guolinke test data: train_numeric.csv

1,183,747 observations
969 features, converted to sparse format for R
81% sparse (217,925,677 elements)

I can't seem to find a way to download Yahoo (email wall) and MS 30K set (email wall).

guolinke · 2017-01-14T12:03:16Z

@Laurae2
I just ran some benchmarks in 50 iterations by one thread.

Higgs:

LightGBM: 306s
xgboost: 455s

lightgbm_higgs_speed.log.txt
xgboost_hist_higgs_speed.log.txt

I think your dataset is sparse, so maybe the reason is caused by sparse data. I will take some investigate on it.

guolinke · 2017-01-14T12:08:40Z

@wxchan try depth-wise? why?

Laurae2 · 2017-01-14T13:33:16Z

@guolinke xgboost only 30% CPU usage on Higgs.

I'll run 12 threads, 6 threads, and 1 thread to compare all this on Higgs.

It seems to run much faster than your benchmark (for 12 threads, other results incoming soon). Here is a sample for Higgs:

Total time: update end, 479.402 sec in all. I'll re run again to check if I didn't do something wrong. I followed all the instructions in your repo.

Which CPU do you use? (I use i7-3930K in my case)

This is yours (did I make a mistake in the file name?):

[10:41:23] boosting round 351, 590.215 sec elapsed
[10:41:25] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=16
[10:41:25] [351]
[10:41:25] boosting round 352, 592.28 sec elapsed
[10:41:26] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=19
[10:41:27] [352]
[10:41:27] boosting round 353, 593.606 sec elapsed
[10:41:27] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=21
[10:41:27] [353]
[10:41:27] boosting round 354, 594.461 sec elapsed
[10:41:28] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=22
[10:41:28] [354]
[10:41:28] boosting round 355, 595.474 sec elapsed
[10:41:29] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=18
[10:41:29] [355]
[10:41:29] boosting round 356, 596.287 sec elapsed
[10:41:30] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=18
[10:41:30] [356]
[10:41:30] boosting round 357, 597.343 sec elapsed
[10:41:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=25
[10:41:31] [357]
[10:41:31] boosting round 358, 597.991 sec elapsed
[10:41:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=29
[10:41:32] [358]

guolinke · 2017-01-14T13:36:22Z

@Laurae2

2 * E5-2680 v2, DDR3 1600Mhz, 256GB (the memory speed will affect to training speed as well)
run by 16 threads

You also can run LightGBM by same setting and see what happens.

Laurae2 · 2017-01-14T17:19:43Z

@guolinke I also use DDR3 1600 MHz (64GB in my case).

My benchmarks on Higgs:

xgboost fast, 1 thread: 1136.12s
xgboost fast, 6 threads: 488.22s
xgboost fast, 12 threads: 477.86s
lightgbm, 1 thread: 2437.73s
lightgbm, 6 threads: 592.61s
lightgbm, 12 threads: 546.24s

Something I don't understand is this when I use xgboost:

3086062x28 matrix with 86409724 entries loaded from ../data/higgs.train

While you have:

10500000x28 matrix with 294000000 entries loaded from ../data/higgs.train

I'm using Python 3.5, so the original Python script to create the libsvm files does not work. Instead I'm using this:

import os

input_filename = "HIGGS.csv"
output_train = "higgs.train"
output_test = "higgs.test"

num_train = 10500000

read_num = 0

input = open(input_filename, "r")
train = open(output_train, "w")
test = open(output_test,"w")

def WriteOneLine(tokens, output):
	label = int(float(tokens[0]))
	output.write(str(label))
	for i in range(1,len(tokens)):
		feature_value = float(tokens[i])
		output.write(' ' + str(i-1) + ':' + str(feature_value))
	output.write('\n')

line = input.readline()

while line:
	tokens = line.split(',')
	if read_num < num_train:
		WriteOneLine(tokens, train)
	else:
		WriteOneLine(tokens, test)
	read_num += 1
	if (read_num % 1000 == 0):
		print(read_num)
	line = input.readline()

input.close()
train.close()
test.close()

It does goes through the 11M files:

$ wc -l HIGGS.csv
11000000 HIGGS.csv
$ wc -l HIGGS.train
10500000 HIGGS.train

Is it a normal behavior? My higgs.train is 6,082,744,083 bytes, HIGGS.csv is 8,035,497,980 bytes. Downloaded and created the libsvm files 3 times to triple check, same result.

higgs.train SHA-256:

First line of my HIGGS.train and HIGGS.csv:

$ head -n 1 HIGGS.train
1 0:0.869293212890625 1:-0.6350818276405334 2:0.22569026052951813 3:0.327470064163208 4:-0.6899932026863098 5:0.7542022466659546 6:-0.24857313930988312 7:-1.0920639038085938 8:0.0 9:1.3749921321868896 10:-0.6536741852760315 11:0.9303491115570068 12:1.1074360609054565 13:1.138904333114624 14:-1.5781983137130737 15:-1.046985387802124 16:0.0 17:0.657929539680481 18:-0.010454569943249226 19:-0.0457671694457531 20:3.101961374282837 21:1.353760004043579 22:0.9795631170272827 23:0.978076159954071 24:0.9200048446655273 25:0.7216574549674988 26:0.9887509346008301 27:0.8766783475875854
$ head -n 1 HIGGS.csv
1.000000000000000000e+00,8.692932128906250000e-01,-6.350818276405334473e-01,2.256902605295181274e-01,3.274700641632080078e-01,-6.899932026863098145e-01,7.542022466659545898e-01,-2.485731393098831177e-01,-1.092063903808593750e+00,0.000000000000000000e+00,1.374992132186889648e+00,-6.536741852760314941e-01,9.303491115570068359e-01,1.107436060905456543e+00,1.138904333114624023e+00,-1.578198313713073730e+00,-1.046985387802124023e+00,0.000000000000000000e+00,6.579295396804809570e-01,-1.045456994324922562e-02,-4.576716944575309753e-02,3.101961374282836914e+00,1.353760004043579102e+00,9.795631170272827148e-01,9.780761599540710449e-01,9.200048446655273438e-01,7.216574549674987793e-01,9.887509346008300781e-01,8.766783475875854492e-01

guolinke · 2017-01-14T17:25:40Z

@Laurae2 what is the data information output by your lightGBM? https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_higgs_speed.log#L4
If it is 10500000 * 28. This file is correct.
If not, maybe you are using wrong parameter(e.g default bagging?) for xgboost.

Laurae2 · 2017-01-14T18:00:18Z

@guolinke I have the same exact line as yours.

https://github.com/Laurae2/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_higgs_speed_01.log#L4

My params are also identical to yours, except I changed:

xgboost file path
number of threads

Default bagging in xgboost is 1.00 (use all data). I compiled xgboost from source. When I run your sh code as is (after removing all the things I can't run), I get the same issue.

My lines 3086061 to 3086063 on higgs.train do not seem malformed, I don't get why xgboost does not want to go any further, it's really strange:

$ sed "3086061q;d" higgs.train
1 0:0.718676745891571 1:-1.681124210357666 2:0.8171724081039429 3:0.7752423286437988 4:0.7791789174079895 5:1.2763633728027344 6:-1.0219426155090332 7:-1.028310775756836 8:2.1730761528015137 9:1.1680066585540771 10:-0.8333783149719238 11:0.6767918467521667 12:2.214872121810913 13:1.0481081008911133 14:-0.4995536506175995 15:-1.4125559329986572 16:0.0 17:0.9711945652961731 18:0.25355038046836853 19:0.16131795942783356 20:0.0 21:1.2038894891738892 22:1.117095708847046 23:0.9722089767456055 24:1.2790273427963257 25:1.8023028373718262 26:1.0590590238571167 27:0.9368829131126404
$ sed "3086062q;d" higgs.train
1 0:0.47252947092056274 1:-0.3263337016105652 2:-0.512377917766571 3:1.7292355298995972 4:-0.6069088578224182 5:1.1556251049041748 6:-2.0052411556243896 7:1.2037097215652466 8:0.0 9:0.585577130317688 10:-0.7906379103660583 11:-1.1281750202178955 12:2.214872121810913 13:0.3222651481628418 14:-1.3442645072937012 15:1.2070095539093018 16:2.548224449157715 17:1.0775635242462158 18:-0.7874786257743835 19:0.2550981044769287 20:0.0 21:1.6085612773895264 22:0.8385911583900452 23:0.9832923412322998 24:0.9576324820518494 25:0.42687857151031494 26:0.7787665128707886 27:1.3860973119735718
$ sed "3086063q;d" higgs.train
0 0:0.7380757331848145 1:-0.917532742023468 2:-0.3181764781475067 3:1.739646553993225 4:-0.41811928153038025 5:0.7752718329429626 6:0.032652150839567184 7:0.9170976877212524 8:0.0 9:1.2407790422439575 10:-0.7255558371543884 11:1.4668693542480469 12:1.1074360609054565 13:1.0872716903686523 14:0.4216180145740509 15:-1.2772005796432495 16:1.2741122245788574 17:0.49582037329673767 18:2.2756452560424805 19:0.9787034392356873 20:0.0 21:1.703389286994934 22:1.3028173446655273 23:1.0018014907836914 24:1.3815209865570068 25:1.1019604206085205 26:1.150844931602478 27:1.1618263721466064

Laurae2 · 2017-01-14T23:45:46Z

I went creating the matrix (binary format) for xgboost using R. All my tries using a libsvm format ended with nearly the same issue (it's just the row count changing in xgboost, I tried Python 2.7 and 3.5...).

Now xgboost works properly and matches your runs for AUC. I have to test for speed after.

My new results.

Setup:

Intel i7-3930K
64GB RAM, 1600 MHz

Speed:

xgboost fast, 1 thread: 3600.66s
xgboost fast, 6 threads: 1147.86s
xgboost fast, 12 threads: 916.89s
lightgbm, 1 thread: 2437.73s
lightgbm, 6 threads: 592.61s
lightgbm, 12 threads: 546.24s

AUC:

xgboost fast: 0.845617
lightgbm: 0.845319

Running depthwise soon.

wxchan · 2017-01-15T06:59:15Z

@guolinke xgboost depthwise is faster, we can compare best performance of xgboost to lightgbm.

Anyway, is there any conclusion here? According to the new updated graphs in dmlc/xgboost#1950 , xgboost has really good performance with allstate dataset.

guolinke · 2017-01-15T07:06:00Z

@wxchan
depth-wise is always faster, due to it pass much fewer #data when grows same #leaves.

Xgboost's algorithm is better for sparse data, And LightGBM is better for dense data.
However, xgboost's algorithm need much temporary space when #theards grows, this limit its speed-up in multi-threading.
And for allstate dataset, it is all one-hot features, so lightgbm actually can use categorical feature support to achieve speed-up.

And I will try to reduce time cost for the sparse feature in LightGBM as well.

guolinke · 2017-01-15T10:03:13Z

@Laurae2
can you try this #216 for you Bosch dataset?

Laurae2 · 2017-01-15T11:32:02Z

@guolinke Will test for Bosch dataset.

@wxchan I got reverted results for depthwise (I could test for Bosch too if needed). See table below on Higgs:

Speed:

Model	Mode	Threads	Speed (s)
xgboost	Fast + Lossguide	1	3600.66
xgboost	Fast + Lossguide	6	1147.86
xgboost	Fast + Lossguide	12	916.89
xgboost	Fast + Depthwise	1	3891.74
xgboost	Fast + Depthwise	6	1312.95
xgboost	Fast + Depthwise	12	1038.30
LightGBM	None	1	2437.73
LightGBM	None	6	592.61
LightGBM	None	12	546.24

AUC:

Model	Mode	AUC
xgboost	Fast + Lossguide	0.845617
xgboost	Fast + Depthwise	0.843080
LightGBM	None	0.845319

wxchan · 2017-01-15T11:43:44Z

@Laurae2 your result is actually same. I read your log, for 100 iterations, depthwise: 227.867, lossguide: 278.813. As that thread said, it's tested only with first dozen of iterations. (dmlc/xgboost#1950 (comment))

Allardvm · 2017-01-23T11:40:32Z

Just chiming in to note that, although comparing performance with XGBoost set at tree_method = hist and grow_method = lossguide seems fair, we should keep in mind that XGBoost has some additional processing to handle missing values (see #122). This can have a rather large impact on performance, so we shouldn't expect XGBoost to come out on top in terms of speed.

guolinke · 2017-01-23T11:45:00Z

@Allardvm The most time cost part in histogram algorithm is the histogram building.
The best split finding in histogram actually cost very small. (about O(#bin), #bin often equal with 255 )
So even with two round best split finding, it actually doesn't impact on speed.

Laurae2 · 2017-01-23T11:46:20Z

@Allardvm for a feature, xgboost only tests NA values aggregated against lowest and highest value. It's negligible, even with 99% sparsity.

guolinke · 2017-02-20T04:31:02Z

close now. will give a new comparison based on LightGBM v2.

github-actions · 2023-08-24T02:41:28Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

guolinke mentioned this issue Jan 14, 2017

[c++ vs python]performance comparisons between different language packages #207

Closed

glemaitre mentioned this issue Jan 24, 2017

[RFC] Gradient Boosting improvement scikit-learn/scikit-learn#8231

Closed

2 tasks

guolinke closed this as completed Feb 20, 2017

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compared with xgboost new histogram based algorithm #211

compared with xgboost new histogram based algorithm #211

guolinke commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

guolinke commented Jan 14, 2017

wxchan commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 •

edited

guolinke commented Jan 14, 2017

guolinke commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 •

edited

guolinke commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017

guolinke commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

wxchan commented Jan 15, 2017

guolinke commented Jan 15, 2017

guolinke commented Jan 15, 2017

Laurae2 commented Jan 15, 2017 •

edited

wxchan commented Jan 15, 2017 •

edited

Allardvm commented Jan 23, 2017

guolinke commented Jan 23, 2017 •

edited

Laurae2 commented Jan 23, 2017

guolinke commented Feb 20, 2017

github-actions bot commented Aug 24, 2023

compared with xgboost new histogram based algorithm #211

compared with xgboost new histogram based algorithm #211

Comments

guolinke commented Jan 14, 2017 • edited

Environment

speed

accuracy

Laurae2 commented Jan 14, 2017 • edited

guolinke commented Jan 14, 2017

wxchan commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 • edited

guolinke commented Jan 14, 2017

guolinke commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 • edited

guolinke commented Jan 14, 2017 • edited

Laurae2 commented Jan 14, 2017

guolinke commented Jan 14, 2017

Laurae2 commented Jan 14, 2017 • edited

Laurae2 commented Jan 14, 2017 • edited

wxchan commented Jan 15, 2017

guolinke commented Jan 15, 2017

guolinke commented Jan 15, 2017

Laurae2 commented Jan 15, 2017 • edited

wxchan commented Jan 15, 2017 • edited

Allardvm commented Jan 23, 2017

guolinke commented Jan 23, 2017 • edited

Laurae2 commented Jan 23, 2017

guolinke commented Feb 20, 2017

github-actions bot commented Aug 24, 2023

guolinke commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

guolinke commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

Laurae2 commented Jan 14, 2017 •

edited

Laurae2 commented Jan 15, 2017 •

edited

wxchan commented Jan 15, 2017 •

edited

guolinke commented Jan 23, 2017 •

edited