Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compared with xgboost new histogram based algorithm #211

Closed
guolinke opened this issue Jan 14, 2017 · 22 comments
Closed

compared with xgboost new histogram based algorithm #211

guolinke opened this issue Jan 14, 2017 · 22 comments

Comments

@guolinke
Copy link
Collaborator

guolinke commented Jan 14, 2017

xgboost just adapt the histogram based idea from LightGBM, dmlc/xgboost#1950 .
It is much faster now and I run a new compare experiment: https://github.com/guolinke/boosting_tree_benchmarks

Environment

CPU: E5-2670 v3 * 2
Memory: 256GB DDR4 2133 MHz

speed

Data xgboost xgboost_approx xgboost_hist LightGBM
Higgs 3856.06 s 2256.88 s 582.192 s 262.364174 s
Yahoo LTR 671.803 s 479.533 s 265.916 s 167.210277 s
MS LTR 1257.1 s 1078.87 s 384.832 s 255.209744 s

The gap is much smaller, and LightGBM is about 1x faster(total about 2x) now.

accuracy

Higgs's AUC:

Metric xgboost xgboost_approx xgboost_hist LightGBM
AUC 0.839593 0.840521 0.845605 0.84611

ndcg at Yahoo LTR:

Metric xgboost xgboost_approx xgboost_hist LightGBM
ndcg@1 0.719748 0.717839 0.720223 0.731098
ndcg@3 0.717813 0.718188 0.721519 0.736522
ndcg@5 0.737849 0.738389 0.739904 0.754748
ndcg@10 0.78089 0.780146 0.783013 0.796101

ndcg at MS LTR:

Metric xgboost xgboost_approx xgboost_hist LightGBM
ndcg@1 0.483956 0.487025 0.488649 0.524228
ndcg@3 0.467951 0.468897 0.473184 0.50649
ndcg@5 0.472476 0.473395 0.477438 0.510901
ndcg@10 0.492429 0.49303 0.496967 0.527279
@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

xgboost approx / hist methods currently are scaling very poorly when it comes to multithreading. This is one of my test benchs using Bosch (I can test other sets), I can't even keep the CPU more busy than 60% (sparse 1Mx1K is too small in this case):

image

Worse is the approx method, which can't even use 25%.

Did you check the CPU usage while running with approx and with hist methods? When it comes to singlethreading, I found xgboost (fast method) to be faster than LightGBM. But for multithreading, LightGBM always wins as xgboost doesn't scale linearly with histograms.

@guolinke
Copy link
Collaborator Author

@Laurae2 can you give more details about your test dataset ? e.g #data, #feature, sparse / one-hot coding and so on

@wxchan
Copy link
Contributor

wxchan commented Jan 14, 2017

I think is https://www.kaggle.com/c/bosch-production-line-performance/data?

@guolinke did you try grow_policy=depthwise?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

@guolinke test data: train_numeric.csv

  • 1,183,747 observations
  • 969 features, converted to sparse format for R
  • 81% sparse (217,925,677 elements)

I can't seem to find a way to download Yahoo (email wall) and MS 30K set (email wall).

@guolinke
Copy link
Collaborator Author

@Laurae2
I just ran some benchmarks in 50 iterations by one thread.

Higgs:

LightGBM: 306s
xgboost: 455s

lightgbm_higgs_speed.log.txt
xgboost_hist_higgs_speed.log.txt

I think your dataset is sparse, so maybe the reason is caused by sparse data. I will take some investigate on it.

@guolinke
Copy link
Collaborator Author

@wxchan try depth-wise? why?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

@guolinke xgboost only 30% CPU usage on Higgs.

I'll run 12 threads, 6 threads, and 1 thread to compare all this on Higgs.

image

It seems to run much faster than your benchmark (for 12 threads, other results incoming soon). Here is a sample for Higgs:

image

Total time: update end, 479.402 sec in all. I'll re run again to check if I didn't do something wrong. I followed all the instructions in your repo.

Which CPU do you use? (I use i7-3930K in my case)

This is yours (did I make a mistake in the file name?):

[10:41:23] boosting round 351, 590.215 sec elapsed
[10:41:25] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=16
[10:41:25] [351]
[10:41:25] boosting round 352, 592.28 sec elapsed
[10:41:26] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=19
[10:41:27] [352]
[10:41:27] boosting round 353, 593.606 sec elapsed
[10:41:27] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=21
[10:41:27] [353]
[10:41:27] boosting round 354, 594.461 sec elapsed
[10:41:28] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=22
[10:41:28] [354]
[10:41:28] boosting round 355, 595.474 sec elapsed
[10:41:29] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=18
[10:41:29] [355]
[10:41:29] boosting round 356, 596.287 sec elapsed
[10:41:30] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=18
[10:41:30] [356]
[10:41:30] boosting round 357, 597.343 sec elapsed
[10:41:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=25
[10:41:31] [357]
[10:41:31] boosting round 358, 597.991 sec elapsed
[10:41:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 508 extra nodes, 0 pruned nodes, max_depth=29
[10:41:32] [358]

@guolinke
Copy link
Collaborator Author

guolinke commented Jan 14, 2017

@Laurae2

2 * E5-2680 v2, DDR3 1600Mhz, 256GB (the memory speed will affect to training speed as well)
run by 16 threads

You also can run LightGBM by same setting and see what happens.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

@guolinke I also use DDR3 1600 MHz (64GB in my case).

My benchmarks on Higgs:

  • xgboost fast, 1 thread: 1136.12s
  • xgboost fast, 6 threads: 488.22s
  • xgboost fast, 12 threads: 477.86s
  • lightgbm, 1 thread: 2437.73s
  • lightgbm, 6 threads: 592.61s
  • lightgbm, 12 threads: 546.24s

Something I don't understand is this when I use xgboost:

3086062x28 matrix with 86409724 entries loaded from ../data/higgs.train

While you have:

10500000x28 matrix with 294000000 entries loaded from ../data/higgs.train

I'm using Python 3.5, so the original Python script to create the libsvm files does not work. Instead I'm using this:

import os

input_filename = "HIGGS.csv"
output_train = "higgs.train"
output_test = "higgs.test"

num_train = 10500000

read_num = 0

input = open(input_filename, "r")
train = open(output_train, "w")
test = open(output_test,"w")

def WriteOneLine(tokens, output):
	label = int(float(tokens[0]))
	output.write(str(label))
	for i in range(1,len(tokens)):
		feature_value = float(tokens[i])
		output.write(' ' + str(i-1) + ':' + str(feature_value))
	output.write('\n')

line = input.readline()

while line:
	tokens = line.split(',')
	if read_num < num_train:
		WriteOneLine(tokens, train)
	else:
		WriteOneLine(tokens, test)
	read_num += 1
	if (read_num % 1000 == 0):
		print(read_num)
	line = input.readline()

input.close()
train.close()
test.close()

It does goes through the 11M files:

$ wc -l HIGGS.csv
11000000 HIGGS.csv
$ wc -l HIGGS.train
10500000 HIGGS.train

Is it a normal behavior? My higgs.train is 6,082,744,083 bytes, HIGGS.csv is 8,035,497,980 bytes. Downloaded and created the libsvm files 3 times to triple check, same result.

higgs.train SHA-256:

image

First line of my HIGGS.train and HIGGS.csv:

$ head -n 1 HIGGS.train
1 0:0.869293212890625 1:-0.6350818276405334 2:0.22569026052951813 3:0.327470064163208 4:-0.6899932026863098 5:0.7542022466659546 6:-0.24857313930988312 7:-1.0920639038085938 8:0.0 9:1.3749921321868896 10:-0.6536741852760315 11:0.9303491115570068 12:1.1074360609054565 13:1.138904333114624 14:-1.5781983137130737 15:-1.046985387802124 16:0.0 17:0.657929539680481 18:-0.010454569943249226 19:-0.0457671694457531 20:3.101961374282837 21:1.353760004043579 22:0.9795631170272827 23:0.978076159954071 24:0.9200048446655273 25:0.7216574549674988 26:0.9887509346008301 27:0.8766783475875854
$ head -n 1 HIGGS.csv
1.000000000000000000e+00,8.692932128906250000e-01,-6.350818276405334473e-01,2.256902605295181274e-01,3.274700641632080078e-01,-6.899932026863098145e-01,7.542022466659545898e-01,-2.485731393098831177e-01,-1.092063903808593750e+00,0.000000000000000000e+00,1.374992132186889648e+00,-6.536741852760314941e-01,9.303491115570068359e-01,1.107436060905456543e+00,1.138904333114624023e+00,-1.578198313713073730e+00,-1.046985387802124023e+00,0.000000000000000000e+00,6.579295396804809570e-01,-1.045456994324922562e-02,-4.576716944575309753e-02,3.101961374282836914e+00,1.353760004043579102e+00,9.795631170272827148e-01,9.780761599540710449e-01,9.200048446655273438e-01,7.216574549674987793e-01,9.887509346008300781e-01,8.766783475875854492e-01

@guolinke
Copy link
Collaborator Author

@Laurae2 what is the data information output by your lightGBM? https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_higgs_speed.log#L4
If it is 10500000 * 28. This file is correct.
If not, maybe you are using wrong parameter(e.g default bagging?) for xgboost.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

@guolinke I have the same exact line as yours.

https://github.com/Laurae2/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_higgs_speed_01.log#L4

My params are also identical to yours, except I changed:

  • xgboost file path
  • number of threads

Default bagging in xgboost is 1.00 (use all data). I compiled xgboost from source. When I run your sh code as is (after removing all the things I can't run), I get the same issue.

My lines 3086061 to 3086063 on higgs.train do not seem malformed, I don't get why xgboost does not want to go any further, it's really strange:

$ sed "3086061q;d" higgs.train
1 0:0.718676745891571 1:-1.681124210357666 2:0.8171724081039429 3:0.7752423286437988 4:0.7791789174079895 5:1.2763633728027344 6:-1.0219426155090332 7:-1.028310775756836 8:2.1730761528015137 9:1.1680066585540771 10:-0.8333783149719238 11:0.6767918467521667 12:2.214872121810913 13:1.0481081008911133 14:-0.4995536506175995 15:-1.4125559329986572 16:0.0 17:0.9711945652961731 18:0.25355038046836853 19:0.16131795942783356 20:0.0 21:1.2038894891738892 22:1.117095708847046 23:0.9722089767456055 24:1.2790273427963257 25:1.8023028373718262 26:1.0590590238571167 27:0.9368829131126404
$ sed "3086062q;d" higgs.train
1 0:0.47252947092056274 1:-0.3263337016105652 2:-0.512377917766571 3:1.7292355298995972 4:-0.6069088578224182 5:1.1556251049041748 6:-2.0052411556243896 7:1.2037097215652466 8:0.0 9:0.585577130317688 10:-0.7906379103660583 11:-1.1281750202178955 12:2.214872121810913 13:0.3222651481628418 14:-1.3442645072937012 15:1.2070095539093018 16:2.548224449157715 17:1.0775635242462158 18:-0.7874786257743835 19:0.2550981044769287 20:0.0 21:1.6085612773895264 22:0.8385911583900452 23:0.9832923412322998 24:0.9576324820518494 25:0.42687857151031494 26:0.7787665128707886 27:1.3860973119735718
$ sed "3086063q;d" higgs.train
0 0:0.7380757331848145 1:-0.917532742023468 2:-0.3181764781475067 3:1.739646553993225 4:-0.41811928153038025 5:0.7752718329429626 6:0.032652150839567184 7:0.9170976877212524 8:0.0 9:1.2407790422439575 10:-0.7255558371543884 11:1.4668693542480469 12:1.1074360609054565 13:1.0872716903686523 14:0.4216180145740509 15:-1.2772005796432495 16:1.2741122245788574 17:0.49582037329673767 18:2.2756452560424805 19:0.9787034392356873 20:0.0 21:1.703389286994934 22:1.3028173446655273 23:1.0018014907836914 24:1.3815209865570068 25:1.1019604206085205 26:1.150844931602478 27:1.1618263721466064

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 14, 2017

I went creating the matrix (binary format) for xgboost using R. All my tries using a libsvm format ended with nearly the same issue (it's just the row count changing in xgboost, I tried Python 2.7 and 3.5...).

Now xgboost works properly and matches your runs for AUC. I have to test for speed after.


My new results.

Setup:

  • Intel i7-3930K
  • 64GB RAM, 1600 MHz

Speed:

  • xgboost fast, 1 thread: 3600.66s
  • xgboost fast, 6 threads: 1147.86s
  • xgboost fast, 12 threads: 916.89s
  • lightgbm, 1 thread: 2437.73s
  • lightgbm, 6 threads: 592.61s
  • lightgbm, 12 threads: 546.24s

AUC:

  • xgboost fast: 0.845617
  • lightgbm: 0.845319

Running depthwise soon.

@wxchan
Copy link
Contributor

wxchan commented Jan 15, 2017

@guolinke xgboost depthwise is faster, we can compare best performance of xgboost to lightgbm.

Anyway, is there any conclusion here? According to the new updated graphs in dmlc/xgboost#1950 , xgboost has really good performance with allstate dataset.

@guolinke
Copy link
Collaborator Author

@wxchan
depth-wise is always faster, due to it pass much fewer #data when grows same #leaves.

Xgboost's algorithm is better for sparse data, And LightGBM is better for dense data.
However, xgboost's algorithm need much temporary space when #theards grows, this limit its speed-up in multi-threading.
And for allstate dataset, it is all one-hot features, so lightgbm actually can use categorical feature support to achieve speed-up.

And I will try to reduce time cost for the sparse feature in LightGBM as well.

@guolinke
Copy link
Collaborator Author

@Laurae2
can you try this #216 for you Bosch dataset?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 15, 2017

@guolinke Will test for Bosch dataset.

@wxchan I got reverted results for depthwise (I could test for Bosch too if needed). See table below on Higgs:


Speed:

Model Mode Threads Speed (s)
xgboost Fast + Lossguide 1 3600.66
xgboost Fast + Lossguide 6 1147.86
xgboost Fast + Lossguide 12 916.89
xgboost Fast + Depthwise 1 3891.74
xgboost Fast + Depthwise 6 1312.95
xgboost Fast + Depthwise 12 1038.30
LightGBM None 1 2437.73
LightGBM None 6 592.61
LightGBM None 12 546.24

AUC:

Model Mode AUC
xgboost Fast + Lossguide 0.845617
xgboost Fast + Depthwise 0.843080
LightGBM None 0.845319

@wxchan
Copy link
Contributor

wxchan commented Jan 15, 2017

@Laurae2 your result is actually same. I read your log, for 100 iterations, depthwise: 227.867, lossguide: 278.813. As that thread said, it's tested only with first dozen of iterations. (dmlc/xgboost#1950 (comment))

@Allardvm
Copy link
Contributor

Just chiming in to note that, although comparing performance with XGBoost set at tree_method = hist and grow_method = lossguide seems fair, we should keep in mind that XGBoost has some additional processing to handle missing values (see #122). This can have a rather large impact on performance, so we shouldn't expect XGBoost to come out on top in terms of speed.

@guolinke
Copy link
Collaborator Author

guolinke commented Jan 23, 2017

@Allardvm The most time cost part in histogram algorithm is the histogram building.
The best split finding in histogram actually cost very small. (about O(#bin), #bin often equal with 255 )
So even with two round best split finding, it actually doesn't impact on speed.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 23, 2017

@Allardvm for a feature, xgboost only tests NA values aggregated against lowest and highest value. It's negligible, even with 99% sparsity.

@guolinke
Copy link
Collaborator Author

close now. will give a new comparison based on LightGBM v2.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants