Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100 million data results #14

Open
Laurae2 opened this issue May 1, 2019 · 10 comments
Open

100 million data results #14

Laurae2 opened this issue May 1, 2019 · 10 comments

Comments

@Laurae2
Copy link

Laurae2 commented May 1, 2019

Using hardware from here: #12

Using dmlc/xgboost@84d992b and microsoft/LightGBM@5ece53b

100M obtained using 10x 10m data.

CPU:

?gb Size Speed (s) AUC
xgb 0.1m 4.181 0.7324224
xgb 1m 15.978 0.7494959
xgb 10m 104.598 0.7551197
xgb 100m 673.861 irrelevant
lgb 0.1m 1.763 0.7298355
lgb 1m 4.253 0.7636987
lgb 10m 38.197 0.7742033
lgb 100m 599.396 irrelevant

1x Quadro P1000:

?gb Size Speed (s) AUC
xgb 0.1m 17.529 0.7328954
xgb 1m 38.528 0.7499591
xgb 10m 103.154 0.7564821
xgb 100m CRASH irrelevant
lgb 0.1m 18.345 0.7298129
lgb 1m 22.179 0.7640155
lgb 10m 62.929 0.774168
lgb 100m 396.233 irrelevant

4x Quadro P1000:

?gb Size Speed (s) AUC
xgb 0.1m 18.838 0.7324756
xgb 1m 36.877 0.749169
xgb 10m 64.994 0.7564492
xgb 100m 232.947 irrelevant

RAM usage:

LightGBM: 2739 MB on GPU
xgboost 1 GPU: CRASH
xgboost 4 GPUs: 2077 MB on each GPU

@RAMitchell
Copy link

How many columns in your data set? Is it sparse? If so what is the percentage of missing values.

@szilard
Copy link
Owner

szilard commented May 2, 2019

Subsample of the airline dataset, 8 columns (though after one-hot-encoding ~700 sparse), no missing values.

@szilard
Copy link
Owner

szilard commented May 3, 2019

Also h2o and catboost (CPU) on r4.8xlarge (32 cores, 1 NUMA, only physical cores/no HT):


docker run -it gbmperf_cpu taskset -c 0-15 R


--------

d_train <- fread("train-10m.csv", showProgress=FALSE)
d_test <- fread("test.csv", showProgress=FALSE)
## 1.5G

d_train <- rbindlist(list(d_train, d_train, d_train, d_train, d_train, d_train, d_train, d_train, d_train, d_train))
## 7.5G / gc 7G

X_train_test <- sparse.model.matrix(dep_delayed_15min ~ .-1, data = rbindlist(list(d_train, d_test)))
## slow***
## 47G / gc 18G

n1 <- nrow(d_train)
n2 <- nrow(d_test)
X_train <- X_train_test[1:n1,]
X_test <- X_train_test[(n1+1):(n1+n2),]
## 29G

dxgb_train <- xgb.DMatrix(data = X_train, label = ifelse(d_train$dep_delayed_15min=='Y',1,0))
## 46G / gc 35G

rm(d_train, X_train_test, X_train)
## gc 10G


  md <- xgb.train(data = dxgb_train, 
## max 25G / gc 25G

683.275 >

0.756058

--------

dlgb_train <- lgb.Dataset(data = X_train, label = ifelse(d_train$dep_delayed_15min=='Y',1,0))
rm(d_train, X_train_test, X_train)
## gc 13G

  md <- lgb.train(data = dlgb_train, 
## max 18G / gc 9GB

416.943 >

0.7742657

--------

dx_train0 <- h2o.importFile("train-10m.csv")
dx_train <- h2o.rbind(dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0)
dx_test <- h2o.importFile("test.csv")
## 4G (h2o compresses the data)

 md <- h2o.gbm(x = Xnames, y = "dep_delayed_15min", training_frame = dx_train, 
## max 12G

882.199 >

0.7747782

-------

d_train <- d_train_test[(1:nrow(d_train)),]
d_test <-  d_train_test[(nrow(d_train)+1):(nrow(d_train)+nrow(d_test)),]
## 12G / gc 8G

dx_train <- catboost.load_pool(d_train[,1:p], label = d_train$dep_delayed_15min)
dx_test  <- catboost.load_pool(d_test[,1:p])
## 29G / gc 15G

rm(d_train, d_train_test)
## 8G

 md <- catboost.train(learn_pool = dx_train, test_pool = NULL, params = params)
## max 150GB
## 18G / gc 18G

5420.555 >

0.7229581  ??!!

UPDATE 2020-09-08 catboost: run time 930sec, RAM data ~5GB, RAM train max ~50GB, RAM after ~5GB, AUC 0.7358616

CPU:

  time AUC RAM data RAM train max RAM train end/gc
h2o 880 0.775 4 12 12
xgboost 680 0.751 10 25 25/25
lightgbm 420 0.774 13 18 18/9 (<13 ?!)
catboost 5400 0.723?! 8 150 18/18

RAM train end/gc is when training ends and then after calling gc() in R

@szilard
Copy link
Owner

szilard commented May 3, 2019

GPU on p3.8xlarge (4 GPUs, but only 1 GPU used!) (needed larger than p3.2xlarge with 1 GPU because data reading/prep did not fit in 60GB RAM)


nvidia-docker run --rm -it gbmperf_gpu R

--------

xgboost

warmup:
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |  1042 / 16160 MB | root(1032M)
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |  1042 / 16160 MB | root(1032M)


[0] Tesla V100-SXM2-16GB | 39'C,  82 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  78 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  71 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  76 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  83 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  79 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  73 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  81 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  76 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  82 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  74 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  81 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  83 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  79 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  73 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  75 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  80 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  82 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  76 % |  4768 / 16160 MB | root(4758M)
[0] Tesla V100-SXM2-16GB | 39'C,  76 % |  7462 / 16160 MB | root(7452M)
[0] Tesla V100-SXM2-16GB | 42'C,  99 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 43'C,  94 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 43'C,  98 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 43'C,  98 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,  98 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,  94 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,  94 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 45'C,  94 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 45'C,  96 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,  98 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 45'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 45'C,  99 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  96 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  96 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  94 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 46'C,  97 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 47'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 47'C,  95 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,   0 % |  6672 / 16160 MB | root(6662M)
[0] Tesla V100-SXM2-16GB | 44'C,   0 % |  6672 / 16160 MB | root(6662M)

83.678 >

0.7556472

-----------

h2o

warmup
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |   444 / 16160 MB | root(434M)
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |   444 / 16160 MB | root(434M)


[0] Tesla V100-SXM2-16GB | 40'C,   0 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 42'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 43'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C,  68 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  11 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  31 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  93 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  62 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  95 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 44'C,  35 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 46'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C,   0 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C,  60 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C,  46 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 45'C,  98 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 47'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 48'C, 100 % |  4246 / 16160 MB | root(4236M)
[0] Tesla V100-SXM2-16GB | 48'C, 100 % |  4246 / 16160 MB | root(4236M)

RAM usage increases though from 6GB to 36GB

270.292 >

0.7546421

-----------------

lightgbm

0] Tesla V100-SXM2-16GB | 38'C,   7 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   6 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   4 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   3 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   3 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   5 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   1 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   4 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,  41 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   8 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   7 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 38'C,   1 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 39'C,  20 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 39'C,   7 % |  3070 / 16160 MB | root(3058M)
[0] Tesla V100-SXM2-16GB | 39'C,   3 % |  3070 / 16160 MB | root(3058M)

RAM usage increases though from 18G to 24G
also uses all CPU cores 100% 

404.215 >

0.7737832


-----------

catboost

warmup:
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 37'C,   1 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 38'C,   0 % |   452 / 16160 MB | root(442M)


[0] Tesla V100-SXM2-16GB | 39'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % |   452 / 16160 MB | root(442M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,  27 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,  22 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,  23 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,  51 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,  19 % | 15376 / 16160 MB | root(15366M)
[0] Tesla V100-SXM2-16GB | 39'C,   0 % | 15376 / 16160 MB | root(15366M)

RAM usage increases though from 8G to 22G


Application terminated with error: ??+0 (0x7F7F3B86FD32)
??+0 (0x7F7F3B86DDBE)
??+0 (0x7F7F3B86CAB5)
??+0 (0x7F7F3B86D4C8)
??+0 (0x7F7F3AAC2823)
??+0 (0x7F7F3AAC26D7)
??+0 (0x7F7F44EA56BA)
clone+109 (0x7F7F44BDB41D)

(NCudaLib::TOutOfMemoryError) catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:302: Error: Out of memory. Requested 381.4697266 MB; Free 283.5054197 MB
uncaught exception:
    address -> 0x1b842d008
    what() -> "catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:302: Error: Out of memory. Requested 381.4697266 MB; Free 283.5054197 MB"
    type -> NCudaLib::TOutOfMemoryError

 *** caught segfault ***
address (nil), cause 'unknown'

GPU:

  time AUC GPU mem max extra RAM
h2o xgboost 270 0.755 4 30
xgboost 80 0.756 6 0
lightgbm 400 0.774 3 6
catboost crash (OOM)      

@szilard
Copy link
Owner

szilard commented May 3, 2019

4x GPU on p3.8xlarge:


xgboost

> cat(system.time({
+   md <- xgb.train(data = dxgb_train_WUP,
+             objective = "binary:logistic",
+             nround = 1, max_depth = 10, eta = 0.1,
+             tree_method = "gpu_hist", n_gpus = 4)
+ })[[3]]," ",sep="")
Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) :
  [16:24:41] /xgboost/src/tree/updater_gpu_hist.cu:1407: Exception in gpu_hist: [16:24:41] /xgboost/src/tree/../common/device_helpers.cuh:864: Check failed: device_ordinals.size() == 1 (4 vs. 1) : XGBoost must be compiled with NCCL to use more than one GPU.
Stack trace:
  [bt] (0) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f9c50c968c2]
  [bt] (1) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dh::AllReducer::Init(std::vector<int, std::allocator<int> > const&)+0x159) [0x7f9c50ed7699]
  [bt] (2) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::InitDataOnce(xgboost::DMatrix*)+0x10c) [0x7f9c50eeb0bc]
  [bt] (3) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<f


----------------


catboost

ip-172-31-34-40  Fri May  3 15:51:10 2019
[0] Tesla V100-SXM2-16GB | 39'C,  45 % | 15376 / 16160 MB | root(15366M)
[1] Tesla V100-SXM2-16GB | 40'C,  11 % | 15376 / 16160 MB | root(15366M)
[2] Tesla V100-SXM2-16GB | 38'C,  45 % | 15376 / 16160 MB | root(15366M)
[3] Tesla V100-SXM2-16GB | 39'C,  45 % | 15376 / 16160 MB | root(15366M)
ip-172-31-34-40  Fri May  3 15:51:11 2019
[0] Tesla V100-SXM2-16GB | 40'C,  63 % | 15376 / 16160 MB | root(15366M)
[1] Tesla V100-SXM2-16GB | 40'C,  44 % | 15376 / 16160 MB | root(15366M)
[2] Tesla V100-SXM2-16GB | 38'C,  51 % | 15376 / 16160 MB | root(15366M)
[3] Tesla V100-SXM2-16GB | 39'C,   6 % | 15376 / 16160 MB | root(15366M)

Application terminated with error: ??+0 (0x7FF2319C2D32)
??+0 (0x7FF2319C0DBE)
??+0 (0x7FF2319BFAB5)
??+0 (0x7FF2319C04C8)
??+0 (0x7FF230C15823)
??+0 (0x7FF230C156D7)
??+0 (0x7FF23AFF86BA)
clone+109 (0x7FF23AD2E41D)

(NCudaLib::TOutOfMemoryError) catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:302: Error: Out of memory. Requested 381.4697266 MB; Free 266.8567867 MB
uncaught exception:
    address -> 0x1b8443808
    what() -> "catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:302: Error: Out of memory. Requested 381.4697266 MB; Free 266.8567867 MB"
    type -> NCudaLib::TOutOfMemoryError


TODO: XGBoost must be compiled with NCCL to use more than one GPU.

catboost crashes (OOM) even on 4 GPUs.

Note: lightgbm and h2o xgboost don't support multiple GPUs currently.

@szilard
Copy link
Owner

szilard commented May 3, 2019

All my prev results in 1 place:


100M records

CPU (r4.8xlarge):

  time AUC RAM train (-data)
h2o 880 0.775 8
xgboost 680 0.751 15
lightgbm 420 0.774 5
catboost 5400 0.723?! 140

GPU (Tesla V100):

  time AUC GPU mem extra RAM
h2o xgboost 270 0.755 4 30
xgboost 80 0.756 6 0
lightgbm 400 0.774 3 6
catboost crash (OOM)    

@szilard
Copy link
Owner

szilard commented May 3, 2019

CPU on m5 (faster than r4 above). m5 is prob the fastest CPU on EC2 for this because for larger data more cores matter most than high frequency CPU (m5>c5, see #13 (comment))



docker run --rm -ti gbmperf_cpu taskset -c 0-23 bash


R --vanilla < 1-h2o.R

524.348 >

0.7747512



R --vanilla < 2-xgboost.R

509.165 >

0.756058



R --vanilla < 3-lightgbm.R

312.752 >
>
0.7742657



R --vanilla < 4-catboost.R


3357.599 >
>
0.7229581






@szilard
Copy link
Owner

szilard commented May 3, 2019

100M records

CPU (m5.12xlarge):

  time [s] AUC RAM train (-data) [GB]
h2o 520 0.775 8
xgboost 510 0.751 15
lightgbm 310 0.774 5
catboost 3360 0.723 ?! 140

GPU (Tesla V100):

  time [s] AUC GPU mem [GB] extra RAM [GB]
h2o xgboost 270 0.755 4 30
xgboost 80 0.756 6 0
lightgbm 400 0.774 3 6
catboost crash (OOM)     >16 14

@szilard
Copy link
Owner

szilard commented May 3, 2019

catboost crashes for larger sizes on the GPU (mem 16GB):

size [M] time [s]
10 135
30 420
40  670
50 crash (OOM)
100 crash (OOM)

@szilard
Copy link
Owner

szilard commented Sep 9, 2020

2020-09-08 update:

catboost GPU still crashes (runs out of GPU memory):

[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 40'C,   0 % |     2 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 41'C,   6 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 41'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   3 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   1 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 41'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   6 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   9 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,  10 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,  19 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,  27 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   6 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,  10 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   7 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
[0] Tesla V100-SXM2-16GB | 42'C,   0 % | 15369 / 16160 MB | root(15367M)
07:28:48     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:28:50     all   19.14    0.00    5.13    0.00    0.00    0.00    0.00    0.00    0.00   75.73
07:28:52     all   21.02    0.00    5.77    0.00    0.00    0.00    0.00    0.00    0.00   73.21
07:28:54     all   19.91    0.00    4.57    0.00    0.00    0.00    0.00    0.00    0.00   75.52
07:28:56     all   20.03    0.00    4.88    0.00    0.00    0.00    0.00    0.00    0.00   75.09
07:28:58     all   19.07    0.00    4.63    0.00    0.00    0.00    0.00    0.00    0.00   76.30
07:29:00     all   18.77    0.00    4.82    0.00    0.00    0.00    0.06    0.00    0.00   76.35
07:29:02     all   18.12    0.00    4.26    0.00    0.00    0.00    0.00    0.00    0.00   77.62
07:29:04     all   20.68    0.00    5.89    0.00    0.00    0.00    0.00    0.00    0.00   73.43
07:29:06     all   19.79    0.00    4.20    0.00    0.00    0.00    0.00    0.00    0.00   76.02
07:29:08     all   19.30    0.00    4.87    0.00    0.00    0.00    0.06    0.00    0.00   75.77
07:29:10     all   19.84    0.00    4.88    0.00    0.00    0.00    0.06    0.00    0.00   75.22
07:29:12     all   18.91    0.00    5.01    0.00    0.00    0.00    0.00    0.00    0.00   76.08
07:29:14     all   18.82    0.00    5.14    0.00    0.00    0.00    0.00    0.00    0.00   76.04
07:29:16     all   20.29    0.00    4.95    0.25    0.00    0.00    0.00    0.00    0.00   74.51
07:29:18     all   19.71    0.00    4.38    0.00    0.00    0.00    0.00    0.00    0.00   75.91
07:29:20     all   19.01    0.00    4.94    0.00    0.00    0.00    0.00    0.00    0.00   76.05
07:29:22     all   20.11    0.00    4.64    0.00    0.00    0.00    0.00    0.00    0.00   75.25
07:29:24     all   17.48    0.00    4.70    0.00    0.00    0.00    0.00    0.00    0.00   77.82
07:29:26     all   20.09    0.00    3.63    0.00    0.00    0.00    0.00    0.00    0.00   76.28
07:29:28     all   19.46    0.00    4.51    0.00    0.00    0.00    0.00    0.00    0.00   76.03
07:29:30     all   19.27    0.00    5.19    0.00    0.00    0.00    0.00    0.00    0.00   75.53
07:29:32     all   19.55    0.00    4.45    0.00    0.00    0.00    0.00    0.00    0.00   76.00
07:29:34     all   19.11    0.00    4.89    0.00    0.00    0.00    0.00    0.00    0.00   76.00
07:29:36     all   18.72    0.00    5.32    0.00    0.00    0.00    0.00    0.00    0.00   75.95
07:29:38     all   16.38    0.00    2.38    0.00    0.00    0.00    0.06    0.00    0.00   81.19
Application terminated with error: ??+0 (0x7F5E87E21D31)
??+0 (0x7F5E87E1FC75)
??+0 (0x7F5E87E1E657)
??+0 (0x7F5E87E1F3F8)
??+0 (0x7F5E86CDE6B3)
??+0 (0x7F5E86CDE567)
??+0 (0x7F5E8B811609)
clone+67 (0x7F5E8C172103)

(NCudaLib::TOutOfMemoryError) catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:303: Error: Out of memory. Requested 381.4697266 MB; Free 161.3072262 MB
uncaught exception:
    address -> 0x106916800
    what() -> "catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:303: Error: Out of memory. Requested 381.4697266 MB; Free 161.3072262 MB"
    type -> NCudaLib::TOutOfMemoryError

 *** caught segfault ***
address (nil), cause 'unknown'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants