Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lgbm.cv example #4

Closed
zachmayer opened this issue Dec 10, 2016 · 8 comments
Closed

lgbm.cv example #4

zachmayer opened this issue Dec 10, 2016 · 8 comments
Assignees

Comments

@zachmayer
Copy link

I'm having trouble figuring out the example in ?lgbm.cv. It looks like it's on the housing price dataset, but I'm not 100% sure. When I try to run it, I get the following error:

Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

Do you have a working example that runs on your machine I could try out, to make sure my installation is working?

@Laurae2
Copy link
Owner

Laurae2 commented Dec 10, 2016

I think the example is outdated, not sure, I have to check that. I am going to add the code below as the example in my next push.

Try this (change the working directory on line 8 and LGBM on line 32):

library(Laurae)
library(stringi)
library(Matrix)
library(sparsity)
library(data.table)

remove(list = ls()) # WARNING: CLEANS EVERYTHING IN THE ENVIRONMENT
setwd("D:/Data Science/HousePrices") # CHANGE THIS TO WHATEVER TEMPORARY DIRECTORY WHERE YOU WANT TEMPORARY FILES

DT <- data.table(Split1 = c(rep(0, 50), rep(1, 50)), Split2 = rep(c(rep(0, 25), rep(0.5, 25)), 2))
DT$Split3 <- rep(c(rep(0, 10), rep(0.25, 15)), 4)
DT$Split4 <- rep(c(rep(0, 5), rep(0.1, 5), rep(0, 5), rep(0.1, 10)), 4)
DT$Split5 <- rep(c(rep(0, 5), rep(0.05, 5), rep(0, 10), rep(0.05, 5)), 4)
label <- c(rep(0, 25), rep(1, 25), rep(0, 25), rep(1, 25))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0) & (DT$Split4 == 0) | ((DT$Split2 == 0.5) & (DT$Split1 == 1) & (DT$Split3 == 0.25) & (DT$Split4 == 0.1) & (DT$Split5 == 0)) | ((DT$Split1 == 0) & (DT$Split2 == 0.5)))

trained <- lgbm.cv(y_train = label,
                   x_train = DT,
                   bias_train = NA,
                   folds = 5,
                   unicity = TRUE,
                   application = "binary",
                   num_iterations = 1,
                   early_stopping_rounds = 1,
                   learning_rate = 5,
                   num_leaves = 16,
                   min_data_in_leaf = 1,
                   min_sum_hessian_in_leaf = 1,
                   tree_learner = "serial",
                   num_threads = 1,
                   lgbm_path = "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe",
                   workingdir = file.path(getwd()),
                   validation = FALSE,
                   files_exist = FALSE,
                   verbose = TRUE,
                   is_training_metric = TRUE,
                   save_binary = TRUE,
                   metric = "binary_logloss")

str(trained)

I am getting this output:

***************  
Fold no:  1 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:44 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000052 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:45 PM  
  
  
***************  
Fold no:  2 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:45 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000140 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000076 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:46 PM  
  
  
***************  
Fold no:  3 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:47 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000151 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000050 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:48 PM  
  
  
***************  
Fold no:  4 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:48 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000135 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000070 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:49 PM  
  
  
***************  
Fold no:  5 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:49 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000055 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:50 PM

and

List of 3
 $ Models    :List of 5
  ..$ 1:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 2:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 3:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 4:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 5:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
 $ Validation:List of 2
  ..$ : num [1:100] 1 1 1 1 1 ...
  ..$ :List of 5
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
 $ Weights   : num [1:5] 0.2 0.2 0.2 0.2 0.2

@zachmayer
Copy link
Author

Thanks!

@zachmayer zachmayer reopened this Dec 10, 2016
@zachmayer
Copy link
Author

(You can close this if you want or leave it open)

@zachmayer
Copy link
Author

Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?

@Laurae2
Copy link
Owner

Laurae2 commented Dec 10, 2016

I fixed the LightGBM functions' documentation in commit @4fe8e2b35acabbe8979cd3181dca8f004a03ee38.

Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?

Your LightGBM should be on the same directory as your LightGBM download.

You can find out where it has been compiled using this on your LightGBM path:

ls -d */

If you installed in a folder named "(...)/LightGBM" path, it should the lgbm_path should be "(...)/LightGBM/lightgbm" (unless my memory is wrong - it must create the executable in the root directory of the folder - you do not need to specify the extension, the shell takes automatically care of it).

@zachmayer
Copy link
Author

I didn't even have lightgbm installed! lol. So for future reference, this error means lightgbm isn't installed, or you're pointing at the wrong path:

Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

Laurae2 added a commit that referenced this issue Dec 13, 2016
@mik3hall
Copy link

I also got this by omitting the path.

***************  
Fold no:  1 / 5  
***************  
Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

I installed on OS X as shown here...
cannot install lightgbm in R with devtools on macOS
Doing the R install as shown there with...
R CMD INSTALL --build . --no-multiarch
I believe this installs to the default R package location as shown by...

> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"
system("ls -l /Library/Frameworks/R.framework/Versions/3.4/Resources/library/lightgbm")
total 32
-rw-rw-r--   1 mjh  admin  2027 Jun 23 17:48 DESCRIPTION
-rw-rw-r--   1 mjh  admin  2044 Jun 23 17:50 INDEX

Might it be possible to make the .libPaths() location the default path?
I just tried...
lgbm_path = '/Library/Frameworks/R.framework/Versions/3.4/Resources/library',
and got...

***************  
Fold no:  1 / 5  
***************  
done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)                                                                              
Saving validation data (data.table) file to: /Users/mjh/ml/kaggle/HomeCredit/code/lgbm_val_1.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 3 12 12 12 12 3 5 5 5 5 12 12 12 12 12 5 3 5 5 3 5 3 3 12 5 3 3 12 3 12 ... 5 5 5 5 3 5 5 5 5 5 
maxLineLen=1559 from sample. Found in 0.016s
Writing column names ... done in 0.000s
Writing 61502 rows in 23 batches of 2690 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)
Starting to work on model as of Tue Jun 26 2018 08:46:11  
/bin/sh: /Library/Frameworks/R.framework/Versions/3.4/Resources/library: is a directory
Model completed, results saved in /Users/mjh/ml/kaggle/HomeCredit/code  
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file '/Users/mjh/ml/kaggle/HomeCredit/code/lgbm_model_1.txt': No such file or directory

It successfully wrote the .conf and train_1.csv and val_1.csv files. I'm not sure waht the other errors are about where it appears to look for a /bin/sh type executable or has the connection failure with no model_1.txt.

@tarunparmar
Copy link

tarunparmar commented Dec 24, 2018

The lgbm_path in mac was the location of unix executable that you build from source.
In my case I had it in my downloads folder so the lgbm_path value would be something like "/Downloads/LightGBM/lightgbm"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants