New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lgb.cv is constantly crashing in R #2357
Comments
It seems the memory is not enough |
@guolinke could you elaborate? I looked at the code around https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L676 but can't figure out how it's possible to violate that check. As a reminder (it's been almost 3 years since you rote this code in #86, according to the blame 😂 ), it has this comment in class
|
refer to : LightGBM/src/treelearner/serial_tree_learner.cpp Lines 49 to 72 in 5f5dcff
I think it should not fail (unless the number of leaves = 1) |
but num_leaves should LightGBM/src/io/config_auto.cpp Lines 298 to 299 in 0dfda82
|
I was able to reproduce this. The bug is triggered by the
Here the java program I used to reproduce import com.microsoft.ml.lightgbm.*;
public class issue_2357 {
private static void validate(int result, String component) throws RuntimeException {
if (result == -1) {
throw new RuntimeException(component + " call failed in LightGBM with error: " + lightgbmlib.LGBM_GetLastError());
}
}
public static void main(String[] args) throws Exception {
try {
String osName = System.getProperty("os.name").toLowerCase();
if (osName.startsWith("mac os x")) {
String prefix = System.getProperty("user.home") + "/src/LightGBM/";
System.load(prefix + "lib_lightgbm.dylib");
System.load(prefix + "lib_lightgbm_swig.jnilib");
} else {
System.load("/src/LightGBM/lib_lightgbm.so");
System.load("/src/LightGBM/lib_lightgbm_swig.so");
}
} catch (UnsatisfiedLinkError e) {
System.err.println(e.getMessage());
e.printStackTrace();
return;
}
int numRow = 1000;
int numCols = 79;
System.out.println("allocating data");
double[][] data = new double[numRow][numCols];
for (int i = 0; i < data.length; i++) {
for (int j = 0; j < data[i].length; j++) {
if (Math.random() < .2) {
data[i][j] = Math.random();
}
}
}
System.out.println("generating dataset");
SWIGTYPE_p_p_void dataset = generateDenseDataset(numRow, data);
SWIGTYPE_p_void dataset_handle = lightgbmlib.voidpp_value(dataset);
for (int i = 0; i < 1000; i++) {
try {
System.out.println("CREATE BOOSTER");
SWIGTYPE_p_p_void boosterOutPtr = lightgbmlib.voidpp_handle();
validate(lightgbmlib.LGBM_BoosterCreate(
dataset_handle,
"max_depth=" + i,
boosterOutPtr),
"Booster LGBM_BoosterCreate");
} catch (Exception e) {
System.err.println("Failed on " + i + " (" + String.format("%08x", i) + ")");
}
}
System.out.println("Done");
}
private static SWIGTYPE_p_double generateData(int numRows, double[][] rowsAsDoubleArray) {
int numCols = rowsAsDoubleArray[0].length;
SWIGTYPE_p_double data = lightgbmlib.new_doubleArray(numCols * numRows);
for (int i = 0; i < numRows; i++) {
for (int j = 0; j < rowsAsDoubleArray[i].length; j++) {
lightgbmlib.doubleArray_setitem(data, i * numCols + j, rowsAsDoubleArray[i][j]);
}
}
return data;
}
private static SWIGTYPE_p_p_void generateDenseDataset(int numRows, double[][] rowsAsDoubleArray) throws RuntimeException {
int numCols = 79;
int isRowMajor = 1;
SWIGTYPE_p_p_void datasetOutPtr = lightgbmlib.voidpp_handle();
String datasetParams = "max_bin=255";
int data64bitType = lightgbmlibConstants.C_API_DTYPE_FLOAT64;
SWIGTYPE_p_double data = generateData(numRows, rowsAsDoubleArray);
validate(lightgbmlib.LGBM_DatasetCreateFromMat(
lightgbmlib.double_to_voidp_ptr(data),
data64bitType,
numRows,
numCols,
isRowMajor, datasetParams, null, datasetOutPtr),
"Dataset create");
lightgbmlib.delete_doubleArray(data);
return datasetOutPtr;
}
} |
There are two integer overflows in config.cpp The static_cast is overflowing because its casting a double with Line 310 in b310fb4
And the bitshift is wrapping around the integer multiple times for max_depth > 29 Line 315 in b310fb4
For the first one, we can leave For the second one, we might need to check if an overflow would be caused
There is also an inconsistency in the num_leaves calculation. I'm not sure if this is intentional or not
|
Thanks @chris-smith-zocdoc |
@guolinke Where do you think we should add validation? Should I add it to this function?
The regression was introduced in #2216 to fix #2215 I can submit a pr if you would like. |
Unfortunately, my R Session is always aborted after a few seconds when running lgb.cv. I call lgb.cv several times since it's part of a scoring function I use to do Bayesian Optimization and hence find the optima hyper parameters. However, the error does seem to come from lightgbm.
In RStudio, I get the following error:
[LightGBM] [Fatal] Check failed: cache_size >= 2 at /private/var/folders/6q/.../T/RtmpGrFClE/R.INSTALL10ba5306a71e/lightgbm/src/src/treelearner/feature_histogram.hpp, line 676.
If you have any idea what is this supposed to mean and how I could solve the issue, I would highly appreciate your help.
The text was updated successfully, but these errors were encountered: