Does this cause core dump ? #16

chenglin · 2022-03-28T07:49:55Z

Recently, I find that one of my model will cause core dump if I use lleaves for predict.

I am confused about two functions below.

In codegen.py, function param type can be int* if param is categorical

def make_tree(tree):
    # declare the function for this tree
    func_dtypes = (INT_CAT if f.is_categorical else DOUBLE for f in tree.features)
    scalar_func_t = ir.FunctionType(DOUBLE, func_dtypes)
    tree_func = ir.Function(module, scalar_func_t, name=str(tree))
    tree_func.linkage = "private"
    # populate function with IR
    gen_tree(tree, tree_func)
    return LTree(llvm_function=tree_func, class_id=tree.class_id)

But in data_processing.py with predict used, all feature param are convert to double*

def ndarray_to_ptr(data: np.ndarray):
    """
    Takes a 2D numpy array, converts to float64 if necessary and returns a pointer

    :param data: 2D numpy array. Copying is avoided if possible.
    :return: pointer to 1D array of dtype float64.
    """
    # ravel makes sure we get a contiguous array in memory and not some strided View
    data = data.astype(np.float64, copy=False, casting="same_kind").ravel()
    ptr = data.ctypes.data_as(POINTER(c_double))
    return ptr

Is this just like

int* predict(int* a, double* b);
double a = 1.1;
double b = 2.2;
predict(&a, &b);

Does this will happy in lleaves?

The text was updated successfully, but these errors were encountered:

siboehm · 2022-03-29T16:55:26Z

TLDR: It's possible that there's a bug that causes a segfault, though it's unlikely that this is happening in the parts of the code you're pointing to.

For diagnosing the segfault: Could you run a minimally reproducing example with gdb to see which instruction triggers the segfault? There used to be an issue with overflows for very large datasets, but I fixed that a few months ago. If there's any way you can have a self-contained, minimally reproducible sample and send it to me (email is fine), I'd love to help you out.

Regarding the categorical data: The relevant function is actually this one:

lleaves/lleaves/compiler/codegen/codegen.py

Line 42 in 9784625

def gen_forest(forest, module, fblocksize):

This is the function in the binary that lleaves calls from Python (using two double pointers). The categorical features are then cast to ints in the core loop here:

lleaves/lleaves/compiler/codegen/codegen.py

Line 205 in 9784625

args.append(builder.fptosi(el, INT_CAT))

Most of the processing of the Pandas dataframes follows LightGBM very closely. This double to int casting is a bit strange, but I wanted to follow LightGBM as closely as possible. It works since LightGBM doesn't allow categoricals > 2^31-1 (max int 32), but double can represent any int up to 2^53 and lower without loss of precision.

chenglin · 2022-03-30T05:59:10Z

I find that if categorical feature is numerical value, we can get rid of the code df[categorical_feature] = df[categorical_feature].astype('category') when prepared training data. We can just call lightgbm train function by set param categorical_feature=categorical_feature. In model file trained like this, pandas_categorical is null. May this issue related to this?

When I retrained a model that pandas_categorical is not null, the core dump disappeared.

PR: return empty list if pandas_categorical is null in model file
BTW, I think we show keep pandas_categorical = None, when pandas_categorical: null in the model file.

siboehm · 2022-04-03T16:13:37Z

I'm having trouble understanding this issue. Could you write up a minimally reproducible example of the core dump / send me the model.txt that causes it?

siboehm closed this as completed Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this cause core dump ? #16

Does this cause core dump ? #16

chenglin commented Mar 28, 2022

siboehm commented Mar 29, 2022

chenglin commented Mar 30, 2022

siboehm commented Apr 3, 2022

Does this cause core dump ? #16

Does this cause core dump ? #16

Comments

chenglin commented Mar 28, 2022

siboehm commented Mar 29, 2022

chenglin commented Mar 30, 2022

siboehm commented Apr 3, 2022