Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category features support #64

Open
UTimeStrange opened this issue Nov 7, 2022 · 2 comments
Open

Category features support #64

UTimeStrange opened this issue Nov 7, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@UTimeStrange
Copy link

UTimeStrange commented Nov 7, 2022

Only dataframe object is supported for the "fit" method

xgbse_model = XGBSEKaplanTree(PARAMS_TREE)
# X = xgb.DMatrix(X, enable_categorical=True)
xgbse_model.fit(X, y)

but enable_categorical is not set "True" in the source code of xgbse
it gives error :
"ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter enable_categorical must
be set to True"

def build_xgb_cox_dmatrix(X, T, E):
    """Builds a XGB DMatrix using specified Data Frame of features (X)
        arrays of times (T) and censors/events (E).

    Args:
        X ([pd.DataFrame, np.array]): Data Frame to be converted to XGBDMatrix format.
        T ([np.array, pd.Series]): Array of times.
        E ([np.array, pd.Series]): Array of censors(False) / events(True).

    Returns:
        (DMatrix): A XGB DMatrix is returned including features and target.
    """

    target = np.where(E, T, -T)

    return xgb.DMatrix(X, label=target)

The last line here does not set "enable_categorical = True". Category features are not supported? Or I just need to change the code here myself.

@UTimeStrange UTimeStrange added the enhancement New feature or request label Nov 7, 2022
@dhuang-apex
Copy link

I would be huge if we can get categorical feature support please

@frank1010111
Copy link

frank1010111 commented Apr 6, 2023

It's a little more complicated because there are other instances of xgb.DMatrix being called by the estimators. I'm working on finding and changing them for a pull request, but for now I would recommend using pandas.get_dummies to convert categorical columns to one-hot encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants