You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
categorical_encoding status_failed
Eigen FAIL
Enum FAIL
msgs_failed
"Override toEigenVec for this Algo!"
"Illegal argument(s) for XGBoost model: XGBoost_model_1647599361285_7143. Details: ERRR on field: _categorical_encoding: Enum encoding is not supported for XGBoost in current H2O.\n"
#----------- END OF OUTPUT -------
If this is the expected behaviour, the parameter's options should be updated.
Thanks!
Carlos Ortega (Spain)
The text was updated successfully, but these errors were encountered:
Tomas Fryda commented: This seems to be expected behavior, for more information see XGBoost part of [https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/categorical_encoding.html|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/categorical_encoding.html|smart-link] .
Hi,
"Enum" and "Eigen" encodings does not work for "xgboost".
It can be reproduced with this:
#--------------
library(h2o)
h2o.init()
import the airlines dataset:
This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
original data can be found at http://www.transtats.bts.gov/
airlines <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
convert columns to factors
airlines["Year"] <- as.factor(airlines["Year"])
airlines["Month"] <- as.factor(airlines["Month"])
airlines["DayOfWeek"] <- as.factor(airlines["DayOfWeek"])
airlines["Cancelled"] <- as.factor(airlines["Cancelled"])
airlines['FlightNum'] <- as.factor(airlines['FlightNum'])
set the predictor names and the response column name
predictors <- c("Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum")
response <- "IsDepDelayed"
split into train and validation
airlines_splits <- h2o.splitFrame(data = airlines, ratios = 0.8, seed = 1234)
train <- airlines_splits[[1]]
valid <- airlines_splits[[2]]
#---- Grid Search
hyper_params <- list(
categorical_encoding = c(
'OneHotExplicit', 'OneHotInternal',
'Binary', 'Eigen', 'SortByResponse',
'EnumLimited', 'Enum', 'LabelEncoder'
)
)
this example uses cartesian grid search because the search space is small
and we want to see the performance of all models. For a larger search space use
random grid search instead: list(strategy = "RandomDiscrete")
this GBM uses early stopping once the validation AUC doesn't improve by at least 0.01% for
5 consecutive scoring events
grid <- h2o.grid(
x = predictors, y = response, training_frame = train, validation_frame = valid,
algorithm = "xgboost",
grid_id = "air_xgboost", hyper_params = hyper_params,
stopping_rounds = 5, stopping_tolerance = 1e-4, stopping_metric = "AUC",
search_criteria = list(strategy = "Cartesian"), parallelism = 0,
seed = 1234)
Sort the grid models by AUC
sorted_grid <- h2o.getGrid("air_xgboost", sort_by = "auc", decreasing = TRUE)
sorted_grid
#--------------
Which produces this output:
Grid ID: air_xgboost
Used hyper parameters:
Number of models: 6
Number of failed models: 2
Hyper-Parameter Search Summary: ordered by decreasing auc
categorical_encoding model_ids auc
1 Binary XGBoost_model_1647599361285_7139 0.74622
2 LabelEncoder XGBoost_model_1647599361285_7144 0.74464
3 OneHotExplicit XGBoost_model_1647599361285_7137 0.74074
4 OneHotInternal XGBoost_model_1647599361285_7138 0.74024
5 EnumLimited XGBoost_model_1647599361285_7142 0.62515
6 SortByResponse XGBoost_model_1647599361285_7141 0.49867
Failed models
categorical_encoding status_failed
Eigen FAIL
Enum FAIL
msgs_failed
"Override toEigenVec for this Algo!"
"Illegal argument(s) for XGBoost model: XGBoost_model_1647599361285_7143. Details: ERRR on field: _categorical_encoding: Enum encoding is not supported for XGBoost in current H2O.\n"
#----------- END OF OUTPUT -------
If this is the expected behaviour, the parameter's options should be updated.
Thanks!
Carlos Ortega (Spain)
The text was updated successfully, but these errors were encountered: