Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM with interactions and Lambda = 0 produces "Categorical value out of bounds" error #8050

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

To Repro, Python code:

{code:python}data = h2o.import_file("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/adult.csv")
data['y'] = data['hours-per-week'] / data['age']

model_cust = H2OGeneralizedLinearEstimator(interactions = ["income", "gender"],
family = "tweedie",
tweedie_variance_power = 1.7, tweedie_link_power = 0,
Lambda = 0,
intercept = True,
compute_p_values = True,
remove_collinear_columns = True,
standardize = True, weights_column = "age", solver="IRLSM")
model_cust.train(x = ["income", "gender"], y = "y", training_frame = data){code}

No error occurs, if you comment out ({{interactions}}) or comment out ({{Lambda}} and {{compute_p_values}})

Stacktrace:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-31-4ef7a3b3682b> in <module>
11                 remove_collinear_columns = True,
12                 standardize = True, weights_column = "age", solver="IRLSM")
---> 13 model_cust.train(x = \["income", "gender"], y = "y", training_frame = data_demo)

~/anaconda3/envs/py_36_new/lib/python3.6/site-packages/h2o/estimators/estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113                                  validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114                                  ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115         self._train(parms, verbose=verbose)
116 
117     def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,

~/anaconda3/envs/py_36_new/lib/python3.6/site-packages/h2o/estimators/estimator_base.py in _train(self, parms, verbose)
200             return
201 
--> 202         job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
203         model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))\["models"]\[0]
204         self._resolve_model(job.dest_key, model_json)

~/anaconda3/envs/py_36_new/lib/python3.6/site-packages/h2o/job.py in poll(self, poll_updates)
76             if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
77                 raise EnvironmentError("Job with key \{} failed with an exception: \{}\\nstacktrace: "
---> 78                                        "\\n\{}".format(self.job_key, self.exception, self.job\["stacktrace"]))
79             else:
80                 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))

OSError: Job with key $03017f00000132d4ffffffff$_b7cb7ed92c31bf94c7e42a48c6dd41d8 failed with an exception: DistributedException from /127.0.0.1:54321: 'Categorical value out of bounds, got 1, next cat starts at 3', caused by java.lang.AssertionError: Categorical value out of bounds, got 1, next cat starts at 3
stacktrace: 
DistributedException from /127.0.0.1:54321: 'Categorical value out of bounds, got 1, next cat starts at 3', caused by java.lang.AssertionError: Categorical value out of bounds, got 1, next cat starts at 3
at water.MRTask.getResult(MRTask.java:494)
at water.MRTask.getResult(MRTask.java:502)
at water.MRTask.doAll(MRTask.java:397)
at water.MRTask.doAll(MRTask.java:392)
at hex.glm.GLM$GLMGradientSolver.getGradient(GLM.java:2880)
at hex.glm.GLM.init(GLM.java:618)
at hex.glm.GLM$GLMDriver.computeImpl(GLM.java:2089)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:248)
at hex.glm.GLM$GLMDriver.compute2(GLM.java:848)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1557)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.AssertionError: Categorical value out of bounds, got 1, next cat starts at 3
at hex.DataInfo.getCategoricalId(DataInfo.java:1094)
at hex.glm.GLMTask$GLMGradientTask.computeCategoricalEtas(GLMTask.java:449)
at hex.glm.GLMTask$GLMGradientTask.map(GLMTask.java:549)
at water.MRTask.compute2(MRTask.java:658)
at water.MRTask.compute2(MRTask.java:607)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1560)
at hex.glm.GLMTask$GLMGenericGradientTask$Icer.compute1(GLMTask$GLMGenericGradientTask$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1556)
... 5 more
@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Fails on Py Client 3.30.0.4

End of Logs:

{noformat}06-01 13:34:22.767 127.0.0.1:54321 64759 FJ-1-5 INFO: Building H2O GLM model with these parameters:
06-01 13:34:22.779 127.0.0.1:54321 64759 FJ-1-5 INFO: {"_train":{"name":"py_1_sid_80cf","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":-1,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":["hours-per-week","race","educational-num","education","x","marital-status","workclass"],"_ignore_const_cols":true,"_weights_column":"age","_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":3,"_stopping_metric":"deviance","_stopping_tolerance":1.0E-4,"_response_column":"y","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_custom_distribution_func":null,"_export_checkpoints_dir":null,"_standardize":true,"_useDispersion1":false,"_family":"tweedie","_rand_family":null,"_link":"family_default","_rand_link":null,"_solver":"IRLSM","_tweedie_variance_power":1.7,"_tweedie_link_power":0.0,"_theta":1.0E-10,"_invTheta":1.0E10,"_alpha":null,"_lambda":[0.0],"_startval":null,"_calc_like":false,"_random_columns":null,"_missing_values_handling":null,"_prior":-1.0,"_lambda_search":false,"_HGLM":false,"_nlambdas":-1,"_non_negative":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":true,"_beta_epsilon":1.0E-4,"_objective_epsilon":-1.0,"_gradient_epsilon":-1.0,"_obj_reg":-1.0,"_compute_p_values":true,"_remove_collinear_columns":true,"_interactions":["income","gender"],"_interaction_pairs":null,"_early_stopping":true,"_beta_constraints":null,"_plug_values":null,"_max_active_predictors":-1,"_stdOverride":false,"_glmType":"glm"}
06-01 13:34:22.780 127.0.0.1:54321 64759 FJ-1-5 INFO: Dropping ignored columns: [hours-per-week, race, educational-num, education, x, marital-status, workclass]
06-01 13:34:22.781 127.0.0.1:54321 64759 FJ-1-5 INFO: train dataset already contains 64 (non-empty) chunks. No need to rebalance. [desiredChunks=16, rebalanceRatio=1.0]
06-01 13:34:22.841 127.0.0.1:54321 64759 FJ-1-5 INFO: GLM[dest=GLM_model_python_1591043651296_1, iter=0 lmb=.0E0 obj=Infinity imp=.0E0 bdf=.0E0] using 48842 nobs out of 48842 total
06-01 13:34:22.853 127.0.0.1:54321 64759 FJ-1-5 INFO: Completing model GLM_model_python_1591043651296_1
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: DistributedException from /127.0.0.1:54321: 'Categorical value out of bounds, got 1, next cat starts at 3', caused by java.lang.AssertionError: Categorical value out of bounds, got 1, next cat starts at 3
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.getResult(MRTask.java:494)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.getResult(MRTask.java:502)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.doAll(MRTask.java:397)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.doAll(MRTask.java:392)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLM$GLMGradientSolver.getGradient(GLM.java:2880)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLM.init(GLM.java:618)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLM$GLMDriver.computeImpl(GLM.java:2089)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:248)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLM$GLMDriver.compute2(GLM.java:848)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1557)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: Caused by: java.lang.AssertionError: Categorical value out of bounds, got 1, next cat starts at 3
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.DataInfo.getCategoricalId(DataInfo.java:1094)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLMTask$GLMGradientTask.computeCategoricalEtas(GLMTask.java:449)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLMTask$GLMGradientTask.map(GLMTask.java:549)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:658)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.MRTask.compute2(MRTask.java:607)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.H2O$H2OCountedCompleter.compute1(H2O.java:1560)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at hex.glm.GLMTask$GLMGenericGradientTask$Icer.compute1(GLMTask$GLMGenericGradientTask$Icer.java)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1556)
06-01 13:34:22.856 127.0.0.1:54321 64759 FJ-1-5 ERRR: ... 5 more{noformat}

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Training occurs without error if manually launching jar file.

End of Logs:

{noformat}06-01 13:38:20.670 192.168.1.73:54321 64768 #98217-22 INFO: GET /3/Frames/adult.hex, parms: {column_offset=0, full_column_count=-1, row_count=10, row_offset=0, column_count=-1}
06-01 13:38:20.717 192.168.1.73:54321 64768 #98217-21 INFO: POST /4/sessions, parms: {}
06-01 13:38:20.725 192.168.1.73:54321 64768 #98217-23 INFO: POST /99/Rapids, parms: {ast=(tmp= py_2_sid_b0dc (append adult.hex (/ (cols_py adult.hex 'hours-per-week') (cols_py adult.hex 'age')) 'y')), session_id=_sid_b0dc}
06-01 13:38:20.792 192.168.1.73:54321 64768 #98217-22 INFO: POST /3/ModelBuilders/glm, parms: {response_column=y, tweedie_variance_power=1.7, interactions=["income","gender"], family=tweedie, training_frame=py_2_sid_b0dc, lambda=0, compute_p_values=True, remove_collinear_columns=True, weights_column=age, tweedie_link_power=0, ignored_columns=["hours-per-week","race","educational-num","education","x","marital-status","workclass"], standardize=True, intercept=True, solver=IRLSM}
06-01 13:38:20.827 192.168.1.73:54321 64768 FJ-1-25 INFO: Building H2O GLM model with these parameters:
06-01 13:38:20.843 192.168.1.73:54321 64768 FJ-1-25 INFO: {"_train":{"name":"py_2_sid_b0dc","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":-1,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":["hours-per-week","race","educational-num","education","x","marital-status","workclass"],"_ignore_const_cols":true,"_weights_column":"age","_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":3,"_stopping_metric":"deviance","_stopping_tolerance":1.0E-4,"_response_column":"y","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_custom_distribution_func":null,"_export_checkpoints_dir":null,"_standardize":true,"_useDispersion1":false,"_family":"tweedie","_rand_family":null,"_link":"family_default","_rand_link":null,"_solver":"IRLSM","_tweedie_variance_power":1.7,"_tweedie_link_power":0.0,"_theta":1.0E-10,"_invTheta":1.0E10,"_alpha":null,"_lambda":[0.0],"_startval":null,"_calc_like":false,"_random_columns":null,"_missing_values_handling":null,"_prior":-1.0,"_lambda_search":false,"_HGLM":false,"_nlambdas":-1,"_non_negative":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":true,"_beta_epsilon":1.0E-4,"_objective_epsilon":-1.0,"_gradient_epsilon":-1.0,"_obj_reg":-1.0,"_compute_p_values":true,"_remove_collinear_columns":true,"_interactions":["income","gender"],"_interaction_pairs":null,"_early_stopping":true,"_beta_constraints":null,"_plug_values":null,"_max_active_predictors":-1,"_stdOverride":false,"_glmType":"glm"}
06-01 13:38:20.845 192.168.1.73:54321 64768 FJ-1-25 INFO: Dropping ignored columns: [hours-per-week, race, educational-num, education, x, marital-status, workclass]
06-01 13:38:20.845 192.168.1.73:54321 64768 FJ-1-25 INFO: train dataset already contains 64 (non-empty) chunks. No need to rebalance. [desiredChunks=16, rebalanceRatio=1.0]
06-01 13:38:20.898 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=0 lmb=.0E0 obj=Infinity imp=.0E0 bdf=.0E0] using 48842 nobs out of 48842 total
06-01 13:38:20.931 192.168.1.73:54321 64768 FJ-1-25 INFO: Starting model GLM_model_python_1591043825523_1
06-01 13:38:20.950 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=0 lmb=.0E0 obj=4.8273 imp=.1E1 bdf=.45E-1] Got 3 active columns out of 3 total
06-01 13:38:20.985 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=0 lmb=.0E0 obj=4.8273 imp=.1E1 bdf=.45E-1] computed in 34+1=35ms, step = 1
06-01 13:38:21.002 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=1 lmb=.0E0 obj=4.8258 imp=.32E-3 bdf=.14E0] computed in 16+1=17ms, step = 1
06-01 13:38:21.017 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=2 lmb=.0E0 obj=4.8257 imp=.17E-4 bdf=.11E-1] computed in 15+0=15ms, step = 1
06-01 13:38:21.031 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=3 lmb=.0E0 obj=4.8257 imp=.27E-6 bdf=.27E-3] relImprovement < eps; relImprovement = 2.7187384703407576E-7, eps = 1.0E-6
06-01 13:38:21.077 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=3 lmb=.0E0 obj=4.8257 imp=.27E-6 bdf=.27E-3] solution has 4 nonzeros
06-01 13:38:21.079 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=3 lmb=.0E0 obj=4.8257 imp=.27E-6 bdf=.27E-3] Scoring after 252ms
06-01 13:38:21.089 192.168.1.73:54321 64768 FJ-1-25 WARN: Test/Validation dataset column 'income_gender' has levels not trained on: [<=50K_Female, <=50K_Male, >50K_Female]
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=3 lmb=.0E0 obj=4.8257 imp=.27E-6 bdf=.27E-3] Model Metrics Type: RegressionGLM
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: Description: N/A
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: model id: GLM_model_python_1591043825523_1
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: frame id: py_2_sid_b0dc
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: MSE: 0.21490197
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: RMSE: 0.46357518
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: mean residual deviance: 0.22721438
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: mean absolute error: 0.35283184
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: root mean squared log error: 0.2238919
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: null DOF: 48841.0
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: residual DOF: 48838.0
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: null deviance: 424507.72
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: residual deviance: 428851.25
06-01 13:38:21.138 192.168.1.73:54321 64768 FJ-1-25 INFO: AIC: NaN
06-01 13:38:21.139 192.168.1.73:54321 64768 FJ-1-25 INFO: GLM[dest=GLM_model_python_1591043825523_1, iter=3 lmb=.0E0 obj=4.8257 imp=.27E-6 bdf=.27E-3] Training metrics computed in 59ms
06-01 13:38:21.145 192.168.1.73:54321 64768 FJ-1-25 INFO: Completing model GLM_model_python_1591043825523_1{noformat}

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Check if assertions are wrong. Since it fails when -ea is enabled.

@exalate-issue-sync
Copy link
Author

Wendy commented: When Lambda=[0], the use_all_factor_levels = False and this creates a problem for the interaction vectors generation. In this case, the interaction vectors are gender (FEMALE, MALE) and income (<50K, >=50K). However, the interaction vectors domain are:

FEMALE_<50K,

FEMAL_>= 50K,

MALE_<50K,

MALE_>=50K.

With use_all_factor_levels=False, the domain of the final interaction vectors is MALE_>=50K. This seems incorrect. However, when checked with R, they did exactly the same.

@exalate-issue-sync
Copy link
Author

Wendy commented: When lambda=0, use_all_factor_level = False. Hence, if the original factor level is 4, it will not become 3. Hence, the check with assert.

However, when you have interaction and lambda=9, use_all_factor_level=False, the combined column categorical level is going to drop for more than 1! Consider gender (2) and race (2), the final combined gender_race interaction column is of 1 level only! This is because we try to combine gender (1) and race (1) due to use_all_factor_level=False. Hence, it will fail the assert.

Concrete example: gender=<Female, Male>, race=<Red, Green>. The complete interaction will be Female_Red, Female_Green, Male_Red, Male_Green. If use_all_factor_level=False, then, the interaction domain will have Male_Green in this case.

Basically, you are only taking into the effect of the interaction column when gender = Male and race=Green.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7588
Assignee: Wendy
Reporter: Neema Mashayekhi
State: Resolved
Fix Version: 3.30.0.5
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant