real data type inferred for enum target in H2OAutoML #8924

exalate-issue-sync · 2023-05-12T10:00:40Z

made a simple script to try AutoML on a modified Titanic dataset (predicting an enum variable).

If I leave the target ('Survived' = [0,1]) as an integer, everything runs fine - but as this is a classification problem I force 'Survived' to be type 'enum' by setting the values to 'yes' and 'no'.

H2OXGBoostEstimator trains properly

Here's the error:

{{OSError: Job with key $03017f00000132d4ffffffff$_823029d133bd6a203b841a5297ef42c6 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has categorical column 'Survived' which is real-valued in the training data}}

training data:

[^train.csv]

demo script

[^titanic-demo.py]

exalate-issue-sync · 2023-05-12T10:00:41Z

Sebastien Poirier commented: [~accountid:5d34759b6e55370bc308bdbb] Sorry for tackling this late: this is actually caused by [https://0xdata.atlassian.net/browse/PUBDEV-5975|https://0xdata.atlassian.net/browse/PUBDEV-5975|smart-link] .

You ran the AutoML instance twice using the same {{project_name}}, which in this case is considered as a rerun and will append new models to the existing leaderboard, with the {{models}} of the first run trying to get re-scored against the new modified {{leaderboard_frame}}.

Issue [https://0xdata.atlassian.net/browse/PUBDEV-5975|https://0xdata.atlassian.net/browse/PUBDEV-5975|smart-link] is aiming at solving those rerun scenarios.
Now thanks to this ticket, I’m considering binding the {{leaderboard}} to the corresponding leaderboard_frame.

h2o-ops · 2023-05-14T23:47:32Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-6708
Assignee: Sebastien Poirier
Reporter: Kiri Nichol
State: Resolved
Fix Version: 3.28.0.1
Attachments: Available (Count: 2)
Development PRs: Available

Linked PRs from JIRA

#3907

Attachments From Jira

Attachment Name: titanic-demo.py
Attached By: Kiri Nichol
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6708/titanic-demo.py

Attachment Name: train.csv
Attached By: Kiri Nichol
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6708/train.csv

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.28.0.1 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real data type inferred for enum target in H2OAutoML #8924

real data type inferred for enum target in H2OAutoML #8924

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023

real data type inferred for enum target in H2OAutoML #8924

real data type inferred for enum target in H2OAutoML #8924

Comments

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023