Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real data type inferred for enum target in H2OAutoML #8924

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 2 comments
Closed

real data type inferred for enum target in H2OAutoML #8924

exalate-issue-sync bot opened this issue May 12, 2023 · 2 comments

Comments

@exalate-issue-sync
Copy link

made a simple script to try AutoML on a modified Titanic dataset (predicting an enum variable).

If I leave the target ('Survived' = [0,1]) as an integer, everything runs fine - but as this is a classification problem I force 'Survived' to be type 'enum' by setting the values to 'yes' and 'no'.

H2OXGBoostEstimator trains properly

Here's the error:

{{OSError: Job with key $03017f00000132d4ffffffff$_823029d133bd6a203b841a5297ef42c6 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has categorical column 'Survived' which is real-valued in the training data}}

training data:

[^train.csv]

demo script

[^titanic-demo.py]

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: [~accountid:5d34759b6e55370bc308bdbb] Sorry for tackling this late: this is actually caused by [https://0xdata.atlassian.net/browse/PUBDEV-5975|https://0xdata.atlassian.net/browse/PUBDEV-5975|smart-link] .

You ran the AutoML instance twice using the same {{project_name}}, which in this case is considered as a rerun and will append new models to the existing leaderboard, with the {{models}} of the first run trying to get re-scored against the new modified {{leaderboard_frame}}.

Issue [https://0xdata.atlassian.net/browse/PUBDEV-5975|https://0xdata.atlassian.net/browse/PUBDEV-5975|smart-link] is aiming at solving those rerun scenarios.
Now thanks to this ticket, I’m considering binding the {{leaderboard}} to the corresponding leaderboard_frame.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-6708
Assignee: Sebastien Poirier
Reporter: Kiri Nichol
State: Resolved
Fix Version: 3.28.0.1
Attachments: Available (Count: 2)
Development PRs: Available

Linked PRs from JIRA

#3907

Attachments From Jira

Attachment Name: titanic-demo.py
Attached By: Kiri Nichol
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6708/titanic-demo.py

Attachment Name: train.csv
Attached By: Kiri Nichol
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6708/train.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant