Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on ignored_columns for AutoML is confusing, or even incorrect #12973

Closed
exalate-issue-sync bot opened this issue May 13, 2023 · 4 comments
Closed

Comments

@exalate-issue-sync
Copy link

I am referring to the following description on this page.
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

ignored_columns: (Optional, Python only) Specify the column or columns (as a list/vector) to be excluded from the model. This is the converse of the x argument.

If I read it naively, I would expect H2OAutoML object in Python API allows 'ignored_columns' to be specified explicitly. But in reality, it only allows specifying 'x' (=included column names) to its train() method, but never exposes 'ignored_columns' directly.
http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html#h2oautoml

Internally, it derives 'ignored_columns' parameter for the REST request from the 'x' vector and other parameters such as fold_column or weight_column (cf. PUBDEV-4509 ) iff 'x' is specified.
https://github.com/h2oai/h2o-3/blob/jenkins-rel-xia-2/h2o-py/h2o/automl/autoh2o.py#L336

See also PUBDEV-5057 as it is the issue why the description has been added in the first place.

@exalate-issue-sync
Copy link
Author

Angela Bartz commented: [~accountid:5b153fb1b0d76456f36daced] can you verify whether ignored_columns is used in AutoML or if this parameter is ignored?

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: [~accountid:557058:6e44bc1a-dd50-499b-a331-2e049f28773b] I think the {{ignored_columns}} section should be removed from AutoML documentation as it is not directly exposed to end user.

As described in this ticket, AutoML uses this params from the REST API only internally on both Python+R clients (like for all algos in the R client) using simple formula:
{{ignored_columns = all_columns - x - y - fold_column - weights_column}}

In my opinion, this is a good thing, and exposing this parameter for other algos on the Python API was an unfortunate mistake as it plays a role very similar to {{x}} parameter, which can only create confusion and misuse.

@exalate-issue-sync
Copy link
Author

Angela Bartz commented: Pull request merged into rel-yu.

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-6142
Assignee: Angela Bartz
Reporter: Kiyoshi Kamishima
State: Resolved
Fix Version: 3.28.0.2
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4181
#4182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant