Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML fails when the project_name is parsable as a number #8642

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 6 comments
Closed

AutoML fails when the project_name is parsable as a number #8642

exalate-issue-sync bot opened this issue May 12, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

If you use a project_name string like "3.26.0.8", AutoML will fail at the client side because it can't retreive the leaderboard. I guess H2OFrames cannot begin with numbers?

{code:java}library(h2o)
h2o.init()

Import a sample binary outcome train/test set into H2O

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

Identify predictors and response

y <- "response"
x <- setdiff(names(train), y)

For binary classification, response should be a factor

train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])

aml <- h2o.automl(x = x, y = y,
training_frame = train,
max_models = 2,
project_name = "3.26.0.8",
seed = 1){code}

This is the error:

{{Error in if (leaderboard$model_id[1, 1] == "") { :  }}

{{argument is of length zero}}

If you try to grab the model from Python, you will see another error:

{code}In [4]: aml = get_automl("3.26.0.8")

H2OValueError Traceback (most recent call last)
in ()
----> 1 aml = get_automl("3.26.0.8")

/home/ledell/venv/h2o-3/local/lib/python2.7/site-packages/h2o/automl/autoh2o.pyc in get_automl(project_name)
610 :returns: A dictionary containing the project_name, leader model, leaderboard, event_log.
611 """
--> 612 return H2OAutoML._fetch_state(project_name)

/home/ledell/venv/h2o-3/local/lib/python2.7/site-packages/h2o/automl/autoh2o.pyc in _fetch_state(project_name, properties)
585 leaderboard = None
586 if should_fetch('leaderboard'):
--> 587 leaderboard = H2OAutoML._fetch_table(state_json['leaderboard_table'], key=project_name+"_leaderboard", progress_bar=False)
588 leaderboard = h2o.assign(leaderboard[1:], project_name+"_leaderboard") # removing index and reassign id to ensure persistence on backend
589

/home/ledell/venv/h2o-3/local/lib/python2.7/site-packages/h2o/automl/autoh2o.pyc in _fetch_table(table, key, progress_bar)
565 H2OJob.PROGRESS_BAR = progress_bar
566 # Parse leaderboard H2OTwoDimTable & return as an H2OFrame
--> 567 return h2o.H2OFrame(table.cell_values, destination_frame=key, column_names=table.col_header, column_types=table.col_types)
568 finally:
569 H2OJob.PROGRESS_BAR = ori_progress_state

/home/ledell/venv/h2o-3/local/lib/python2.7/site-packages/h2o/frame.pyc in init(self, python_obj, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns)
100 assert_is_type(column_types, None, [coltype], {str: coltype})
101 assert_is_type(na_strings, None, [str], [[str]], {str: [str]})
--> 102 check_frame_id(destination_frame)
103
104 self._ex = ExprNode()

/home/ledell/venv/h2o-3/local/lib/python2.7/site-packages/h2o/utils/shared_utils.pyc in check_frame_id(frame_id)
56 raise H2OValueError("Character '%s' is illegal in frame id: %s" % (ch, frame_id))
57 if re.match(r"-?[0-9]", frame_id):
---> 58 raise H2OValueError("Frame id cannot start with a number: %s" % frame_id)
59
60{code}

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] This is a constraint on {{Frame}} ids. I don’t think it’s worth the effort (and complexity) to fix this for {{AutoML}} project names…

if it’s just for display, you can prefix with {{_}}, it works: {{project_name=”_3.26.0.8”}}… or {{project_name=”v3.26.0.8”}}

Should I close it?

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: What I can do though, is to validate {{project_name}} upfront, so that there’s no surprise after the run.

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: For some reason, it doesn’t work either in R, but the error is much more cryptic (I believe there’s no validation there).

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: This constraint is connected with Rapids language.
So this is what I’m gonna do:

  • as AutoML project name is the base name for additional entities like leaderboard frame, it will be validated to fulfil the same naming constraints as Frame ids.
  • Those constraints are relevant only when using REST API, so it only concerns Py and R (+maybe Flow).

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] Good plan, thanks!

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-6998
Assignee: Sebastien Poirier
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.28.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant