Get AutoML leaderboard from sklearn wrapped functions #8431

exalate-issue-sync · 2023-05-11T20:30:28Z

Using one of the new sklearn compatible automl models,

{noformat}from h2o.sklearn import H2OAutoMLRegressor, H2OAutoMLClassifier{noformat}

How can we access the leaderboard which has a summary of models?
I tried wrapping the model instance using the get_leaderboard function but that only accepts the H20AutoML class. Perhaps the leaderboard can be added as a definition to the sklearn classes?

{noformat}H2OTypeError: Argument aml should be an H2OAutoML, got H2OAutoMLClassifier H2OAutoMLClassifier(algo_parameters=None, balance_classes=False,
class_sampling_factors=None, data_conversion='auto',
exclude_algos=None, export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0, max_models=None,
max_runtime_secs=None, max_runtime_secs_per_model=None,
modeling_plan=None, monotone_constraints=None, nfolds=5,
project_name=None, seed=4336, sort_metric='AUTO',
stopping_metric='AUTO', stopping_rounds=3,
stopping_tolerance=None, verbosity='warn'){noformat}

The text was updated successfully, but these errors were encountered:

exalate-issue-sync · 2023-05-11T20:30:30Z

Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] has been working on some demos of how to use the new sklearn API – Seb, do you have a link to any of those notebooks yet?

exalate-issue-sync · 2023-05-11T20:30:32Z

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] [~accountid:5e1ba4e7b5771b0ca440cd4e] I’ll publish a complete tutorial for those {{sklearn}} wrappers very soon.

For now, just be aware that when using a {{sklearn}} wrapper of an H2OEstimator or H2OAutoML, you still have full access to the wrapped object, using the {{_estimator}} property.
For example:

{code:python}from h2o.sklearn import H2OAutoMLClassifier
from h2o.automl import get_leaderboard

aml = H2OAutoMLClassifier(max_models=5)
aml.fit(X, y)
lb = aml._estimator.leaderboard
lb_ext = get_leaderboard(aml._estimator, extra_columns='ALL')
{code}

I should probably expose a “public property” though: I didn’t want to create potential naming conflicts at first, but now that there is internal logic to prevent those, I don’t see what prevents me from making it public.

exalate-issue-sync · 2023-05-11T20:30:34Z

Stan Biryukov commented: Thanks for the quick, reply. Good to know about the {{_estimator}} property.

How do I save one of these sklearn wrapped models? I’m attempting to save a sklearn pipeline and can’t seem to find the best way to save to disk. Happy to open a separate issue if that’s best.

For example, my {{mlt}} object is:

{noformat}Pipeline(memory=None,
steps=[('scaler',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('model',
H2OAutoMLClassifier(algo_parameters=None,
balance_classes=False,
class_sampling_factors=None,
data_conversion='auto', exclude_algos=None,
export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0,
max_models=None, max_runtime_secs=60,
max_runtime_secs_per_model=None,
modeling_plan=None,
monotone_constraints=None, nfolds=5,
project_name=None, seed=4336,
sort_metric='AUTO', stopping_metric='AUTO',
stopping_rounds=3, stopping_tolerance=None,
verbosity='warn'))],
verbose=False)
{noformat}

Try pickle dump of everything:

{noformat}import pickle
with open('/workspace/testautoml.pkl', 'wb') as fid:
pickle.dump(mlt, fid){noformat}

Results in: TypeError: can't pickle dict_keys objects

Try h2o save of just the estimator:
{{model_path = h2o.save_model(model=mlt['model'], path="/tmp/mymodel", force=True)}}

Results in:

{noformat} H2OTypeError: Argument model should be a ModelBase, got H2OAutoMLClassifier H2OAutoMLClassifier(algo_parameters=None, balance_classes=False,
class_sampling_factors=None, data_conversion='auto',
exclude_algos=None, export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0, max_models=None,
max_runtime_secs=60, max_runtime_secs_per_model=None,
modeling_plan=None, monotone_constraints=None, nfolds=5,
project_name=None, seed=4336, sort_metric='AUTO',
stopping_metric='AUTO', stopping_rounds=3,
stopping_tolerance=None, verbosity='warn') {noformat}

exalate-issue-sync · 2023-05-11T20:30:35Z

Sebastien Poirier commented: [~accountid:5e1ba4e7b5771b0ca440cd4e] I could reproduce the issue with pickle, thanks for pointing that out.

The fix is trivial, creating a ticket, will be in next minor release.
However, if it’s easy to fix the {{dump}}, the {{load}} still fails as the H2O estimators are actually not {{pickable}} due to the {{connection}} instance to the backend.

For now, what you can still do is to save the params and the wrapper class to restore them later:

{code:python}model = mlt.named_steps.model
import pickle
with open('/workspace/testautoml.pkl', 'wb') as fid:
pickle.dump((model.class, model.get_params()), fid)

with open('/workspace/testautoml.pkl', 'rb') as fid:
cls, params = pickle.load(fid)
restored_model = cls(**params){code}

however, if the restored model is usable, it’s untrained/unfit of course, so it may suits your needs if you want to save the pipeline before training, but don’t expect to recover a trained {{H2OAutoML}} or a trained {{H2OEstimator}} from pickle that easily.

I’m creating a quick fix for the {{dump}} issue as it will still allow you to dump both an untrained or trained wrapper, but you’ll still only be able to {{load}} an untrained one…

If you want to be able to pickle trained models, please create a task, I can’t promise any time estimation for this issue though.

exalate-issue-sync · 2023-05-11T20:30:37Z

Stan Biryukov commented: Thanks, Sebastien. It would be ideal to save and then load a model for production purposes, therefore I’ll open a new task.

h2o-ops · 2023-05-14T22:40:20Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7201
Assignee: Sebastien Poirier
Reporter: Stan Biryukov
State: Resolved
Fix Version: 3.28.0.2
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4220

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.28.0.2 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get AutoML leaderboard from sklearn wrapped functions #8431

Get AutoML leaderboard from sklearn wrapped functions #8431

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023

Get AutoML leaderboard from sklearn wrapped functions #8431

Get AutoML leaderboard from sklearn wrapped functions #8431

Comments

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023