Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get AutoML leaderboard from sklearn wrapped functions #8431

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments
Closed

Get AutoML leaderboard from sklearn wrapped functions #8431

exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

Using one of the new sklearn compatible automl models,

{noformat}from h2o.sklearn import H2OAutoMLRegressor, H2OAutoMLClassifier{noformat}

How can we access the leaderboard which has a summary of models?
I tried wrapping the model instance using the get_leaderboard function but that only accepts the H20AutoML class. Perhaps the leaderboard can be added as a definition to the sklearn classes?

{noformat}H2OTypeError: Argument aml should be an H2OAutoML, got H2OAutoMLClassifier H2OAutoMLClassifier(algo_parameters=None, balance_classes=False,
class_sampling_factors=None, data_conversion='auto',
exclude_algos=None, export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0, max_models=None,
max_runtime_secs=None, max_runtime_secs_per_model=None,
modeling_plan=None, monotone_constraints=None, nfolds=5,
project_name=None, seed=4336, sort_metric='AUTO',
stopping_metric='AUTO', stopping_rounds=3,
stopping_tolerance=None, verbosity='warn'){noformat}

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] has been working on some demos of how to use the new sklearn API – Seb, do you have a link to any of those notebooks yet?

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] [~accountid:5e1ba4e7b5771b0ca440cd4e] I’ll publish a complete tutorial for those {{sklearn}} wrappers very soon.

For now, just be aware that when using a {{sklearn}} wrapper of an H2OEstimator or H2OAutoML, you still have full access to the wrapped object, using the {{_estimator}} property.
For example:

{code:python}from h2o.sklearn import H2OAutoMLClassifier
from h2o.automl import get_leaderboard

aml = H2OAutoMLClassifier(max_models=5)
aml.fit(X, y)
lb = aml._estimator.leaderboard
lb_ext = get_leaderboard(aml._estimator, extra_columns='ALL')
{code}

I should probably expose a “public property” though: I didn’t want to create potential naming conflicts at first, but now that there is internal logic to prevent those, I don’t see what prevents me from making it public.

@exalate-issue-sync
Copy link
Author

Stan Biryukov commented: Thanks for the quick, reply. Good to know about the {{_estimator}} property.

How do I save one of these sklearn wrapped models? I’m attempting to save a sklearn pipeline and can’t seem to find the best way to save to disk. Happy to open a separate issue if that’s best.

For example, my {{mlt}} object is:

{noformat}Pipeline(memory=None,
steps=[('scaler',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('model',
H2OAutoMLClassifier(algo_parameters=None,
balance_classes=False,
class_sampling_factors=None,
data_conversion='auto', exclude_algos=None,
export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0,
max_models=None, max_runtime_secs=60,
max_runtime_secs_per_model=None,
modeling_plan=None,
monotone_constraints=None, nfolds=5,
project_name=None, seed=4336,
sort_metric='AUTO', stopping_metric='AUTO',
stopping_rounds=3, stopping_tolerance=None,
verbosity='warn'))],
verbose=False)
{noformat}

Try pickle dump of everything:

{noformat}import pickle
with open('/workspace/testautoml.pkl', 'wb') as fid:
pickle.dump(mlt, fid){noformat}

Results in: TypeError: can't pickle dict_keys objects

Try h2o save of just the estimator:
{{model_path = h2o.save_model(model=mlt['model'], path="/tmp/mymodel", force=True)}}

Results in:

{noformat} H2OTypeError: Argument model should be a ModelBase, got H2OAutoMLClassifier H2OAutoMLClassifier(algo_parameters=None, balance_classes=False,
class_sampling_factors=None, data_conversion='auto',
exclude_algos=None, export_checkpoints_dir=None,
include_algos=None,
keep_cross_validation_fold_assignment=False,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
max_after_balance_size=5.0, max_models=None,
max_runtime_secs=60, max_runtime_secs_per_model=None,
modeling_plan=None, monotone_constraints=None, nfolds=5,
project_name=None, seed=4336, sort_metric='AUTO',
stopping_metric='AUTO', stopping_rounds=3,
stopping_tolerance=None, verbosity='warn') {noformat}

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: [~accountid:5e1ba4e7b5771b0ca440cd4e] I could reproduce the issue with pickle, thanks for pointing that out.

The fix is trivial, creating a ticket, will be in next minor release.
However, if it’s easy to fix the {{dump}}, the {{load}} still fails as the H2O estimators are actually not {{pickable}} due to the {{connection}} instance to the backend.

For now, what you can still do is to save the params and the wrapper class to restore them later:

{code:python}model = mlt.named_steps.model
import pickle
with open('/workspace/testautoml.pkl', 'wb') as fid:
pickle.dump((model.class, model.get_params()), fid)

with open('/workspace/testautoml.pkl', 'rb') as fid:
cls, params = pickle.load(fid)
restored_model = cls(**params){code}

however, if the restored model is usable, it’s untrained/unfit of course, so it may suits your needs if you want to save the pipeline before training, but don’t expect to recover a trained {{H2OAutoML}} or a trained {{H2OEstimator}} from pickle that easily.

I’m creating a quick fix for the {{dump}} issue as it will still allow you to dump both an untrained or trained wrapper, but you’ll still only be able to {{load}} an untrained one…

If you want to be able to pickle trained models, please create a task, I can’t promise any time estimation for this issue though.

@exalate-issue-sync
Copy link
Author

Stan Biryukov commented: Thanks, Sebastien. It would be ideal to save and then load a model for production purposes, therefore I’ll open a new task.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7201
Assignee: Sebastien Poirier
Reporter: Stan Biryukov
State: Resolved
Fix Version: 3.28.0.2
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant