-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-7831: Add new learning curve plotting function/method #5164
Conversation
cf7e9ca
to
a61a4d2
Compare
The scoring history only has xval deviance with it and nothing else. However, if you want to checkout other values apart from deviance, I suggest that you add them to scoring history instead of having cv_scoring_history. I think it will be easier. Also, the CV in GLM only calculated the deviance values. If you need other metrics calculated, you can add it there (in method cv_computeAndSetOptimalParameters(ModelBuilder[] cvModelBuilders)) as well. The CV is used in GLM to figure out the best alpha and lambda values to use and nothing else. Also, the number of iterations may not be the same as in the main run. I will check what happens when you limit the max runtime. Wendy |
Also, at the end of cross-validation, we will calculate the deviance on the hold-out set and pick the best deviance for the alpha/lambda value. However, after that, we will not bother to save the deviance values for the later iterations. This is where it happens (in method cv_computeAndSetOptimalParameters(ModelBuilder[] cvModelBuilders)):
where lmin_max is the submodel with the lowest test deviance. |
Okay, when you set a max_runtime_secs and nfold > 1, each cv model and main model will be allocated equal time to build their model like this (in maxRuntimeSecsPerModel(int cvModelsCount, int parallelization) of ModelBuilder.java): _parms._max_runtime_secs / Math.ceil((double)cvModelsCount / parallelization + 1). So, in theory, they should not run out of time. Also, for GLM, it will always run for one iteration before it quits so as not to return an empty model. This will implies that we may use more time than is allowed by max_runtime_secs. Hope this helps. Wendy |
Thank you @wendycwong. Sorry, I most likely misled you by concentrating on the time constraint. I added a lot of logging and fortunately I was able to reproduce it after couple runs. It looks like the problem is that GLM selects one of the first alpha values tested but if (_parms._lambda_search) { //h2o-algos/src/main/java/hex/glm/GLMModel.java#L1534
lambdaSearch = 1;
iter = _output._submodels[_output._selected_submodel_idx].iteration; // THIS IS THE ADDED LINE
_output._model_summary.set(0, 3, "nlambda = " + _parms._nlambdas + ", lambda.max = " + MathUtils.roundToNDigits(_lambda_max, 4) + ", lambda.min = " + MathUtils.roundToNDigits(_output.lambda_best(), 4) + ", lambda.1se = " + MathUtils.roundToNDigits(_output.lambda_1se(), 4));
} Note that I still don't understand well how GLM works inside, e.g., I don't know if the submodels are created during CV or just final model and if the
I created the I don't have a strong opinion about this but when I look at And since I want to get the actual scoring history of the CV models I would probably have to either
But I might be missing some better way to do it - if you still think your idea is better could you please describe it more concretely?
Thank you again for helping me with the GLM. |
859a430
to
6ec0c9d
Compare
@sebhrusen @ledell I went through the algos and created a mapping between stopping criteria (specified in documentation) and columns present in [{
"mean_per_class_error": {},
"MSE": {},
"RMSLE": {},
"anomaly_score": {
"IsolationForest": ["mean_anomaly_score"]
},
"custom": {
"GBM":["training_custom", "validation_custom"],
"DRF":["training_custom", "validation_custom"]
},
"custom_increasing": {
"GBM":["training_custom", "validation_custom"],
"DRF":["training_custom", "validation_custom"]
},
"deviance": {
"GLM/lambda_search": ["deviance_train", "deviance_test", "deviance_xval", "deviance_se"],
"DRF/regression": ["training_deviance", "validation_deviance"],
"GBM/regression": ["training_deviance", "validation_deviance"],
"DeepLearning/regression": ["training_deviance", "validation_deviance"],
"XGBoost/regression": ["training_deviance", "validation_deviance"],
},
"logloss/binomial,multinomial": {
"DeepLearning": ["training_logloss", "validation_logloss"],
"DRF": ["training_logloss", "validation_logloss"],
"GBM": ["training_logloss", "validation_logloss"],
"XGBoost": ["training_logloss", "validation_logloss"]
},
"RMSE/binomial,multinomial,regression": {
"DeepLearning": ["training_rmse", "validation_rmse"],
"DRF": ["training_rmse", "validation_rmse"],
"GBM": ["training_rmse", "validation_rmse"],
"XGBoost": ["training_rmse", "validation_rmse"]
},
"MAE/regression": {
"DRF": ["training_mae", "validation_mae"],
"GBM": ["training_mae", "validation_mae"],
"DeepLearning": ["training_mae", "validation_mae"],
"XGBoost": ["training_mae", "validation_mae"]
},
"AUC/binomial, + opt-in in multinomial": {
"DeepLearning": ["training_auc", "validation_auc"],
"DRF": ["training_auc", "validation_auc"],
"GBM": ["training_auc", "validation_auc"],
"XGBoost": ["training_auc", "validation_auc"]
},
"AUCPR/binomial, + opt-in in multinomial": {
"DeepLearning": ["training_pr_auc", "validation_pr_auc"],
"DRF": ["training_pr_auc", "validation_pr_auc"],
"GBM": ["training_pr_auc", "validation_pr_auc"],
"XGBoost": ["training_pr_auc", "validation_pr_auc"]
},
"lift_top_group/binomial": {
"DeepLearning": ["training_lift", "validation_lift"],
"DRF": ["training_lift", "validation_lift"],
"GBM": ["training_lift", "validation_lift"],
"XGBoost": ["training_lift", "validation_lift"]
},
"misclassification/binomial,multinomial": {
"DeepLearning": ["training_classification_error", "validation_classification_error"],
"DRF": ["training_classification_error", "validation_classification_error"],
"GBM": ["training_classification_error", "validation_classification_error"],
"XGBoost": ["training_classification_error", "validation_classification_error"]
}
},
"Not a stopping criterium(based on docs) but present in scoring history": {
"DeepLearning" : ["training_r2", "validation_r2"],
"CoxPH": ["loglik"],
"IsolationForest": ["mean_tree_path_length"],
"GLM": ["objective", "convergence", "negative_log_likelihood", "sum(etai-eta0)^2"],
"PCA": ["objective"]
}] Should I use only the names from stopping criteria ( |
6ec0c9d
to
f99a194
Compare
FYI I created a separate PR for the GLM fix: #5191 (merged now) |
a24bfd6
to
19626a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this on a call, but I am summarizing the requested changes here:
- New plot label names:
- Training
- Training (CV Models)
- Validation
- Cross-validation
- Support lower and upper versions of names (e.g.
tolower()
in R), with the official version being lower case. We should consider whether to make all our metric names default to lower case... something to think about for another day.
> h2o.learning_curve_plot(gbm, metric = "AUC")
Error in match.arg(metric) :
'arg' should be one of “AUTO”, “auc”, “aucpr”, “mae”, “rmse”, “anomaly_score”, “convergence”, “custom”, “custom_increasing”, “deviance”, “lift_top_group”, “logloss”, “misclassification”, “negative_log_likelihood”, “objective”, “sumetaieta02”
-
Remove the suffix of the AutoML model IDs (for plotting purposes) so that it's easier to read.
-
Change cutoff line color to red or green.
-
Add newline after "Selected" plot label for more space (let's see if this looks better or worse).
-
Notes about GLM differences:
- deviance (lambda search), iteration
- objective (no lambda search), iterations
The GLM scoring history missing some metrics in GLM, and also in the case of no lambda search, we don't have the cross-validation metrics either. Can we add all of this? Let's discuss in #dev-h2o-3 to see if we can fill in the missing pieces in GLM so we can have a more unified plotting experience (especially since we are using GLM metalearner as the curve we plot for Stacked Ensembles).
#' Create learning curve plot for an H2O Model. Learning curves show error metric dependence on | ||
#' learning progress, e.g., RMSE vs number of trees trained so far in GBM. There can be up to 4 curves | ||
#' showing Training, Validation, Training on CV Models, and Cross-validation error. | ||
#' | ||
#' @param model an H2O model | ||
#' @param metric Metric to be used for the learning curve plot. These should mostly correspond with stopping metric. | ||
#' @param cv_ribbon if True, plot the CV mean as a and CV standard deviation as a ribbon around the mean, | ||
#' if NULL, it will attempt to automatically determine if this is suitable visualisation | ||
#' @param cv_lines if True, plot scoring history for individual CV models, if NULL, it will attempt to | ||
#' automatically determine if this is suitable visualisation | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New doc string here. Python version is basically the same except for NULL -> None
2bfbbc6
to
43ffcf2
Compare
See hex.gam.GAMModel#copyTwoDimTable for the GAM's implementation.
…m/python/gen_gam.py
…bout missing matplotlib
30fd5cb
to
996acbe
Compare
parms._generate_scoring_history = true; | ||
parms._score_iteration_interval = (parms._valid == null) ? 5 : -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scoring_iteration_interval
might change depending on benchmark results.
When _valid
is specified, we use lambda search which provides plenty of information for the learning curve. Otherwise, lambda search is off and the information in scoring history is often just one or two points entries. Even metalearner trained on Airlines dataset subset with 250k rows has less than 10 iterations so even this score_iteration_iterval
might be too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With benchmark results from @wendycwong we decided to keep the value to 5 as it has less than 2% performance impact but it improves the learning curve in some situations significantly. This affects only the AUTO metalearner in SE.
copiedScoringHistory.set(rowIndex, colIndex,sh.get(rowIndex, colIndex)); | ||
} | ||
} | ||
mainModel._output._cv_scoring_history[i] = copiedScoringHistory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does cloning the object not work for this use case? it seems like you are just copying the table
if model.algo == "stackedensemble": | ||
model = model.metalearner() | ||
|
||
if model.algo not in ("stackedensemble", "glm", "gam", "glrm", "deeplearning", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to generalize it, instead of having this enumerated add "an interface" that the algo would implement
this would let us get rid of the big if that determines allowed_metrics
and allowed_timesteps
) * Add prototype implementation of learning curve in R * Update R version * Fix NPE when no scoring history available * Add initial python version * Unify colors between R and Python * Add error for models without scoring history in python * Fix alpha selection in GAM/GLM * Use glm_model_summary as model_summary in GAMs * Add tests and fix bugs * Fix python cv_ribbon default override * Change default colors and improve R legend * Fix logic error in R * Add coxPH and rename cv_individual_lines to cv_lines * Add CoxPH and IsolationForest * Map stopping metric to metric in scoring history * Adjust docstring * Add examples to docstings and fix R cran check * Fix legend in matplotlib2 * Incorporate suggestions from MLI meeting * Add more docstrings and make logloss as default metric for multiple scenarios * Copy TwoDimTable as in GAM instead of clone See hex.gam.GAMModel#copyTwoDimTable for the GAM's implementation. * Remove matplotlib import at the top of the _explain.py file * Move GAM specific modification of ModelBase to h2o-bindings/bin/custom/python/gen_gam.py * Assign default implementation to learning curve plot that complains about missing matplotlib * Adapt to the new features from Wendy's PR (cherry picked from commit 8173e3f)
https://h2oai.atlassian.net/browse/PUBDEV-7831
Need to be tested for multiple different model configurations.
For now, this should work at least on the models that we get from AutoML (it is tested mainly on those).
Questionable choices:
When provided with StackedEnsemble, draw learning curve for the metalearner and make sure to have the metalearner model id in the plot title to make this choice explicitly visible.
If it seems to be hard to make nice ribbon, don't make a ribbon and instead draw individual CV lines. This is because of "DeepLearning epochs doesn't seem to be saved in the same point so the
sd
estimation for ribbon fails".Related fixes
Example