PUBDEV-7831: Add new learning curve plotting function/method #5164

tomasfryda · 2020-12-01T16:30:52Z

https://h2oai.atlassian.net/browse/PUBDEV-7831

Need to be tested for multiple different model configurations.
For now, this should work at least on the models that we get from AutoML (it is tested mainly on those).

Questionable choices:

When provided with StackedEnsemble, draw learning curve for the metalearner and make sure to have the metalearner model id in the plot title to make this choice explicitly visible.
If it seems to be hard to make nice ribbon, don't make a ribbon and instead draw individual CV lines. This is because of "DeepLearning epochs doesn't seem to be saved in the same point so the sd estimation for ribbon fails".

Related fixes

CoxPH scoring history is missing last entry: Fix CoxPH scoring history #5186 (merged)
GLM with lambda search reports number of iterations of the last model, not the winning submodel (merged)

Example

wendycwong · 2020-12-10T16:56:10Z

@tomasfryda

The scoring history only has xval deviance with it and nothing else. However, if you want to checkout other values apart from deviance, I suggest that you add them to scoring history instead of having cv_scoring_history. I think it will be easier.

Also, the CV in GLM only calculated the deviance values. If you need other metrics calculated, you can add it there (in method cv_computeAndSetOptimalParameters(ModelBuilder[] cvModelBuilders)) as well. The CV is used in GLM to figure out the best alpha and lambda values to use and nothing else. Also, the number of iterations may not be the same as in the main run.

I will check what happens when you limit the max runtime.

Wendy

wendycwong · 2020-12-10T17:40:57Z

Also, at the end of cross-validation, we will calculate the deviance on the hold-out set and pick the best deviance for the alpha/lambda value. However, after that, we will not bother to save the deviance values for the later iterations. This is where it happens (in method cv_computeAndSetOptimalParameters(ModelBuilder[] cvModelBuilders)):

  _parms._lambda = Arrays.copyOf(_parms._lambda,lmin_max+1);
  _xval_deviances = Arrays.copyOf(_xval_deviances, lmin_max+1);
  _xval_sd = Arrays.copyOf(_xval_sd, lmin_max+1);

where lmin_max is the submodel with the lowest test deviance.

wendycwong · 2020-12-10T17:52:57Z

Okay, when you set a max_runtime_secs and nfold > 1, each cv model and main model will be allocated equal time to build their model like this (in maxRuntimeSecsPerModel(int cvModelsCount, int parallelization) of ModelBuilder.java):

_parms._max_runtime_secs / Math.ceil((double)cvModelsCount / parallelization + 1).

So, in theory, they should not run out of time. Also, for GLM, it will always run for one iteration before it quits so as not to return an empty model. This will implies that we may use more time than is allowed by max_runtime_secs. Hope this helps. Wendy

tomasfryda · 2020-12-11T14:54:38Z

_parms._max_runtime_secs / Math.ceil((double)cvModelsCount / parallelization + 1).

So, in theory, they should not run out of time. Also, for GLM, it will always run for one iteration before it quits so as not to return an empty model. This will implies that we may use more time than is allowed by max_runtime_secs. Hope this helps. Wendy

Thank you @wendycwong. Sorry, I most likely misled you by concentrating on the time constraint.

I added a lot of logging and fortunately I was able to reproduce it after couple runs. It looks like the problem is that GLM selects one of the first alpha values tested but generateSummary is called after the model is finished and it uses the _state._iter as number_of_iterations, however, the _state._iter contains the number of iteration from the last tried alpha value (which in this case is not the alpha_best). My fix would be something like adding a line to hex.glm.GLMModel#generateSummary:

if (_parms._lambda_search) { //h2o-algos/src/main/java/hex/glm/GLMModel.java#L1534
    lambdaSearch = 1;
    iter = _output._submodels[_output._selected_submodel_idx].iteration;  // THIS IS THE ADDED LINE
    _output._model_summary.set(0, 3, "nlambda = " + _parms._nlambdas + ", lambda.max = " + MathUtils.roundToNDigits(_lambda_max, 4) + ", lambda.min = " + MathUtils.roundToNDigits(_output.lambda_best(), 4) + ", lambda.1se = " + MathUtils.roundToNDigits(_output.lambda_1se(), 4));
}

Note that I still don't understand well how GLM works inside, e.g., I don't know if the submodels are created during CV or just final model and if the iteration in a submodel corresponds to the iteration of the final model that is used for prediction so this fix might be incorrect. What do you think?

The scoring history only has xval deviance with it and nothing else. However, if you want to checkout other values apart from deviance, I suggest that you add them to scoring history instead of having cv_scoring_history. I think it will be easier.

I created the cv_scoring_history[] because it seemed to me easier than adding it to the scoring_history.

I don't have a strong opinion about this but when I look at glm@model$scoring_history I get the table below, which seems to me like a very wide table with a lot of NAs already.

And since I want to get the actual scoring history of the CV models I would probably have to either

add a new column that would indicate whether it is a CV model or final model and rbind the cv scoring_histories, or
add multiple columns such as cv_training_rmse so there is a clear distinction between the final model and CV models and since some models are scored based on some time interval (I think deep learning) this would add a lot of new rows that would have NAs for the cv model or the final one.

But I might be missing some better way to do it - if you still think your idea is better could you please describe it more concretely?

            timestamp   duration iterations negative_log_likelihood objective training_rmse training_logloss
1 2020-12-11 12:25:13  0.000 sec          0               689.51201   0.66236            NA               NA
2 2020-12-11 12:25:13  0.001 sec          1               252.03686   0.36284            NA               NA
3 2020-12-11 12:25:13  0.002 sec          2               222.59586   0.35667            NA               NA
4 2020-12-11 12:25:13  0.002 sec          3               220.00584   0.35661            NA               NA
5 2020-12-11 12:25:13  0.003 sec          4               219.97507   0.35661       0.21102          0.21131
  training_r2 training_auc training_pr_auc training_lift training_classification_error validation_rmse
1          NA           NA              NA            NA                            NA              NA
2          NA           NA              NA            NA                            NA              NA
3          NA           NA              NA            NA                            NA              NA
4          NA           NA              NA            NA                            NA              NA
5     0.81033           NA              NA       2.65561                       0.02498         0.20668
  validation_logloss validation_r2 validation_auc validation_pr_auc validation_lift
1                 NA            NA             NA                NA              NA
2                 NA            NA             NA                NA              NA
3                 NA            NA             NA                NA              NA
4                 NA            NA             NA                NA              NA
5            0.20681       0.82245        0.99494           0.99149         2.48148
  validation_classification_error
1                              NA
2                              NA
3                              NA
4                              NA
5                         0.02239

Thank you again for helping me with the GLM.

tomasfryda · 2020-12-14T14:07:11Z

@sebhrusen @ledell I went through the algos and created a mapping between stopping criteria (specified in documentation) and columns present in scoring_history. I couldn't find some stopping criteria, namely mean_per_class_error, MSE, RMSLE. I added some "metadata" after / to note when the criterium is present in the scoring history.

[{
    "mean_per_class_error": {},
    "MSE": {},
    "RMSLE": {},
    "anomaly_score": {
      "IsolationForest": ["mean_anomaly_score"]
    },
    "custom": {
      "GBM":["training_custom", "validation_custom"],
      "DRF":["training_custom", "validation_custom"]
    },
    "custom_increasing": {
      "GBM":["training_custom", "validation_custom"],
      "DRF":["training_custom", "validation_custom"]
    },
    "deviance": {
      "GLM/lambda_search": ["deviance_train", "deviance_test", "deviance_xval", "deviance_se"],
      "DRF/regression": ["training_deviance", "validation_deviance"],
      "GBM/regression": ["training_deviance", "validation_deviance"],
      "DeepLearning/regression": ["training_deviance", "validation_deviance"],
      "XGBoost/regression": ["training_deviance", "validation_deviance"],
    },
    "logloss/binomial,multinomial": {
      "DeepLearning": ["training_logloss", "validation_logloss"],
      "DRF": ["training_logloss", "validation_logloss"],
      "GBM": ["training_logloss", "validation_logloss"],
      "XGBoost": ["training_logloss", "validation_logloss"]
    },
    "RMSE/binomial,multinomial,regression": {
      "DeepLearning": ["training_rmse", "validation_rmse"],
      "DRF": ["training_rmse", "validation_rmse"],
      "GBM": ["training_rmse", "validation_rmse"],
      "XGBoost": ["training_rmse", "validation_rmse"]
    },
    "MAE/regression": {
      "DRF": ["training_mae", "validation_mae"],
      "GBM": ["training_mae", "validation_mae"],
      "DeepLearning": ["training_mae", "validation_mae"],
      "XGBoost": ["training_mae", "validation_mae"]
    },
    "AUC/binomial, + opt-in in multinomial": {
      "DeepLearning": ["training_auc", "validation_auc"],
      "DRF": ["training_auc", "validation_auc"],
      "GBM": ["training_auc", "validation_auc"],
      "XGBoost": ["training_auc", "validation_auc"]
    },
    "AUCPR/binomial, + opt-in in multinomial": {
      "DeepLearning": ["training_pr_auc", "validation_pr_auc"],
      "DRF": ["training_pr_auc", "validation_pr_auc"],
      "GBM": ["training_pr_auc", "validation_pr_auc"],
      "XGBoost": ["training_pr_auc", "validation_pr_auc"]
    },
    "lift_top_group/binomial": {
      "DeepLearning": ["training_lift", "validation_lift"],
      "DRF": ["training_lift", "validation_lift"],
      "GBM": ["training_lift", "validation_lift"],
      "XGBoost": ["training_lift", "validation_lift"]
    },
    "misclassification/binomial,multinomial": {
      "DeepLearning": ["training_classification_error", "validation_classification_error"],
      "DRF": ["training_classification_error", "validation_classification_error"],
      "GBM": ["training_classification_error", "validation_classification_error"],
      "XGBoost": ["training_classification_error", "validation_classification_error"]
    }
  },
  "Not a stopping criterium(based on docs) but present in scoring history": {
  "DeepLearning" : ["training_r2", "validation_r2"],
  "CoxPH": ["loglik"],
  "IsolationForest": ["mean_tree_path_length"],
  "GLM": ["objective", "convergence", "negative_log_likelihood", "sum(etai-eta0)^2"],
  "PCA": ["objective"]
  }]

Should I use only the names from stopping criteria (misclassification) or should I allow user specify both (misclassification or classification_error)?

tomasfryda · 2020-12-16T15:26:49Z

FYI I created a separate PR for the GLM fix: #5191 (merged now)

ledell

We discussed this on a call, but I am summarizing the requested changes here:

New plot label names:

Training
Training (CV Models)
Validation
Cross-validation

Support lower and upper versions of names (e.g. tolower() in R), with the official version being lower case. We should consider whether to make all our metric names default to lower case... something to think about for another day.

> h2o.learning_curve_plot(gbm, metric = "AUC")
Error in match.arg(metric) : 
  'arg' should be one of “AUTO”, “auc”, “aucpr”, “mae”, “rmse”, “anomaly_score”, “convergence”, “custom”, “custom_increasing”, “deviance”, “lift_top_group”, “logloss”, “misclassification”, “negative_log_likelihood”, “objective”, “sumetaieta02”

Remove the suffix of the AutoML model IDs (for plotting purposes) so that it's easier to read.
Change cutoff line color to red or green.
Add newline after "Selected" plot label for more space (let's see if this looks better or worse).
Notes about GLM differences:

deviance (lambda search), iteration
objective (no lambda search), iterations

The GLM scoring history missing some metrics in GLM, and also in the case of no lambda search, we don't have the cross-validation metrics either. Can we add all of this? Let's discuss in #dev-h2o-3 to see if we can fill in the missing pieces in GLM so we can have a more unified plotting experience (especially since we are using GLM metalearner as the curve we plot for Stacked Ensembles).

tomasfryda · 2021-01-14T20:14:16Z

I made the modification (1-5) and here are the results:

Python

R

tomasfryda · 2021-01-15T11:27:59Z

h2o-r/h2o-package/R/explain.R

+#' Create learning curve plot for an H2O Model. Learning curves show error metric dependence on
+#' learning progress, e.g., RMSE vs number of trees trained so far in GBM. There can be up to 4 curves
+#' showing Training, Validation, Training on CV Models, and Cross-validation error.
+#'
+#' @param model an H2O model
+#' @param metric Metric to be used for the learning curve plot. These should mostly correspond with stopping metric.
+#' @param cv_ribbon if True, plot the CV mean as a and CV standard deviation as a ribbon around the mean,
+#'                  if NULL, it will attempt to automatically determine if this is suitable visualisation
+#' @param cv_lines if True, plot scoring history for individual CV models, if NULL, it will attempt to
+#'                 automatically determine if this is suitable visualisation
+#'


New doc string here. Python version is basically the same except for NULL -> None

h2o-core/src/main/java/hex/ModelBuilder.java

h2o-py/h2o/model/model_base.py

h2o-py/h2o/explanation/_explain.py

…cenarios

See hex.gam.GAMModel#copyTwoDimTable for the GAM's implementation.

…m/python/gen_gam.py

…bout missing matplotlib

tomasfryda · 2021-03-19T13:03:25Z

h2o-algos/src/main/java/hex/ensemble/Metalearners.java

+            parms._generate_scoring_history = true;
+            parms._score_iteration_interval = (parms._valid == null) ? 5 : -1;


scoring_iteration_interval might change depending on benchmark results.

When _valid is specified, we use lambda search which provides plenty of information for the learning curve. Otherwise, lambda search is off and the information in scoring history is often just one or two points entries. Even metalearner trained on Airlines dataset subset with 250k rows has less than 10 iterations so even this score_iteration_iterval might be too big.

With benchmark results from @wendycwong we decided to keep the value to 5 as it has less than 2% performance impact but it improves the learning curve in some situations significantly. This affects only the AUTO metalearner in SE.

michalkurka · 2021-03-22T19:43:44Z

h2o-core/src/main/java/hex/ModelBuilder.java

+            copiedScoringHistory.set(rowIndex, colIndex,sh.get(rowIndex, colIndex));
+          }
+        }
+        mainModel._output._cv_scoring_history[i] = copiedScoringHistory;


does cloning the object not work for this use case? it seems like you are just copying the table

michalkurka · 2021-03-22T20:14:50Z

h2o-py/h2o/explanation/_explain.py

+    if model.algo == "stackedensemble":
+        model = model.metalearner()
+
+    if model.algo not in ("stackedensemble", "glm", "gam", "glrm", "deeplearning",


would be nice to generalize it, instead of having this enumerated add "an interface" that the algo would implement

this would let us get rid of the big if that determines allowed_metrics and allowed_timesteps

) * Add prototype implementation of learning curve in R * Update R version * Fix NPE when no scoring history available * Add initial python version * Unify colors between R and Python * Add error for models without scoring history in python * Fix alpha selection in GAM/GLM * Use glm_model_summary as model_summary in GAMs * Add tests and fix bugs * Fix python cv_ribbon default override * Change default colors and improve R legend * Fix logic error in R * Add coxPH and rename cv_individual_lines to cv_lines * Add CoxPH and IsolationForest * Map stopping metric to metric in scoring history * Adjust docstring * Add examples to docstings and fix R cran check * Fix legend in matplotlib2 * Incorporate suggestions from MLI meeting * Add more docstrings and make logloss as default metric for multiple scenarios * Copy TwoDimTable as in GAM instead of clone See hex.gam.GAMModel#copyTwoDimTable for the GAM's implementation. * Remove matplotlib import at the top of the _explain.py file * Move GAM specific modification of ModelBase to h2o-bindings/bin/custom/python/gen_gam.py * Assign default implementation to learning curve plot that complains about missing matplotlib * Adapt to the new features from Wendy's PR (cherry picked from commit 8173e3f)

tomasfryda added python R labels Dec 1, 2020

tomasfryda self-assigned this Dec 1, 2020

tomasfryda requested review from sebhrusen and ledell December 1, 2020 16:31

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch 3 times, most recently from cf7e9ca to a61a4d2 Compare December 7, 2020 17:21

tomasfryda marked this pull request as ready for review December 8, 2020 09:56

tomasfryda added the please review label Dec 8, 2020

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch from 859a430 to 6ec0c9d Compare December 14, 2020 13:41

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch from 6ec0c9d to f99a194 Compare December 15, 2020 10:03

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch from a24bfd6 to 19626a8 Compare January 13, 2021 18:10

ledell suggested changes Jan 13, 2021

View reviewed changes

tomasfryda requested a review from ledell January 14, 2021 20:14

tomasfryda commented Jan 15, 2021

View reviewed changes

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch 2 times, most recently from 2bfbbc6 to 43ffcf2 Compare January 25, 2021 18:41

sebhrusen requested a review from michalkurka January 25, 2021 22:29

sebhrusen reviewed Jan 25, 2021

View reviewed changes

h2o-core/src/main/java/hex/ModelBuilder.java Show resolved Hide resolved

h2o-py/h2o/model/model_base.py Outdated Show resolved Hide resolved

h2o-py/h2o/explanation/_explain.py Outdated Show resolved Hide resolved

tomasfryda added 3 commits March 19, 2021 11:07

Add prototype implementation of learning curve in R

5f2f507

Update R version

d4af1d8

Fix NPE when no scoring history available

398497b

tomasfryda added 18 commits March 19, 2021 11:09

Use glm_model_summary as model_summary in GAMs

c79b90c

Add tests and fix bugs

c1d532a

Fix python cv_ribbon default override

08ed93a

Change default colors and improve R legend

c44f710

Fix logic error in R

694ab40

Add coxPH and rename cv_individual_lines to cv_lines

28e4fe7

Add CoxPH and IsolationForest

019230d

Map stopping metric to metric in scoring history

9cbd15a

Adjust docstring

adcfcfe

Add examples to docstings and fix R cran check

059b00a

Fix legend in matplotlib2

e1ebe85

Incorporate suggestions from MLI meeting

19e8c0d

Add more docstrings and make logloss as default metric for multiple s…

dae0c4d

…cenarios

Copy TwoDimTable as in GAM instead of clone

37aee53

See hex.gam.GAMModel#copyTwoDimTable for the GAM's implementation.

Remove matplotlib import at the top of the _explain.py file

c94895d

Move GAM specific modification of ModelBase to h2o-bindings/bin/custo…

634e104

…m/python/gen_gam.py

Assign default implementation to learning curve plot that complains a…

2b0bd96

…bout missing matplotlib

Adapt to the new features from Wendy's PR

996acbe

tomasfryda force-pushed the tomf_pubdev-7831_learning_curve branch from 30fd5cb to 996acbe Compare March 19, 2021 12:54

tomasfryda changed the base branch from rel-zermelo to master March 19, 2021 12:54

tomasfryda commented Mar 19, 2021

View reviewed changes

tomasfryda added the 4RELEASE label Mar 19, 2021

tomasfryda changed the base branch from master to rel-zipf March 22, 2021 19:08

michalkurka reviewed Mar 22, 2021

View reviewed changes

michalkurka approved these changes Mar 22, 2021

View reviewed changes

michalkurka merged commit 8173e3f into rel-zipf Mar 22, 2021

exalate-issue-sync bot mentioned this pull request May 11, 2023

Add missing metrics to GLM scoring history #8459

Closed

h2o-ops mentioned this pull request May 14, 2023

Create better learning curve function in R and Python #7811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUBDEV-7831: Add new learning curve plotting function/method #5164

PUBDEV-7831: Add new learning curve plotting function/method #5164

tomasfryda commented Dec 1, 2020 •

edited

Loading

wendycwong commented Dec 10, 2020

wendycwong commented Dec 10, 2020

wendycwong commented Dec 10, 2020

tomasfryda commented Dec 11, 2020

tomasfryda commented Dec 14, 2020

tomasfryda commented Dec 16, 2020 •

edited

Loading

ledell left a comment •

edited

Loading

tomasfryda commented Jan 14, 2021

tomasfryda Jan 15, 2021

tomasfryda Mar 19, 2021

tomasfryda Mar 22, 2021

michalkurka Mar 22, 2021

michalkurka Mar 22, 2021

		parms._generate_scoring_history = true;
		parms._score_iteration_interval = (parms._valid == null) ? 5 : -1;

PUBDEV-7831: Add new learning curve plotting function/method #5164

PUBDEV-7831: Add new learning curve plotting function/method #5164

Conversation

tomasfryda commented Dec 1, 2020 • edited Loading

wendycwong commented Dec 10, 2020

wendycwong commented Dec 10, 2020

wendycwong commented Dec 10, 2020

tomasfryda commented Dec 11, 2020

tomasfryda commented Dec 14, 2020

tomasfryda commented Dec 16, 2020 • edited Loading

ledell left a comment • edited Loading

Choose a reason for hiding this comment

tomasfryda commented Jan 14, 2021

tomasfryda Jan 15, 2021

Choose a reason for hiding this comment

tomasfryda Mar 19, 2021

Choose a reason for hiding this comment

tomasfryda Mar 22, 2021

Choose a reason for hiding this comment

michalkurka Mar 22, 2021

Choose a reason for hiding this comment

michalkurka Mar 22, 2021

Choose a reason for hiding this comment

tomasfryda commented Dec 1, 2020 •

edited

Loading

tomasfryda commented Dec 16, 2020 •

edited

Loading

ledell left a comment •

edited

Loading