New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EarlyStopping integration to TensorFlow.Keras autologging #2301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. We should decide what to do regarding the tradeoff of:
- Log both loss & epoch_loss, accuracy & epoch_acc metrics, including in the non-early stopping case (the current behavior - IMO this might confuse users). We could make this better by only logging the extra
loss
&acc
metrics in the early-stopping case - Remove Tensorboard callback from Keras model to avoid generating
epoch_loss
,epoch_acc
metrics. This has the downside of us no longer generating Tensorboard logs as an artifact, & if the user does include a TensorBoard callback we'll still logepoch_loss
etc. - Provide special handling when logging metrics of the restored model after early stopping - log loss & accuracy with the
epoch_
prefix, & don't log an extra step of other metrics
Out of these options I think I like 3 the most, if it's not too complicated - let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We decided it was best to stick with option number 3 for simplicity. Note that the callback logs |
* Add autodetection of job environments to R client (mlflow#2272) * R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name * EarlyStopping Callback support for Keras Autologging (mlflow#2219) Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step. * Add REPL-aware listener for Spark datasource autologging (mlflow#2249) Add REPL-aware listener for Spark datasource autologging (mlflow#2249) * Fix XGBoost and LightGBM flavor tests (mlflow#2244) Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task. * Add status in RunView and ExperimentRunsTable (mlflow#1816) * Changes needed to support the sqlplugin (mlflow#2285) * Changes needed to support the sqlplugin * Edited plugin name in setup file * Updated plugin name in setup file Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * Elevate how to load mleap flavor (mlflow#2211) * Elevate how to load mleap flavor. * Load using loadPipeline. * Get rid of extra char. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * pillow fix (mlflow#2307) * Add EarlyStopping integration to TensorFlow.Keras autologging (mlflow#2301) Merging TF.Keras EarlyStopping integration * Document MLflow plugin system (mlflow#2270) Adds an mlflow.org doc page explaining how to write & use MLflow plugins. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com> Co-authored-by: Avrilia Floratou <avflor@microsoft.com> Co-authored-by: Stephanie Bodoff <stephanie.bodoff@databricks.com>
…#2301) Merging TF.Keras EarlyStopping integration
What changes are proposed in this pull request?
Mirrors work on #2219 by adding EarlyStopping callback integration to TensorFlow.Keras (both 1.X and 2.X). Does not include
fit_generator()
support.How is this patch tested?
Unit tests written in
tests/tensorflow_autolog/test_tensorflow_autolog.py
andtests/tensorflow_autolog/test_tensorflow2_autolog.py
Release Notes
Is this a user-facing change?
Same as #2219 , but this is for TensorFlow.Keras.
What component(s) does this PR affect?
How should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes