Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EarlyStopping integration to TensorFlow.Keras autologging #2301

Merged
merged 9 commits into from Jan 15, 2020

Conversation

juntai-zheng
Copy link
Collaborator

What changes are proposed in this pull request?

Mirrors work on #2219 by adding EarlyStopping callback integration to TensorFlow.Keras (both 1.X and 2.X). Does not include fit_generator() support.

How is this patch tested?

Unit tests written in tests/tensorflow_autolog/test_tensorflow_autolog.py and tests/tensorflow_autolog/test_tensorflow2_autolog.py

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Same as #2219 , but this is for TensorFlow.Keras.

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. We should decide what to do regarding the tradeoff of:

  1. Log both loss & epoch_loss, accuracy & epoch_acc metrics, including in the non-early stopping case (the current behavior - IMO this might confuse users). We could make this better by only logging the extra loss & acc metrics in the early-stopping case
  2. Remove Tensorboard callback from Keras model to avoid generating epoch_loss, epoch_acc metrics. This has the downside of us no longer generating Tensorboard logs as an artifact, & if the user does include a TensorBoard callback we'll still log epoch_loss etc.
  3. Provide special handling when logging metrics of the restored model after early stopping - log loss & accuracy with the epoch_ prefix, & don't log an extra step of other metrics

Out of these options I think I like 3 the most, if it's not too complicated - let me know what you think.

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@juntai-zheng
Copy link
Collaborator Author

We decided it was best to stick with option number 3 for simplicity. Note that the callback logs epoch_loss and epoch_acc for TF 1.X, but loss and acc for TF 2.X.

@juntai-zheng juntai-zheng merged commit 3bec7c1 into mlflow:master Jan 15, 2020
@juntai-zheng juntai-zheng added the rn/feature Mention under Features in Changelogs. label Jan 15, 2020
amrqt added a commit to criteo-forks/mlflow that referenced this pull request Jan 16, 2020
* Add autodetection of job environments to R client (mlflow#2272)

* R client detection

* Efficiency

* Simplify

* Return tags

* Add test cases

* Lint

* Tweak test name

* EarlyStopping Callback support for Keras Autologging (mlflow#2219)

Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step.

* Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

* Fix XGBoost and LightGBM flavor tests (mlflow#2244)

Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task.

* Add status in RunView and ExperimentRunsTable (mlflow#1816)

* Changes needed to support the sqlplugin (mlflow#2285)

* Changes needed to support the sqlplugin

* Edited plugin name in setup file

* Updated plugin name in setup file

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

* Elevate how to load mleap flavor (mlflow#2211)

* Elevate how to load mleap flavor.

* Load using loadPipeline.

* Get rid of extra char.

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

* pillow fix (mlflow#2307)

* Add EarlyStopping integration to TensorFlow.Keras autologging (mlflow#2301)

Merging TF.Keras EarlyStopping integration

* Document MLflow plugin system (mlflow#2270)

Adds an mlflow.org doc page explaining how to write & use MLflow plugins.

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com>
Co-authored-by: Siddharth Murching <smurching@gmail.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com>
Co-authored-by: Avrilia Floratou <avflor@microsoft.com>
Co-authored-by: Stephanie Bodoff <stephanie.bodoff@databricks.com>
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants