Fix flaky keras test #2926

harupy · 2020-06-12T05:47:32Z

What changes are proposed in this pull request?

test_model_save_load in test_keras_model_export.py fails when the gradients of a model explode during training and the prediction values become infinity. This PR aims to fix this issue by using a smaller learning rate.

https://github.com/mlflow/mlflow/pull/2914/checks?check_run_id=762832699#step:5:347

How is this patch tested?

(Details)

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for
Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: Local serving, model deployment tools, spark UDFs
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

codecov-commenter · 2020-06-12T06:07:05Z

Codecov Report

Merging #2926 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2926   +/-   ##
=======================================
  Coverage   85.04%   85.04%           
=======================================
  Files          20       20           
  Lines        1050     1050           
=======================================
  Hits          893      893           
  Misses        157      157

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7479f35...4fd81ba. Read the comment docs.

harupy · 2020-06-12T06:28:01Z

I wrote a notebook to verify that a small learning rate prevents gradients from exploding.

https://colab.research.google.com/drive/1a0b60Gk9ItEfQDluyi8W49xE6Uckf5eS?usp=sharing

aarondav

Looks OK. I wonder if we could also set a fixed random seed so this is deterministic?

harupy · 2020-06-13T02:32:53Z

@aarondav We could do that too. I'll add a fixture that fixes a random seed.

harupy added 2 commits June 12, 2020 14:46

Run test 100 times

5fe057c

Use execution_number to enable parametrize

12ece70

harupy added 3 commits June 12, 2020 16:37

Use a smaller learning rate

9fcf5d5

Remove fixture to repeat test

2243418

Remove execution_number

bc155f0

harupy marked this pull request as ready for review June 12, 2020 15:37

aarondav approved these changes Jun 12, 2020

View reviewed changes

aarondav added the needs author feedback Issue is waiting for the author to respond label Jun 12, 2020

stale bot removed the needs author feedback Issue is waiting for the author to respond label Jun 13, 2020

harupy added 4 commits June 13, 2020 11:45

Add a fixture to fix a random seed

d35ab6e

Use set_random_seed for tf < 2.0.0

c1ebbf2

Move tf imports

523f6aa

Add comment on why a small learning rate is used

4fd81ba

aarondav merged commit c196333 into mlflow:master Jun 15, 2020

smurching added the rn/none List under Small Changes in Changelogs. label Jun 18, 2020

avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020

Fix flaky keras test (mlflow#2926)

956c0aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky keras test #2926

Fix flaky keras test #2926

harupy commented Jun 12, 2020 •

edited

codecov-commenter commented Jun 12, 2020 •

edited

harupy commented Jun 12, 2020

aarondav left a comment

harupy commented Jun 13, 2020

Fix flaky keras test #2926

Fix flaky keras test #2926

Conversation

harupy commented Jun 12, 2020 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

codecov-commenter commented Jun 12, 2020 • edited

Codecov Report

harupy commented Jun 12, 2020

aarondav left a comment

Choose a reason for hiding this comment

harupy commented Jun 13, 2020

harupy commented Jun 12, 2020 •

edited

codecov-commenter commented Jun 12, 2020 •

edited