Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tensorflow 2.5.2 cross test failure #4995

Merged
merged 7 commits into from
Nov 4, 2021

Conversation

WeichenXu123
Copy link
Collaborator

@WeichenXu123 WeichenXu123 commented Nov 3, 2021

Signed-off-by: Weichen Xu weichen.xu@databricks.com

What changes are proposed in this pull request?

Fix tensorflow 2.5.2 cross test failure
Tensorflow release 2.5.2 today https://pypi.org/project/tensorflow/2.5.2/ , and it cause this failure happen.

How is this patch tested?

Unit tests.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs. labels Nov 3, 2021
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@@ -912,7 +912,7 @@ def test_import_tensorflow_with_fluent_autolog_enables_tf_autologging():

# NB: For backwards compatibility, fluent autologging enables TensorFlow and
# Keras autologging upon tensorflow import in TensorFlow 2.5.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are also outdated here.

@@ -927,7 +927,7 @@ def test_import_tf_keras_with_fluent_autolog_enables_tf_autologging():

# NB: For backwards compatibility, fluent autologging enables TensorFlow and
# Keras autologging upon tf.keras import in TensorFlow 2.5.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -912,7 +912,7 @@ def test_import_tensorflow_with_fluent_autolog_enables_tf_autologging():

# NB: For backwards compatibility, fluent autologging enables TensorFlow and
# Keras autologging upon tensorflow import in TensorFlow 2.5.1
if Version(tf.__version__) != Version("2.5.1"):
if Version(tf.__version__) >= Version("2.6"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we also want to disable Version(tf.__version__) < Version("2.5.1")? it looks like only 2.5.1 and 2.5.2 are causing failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbczumar
I think previous code logic here is wrong. Only >= 2.6 version will redirect keras autologging to tensorflow autologging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @liangz1

For more context, you can read this PR:
#4766

Simple explanation:
In Tensorflow >= 2.6, the Tensorflow embedded keras will be linked to the solely installed keras (in contrast, previous Tensorflow embedded keras was a different module with the solely installed keras)

The change in Tensorflow >= 2.6, cause the mlflow.tensorflow.autolog (including patching on tf and embeded tf.keras) and mlflow.keras.autolog (patching on keras ) conflicts. So, as a workaround, for TF >= 2.6, we disable mlflow.keras.autolog, but instead, we only trigger mlflow.tensorflow.autolog

Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @WeichenXu123 ! Can we update the comments in the tests as well, as @liangz1 pointed out?

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Comment on lines 930 to 931
if Version(tf.__version__) >= Version("2.6"):
assert autologging_is_disabled(mlflow.keras.FLAVOR_NAME)
Copy link
Member

@harupy harupy Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 @dbczumar
In tensorflow < 2.6, does import tensorflow.keras enable keras autologging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
# NB: In Tensorflow >= 2.6, we redirect keras autologging to tensorflow autologging
# so the original keras autologging is disabled
if Version(tf.__version__) >= Version("2.6"):
import keras # pylint: disable=unused-variable,unused-import
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add import keras to make test following test autologging_is_disabled(mlflow.keras.FLAVOR_NAME) meaningful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does import tensorflow.keras run import keras or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to create a separate test (e.g. test_import_keras_with_fluent_autolog_does_not_enable_keras_autologging)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does import tensorflow.keras run import keras or not?

For TF >= 2.6, Yes.
Remove import keras under this case.

Copy link
Collaborator Author

@WeichenXu123 WeichenXu123 Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to create a separate test (e.g. test_import_keras_with_fluent_autolog_does_not_enable_keras_autologging)

We already has it. "test_import_keras_with_fluent_autolog_enables_tensorflow_autologging"

The test test_import_tensorflow_with_fluent_autolog_enables_tf_autologging is a bit different, it want to test we import tensorflow first (invoke set_up_tensorflow_autologging), then import keras( invoke conditionally_set_up_keras_autologging), different invoking order leads to different code path coverage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already has it. "test_import_keras_with_fluent_autolog_enables_tensorflow_autologging"

Ah I missed that.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Collaborator Author

Seems a new error (irrelative to this PR) happen:

E   tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.

This happen under TF 2.6 installed (it will install a incompatible keras version (keras 2.7) as dependency)

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@harupy
Copy link
Member

harupy commented Nov 4, 2021

This happen under TF 2.6 installed (it will install a incompatible keras version (keras 2.7) as dependency)

Found related issue: tensorflow/tensorflow#52922

@BenWilson2
Copy link
Member

TF 2.6.2 released to pypi about 2 hours ago with a version fix in dependencies.

@BenWilson2 BenWilson2 merged commit 5c61208 into mlflow:master Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants