Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove url casting in local artifact storage path #4890

Merged
merged 11 commits into from
Oct 14, 2021

Conversation

BenWilson2
Copy link
Member

Signed-off-by: Ben Wilson benjamin.wilson@databricks.com

What changes are proposed in this pull request?

Removed the casting to url-encoded pathing in urllib.request.pathname2url() and replaced with os.path.normpath() so that artifacts that have non-url-compliant characters will not get unicode-converted and result in retrieval with pyfunc operators to fail.
Fixes: [4785, 4879]

How is this patch tested?

Unit tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Fixes issues with logging and retrieving pyfunc flavors where the model name value has non-url-compliant characters within its name. (i.e., "sklearn:model", "sklearn model") which would before have been converted (e.g., "sklearn model" -> "sklearn%20model") and would cause a path resolution error on loading.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

…ocal file location

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 added the bug Something isn't working label Oct 12, 2021
@github-actions github-actions bot added area/artifacts Artifact stores and artifact logging area/models MLmodel format, model serialization/deserialization, flavors area/tracking Tracking service, tracking client APIs, autologging rn/bug-fix Mention under Bug Fixes in Changelogs. labels Oct 12, 2021
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 removed the bug Something isn't working label Oct 12, 2021
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…asting for non windows paths

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…del uri for drive considerations.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@@ -62,7 +62,16 @@ def _download_artifact_from_uri(artifact_uri, output_path=None):
a local output path will be created.
"""
if os.path.exists(artifact_uri):
artifact_uri = path_to_local_file_uri(artifact_uri)
if os.name != "nt":
# If we're dealing with local files, just reference the direct pathing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this inline comment to indicate why we have special treatment for Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly

Comment on lines 375 to 378
if os.name != "nt":
path = os.path.normpath(path)
else:
path = pathname2url(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to make this change now that we have logic in _download_artifacts_from_uri to address the issue? If it's not absolutely necessary, I think we should revert this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key to getting the windows tests to pass. I did a run with this logic disabled and the vast majority of those tests failed on the windows env.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be critical to getting windows to pass? The logic for windows remains exactly the same when this change is applied.

@@ -380,7 +380,8 @@ def test_model_log_without_specified_conda_env_uses_default_env_with_expected_de

def test_pyfunc_serve_and_score(prophet_model):

artifact_path = "model"
# Verify that a name with non-url-compliant characters can be handled properly
artifact_path = "model : prophet"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case for _download_artifact_from_uri rather than including test coverage in test_prophet_model_export?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most definitely.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines 375 to 380
if (
os.name != "nt"
): # Local file paths for non-nt based systems do not need url encoding to resolve paths.
path = os.path.normpath(path)
else: # nt-based systems need url-enconding to convert non-supported char types in file names and paths.
path = pathname2url(path)
Copy link
Collaborator

@dbczumar dbczumar Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 I still don't think we need this logic for windows tests to pass, since the logic is the same for windows users with & without this change applied.

Can we remove this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally forgot about that last minute change I did last night with the additional wrapper in artifact_utils. Reverting this.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@@ -380,6 +380,7 @@ def test_model_log_without_specified_conda_env_uses_default_env_with_expected_de

def test_pyfunc_serve_and_score(prophet_model):

# Verify that a name with non-url-compliant characters can be handled properly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this inline comment?

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…ting functionality.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @BenWilson2 !

@BenWilson2 BenWilson2 merged commit a633ace into master Oct 14, 2021
@BenWilson2 BenWilson2 deleted the pyfunc-logmodel-reconcile branch October 14, 2021 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Artifact stores and artifact logging area/models MLmodel format, model serialization/deserialization, flavors area/tracking Tracking service, tracking client APIs, autologging rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants