Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221

dbczumar · 2019-12-18T10:11:59Z

What changes are proposed in this pull request?

Currently, the FileStore and SqlAlchemyStore construct experiment and run artifact root URIs by appending the experiment ID or run artifact path to the store's artifact root URI via the posixpath.join function. Unfortunately, this behavior is not aware of URI structures such as queries and fragments. As a result, for an artifact root of db://root:password@localhost:port/dbname?driver=odbc, experiment URIs take the form db://root:password@localhost:port/dbname?driver=odbc/{experiment_id}, and run artifact URIs take the form db://root:password@localhost:port/dbname?driver=odbc/{experiment_id}/{run_id}/artifacts. We see that the experiment and run information is being appended to the query component of the URI, rather than the path.

This PR fixes the behavior to correctly append experiment and run details to the path component of artifact root URIs. For the case above, the stores will now construct experiment and run artifact URIs as follows:

db://root:password@localhost:port/dbname/{experiment_id}?driver=odbc
db://root:password@localhost:port/dbname/{experiment_id}/{run_id}/artifacts?driver=odbc

How is this patch tested?

Thorough unit tests, extensive manual testing

Manual test plan
This PR was manually tested by verifying the correct behavior of the MLflow artifact and model . tracking APIs against artifact stores with the following URIs:

The current working directory (local filesystem)
A specific local filesystem directory ("/tmp/directory@")
An Amazon S3 bucket location of the form ("s3://bucket_name/artifact/root/path")
A Microsoft SQL Server database running in a Docker container with the following URI: mssql+pyodbc://root:password@localhost:1433/dbname?driver=ODBC+Driver+17+for+SQL+Server, after merging these changes with a copy of the pending DB artifact storage plugin branch (https://github.com/dbczumar/mlflow/tree/dbartifacts_uri_update).

For each case, I ran the following code (replacing the artifact location with the specified URI):

import mlflow
import mlflow.pyfunc
from mlflow.tracking.artifact_utils import _download_artifact_from_uri
from mlflow.tracking.client import MlflowClient
from mlflow.pyfunc import PythonModel

client = MlflowClient()

exp = client.create_experiment("s3exp", artifact_location="s3://artifact-testing-for-mlflow-bucket/subpath/root")


mlflow.set_experiment("s3exp")

with mlflow.start_run():

    class Mod(PythonModel):
        
        def predict(self, ctx, inp):
            return 7

    mlflow.pyfunc.log_model(
        python_model=Mod(),
        artifact_path="mymodel")

    model_uri = "runs:/{run_id}/mymodel".format(run_id=mlflow.active_run().info.run_id)

print(mlflow.pyfunc.load_model(model_uri))

with mlflow.start_run():
    local_path = "/tmp/myartifact.txt"
    with open(local_path, "w") as f:
        f.write("MY FILE")

    mlflow.log_artifact(local_path, "artifact/sublocation")
    artifact_uri = "runs:/{run_id}/artifact/sublocation/myartifact.txt".format(run_id=mlflow.active_run().info.run_id)
    downloaded_location = _download_artifact_from_uri(artifact_uri)
    with open(downloaded_location, "r") as f:
        print(f.read())

I then verified that artifacts were written to the appropriate storage locations (e.g., S3, local filesystem, SQL server) and that the script produced correct output of the form:

<mlflow.pyfunc.model._PythonModelPyfuncWrapper object at 0x1100051d0>
MY FILE

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

sueann

Looks mostly good to me. A few things:

Left some inline questions / comments - only 1 blocking question.
Would you mind adding some details of the manual test in the PR description (especially where/if it tests what unit tests do not cover)?
Not sure if the test failure is flaky but it looks somewhat real.
I agree the test cases could be a bit more modularized and de-duped, but the coverage is good so I'm ok merging with the tests as is.

sueann · 2019-12-18T17:15:01Z

mlflow/utils/uri.py

+    :return: A new URI with a path component consisting of the specified `paths` appended to
+             the path component of the specified `uri`.
+
+    >>> uri1 = "s3://root/base/path?param=value"


Good catch! Will file a followup PR to fix this, since we're close to branch cut.

sueann · 2019-12-18T17:15:15Z

mlflow/utils/uri.py

+    >>> uri1 = "s3://root/base/path?param=value"
+    >>> uri1 = append_to_uri_path(uri, "some/subpath", "/anotherpath")
+    >>> assert uri1 == "s3://root/base/path/some/subpath/anotherpath?param=value"
+    >>> uri2 = "a/posix/path"


Good catch! Will file a followup PR to fix this, since we're close to branch cut.

sueann · 2019-12-18T17:38:03Z

mlflow/utils/uri.py

+    >>> result3 = _join_posixpaths_and_append_absolute_suffixes("/absolutepath", "relpath")
+    >>> assert result3 == "/absolutepath/relpath"
+    >>> result4 = _join_posixpaths_and_append_absolute_suffixes("/absolutepath1", "/absolutepath2")
+    >>> assert result4 == "/absolutepath1/absolutepath2"


we should have unit tests for these - can't just trust the docs :-) it is covered by the million test cases here though so if we need to get this in i'm ok with it as is.

Good point! Included these here to demonstrate how the function works. The existing unit tests should cover these cases.

sueann · 2019-12-18T17:45:10Z

tests/utils/test_uri.py

+        ("//path", "/subpath", "//path/subpath"),
+        ("///path", "/subpath", "///path/subpath"),
+        ("/path", "/subpath/subdir", "/path/subpath/subdir"),
+        ("file:path", "", "file:path/"),


it's interesting that for 1-level deep paths we return a trailing "/" and for deeper paths we don't. is that intentional?

and more importantly, are there any discrepancies in downstream behaviors this difference can cause?

This appears to be caused by the behavior of posixpath.join(). I tried the following cases out in Python2 and Python 3:

>>> import posixpath >>> posixpath.join("base", "") 'base/' >>> posixpath.join("base/", "") 'base/' >>> posixpath.join("/base", "",) '/base/'

Given that we were previously leveraging posixpath.join() for the FileStore and SqlAlchemyStore and that multiple artifact stores rely on it for URI path component construction, this behavior seems consistent with pre-existing behavior and is (hopefully) safe.

dbczumar · 2019-12-18T21:24:25Z

@sueann Thanks for reviewing! I've updated the PR description to include my manual test plan (already executed). Regarding the failing test, the errant case appears to be:

tests/tensorflow/test_tensorflow_model_export.py::test_iris_data_model_can_be_loaded_and_evaluated_as_pyfunc FAILED

This is a known flaky test that will hopefully be fixed by @jdlesage in #2223.

sueann

lgtm - thanks!

* Replace ValueError check with Exception check for JSON.loads in HTTP requests (mlflow#2204) * Replace ValueError check with Exception check for JSON.loads in HTTP requests When an HTTP request is executed, it tries to parse an error response as JSON. This check only handles ValueError. `json.loads` can throw multiple types of errors and is not specific about which. Therefore, I've switched to catch the `Exception`. * Add killed status constraint (mlflow#2110) * Add killed status constraint * Fix migration script * Do migration only if check constraints are defined * Use alter_column to replace constraints * Apply previous constraints with the table_args arguments * Specify `native_enum=False` * Update tests * Update sql latest schema * relaunch travis * relaunch travis * Fix mlflow.keras.save_model with DBFS (mlflow#2216) Fixed a bug where mlflow.keras.save_model failed against "/dbfs/..." paths due to the lack of support for random writes when using DBFS FUSE. * Fix handling of empty directories in fs based artifact repositories. (mlflow#1891) * Fix handling of empty directories in artifact filesystem based artifact repositories. * Fixed tests. * Update. * Update. * nits * nit * Fix python27 compatibility. * fix * Fix download artifacts behavior and minor bug in s3 repo behavior. * Removed debug print. * Updated test. * Updated test for local artifact repo. * nit * nit: Removed outdated comment. * Fix artifact URI constructions in FileStore and SqlAlchemyStore (mlflow#2221) * append_uri_to_path with tons of tests * Store changes with test cases * Test multi-arg path * Lint * docs casing * Revert * Docstring * Add Java implementation of Spark datasource autologging (mlflow#2181) Adds Java implementation of Spark datasource autologging as described in https://docs.google.com/document/d/11nhwZtj-rps0stxuIioFBM9lkvIh_ua45cAFy_PqdHU/edit under a new mlflow-spark maven package. The current implementation notifies subscribers with datasource path, version, and format information. A follow-up PR will add logic to register a subscriber from Python when mlflow.spark.autolog() is called. * registry.html points to model-registry.html (mlflow#2232) * registry.html points to model-registry.html * 1.5 Changelog Changes (mlflow#2231) Add changelog for MLflow 1.5 Co-authored-by: Andy Chow <58712524+andychow-db@users.noreply.github.com> Co-authored-by: Jean-Denis Lesage <jdlesage@gmail.com> Co-authored-by: tomasatdatabricks <33237569+tomasatdatabricks@users.noreply.github.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Mani Parkhe <mani@databricks.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com>

…ow#2221) * append_uri_to_path with tons of tests * Store changes with test cases * Test multi-arg path * Lint * docs casing * Revert * Docstring

dbczumar added 7 commits December 18, 2019 00:20

append_uri_to_path with tons of tests

d335991

Store changes with test cases

3b704b5

Test multi-arg path

d947de4

Lint

07f0ece

docs casing

c7dff28

Revert

60f126a

Docstring

96d1f07

dbczumar added the rn/none List under Small Changes in Changelogs. label Dec 18, 2019

dbczumar requested review from smurching, sueann and tomasatdatabricks December 18, 2019 10:12

sueann reviewed Dec 18, 2019

View reviewed changes

sueann approved these changes Dec 18, 2019

View reviewed changes

dbczumar merged commit 5187d7c into mlflow:master Dec 18, 2019

dbczumar mentioned this pull request Dec 21, 2019

Use append_to_uri_path for URI construction in other relevant locations #2237

Merged

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221

Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221

dbczumar commented Dec 18, 2019 •

edited

sueann left a comment

sueann Dec 18, 2019

dbczumar Dec 18, 2019

sueann Dec 18, 2019

dbczumar Dec 18, 2019

sueann Dec 18, 2019

dbczumar Dec 18, 2019

sueann Dec 18, 2019

sueann Dec 18, 2019

dbczumar Dec 18, 2019

dbczumar commented Dec 18, 2019 •

edited

sueann left a comment

Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221

Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221

Conversation

dbczumar commented Dec 18, 2019 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

sueann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar commented Dec 18, 2019 • edited

sueann left a comment

Choose a reason for hiding this comment

dbczumar commented Dec 18, 2019 •

edited

dbczumar commented Dec 18, 2019 •

edited