New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix artifact URI constructions in FileStore and SqlAlchemyStore #2221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me. A few things:
- Left some inline questions / comments - only 1 blocking question.
- Would you mind adding some details of the manual test in the PR description (especially where/if it tests what unit tests do not cover)?
- Not sure if the test failure is flaky but it looks somewhat real.
- I agree the test cases could be a bit more modularized and de-duped, but the coverage is good so I'm ok merging with the tests as is.
:return: A new URI with a path component consisting of the specified `paths` appended to | ||
the path component of the specified `uri`. | ||
|
||
>>> uri1 = "s3://root/base/path?param=value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will file a followup PR to fix this, since we're close to branch cut.
>>> uri1 = "s3://root/base/path?param=value" | ||
>>> uri1 = append_to_uri_path(uri, "some/subpath", "/anotherpath") | ||
>>> assert uri1 == "s3://root/base/path/some/subpath/anotherpath?param=value" | ||
>>> uri2 = "a/posix/path" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will file a followup PR to fix this, since we're close to branch cut.
>>> result3 = _join_posixpaths_and_append_absolute_suffixes("/absolutepath", "relpath") | ||
>>> assert result3 == "/absolutepath/relpath" | ||
>>> result4 = _join_posixpaths_and_append_absolute_suffixes("/absolutepath1", "/absolutepath2") | ||
>>> assert result4 == "/absolutepath1/absolutepath2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have unit tests for these - can't just trust the docs :-) it is covered by the million test cases here though so if we need to get this in i'm ok with it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Included these here to demonstrate how the function works. The existing unit tests should cover these cases.
("//path", "/subpath", "//path/subpath"), | ||
("///path", "/subpath", "///path/subpath"), | ||
("/path", "/subpath/subdir", "/path/subpath/subdir"), | ||
("file:path", "", "file:path/"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's interesting that for 1-level deep paths we return a trailing "/" and for deeper paths we don't. is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and more importantly, are there any discrepancies in downstream behaviors this difference can cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be caused by the behavior of posixpath.join()
. I tried the following cases out in Python2 and Python 3:
>>> import posixpath
>>> posixpath.join("base", "")
'base/'
>>> posixpath.join("base/", "")
'base/'
>>> posixpath.join("/base", "",)
'/base/'
Given that we were previously leveraging posixpath.join()
for the FileStore and SqlAlchemyStore and that multiple artifact stores rely on it for URI path component construction, this behavior seems consistent with pre-existing behavior and is (hopefully) safe.
@sueann Thanks for reviewing! I've updated the PR description to include my manual test plan (already executed). Regarding the failing test, the errant case appears to be:
This is a known flaky test that will hopefully be fixed by @jdlesage in #2223. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - thanks!
* Replace ValueError check with Exception check for JSON.loads in HTTP requests (mlflow#2204) * Replace ValueError check with Exception check for JSON.loads in HTTP requests When an HTTP request is executed, it tries to parse an error response as JSON. This check only handles ValueError. `json.loads` can throw multiple types of errors and is not specific about which. Therefore, I've switched to catch the `Exception`. * Add killed status constraint (mlflow#2110) * Add killed status constraint * Fix migration script * Do migration only if check constraints are defined * Use alter_column to replace constraints * Apply previous constraints with the table_args arguments * Specify `native_enum=False` * Update tests * Update sql latest schema * relaunch travis * relaunch travis * Fix mlflow.keras.save_model with DBFS (mlflow#2216) Fixed a bug where mlflow.keras.save_model failed against "/dbfs/..." paths due to the lack of support for random writes when using DBFS FUSE. * Fix handling of empty directories in fs based artifact repositories. (mlflow#1891) * Fix handling of empty directories in artifact filesystem based artifact repositories. * Fixed tests. * Update. * Update. * nits * nit * Fix python27 compatibility. * fix * Fix download artifacts behavior and minor bug in s3 repo behavior. * Removed debug print. * Updated test. * Updated test for local artifact repo. * nit * nit: Removed outdated comment. * Fix artifact URI constructions in FileStore and SqlAlchemyStore (mlflow#2221) * append_uri_to_path with tons of tests * Store changes with test cases * Test multi-arg path * Lint * docs casing * Revert * Docstring * Add Java implementation of Spark datasource autologging (mlflow#2181) Adds Java implementation of Spark datasource autologging as described in https://docs.google.com/document/d/11nhwZtj-rps0stxuIioFBM9lkvIh_ua45cAFy_PqdHU/edit under a new mlflow-spark maven package. The current implementation notifies subscribers with datasource path, version, and format information. A follow-up PR will add logic to register a subscriber from Python when mlflow.spark.autolog() is called. * registry.html points to model-registry.html (mlflow#2232) * registry.html points to model-registry.html * 1.5 Changelog Changes (mlflow#2231) Add changelog for MLflow 1.5 Co-authored-by: Andy Chow <58712524+andychow-db@users.noreply.github.com> Co-authored-by: Jean-Denis Lesage <jdlesage@gmail.com> Co-authored-by: tomasatdatabricks <33237569+tomasatdatabricks@users.noreply.github.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Mani Parkhe <mani@databricks.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com>
…ow#2221) * append_uri_to_path with tons of tests * Store changes with test cases * Test multi-arg path * Lint * docs casing * Revert * Docstring
…ow#2221) * append_uri_to_path with tons of tests * Store changes with test cases * Test multi-arg path * Lint * docs casing * Revert * Docstring
What changes are proposed in this pull request?
Currently, the FileStore and SqlAlchemyStore construct experiment and run artifact root URIs by appending the experiment ID or run artifact path to the store's artifact root URI via the
posixpath.join
function. Unfortunately, this behavior is not aware of URI structures such as queries and fragments. As a result, for an artifact root ofdb://root:password@localhost:port/dbname?driver=odbc
, experiment URIs take the formdb://root:password@localhost:port/dbname?driver=odbc/{experiment_id}
, and run artifact URIs take the formdb://root:password@localhost:port/dbname?driver=odbc/{experiment_id}/{run_id}/artifacts
. We see that the experiment and run information is being appended to the query component of the URI, rather than the path.This PR fixes the behavior to correctly append experiment and run details to the path component of artifact root URIs. For the case above, the stores will now construct experiment and run artifact URIs as follows:
How is this patch tested?
Thorough unit tests, extensive manual testing
Manual test plan
This PR was manually tested by verifying the correct behavior of the MLflow artifact and model . tracking APIs against artifact stores with the following URIs:
mssql+pyodbc://root:password@localhost:1433/dbname?driver=ODBC+Driver+17+for+SQL+Server
, after merging these changes with a copy of the pending DB artifact storage plugin branch (https://github.com/dbczumar/mlflow/tree/dbartifacts_uri_update).For each case, I ran the following code (replacing the artifact location with the specified URI):
I then verified that artifacts were written to the appropriate storage locations (e.g., S3, local filesystem, SQL server) and that the script produced correct output of the form:
Release Notes
Is this a user-facing change?
What component(s) does this PR affect?
How should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes