New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable mlflow-artifacts scheme as wrapper around http artifact scheme #5070
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
||
def __init__(self, artifact_uri): | ||
tracking_uri = get_tracking_uri() | ||
resolved_artifacts_uri = artifact_uri.replace("mlflow-artifacts:", f"{tracking_uri}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only perform this replacement if the URI doesn't specify a host. We also need to handle the case where the URI has one or three slashes after the scheme.
Right now, if the URI specifies a host (e.g. mlflow-artifacts://myhostname:5000
), the following behavior occurs:
In [11]: MlflowArtifactsRepository("mlflow-artifacts://foo:5050/my/artifact/path").artifact_uri
Out[11]: 'http://localhost:5000//foo:5050/my/artifact/path'
Similarly, if a hostname-less mlflow-artifacts
URI is specified with one or three slashes after the scheme, the following behaviors occur:
In [14]: mlflow.set_tracking_uri("http://localhost:5000/")
In [15]: MlflowArtifactsRepository("mlflow-artifacts:/my/artifact/path").artifact_uri
Out[15]: 'http://localhost:5000//my/artifact/path'
In [16]: MlflowArtifactsRepository("mlflow-artifacts:///my/artifact/path").artifact_uri
Out[16]: 'http://localhost:5000////my/artifact/path'
While the extra slashes in the path shouldn't be harmful, it would be nice to remove them.
To clarify re:hostnames - when a user supplies an artifact URI of the form mlflow-artifacts://<hostname>/<path>
or mlflow-artifacts://<hostname>:<port>/<path>
, we should resolve the URI to http://<hostname>/api/2.0/mlflow-artifacts/artifacts/<path>
or http://<hostname>:<port>/api/2.0/mlflow-artifacts/artifacts/<path>
respectively.
Can we add test cases for each of the above scenarios as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some... (not so minor) changes to handle resolving all of the different ways that this uri can be configured. I have a validation check in here for raising an Exception if only a port is passed; I'd really like an opinion on whether this is needed or if it's a silly thing to check for.
Also, should I take those private functions and make them class methods? I don't see those functions being particularly useful outside of this sub class of HttpArtifactRepository.
@dbczumar @harupy any thoughts / advice on these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a validation check in here for raising an Exception if only a port is passed.
This sounds good to me!
should I take those private functions and make them class methods?
I don't have a strong preference between private functions and class methods. The current code looks ok to me.
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
uri = MlflowArtifactsRepository( # pylint: disable=unused-variable | ||
failing_condition | ||
).artifact_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri = MlflowArtifactsRepository( # pylint: disable=unused-variable | |
failing_condition | |
).artifact_uri | |
MlflowArtifactsRepository(failing_condition) |
nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to just the class instantiation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
||
@classmethod | ||
def resolve_uri(cls, artifact_uri): | ||
tracking_uri = get_tracking_uri() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we raise an exception if the scheme of tracking_uri
is not http
or https
(e.g. sqlite:///foo/bar
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea :) added! I also simplified the parsing logic and temporarily set the tracking uri in the test function to validate against a valid server address (then reverted the global variable to not affect other tests with an inadvertent side-effect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! A tracking URI looks like http://localhost:5000
and doesn't contain /api/2.0/mlflow-artifacts/artifacts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the tests and cleaned all of that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Sorry I wasn't clear. The resolved artifact URI needs to have /api/2.0/mlflow-artifacts/artifacts
(e.g. http://localhost:5000/api/2.0/mlflow-artifacts/artifacts/path/to/dir
).
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…i parsing Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…-artifact-scheme-ml-18423 Updating to new version of black Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
I installed the new version of black after merging master into this branch. Black on local (21.10b0) suggests no changes, but the lint test is failing in CI. @harupy have you seen this since the merge of the black update? |
Does
|
f"mlflow-artifacts://{base_path}/redundant", | ||
f"{tracking_uri.scheme}://{tracking_uri.netloc}/{base_api_uri}{base_path}/redundant", | ||
), | ||
("mlflow-artifacts:/", uri_temp,), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("mlflow-artifacts:/", uri_temp,), | |
("mlflow-artifacts:/", uri_temp,), |
@BenWilson2 Removing the command here should fix the lint issue.
print(f"\nuri_parse: {uri_parse}") | ||
print(f"track_parse: {track_parse}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"\nuri_parse: {uri_parse}") | |
print(f"track_parse: {track_parse}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted my debug ;)
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Ben Wilson benjamin.wilson@databricks.com
What changes are proposed in this pull request?
Support mlflow-artifacts: scheme for proxied mlflow server artifact handling functionality.
The implementation is a wrapper around the HTTP artifact repository scheme.
This supports both a 'basic scheme' (user leaving the resolving server address out of the uri) and approaches that retain the host (and/or port) for replacing the scheme with the originating registered uri for the artifact server.
How is this patch tested?
Unit tests
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Support for mlflow-artifacts scheme added (i.e., mlflow.create_experiment(name="my-experiment", artifact_location="mlflow-artifacts:/root/models/my-experiment-models/model-1"). Defining host, port, or both is also supported.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes