-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw Exception When Invalid URI with Databricks Scheme Provided #4877
Conversation
Signed-off-by: Yun Park <yun@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small unit test I'd recommend adding
Signed-off-by: Yun Park <yun@databricks.com>
@@ -48,7 +48,6 @@ def test_extract_db_type_from_uri(): | |||
("nondatabricks://profile:prefix", (None, None)), | |||
("databricks://profile", ("profile", None)), | |||
("databricks://profile/", ("profile", None)), | |||
("databricks://", ("", None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we were technically allowing this case to be ~= "databricks", because of the bug/loophole that was the source of the problem this PR is fixing (we just default to the default profile if there is no netloc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add in the release notes which cases used to work (sort of as intended) but will now be given an error instead? I think databricks:/
databricks://
and databricks:/default
(or databricks:/databricks
??) might be those? (should confirm the last two though)
mlflow/utils/uri.py
Outdated
@@ -68,6 +68,9 @@ def get_db_info_from_uri(uri): | |||
""" | |||
parsed_uri = urllib.parse.urlparse(uri) | |||
if parsed_uri.scheme == "databricks": | |||
# netloc should not be an empty string unless URI is formatted incorrectly. | |||
if parsed_uri.netloc == "": | |||
raise MlflowException("URI is formatted incorrectly: no netloc in URI '%s'." % uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest that it may be because they only have one slash?
@@ -48,7 +48,6 @@ def test_extract_db_type_from_uri(): | |||
("nondatabricks://profile:prefix", (None, None)), | |||
("databricks://profile", ("profile", None)), | |||
("databricks://profile/", ("profile", None)), | |||
("databricks://", ("", None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we were technically allowing this case to be ~= "databricks", because of the bug/loophole that was the source of the problem this PR is fixing (we just default to the default profile if there is no netloc).
@sueann - Addressed your comments! |
Signed-off-by: Yun Park yun@databricks.com
What changes are proposed in this pull request?
When a user uses a single slash in the URI when referencing the registry URI in Databricks, e.g. databricks:/profile:prefix, it incorrectly references the local registry. The user most likely intended to not use the local registry but the registry provided with a correctly formatted URI.
How is this patch tested?
(Details)
Release Notes
Is this a user-facing change?
Tracking and registry URIs provided to the client that reference a Databricks endpoint (useing the
databricks
URI scheme) must be correctly formatted using 2 slashes after the colon instead of 1. Previously, if URIs such asdatabricks://
,databricks:/
,databricks:/default
ordatabricks:/scope:prefix
were provided, they defaulted to the local profile (in Databricks, this would be the default tracking server or registry in the workspace). This will no longer be the case, and you will instead get an error about an incorrectly formatted URI.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes