-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use REPL context attributes if available to avoid calling JVM methods #5132
Conversation
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
mlflow/utils/databricks_utils.py
Outdated
_env_var_prefix = "DATABRICKS_" | ||
|
||
|
||
def _use_env_var_if_exists(env_var, *, if_exists=lambda x: os.environ[x]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced this decorator to make it easier to preserve the existing logic for older runtime versions.
""" | ||
|
||
def decorator(f): | ||
@functools.wraps(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice use of the decorator factory here. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really clever, elegant, and simplified solution.
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
mlflow/utils/databricks_utils.py
Outdated
@@ -50,6 +79,7 @@ def _get_context_tag(context_tag_key): | |||
return None | |||
|
|||
|
|||
@_use_env_var_if_exists(_env_var_prefix + "ACL_PATH_OF_ACL_ROOT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we prefix these environment variables with DATABRICKS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need ACL_PATH_OF_ACL_ROOT
, since this is used for is_in_databricks_notebook
/ get_notebook_id
. We can rely on DATABRICKS_NOTEBOOK_ID
for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbczumar Thanks for the comment! _env_var_prefix
adds DATABRICKS_
or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. Sorry - missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need ACL_PATH_OF_ACL_ROOT, since this is used for is_in_databricks_notebook / get_notebook_id. We can rely on DATABRICKS_NOTEBOOK_ID for those.
Makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once #5132 (comment) is addressed. Thanks Haru!
@BenWilson2 @dbczumar Thanks for the review, I still need to update the code for dynamic metadata (e.g. command run id). |
@@ -133,6 +166,7 @@ def get_notebook_id(): | |||
return None | |||
|
|||
|
|||
@_use_env_var_if_exists(_ENV_VAR_PREFIX + "NOTEBOOK_PATH") | |||
def get_notebook_path(): | |||
"""Should only be called if is_in_databricks_notebook is true""" | |||
path = _get_property_from_spark_context("spark.databricks.notebook.path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work with ephemeral notebooks within and without jobs?
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-LGTM! Thanks @harupy !
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
30473e2
to
5ac0475
Compare
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
What changes are proposed in this pull request?
Use REPL context attributes if available to avoid calling JVM methods.
How is this patch tested?
Install mlflow from this branch on Databricks and confirmed we can run mlflow code in multiprocessing.
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes