New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding DatabricksArtifactRepository to the MLflow client. #2911
Adding DatabricksArtifactRepository to the MLflow client. #2911
Conversation
Path resolution fixes for DatabricksArtifactRepository
…-db/mlflow into databricks-artifact-repo
Note: This PR was filed against the main MLflow fork after undergoing extensive review on @dbczumar 's fork here: dbczumar#5 |
@@ -0,0 +1,324 @@ | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjundc-db Can we add unit tests for log_artifact()/
log_artifacts(),
download_artifacts(), and
list_artifacts()that construct a
DatabricksArtifactReporooted at a subdirectory of the run artifact root (e.g.,
dbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/my/path) and ensure that these APIs operate on paths relative to this subdirectory? (E.g.,
list_artifacts('foo')should list artifacts at
dbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/my/path/foorather than
dbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/foo`)
To test this, we should be able to mock out _upload_to_cloud
, _call_endpoint
, and _download_from_cloud
and ensure that they're called with the expected paths / request bodies. Let me know if this makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
os.path.join(artifact_path, 'subdir'))] | ||
log_artifact_mock.assert_has_calls(calls) | ||
|
||
def test_list_artifacts(self, databricks_artifact_repo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test that outputs are computed relative to run_relative_artifact_repo_root_path
by invoking list on a DatabricksArtifactRepo that is rooted at a subdirectory of the run artifact root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Paginated client
…-db/mlflow into databricks-artifact-repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work @arjundc-db !
Codecov Report
@@ Coverage Diff @@
## master #2911 +/- ##
==========================================
- Coverage 85.23% 85.04% -0.20%
==========================================
Files 20 20
Lines 1050 1050
==========================================
- Hits 895 893 -2
- Misses 155 157 +2
Continue to review full report at Codecov.
|
* Add protos and compilation * Adding databricks_artifact_repo to store/artifact * Addressing comments * Addressing comments and fixing _download_file * Code Clean-up and lint * Adding multi-part upload logic and unit tests * Addressing comments * Addressing comments, making azure download more memory efficent and other fixes. * Small fix * Fixing list_artifacts * Addressing final comments. * Making extract_run_id static * Adding AWS support * Addressing comments * Fix - needs docs and tests * Comment and simplification * Special case for empty file upload to AWS * Clean up and added tests for relative path * Page * Fix * Added relative path test cases * Added test for list_artifacts pagination * Fixing travis failures * Fixes * More fixes * More fixes * Clean-up * Clean-up Co-authored-by: Corey Zumar <corey.zumar@databricks.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>
What changes are proposed in this pull request?
This PR adds a new artifact repository 'DatabricksArtifactRepository' to the MLflow client along with the required test cases.
(Please fill in changes proposed in this fix)
How is this patch tested?
Unit Tests (added) and manual testing.
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls forModel Registry
area/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: Local serving, model deployment tools, spark UDFsarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, JavaScript, plottingarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientsIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes