Adding DatabricksArtifactRepository to the MLflow client. #2911

arjundc-db · 2020-06-08T20:12:44Z

What changes are proposed in this pull request?

This PR adds a new artifact repository 'DatabricksArtifactRepository' to the MLflow client along with the required test cases.
(Please fill in changes proposed in this fix)

How is this patch tested?

Unit Tests (added) and manual testing.

Release Notes

Is this a user-facing change?

[x ] No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

[ x] area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for
Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: Local serving, model deployment tools, spark UDFs
[ x] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
[x ] rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

…ther fixes.

Path resolution fixes for DatabricksArtifactRepository

…-db/mlflow into databricks-artifact-repo

dbczumar · 2020-06-08T20:44:16Z

Note: This PR was filed against the main MLflow fork after undergoing extensive review on @dbczumar 's fork here: dbczumar#5

mlflow/store/artifact/dbfs_artifact_repo.py

dbczumar · 2020-06-08T21:17:05Z

tests/store/artifact/test_databricks_artifact_repo.py

@@ -0,0 +1,324 @@
+# -*- coding: utf-8 -*-


@arjundc-db Can we add unit tests for log_artifact()/log_artifacts(), download_artifacts(), and list_artifacts()that construct aDatabricksArtifactReporooted at a subdirectory of the run artifact root (e.g.,dbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/my/path) and ensure that these APIs operate on paths relative to this subdirectory? (E.g., list_artifacts('foo')should list artifacts atdbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/my/path/foorather thandbfs:/databricks/mlflow-tracking/<EXP_ID>/<RUN_ID>/artifacts/foo`)

To test this, we should be able to mock out _upload_to_cloud, _call_endpoint, and _download_from_cloud and ensure that they're called with the expected paths / request bodies. Let me know if this makes sense!

dbczumar · 2020-06-08T21:24:01Z

tests/store/artifact/test_databricks_artifact_repo.py

+                               os.path.join(artifact_path, 'subdir'))]
+            log_artifact_mock.assert_has_calls(calls)
+
+    def test_list_artifacts(self, databricks_artifact_repo):


Can we test that outputs are computed relative to run_relative_artifact_repo_root_path by invoking list on a DatabricksArtifactRepo that is rooted at a subdirectory of the run artifact root?

Paginated client

…-db/mlflow into databricks-artifact-repo

dbczumar

LGTM! Great work @arjundc-db !

…-db/mlflow into databricks-artifact-repo

codecov-commenter · 2020-06-16T03:34:31Z

Codecov Report

Merging #2911 into master will decrease coverage by 0.19%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2911      +/-   ##
==========================================
- Coverage   85.23%   85.04%   -0.20%     
==========================================
  Files          20       20              
  Lines        1050     1050              
==========================================
- Hits          895      893       -2     
- Misses        155      157       +2

Impacted Files	Coverage Δ
R/tracking-server.R	`96.22% <0.00%> (-3.78%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf0330b...dfc7f60. Read the comment docs.

* Add protos and compilation * Adding databricks_artifact_repo to store/artifact * Addressing comments * Addressing comments and fixing _download_file * Code Clean-up and lint * Adding multi-part upload logic and unit tests * Addressing comments * Addressing comments, making azure download more memory efficent and other fixes. * Small fix * Fixing list_artifacts * Addressing final comments. * Making extract_run_id static * Adding AWS support * Addressing comments * Fix - needs docs and tests * Comment and simplification * Special case for empty file upload to AWS * Clean up and added tests for relative path * Page * Fix * Added relative path test cases * Added test for list_artifacts pagination * Fixing travis failures * Fixes * More fixes * More fixes * Clean-up * Clean-up Co-authored-by: Corey Zumar <corey.zumar@databricks.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

dbczumar and others added 20 commits May 11, 2020 17:01

Add protos and compilation

e63182b

Adding databricks_artifact_repo to store/artifact

cb69e26

Addressing comments

3af5d3b

Addressing comments and fixing _download_file

38630d3

Code Clean-up and lint

1cb98e9

Adding multi-part upload logic and unit tests

870c0aa

Addressing comments

10cf458

Addressing comments, making azure download more memory efficent and o…

ecdce6a

…ther fixes.

Small fix

116c5d0

Fixing list_artifacts

11be341

Addressing final comments.

0609a5b

Making extract_run_id static

3f1327d

Adding AWS support

98035b4

Addressing comments

0b1af46

Fix - needs docs and tests

3d923cb

Comment and simplification

d3fab4a

Special case for empty file upload to AWS

368973d

Merge pull request #4 from dbczumar/databricks-repo-fix

b67e4dc

Path resolution fixes for DatabricksArtifactRepository

Merge branch 'databricks-artifact-repo' of https://github.com/arjundc…

5e87cf7

…-db/mlflow into databricks-artifact-repo

Clean up and added tests for relative path

72427a3

arjundc-db requested a review from dbczumar June 8, 2020 20:13

arjundc-db added area/artifacts Artifact stores and artifact logging area/tracking Tracking service, tracking client APIs, autologging labels Jun 8, 2020

dbczumar reviewed Jun 8, 2020

View reviewed changes

mlflow/store/artifact/dbfs_artifact_repo.py Show resolved Hide resolved

dbczumar reviewed Jun 8, 2020

View reviewed changes

dbczumar and others added 3 commits June 9, 2020 13:33

Page

f82c607

Fix

30e3885

Added relative path test cases

a03e099

arjundc-db and others added 3 commits June 9, 2020 22:46

Merge pull request #5 from dbczumar/databricks-repo-pagination

50a5e28

Paginated client

Merge branch 'databricks-artifact-repo' of https://github.com/arjundc…

8c75590

…-db/mlflow into databricks-artifact-repo

Added test for list_artifacts pagination

6d36297

dbczumar approved these changes Jun 11, 2020

View reviewed changes

dbczumar and others added 8 commits June 15, 2020 11:59

Merge branch 'master' into databricks-artifact-repo

d84fa21

Fixing travis failures

ec85f06

Merge branch 'databricks-artifact-repo' of https://github.com/arjundc…

70ca1d7

…-db/mlflow into databricks-artifact-repo

Fixes

c6585d6

More fixes

51ca61f

More fixes

d1b9dfd

Clean-up

72ecab5

Clean-up

dfc7f60

arjundc-db merged commit 74be64c into mlflow:master Jun 16, 2020

smurching added rn/none List under Small Changes in Changelogs. area/artifacts Artifact stores and artifact logging and removed area/artifacts Artifact stores and artifact logging labels Jun 18, 2020

arjundc-db mentioned this pull request Jul 1, 2020

[SETUP-BUG] azure requirement required by default #3009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding DatabricksArtifactRepository to the MLflow client. #2911

Adding DatabricksArtifactRepository to the MLflow client. #2911

arjundc-db commented Jun 8, 2020

dbczumar commented Jun 8, 2020

dbczumar Jun 8, 2020

arjundc-db Jun 10, 2020

dbczumar Jun 8, 2020 •

edited

arjundc-db Jun 10, 2020

dbczumar left a comment

codecov-commenter commented Jun 16, 2020

Adding DatabricksArtifactRepository to the MLflow client. #2911

Adding DatabricksArtifactRepository to the MLflow client. #2911

Conversation

arjundc-db commented Jun 8, 2020

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar commented Jun 8, 2020

dbczumar Jun 8, 2020

Choose a reason for hiding this comment

arjundc-db Jun 10, 2020

Choose a reason for hiding this comment

dbczumar Jun 8, 2020 • edited

Choose a reason for hiding this comment

arjundc-db Jun 10, 2020

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 16, 2020

Codecov Report

dbczumar Jun 8, 2020 •

edited