Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate artifact storage configuration to docker env #1621

Merged
merged 15 commits into from Sep 16, 2019

Conversation

nlaille
Copy link
Contributor

@nlaille nlaille commented Jul 18, 2019

What changes are proposed in this pull request?

Propagate artifact storage configuration to docker environment so artifacts can be logged to remote storage.

How is this patch tested?

Additional unit tests.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Artifact storage configuration is now propagated to docker env so artifacts can be logged to remote storage.

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@nlaille nlaille changed the title WIP - Propagate artifact storage configuration to docker env Propagate artifact storage configuration to docker env Jul 23, 2019
@nlaille
Copy link
Contributor Author

nlaille commented Aug 5, 2019

@smurching I included your comment from #1544 about /mlflow/projects/code/ in this pr.

@nlaille
Copy link
Contributor Author

nlaille commented Sep 1, 2019

@smurching , do you think this pr make sense ?

@smurching
Copy link
Collaborator

@nlaille sorry for the delay, would definitely be good to get this in - one thing that'd help accelerate review actually is if you could split this up into two PRs:

  • One containing the logic change to propagate artifact storage configuration to the docker enviornment
  • Another doing the projects refactoring you referenced in your original PR description (thanks for doing that btw!)

The main reason I suggest this is that it's just easier to review a substantial refactoring if it's not mixed together with logic changes - thanks :)

@smurching smurching self-requested a review September 3, 2019 21:04
@nlaille
Copy link
Contributor Author

nlaille commented Sep 4, 2019

@smurching I will keep this one to integration the propagation and when it will be merged I will propose another one about the refactoring.

@nlaille
Copy link
Contributor Author

nlaille commented Sep 4, 2019

@smurching good to review.

@smurching
Copy link
Collaborator

@nlaille thanks! Looking now

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlaille thanks for the PR!

It generally LGTM - were you able to QA the credential propagation with any of the artifact stores here? I also had some high-level thoughts on the PR that I'd like to chat with some of the other committers about before merging:

  1. Currently, users run MLflow projects using a combination of (environment type, execution target, tracking URI, artifact URI). The tracking URI & artifact URI are currently pluggable, while "environment type" and "execution target" are fixed (the former must be "conda" or "docker", while the latter can be "local", "databricks", or "kubernetes"). Eventually, we may make the "execution target" pluggable as well.

  2. Given 1), we might consider standardizing the interface exposed by different artifact storage locations to project execution targets. That is, maybe all ArtifactRepositories should be able to override a method like _get_credential_context that returns a context object/dictionary with fields like credential_files, credential_envs (assuming that files & environment variables are sufficient to capture credential information, with the possibility of adding additional fields later on)

The latter approach would make it easier for ArtifactRepository implementors to ensure their plugins can be used when running in different environments/execution targets. IMO the simplest & least-restrictive thing is to defer this for now (i.e. keep the PR as is) and wait till we have the use case of running against pluggable execution targets more fleshed out, but I'll chat about this with others.

@nlaille
Copy link
Contributor Author

nlaille commented Sep 12, 2019

@smurching let me know about your chat, I started to work on similar stuff as 1). I will put it on hold in the meantime.

About 2), I can implement it just after this one. It would make more sense, even if it is only used for docker "environment type" on local "execution target" at first.

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlaille sounds good regarding the plan to allow artifact repositories to expose credential information in a follow-up PR. In the interim, just pushed a commit adding a test verifying that we don't throw when we receive an unexpected artifact URI (e.g. an artifact URI corresponding to a plugin). Will merge this once tests pass with that commit!

@smurching smurching added the rn/feature Mention under Features in Changelogs. label Sep 16, 2019
@smurching smurching merged commit 18fca9e into mlflow:master Sep 16, 2019
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
…docker-based MLflow projects locally (mlflow#1621)

Artifact storage configuration (credentials etc) are now propagated to the docker container used to run docker-based MLflow projects locally, so artifacts can be logged to remote storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants