Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable use of Azure ML Environments and code upload #14

Closed
tomasvanpottelbergh opened this issue Sep 2, 2022 · 13 comments · Fixed by #15
Closed

Enable use of Azure ML Environments and code upload #14

tomasvanpottelbergh opened this issue Sep 2, 2022 · 13 comments · Fixed by #15

Comments

@tomasvanpottelbergh
Copy link
Contributor

Thanks for creating this plugin!

When evaluating it for usage in our projects, I was a bit surprised by the approach to use a Docker image with the code baked in, instead of the Azure ML Environments (or Docker images) with code upload on job submission.

The current approach works fine for production deployments, but it doesn't support the experimentation workflow very well. I modified the plugin locally to support submitting jobs using Environments and code snapshotting. Would this change be welcome as a PR? Or would you like to keep this current approach or have both options?

@marrrcin
Copy link
Contributor

marrrcin commented Sep 2, 2022

Could you elaborate how your approach works? Are all of the features of Kedro preserved?

@tomasvanpottelbergh
Copy link
Contributor Author

The main change is to reference the Environment by name:

environment=self.environment_name or self.config.azure.environment_name,

instead of

environment=Environment(
image=self.docker_image or self.config.docker.image
),

and to add code=".", to the command call.

I'm fairly new to Kedro, so could you clarify which Kedro features you think might be lost?

@marrrcin
Copy link
Contributor

marrrcin commented Sep 2, 2022

We initially force docker-based executions to prevent issues with missing dependencies and code changes. In our MLOps lifecycle, we run valid training jobs always from CI/CD, that's the only way we can be 100% sure that experiments are reproducible - enabling ad-hoc executions with code upload might interfere with that.

Having that in mind, if that's the only change that needs to be done and the pipeline works the same, we can adopt that approach as an optional feature.

There should be some additional CLI command introduced to give the users easy way to create this Azure environment and then - the config should have a new parameter that switches the execution mode. In the azure config section I propose to introduce a environment_name, which will be null by default. If set, it will enable your way of starting the pipeline.
A few notes:

  • if environment_name is set, the plugin should display a warning during run command that the code is submitted form local directory and runs might not be reproducible.
  • if environment_name is set, the plugin needs to make sure that the AML environment exists - if it doesn't it would be best to create one without users intervention, based on the docker.image config.
  • introduce new CLI command create-environment to create the environment manually, also based on the docker.image config.

@tomasvanpottelbergh
Copy link
Contributor Author

Thanks for your comments and being open to the suggestion!

We initially force docker-based executions to prevent issues with missing dependencies and code changes. In our MLOps lifecycle, we run valid training jobs always from CI/CD, that's the only way we can be 100% sure that experiments are reproducible - enabling ad-hoc executions with code upload might interfere with that.

I understand that CI/CD runs are preferred, but for experimentation this creates a large overhead.

* if `environment_name` is set, the plugin should display a warning during `run` command that the code is submitted form local directory and runs might not be reproducible.

Azure ML runs as I suggest should also be fully reproducible, since they contain both the code and the Environment which was used in the experiment. I do agree that it should be made clear that the code is uploaded and not taken from the Docker image.

* if `environment_name` is set, the plugin needs to make sure that the AML environment exists - if it doesn't it would be best to create one without users intervention, based on the `docker.image` config.

By creating an AML Environment based on the docker.image config do you mean a registered one, or an ad-hoc one as in the current implementation? I think a registered one would be preferable.

* introduce new CLI command `create-environment` to create the environment manually, also based on the `docker.image` config.

I agree with the need for such a command, this was the next thing on my list. Regarding how to create it, I would also like to support the BuildContext option, which would avoid having to manually build the Docker image and push it to the ACR. What do you think?

@marrrcin
Copy link
Contributor

marrrcin commented Sep 5, 2022

By creating an AML Environment based on the docker.image config do you mean a registered one, or an ad-hoc one as in the current implementation? I think a registered one would be preferable.

Registered one.

I agree with the need for such a command, this was the next thing on my list. Regarding how to create it, I would also like to support the BuildContext option, which would avoid having to manually build the Docker image and push it to the ACR. What do you think?

Sounds good to me, go ahead - I'm looking forward to your PR!

@marrrcin
Copy link
Contributor

@tomasvanpottelbergh how is your PR going?

@tomasvanpottelbergh
Copy link
Contributor Author

Sorry, I was on vacation, so didn't make much progress. I am first sorting out a few other things to make the plugin work for our workflow, after which I can clean things up and open the PR. Feel free to start a PR already if you were also working on this.

@marrrcin
Copy link
Contributor

We're not actively working on it at the moment, so please proceed at your own pace 🙂

@tomasvanpottelbergh
Copy link
Contributor Author

tomasvanpottelbergh commented Sep 21, 2022

I am starting to clean up my code to be able to open a PR for this. While doing that, I had some more thoughts I'd like your opinion on:

  • In hindsight the code upload and Environments feature do not really need coupled and it would be clearer if code upload can be configured explicitly. I suggest to add an option config field code_directory which specifies the directory to upload, or none if no value is given.
  • I feel having the DockerConfig together with the environment_name in the config will be confusing. The DockerConfig could have a different image in it than the Environment is using, potentially confusing the user. Therefore I propose to remove the DockerConfig and always refer only to a registered environment using environment_name.

Looking forward to your thoughts on this!

@marrrcin
Copy link
Contributor

  1. When we split the functionality of env creation, it would be a 2 step process for the plugin users. We want this plugin to be easy to use, so if something can be automated and transparent to the end user (Data Scientist) we want to automate it.
  2. About the DockerConfig - what would be the backward-compatible (to current version of the plugin) behaviour if you remove it? The flow with docker is our go-to solution for all of our plugins.

It would be great if you show us the flow you've prepared in a form of steps necessary to run Kedro on Azure with the code upload and we can iterate on this from the high level perspective.

@tomasvanpottelbergh
Copy link
Contributor Author

  1. When we split the functionality of env creation, it would be a 2 step process for the plugin users. We want this plugin to be easy to use, so if something can be automated and transparent to the end user (Data Scientist) we want to automate it.

Understood, but currently it's already a two-step process. My fear with automating the environment creation is that any change to the config would need to be reflected in the environment as well. Users might expect this to happen automatically, whereas it would be impractical to update the environment on every run.

2. About the `DockerConfig` - what would be the backward-compatible (to current version of the plugin) behaviour if you remove it? The flow with docker is our go-to solution for all of our plugins.

I think keeping backward-compatibility would complicate the design. In theory we could keep the DockerConfig and add fields to mirror the Environment YAML schema, but because of my argument above I don't think this is desirable.

It would be great if you show us the flow you've prepared in a form of steps necessary to run Kedro on Azure with the code upload and we can iterate on this from the high level perspective.

  1. (Optional) Create and register an Azure ML Environment, for example using an Environment YAML file and the Azure CLI
  2. Run kedro azureml init
  • specifying the Azure ML Environment (either curated or as created above)
  • choosing whether to enable code upload

I can also just open a PR to make things more concrete and we could iterate there.

@marrrcin
Copy link
Contributor

Users might expect this to happen automatically, whereas it would be impractical to update the environment on every run.

Is this an expensive process? Maybe it can be done in 2 phases - check for changes and then update only if there are changes.

I can also just open a PR to make things more concrete and we could iterate there.

Go ahead! :)

@tomasvanpottelbergh
Copy link
Contributor Author

Is this an expensive process? Maybe it can be done in 2 phases - check for changes and then update only if there are changes.

In most cases it is, possibly involving file uploads and a Docker build, which needs a compute cluster to run. Checking for changes would be technically difficult to impossible for my workflow to create an environment based on a build context.

Go ahead! :)

Thanks, I will try to have something ready this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants