Enable use of Azure ML Environments and code upload #14

tomasvanpottelbergh · 2022-09-02T09:02:39Z

Thanks for creating this plugin!

When evaluating it for usage in our projects, I was a bit surprised by the approach to use a Docker image with the code baked in, instead of the Azure ML Environments (or Docker images) with code upload on job submission.

The current approach works fine for production deployments, but it doesn't support the experimentation workflow very well. I modified the plugin locally to support submitting jobs using Environments and code snapshotting. Would this change be welcome as a PR? Or would you like to keep this current approach or have both options?

marrrcin · 2022-09-02T09:09:26Z

Could you elaborate how your approach works? Are all of the features of Kedro preserved?

tomasvanpottelbergh · 2022-09-02T09:45:45Z

The main change is to reference the Environment by name:

environment=self.environment_name or self.config.azure.environment_name,

instead of

kedro-azureml/kedro_azureml/generator.py

Lines 98 to 100 in b15758c

    
           environment=Environment( 
        
               image=self.docker_image or self.config.docker.image 
        
           ),

and to add code=".", to the command call.

I'm fairly new to Kedro, so could you clarify which Kedro features you think might be lost?

marrrcin · 2022-09-02T10:27:52Z

We initially force docker-based executions to prevent issues with missing dependencies and code changes. In our MLOps lifecycle, we run valid training jobs always from CI/CD, that's the only way we can be 100% sure that experiments are reproducible - enabling ad-hoc executions with code upload might interfere with that.

Having that in mind, if that's the only change that needs to be done and the pipeline works the same, we can adopt that approach as an optional feature.

There should be some additional CLI command introduced to give the users easy way to create this Azure environment and then - the config should have a new parameter that switches the execution mode. In the azure config section I propose to introduce a environment_name, which will be null by default. If set, it will enable your way of starting the pipeline.
A few notes:

if environment_name is set, the plugin should display a warning during run command that the code is submitted form local directory and runs might not be reproducible.
if environment_name is set, the plugin needs to make sure that the AML environment exists - if it doesn't it would be best to create one without users intervention, based on the docker.image config.
introduce new CLI command create-environment to create the environment manually, also based on the docker.image config.

tomasvanpottelbergh · 2022-09-02T14:39:58Z

Thanks for your comments and being open to the suggestion!

We initially force docker-based executions to prevent issues with missing dependencies and code changes. In our MLOps lifecycle, we run valid training jobs always from CI/CD, that's the only way we can be 100% sure that experiments are reproducible - enabling ad-hoc executions with code upload might interfere with that.

I understand that CI/CD runs are preferred, but for experimentation this creates a large overhead.

* if `environment_name` is set, the plugin should display a warning during `run` command that the code is submitted form local directory and runs might not be reproducible.

Azure ML runs as I suggest should also be fully reproducible, since they contain both the code and the Environment which was used in the experiment. I do agree that it should be made clear that the code is uploaded and not taken from the Docker image.

* if `environment_name` is set, the plugin needs to make sure that the AML environment exists - if it doesn't it would be best to create one without users intervention, based on the `docker.image` config.

By creating an AML Environment based on the docker.image config do you mean a registered one, or an ad-hoc one as in the current implementation? I think a registered one would be preferable.

* introduce new CLI command `create-environment` to create the environment manually, also based on the `docker.image` config.

I agree with the need for such a command, this was the next thing on my list. Regarding how to create it, I would also like to support the BuildContext option, which would avoid having to manually build the Docker image and push it to the ACR. What do you think?

marrrcin · 2022-09-05T08:57:21Z

By creating an AML Environment based on the docker.image config do you mean a registered one, or an ad-hoc one as in the current implementation? I think a registered one would be preferable.

Registered one.

I agree with the need for such a command, this was the next thing on my list. Regarding how to create it, I would also like to support the BuildContext option, which would avoid having to manually build the Docker image and push it to the ACR. What do you think?

Sounds good to me, go ahead - I'm looking forward to your PR!

marrrcin · 2022-09-12T10:53:00Z

@tomasvanpottelbergh how is your PR going?

tomasvanpottelbergh · 2022-09-13T19:09:48Z

Sorry, I was on vacation, so didn't make much progress. I am first sorting out a few other things to make the plugin work for our workflow, after which I can clean things up and open the PR. Feel free to start a PR already if you were also working on this.

marrrcin · 2022-09-14T10:02:28Z

We're not actively working on it at the moment, so please proceed at your own pace 🙂

tomasvanpottelbergh · 2022-09-21T08:57:35Z

I am starting to clean up my code to be able to open a PR for this. While doing that, I had some more thoughts I'd like your opinion on:

In hindsight the code upload and Environments feature do not really need coupled and it would be clearer if code upload can be configured explicitly. I suggest to add an option config field code_directory which specifies the directory to upload, or none if no value is given.
I feel having the DockerConfig together with the environment_name in the config will be confusing. The DockerConfig could have a different image in it than the Environment is using, potentially confusing the user. Therefore I propose to remove the DockerConfig and always refer only to a registered environment using environment_name.
- This has the additional benefit of traceability and the ability to use "curated" Environments without creating one yourself.
- The configuration of the Environment can be stored using the Environment YAML schema and used to create/update the Environment with the Azure CLI, also eliminating the need to create an additional CLI command.

Looking forward to your thoughts on this!

marrrcin · 2022-09-21T09:48:58Z

When we split the functionality of env creation, it would be a 2 step process for the plugin users. We want this plugin to be easy to use, so if something can be automated and transparent to the end user (Data Scientist) we want to automate it.
About the DockerConfig - what would be the backward-compatible (to current version of the plugin) behaviour if you remove it? The flow with docker is our go-to solution for all of our plugins.

It would be great if you show us the flow you've prepared in a form of steps necessary to run Kedro on Azure with the code upload and we can iterate on this from the high level perspective.

tomasvanpottelbergh · 2022-09-21T11:00:18Z

When we split the functionality of env creation, it would be a 2 step process for the plugin users. We want this plugin to be easy to use, so if something can be automated and transparent to the end user (Data Scientist) we want to automate it.

Understood, but currently it's already a two-step process. My fear with automating the environment creation is that any change to the config would need to be reflected in the environment as well. Users might expect this to happen automatically, whereas it would be impractical to update the environment on every run.

2. About the `DockerConfig` - what would be the backward-compatible (to current version of the plugin) behaviour if you remove it? The flow with docker is our go-to solution for all of our plugins.

I think keeping backward-compatibility would complicate the design. In theory we could keep the DockerConfig and add fields to mirror the Environment YAML schema, but because of my argument above I don't think this is desirable.

It would be great if you show us the flow you've prepared in a form of steps necessary to run Kedro on Azure with the code upload and we can iterate on this from the high level perspective.

(Optional) Create and register an Azure ML Environment, for example using an Environment YAML file and the Azure CLI
Run kedro azureml init

specifying the Azure ML Environment (either curated or as created above)
choosing whether to enable code upload

I can also just open a PR to make things more concrete and we could iterate there.

marrrcin · 2022-09-21T11:31:45Z

Users might expect this to happen automatically, whereas it would be impractical to update the environment on every run.

Is this an expensive process? Maybe it can be done in 2 phases - check for changes and then update only if there are changes.

I can also just open a PR to make things more concrete and we could iterate there.

Go ahead! :)

tomasvanpottelbergh · 2022-09-21T15:59:58Z

Is this an expensive process? Maybe it can be done in 2 phases - check for changes and then update only if there are changes.

In most cases it is, possibly involving file uploads and a Docker build, which needs a compute cluster to run. Checking for changes would be technically difficult to impossible for my workflow to create an environment based on a build context.

Go ahead! :)

Thanks, I will try to have something ready this week.

tomasvanpottelbergh mentioned this issue Sep 22, 2022

Use Azure ML Environments and add code upload option #15

Merged

marrrcin closed this as completed in #15 Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable use of Azure ML Environments and code upload #14

Enable use of Azure ML Environments and code upload #14

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 2, 2022

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 2, 2022

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 5, 2022

marrrcin commented Sep 12, 2022

tomasvanpottelbergh commented Sep 13, 2022

marrrcin commented Sep 14, 2022

tomasvanpottelbergh commented Sep 21, 2022 •

edited

Loading

marrrcin commented Sep 21, 2022

tomasvanpottelbergh commented Sep 21, 2022

marrrcin commented Sep 21, 2022

tomasvanpottelbergh commented Sep 21, 2022

Enable use of Azure ML Environments and code upload #14

Enable use of Azure ML Environments and code upload #14

Comments

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 2, 2022

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 2, 2022

tomasvanpottelbergh commented Sep 2, 2022

marrrcin commented Sep 5, 2022

marrrcin commented Sep 12, 2022

tomasvanpottelbergh commented Sep 13, 2022

marrrcin commented Sep 14, 2022

tomasvanpottelbergh commented Sep 21, 2022 • edited Loading

marrrcin commented Sep 21, 2022

tomasvanpottelbergh commented Sep 21, 2022

marrrcin commented Sep 21, 2022

tomasvanpottelbergh commented Sep 21, 2022

tomasvanpottelbergh commented Sep 21, 2022 •

edited

Loading