Feature/node tag compute #13

fdroessler · 2022-08-26T16:15:00Z

This provides the ability to overwrite the compute target on a node basis using Kedro Node tags.

I saw this comment and had a rough implementation running in my plugin prototype. Happy to discuss or change the details of the implementation but gave it an initial stab.

em-pe · 2022-08-29T10:34:20Z

Hi @fdroessler. Happy to see your contribution! I'm wondering if we could move compute specs into plugin config file (similar to the resources block in kedro-kubeflow to keep Azure concepts separate from pipeline code.

You can ignore sonarcloud and e2e tests as they require credentials not available in forks, we'll run them on our side.

marrrcin · 2022-08-29T10:49:11Z

Following up on @em-pe ,we would like this feature to be fully separated-out from the Python code, to not couple the execution layer with the logic layer.
The config for managing resources in the Kedro-Kubeflow plugin looks like this:

  resources:

    # For nodes that require more RAM you can increase the "memory"
    data_import_step:
      memory: 2Gi

    # Training nodes can utilize more than one CPU if the algoritm
    # supports it
    model_training:
      cpu: 8
      memory: 1Gi

    # GPU-capable nodes can request 1 GPU slot
    tensorflow_step:
      nvidia.com/gpu: 1

    # Default settings for the nodes
    __default__:
      cpu: 1 
      memory: 1Gi

So in general, you have:

step_name:
    cpu: 4
    memory: 4Gi

etc.

For kedro-azureml we'll probably want something like this (excerpt from azureml.yml)

azure:
    compute_targets:
        step_name:
            cluster: <cluster name>
        # (... other steps)
        __default__:
            cluster: <cluster name> # <-- this should be populated from the config init, right now it's in the azure.cluster_name field but it should be moved here

See how the __default__ handling is implemented here https://github.com/getindata/kedro-kubeflow/blob/45bf6c5945428f954968b0edcd2810491e4c5a5f/kedro_kubeflow/config.py#L288

Please let us know if you have any additional questions.

fdroessler · 2022-08-31T20:52:50Z

Ok I see the general idea here. Just a few questions from the top of my mind:

Does that also imply that one needs to keep the naming of nodes and the resource section in sync and would that fail early enough to avoid needing to build and push an image again if there is a typo?
In the end for a lot of pipelines there would be a small amount of nodes that would include modification but some larger pipelines it could get a bit messy in that file or not?

What was appealing for me with this was that I can read the pipelines and as far as I know also the kedro-viz graph and know exactly which part of my pipeline is running where. But maybe that benefit is not worth the coupling. I'll give the other suggestions a shot in a separate branch.

em-pe · 2022-08-31T21:15:37Z

The correctness of configuration I think could be handled with a simple config validation, but I agree that the messiness of resources mapping can be a problem for massive pipelines as it impacts readability, code duplication and we don't have a visibility of resources allocation that kedro-viz brings to the table.

I have one more idea that kind of marries two approaches - what if we do the config section that maps kedro node tags to azure compute resources?

you would have something like that in config file:

azure:
    ....
    resources:
        __default__:
            cluster: <default cluster>
        chunky: 
           cluster: <himem cluster>
       gpu:
          cluster: <gpu enabled cluster>

... and then you operate with chunky and gpu tags within the pipeline. We keep the flexibility and simplicity of tags and have it platform independent at the same time. That would work for other execution environments as well

@fdroessler @marrrcin any thoughts?

marrrcin · 2022-09-01T07:08:23Z

Great idea @em-pe , sounds good to me!

fdroessler · 2022-09-01T11:24:10Z

I like it. Will change accordingly :)

marrrcin · 2022-09-12T10:52:37Z

How is your progress @fdroessler ? Do you need some help?

fdroessler · 2022-09-13T07:51:09Z

How is your progress @fdroessler ? Do you need some help?

Thanks for following up. I had a couple of busy weeks with weddings and the like. But I will have time this week to finish it :) I'll reach out in case I need help!

…o-azureml into feature/node_tag_compute

marrrcin · 2022-10-10T08:35:57Z

@fdroessler how is it going? 🙂

fdroessler · 2022-10-12T13:04:43Z

Slow progress but getting there. Thanks for the patience it was mostly busy work. :) but have not forgot about it. Running some tests today

fdroessler · 2022-10-12T15:24:07Z

@marrrcin I pushed some code earlier with an initial implementation. I was wondering if this was roughly along the lines you and @em-pe were thinking? Would be good to get some comments on the general approach (not the code in detail yet) before cleaning it up and adding tests.

marrrcin · 2022-10-14T07:08:33Z

kedro_azureml/config.py

@@ -52,6 +80,11 @@ class KedroAzureRunnerConfig(BaseModel):
    account_name: "{storage_account_name}"
    # Name of the storage container
    container: "{storage_container}"
+  resources:
+    __default__:


Three things here:

Let's remove azure.cluster_name for the purpose of resources.__default__ to have this setting only in one place.

I would rename resources to compute, as compute is used in Azure SDK, so it will be much clearer for the users.

I would not generate chunky as a template here, maybe let's stick to something self-commenting like

your_node_tag: cluster_name: high-cpu-cluster

+ please provide description of the compute node in form of a # comment like in other nodes in this template.

marrrcin · 2022-10-14T07:09:05Z

tests/test_generator.py

+        for node in dummy_pipeline_compute_tag.nodes:
+            if node.tags:
+                for tag in node.tags:
+                    if "azureml.compute" in tag:


Please remember to update the tests once you finish the changes to the kedro tag -> azure compute mapping.

marrrcin · 2022-10-14T07:09:24Z

@fdroessler Initially looks good, added my comments.

… of the cluster name in the config

fdroessler · 2022-10-16T20:11:07Z

Thanks for the feedback. I adapted things according to your suggestions and added a small section in the quickstart guide.

marrrcin · 2022-10-17T07:21:13Z

Looks good to me!
Just resolve the conflicts and I will merge your changes 🙂

This branch has conflicts that must be resolved
to resolve conflicts before continuing.
Conflicting files
CHANGELOG.md
docs/source/03_quickstart.md

marrrcin · 2022-10-18T12:18:56Z

Merged, we will verify it in e2e tests and if anything goes wrong I will create additional issue. If everything passes fine, we will probably create new release next week.

Thanks so much for your contribution!

fdroessler added 4 commits August 25, 2022 21:15

initial commit

d8ff318

initial implementation

5f9a335

add tests

0c10199

udate Changelog

0d7a63c

Merge branch 'feature/node_tag_compute' of github.com:fdroessler/kedr…

9682f20

…o-azureml into feature/node_tag_compute

test

5030b9b

marrrcin reviewed Oct 14, 2022

View reviewed changes

Florian Roessler added 3 commits October 16, 2022 19:59

replacing resource with compute and using __default__ compute instead…

3100c59

… of the cluster name in the config

update tests

0a0cf0b

update Changelog and quickstart guide

2fe6fe1

Fixed typo in assert

5942fd9

Florian Roessler added 3 commits October 17, 2022 22:34

Merge branch 'develop' into feature/node_tag_compute

b06a684

add quickstart guide

96b5971

add syntax highlight

4b8be75

marrrcin merged commit d19e0f1 into getindata:develop Oct 18, 2022

marrrcin mentioned this pull request Oct 18, 2022

SonarCloud error after merging node/compute feature #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/node tag compute #13

Feature/node tag compute #13

fdroessler commented Aug 26, 2022

em-pe commented Aug 29, 2022

marrrcin commented Aug 29, 2022 •

edited

Loading

fdroessler commented Aug 31, 2022

em-pe commented Aug 31, 2022

marrrcin commented Sep 1, 2022

fdroessler commented Sep 1, 2022

marrrcin commented Sep 12, 2022

fdroessler commented Sep 13, 2022

marrrcin commented Oct 10, 2022

fdroessler commented Oct 12, 2022

fdroessler commented Oct 12, 2022 •

edited

Loading

marrrcin Oct 14, 2022

marrrcin Oct 14, 2022

marrrcin commented Oct 14, 2022

fdroessler commented Oct 16, 2022

marrrcin commented Oct 17, 2022

marrrcin commented Oct 18, 2022

Feature/node tag compute #13

Feature/node tag compute #13

Conversation

fdroessler commented Aug 26, 2022

em-pe commented Aug 29, 2022

marrrcin commented Aug 29, 2022 • edited Loading

fdroessler commented Aug 31, 2022

em-pe commented Aug 31, 2022

marrrcin commented Sep 1, 2022

fdroessler commented Sep 1, 2022

marrrcin commented Sep 12, 2022

fdroessler commented Sep 13, 2022

marrrcin commented Oct 10, 2022

fdroessler commented Oct 12, 2022

fdroessler commented Oct 12, 2022 • edited Loading

marrrcin Oct 14, 2022

Choose a reason for hiding this comment

marrrcin Oct 14, 2022

Choose a reason for hiding this comment

marrrcin commented Oct 14, 2022

fdroessler commented Oct 16, 2022

marrrcin commented Oct 17, 2022

marrrcin commented Oct 18, 2022

marrrcin commented Aug 29, 2022 •

edited

Loading

fdroessler commented Oct 12, 2022 •

edited

Loading