-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/node tag compute #13
Feature/node tag compute #13
Conversation
Hi @fdroessler. Happy to see your contribution! I'm wondering if we could move compute specs into plugin config file (similar to the resources block in kedro-kubeflow to keep Azure concepts separate from pipeline code. You can ignore sonarcloud and e2e tests as they require credentials not available in forks, we'll run them on our side. |
Following up on @em-pe ,we would like this feature to be fully separated-out from the Python code, to not couple the execution layer with the logic layer. resources:
# For nodes that require more RAM you can increase the "memory"
data_import_step:
memory: 2Gi
# Training nodes can utilize more than one CPU if the algoritm
# supports it
model_training:
cpu: 8
memory: 1Gi
# GPU-capable nodes can request 1 GPU slot
tensorflow_step:
nvidia.com/gpu: 1
# Default settings for the nodes
__default__:
cpu: 1
memory: 1Gi So in general, you have: step_name:
cpu: 4
memory: 4Gi etc. For kedro-azureml we'll probably want something like this (excerpt from azure:
compute_targets:
step_name:
cluster: <cluster name>
# (... other steps)
__default__:
cluster: <cluster name> # <-- this should be populated from the config init, right now it's in the azure.cluster_name field but it should be moved here See how the Please let us know if you have any additional questions. |
Ok I see the general idea here. Just a few questions from the top of my mind:
What was appealing for me with this was that I can read the pipelines and as far as I know also the kedro-viz graph and know exactly which part of my pipeline is running where. But maybe that benefit is not worth the coupling. I'll give the other suggestions a shot in a separate branch. |
The correctness of configuration I think could be handled with a simple config validation, but I agree that the messiness of resources mapping can be a problem for massive pipelines as it impacts readability, code duplication and we don't have a visibility of resources allocation that I have one more idea that kind of marries two approaches - what if we do the config section that maps kedro node tags to azure compute resources? you would have something like that in config file:
... and then you operate with @fdroessler @marrrcin any thoughts? |
Great idea @em-pe , sounds good to me! |
I like it. Will change accordingly :) |
How is your progress @fdroessler ? Do you need some help? |
Thanks for following up. I had a couple of busy weeks with weddings and the like. But I will have time this week to finish it :) I'll reach out in case I need help! |
…o-azureml into feature/node_tag_compute
@fdroessler how is it going? 🙂 |
Slow progress but getting there. Thanks for the patience it was mostly busy work. :) but have not forgot about it. Running some tests today |
@@ -52,6 +80,11 @@ class KedroAzureRunnerConfig(BaseModel): | |||
account_name: "{storage_account_name}" | |||
# Name of the storage container | |||
container: "{storage_container}" | |||
resources: | |||
__default__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three things here:
- Let's remove
azure.cluster_name
for the purpose ofresources.__default__
to have this setting only in one place. - I would rename
resources
tocompute
, ascompute
is used in Azure SDK, so it will be much clearer for the users. - I would not generate
chunky
as a template here, maybe let's stick to something self-commenting like
your_node_tag:
cluster_name: high-cpu-cluster
+ please provide description of the compute
node in form of a # comment
like in other nodes in this template.
tests/test_generator.py
Outdated
for node in dummy_pipeline_compute_tag.nodes: | ||
if node.tags: | ||
for tag in node.tags: | ||
if "azureml.compute" in tag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to update the tests once you finish the changes to the kedro tag -> azure compute mapping.
@fdroessler Initially looks good, added my comments. |
… of the cluster name in the config
Thanks for the feedback. I adapted things according to your suggestions and added a small section in the quickstart guide. |
Looks good to me!
|
Merged, we will verify it in e2e tests and if anything goes wrong I will create additional issue. If everything passes fine, we will probably create new release next week. Thanks so much for your contribution! |
This provides the ability to overwrite the compute target on a node basis using Kedro Node tags.
I saw this comment and had a rough implementation running in my plugin prototype. Happy to discuss or change the details of the implementation but gave it an initial stab.