Skip to content

adding deployment spec overrides via node label selector#5845

Merged
kubeedge-bot merged 3 commits into
kubeedge:masterfrom
EraKin575:override
Dec 6, 2024
Merged

adding deployment spec overrides via node label selector#5845
kubeedge-bot merged 3 commits into
kubeedge:masterfrom
EraKin575:override

Conversation

@EraKin575

@EraKin575 EraKin575 commented Sep 5, 2024

Copy link
Copy Markdown
Contributor

What type of PR is this?
Adding deployment spec overrides via node label selector to have granular node characteristics.

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

A part of #5755

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@kubeedge-bot kubeedge-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2024
@kubeedge-bot kubeedge-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 5, 2024
@WillardHu

Copy link
Copy Markdown
Collaborator

You can add some description in the "What this PR does / why we need it" part.

@kubeedge-bot kubeedge-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 9, 2024
@EraKin575

Copy link
Copy Markdown
Contributor Author

@WillardHu can you please review if I am going to the right track?

@EraKin575

Copy link
Copy Markdown
Contributor Author

question: 1. How can I update the edgeapplication crd yaml to test these changes?
2.
If both TargetNodeGroups and TargetNodeLabels are defined for the same workload, and each specifies different overrides for the same property (e.g., different NodeSelector values), there could be conflicts. What should I prioritize here?

@WillardHu

WillardHu commented Sep 12, 2024

Copy link
Copy Markdown
Collaborator

question: 1. How can I update the edgeapplication crd yaml to test these changes? 2. If both TargetNodeGroups and TargetNodeLabels are defined for the same workload, and each specifies different overrides for the same property (e.g., different NodeSelector values), there could be conflicts. What should I prioritize here?

Sorry, I've been busy recently, so I'm replying late.

  1. You can execute make generate to update crds.
  2. Each item configuration of the nodeSelector and nodeGroup generates a new Deployment. If a user configuration error causes multiple Pods of the same application to be deployed to one node, resulting in resource conflict, We can reflect the operation result of deployment in the edgeapplication status first. Then we can judge whether verification is necessary based on user feedback. I think validation becomes expensive when there are a large number of nodes and overriders configurations. What do you think? @EraKin575 @Shelley-BaoYue @wbc6080

By the way, for deployment generated by nodeSelector, we need to define their names in other ways. For example, <edgeapplication>-<random number>.

@EraKin575

EraKin575 commented Sep 13, 2024

Copy link
Copy Markdown
Contributor Author

question: 1. How can I update the edgeapplication crd yaml to test these changes? 2. If both TargetNodeGroups and TargetNodeLabels are defined for the same workload, and each specifies different overrides for the same property (e.g., different NodeSelector values), there could be conflicts. What should I prioritize here?

Sorry, I've been busy recently, so I'm replying late.

1. You can execute `make generate` to update crds.

2. Each item configuration of the `nodeSelector` and `nodeGroup` generates a new Deployment. If a user configuration error causes multiple Pods of the same application to be deployed to one node, resulting in resource conflict, We can reflect the operation result of deployment in the edgeapplication status first. Then we can judge whether verification is necessary based on user feedback. I think validation becomes expensive when there are a large number of nodes and overriders configurations.  What do you think? @EraKin575  @Shelley-BaoYue  @wbc6080

By the way, for deployment generated by nodeSelector, we need to define their names in other ways. For example, <edgeapplication>-<random number>.

make generate is causing an error when running:
image

I have also updated the OpenAPI schema before make generate

@kubeedge-bot kubeedge-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 13, 2024

@wbc6080 wbc6080 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to run hack/update-codegen.sh to pass the make verify CI test and run hack/generate-crds.sh to generate new crds

@wbc6080

wbc6080 commented Sep 14, 2024

Copy link
Copy Markdown
Collaborator

What do you think? @EraKin575 @Shelley-BaoYue @wbc6080

Maybe we can discuss more details in community meeting

@EraKin575

Copy link
Copy Markdown
Contributor Author

You can try to run hack/update-codegen.sh to pass the make verify CI test and run hack/generate-crds.sh to generate new crds

Its still generating the same error. I tried to upgrade the controller gen version to latest version and it apparently worked. Should I go ahead with it?

@wbc6080 wbc6080 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EraKin575 Do you mean that after upgrading the controller gen version, hack/generate-crds.sh was executed successfully? Based on this success, you can run hack/update-codegen.sh again, and then execute git status. You should be able to see that some files have been automatically modified. Push these modifications up again, and you should be able to solve the CI error.

@EraKin575

Copy link
Copy Markdown
Contributor Author

Hi! I have a question. Can you please tell me about the matching logic for targetnodegroup overriders? How does it find the node group to override the deployment spec?

@wbc6080 wbc6080 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I have a question. Can you please tell me about the matching logic for targetnodegroup overriders? How does it find the node group to override the deployment spec?

Sorry, I didn't understand your question clearly. Are you asking how to build nodes into node groups? Or how to fully populate edge application deployment in different node groups?
If it is the former, we define the NodeGroup CRD to describe a node group. Nodes with specific labels can be add to a node group management through the matchlable field.

If it is the latter, in edge application CRD , the Workload Templates field can define the Deployment Template of the edge application, and WorkloadScopes can be configured differently according to the needs of different node groups, such as configuring an image repository address. Workload Templates and WorkloadScopes can be added together to form a complete deployment for deployment.

You can also read nodegroup's proposal for more information https://github.com/kubeedge/kubeedge/blob/master/docs/proposals/node-group-management.md

@WillardHu

WillardHu commented Sep 27, 2024

Copy link
Copy Markdown
Collaborator

Hi! I have a question. Can you please tell me about the matching logic for targetnodegroup overriders? How does it find the node group to override the deployment spec?

  1. Assign a instance overridemanager.OverrideManager to Overrider field in controller management.
    Overrider: &overridemanager.OverrideManager{
  2. Loops overrider-infos to overrides the Deployment yaml
  3. Loops overriders to executes func ApplyOverrides(..) of each Overrider implementation(argsoverrider.go, envoverrider.go, imageoverrider.go, etc...).
    func (o *OverrideManager) ApplyOverrides(rawObjs *unstructured.Unstructured, overrideInfo OverriderInfo) error {
  4. Updates, creates or deletes k8s resources.

@EraKin575

EraKin575 commented Sep 28, 2024

Copy link
Copy Markdown
Contributor Author

Hi! I have a question. Can you please tell me about the matching logic for targetnodegroup overriders? How does it find the node group to override the deployment spec?

1. Assign a instance `overridemanager.OverrideManager` to `Overrider` field in controller management.https://github.com/kubeedge/kubeedge/blob/841368a8b412169dff3cd7bb34b12f3fe8cfa9a7/cloud/pkg/controllermanager/controllermanager.go#L63

2. Loops overrider-infos to overrides the Deployment yaml https://github.com/kubeedge/kubeedge/blob/841368a8b412169dff3cd7bb34b12f3fe8cfa9a7/cloud/pkg/controllermanager/edgeapplication/edgeapplicationcontroller.go#L130

3. Loops overriders to executes func `ApplyOverrides(..)` of each Overrider implementation(argsoverrider.go, envoverrider.go, imageoverrider.go, etc...).
   https://github.com/kubeedge/kubeedge/blob/841368a8b412169dff3cd7bb34b12f3fe8cfa9a7/cloud/pkg/controllermanager/edgeapplication/overridemanager/overridemanager.go#L35

4. Updates, creates or deletes k8s resources.

thanks for the help @WillardHu , but already I knew this part. I am sorry I couldn't explain my problem better. I am struggling to find the logic for finding the correct targetnodegroup that implements the deployments on. Example;

targetNodeGroups:
      - name: beijing
        overriders:
          resourcesOverriders:
            - containerName: container-1
              value: {}
      - name: shanghai
        overriders:
          resourcesOverriders:
            - containerName: container-1
              value: {}

where is the code to match the name of targetnodegroup with the actual NodeGroup where it does the deployment?

@kubeedge-bot kubeedge-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 3, 2024
@EraKin575

EraKin575 commented Oct 3, 2024

Copy link
Copy Markdown
Contributor Author

Hi, @WillardHu can you review this PR ? I have made the required changes and it seems to be working. For now, I have added tests for command overrider and it passed. If you approve the changes, I will add test for every overrider
cc @wbc6080

@EraKin575 EraKin575 requested a review from wbc6080 October 5, 2024 08:04
@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 5, 2024
Comment thread hack/generate-crds.sh Outdated
if [ "$(which controller-gen)" == "" ]; then
echo "Start to install controller-gen tool"
GO111MODULE=on go install -v sigs.k8s.io/controller-tools/cmd/controller-gen@v0.6.2
GO111MODULE=on go install -v sigs.k8s.io/controller-tools/cmd/controller-gen@latest

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A consistent version is required here, and the latest version will make a difference in comparison checks

Comment on lines +161 to +166
if err := utils.ApplyNodeAffinity(tmplCopy, info.TargetNodeLabelSelector); err != nil {
klog.Errorf("failed to apply node affinity to obj %s/%s of gvk %s, %v",
tmplCopy.GetNamespace(), tmplCopy.GetName(), tmplCopy.GroupVersionKind(), err)
errs = append(errs, err)
continue
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be put in the nodeselectoroverrider.go

Comment on lines +168 to +170
klog.Warningf("No nodes found matching labels for obj %s/%s of gvk %s",
tmplCopy.GetNamespace(), tmplCopy.GetName(), tmplCopy.GroupVersionKind())
continue

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to write the message to the EdgeApplication Status. And the historical error handling doesn't look good, the Reconcile() function returns an error only make it constant retried.

Comment on lines +62 to +69
affinity["nodeAffinity"].(map[string]interface{})["requiredDuringSchedulingIgnoredDuringExecution"].(map[string]interface{})["nodeSelectorTerms"].([]interface{})[0].(map[string]interface{})["matchExpressions"] = append(
affinity["nodeAffinity"].(map[string]interface{})["requiredDuringSchedulingIgnoredDuringExecution"].(map[string]interface{})["nodeSelectorTerms"].([]interface{})[0].(map[string]interface{})["matchExpressions"].([]interface{}),
map[string]interface{}{
"key": key,
"operator": "In",
"values": []interface{}{value},
},
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the processing performance is better with map, but the readability is worse. Is it better to use jsonpath?

@WillardHu

Copy link
Copy Markdown
Collaborator

The nameoverrider.go file is also to be modified

@WillardHu

WillardHu commented Oct 8, 2024

Copy link
Copy Markdown
Collaborator

where is the code to match the name of targetnodegroup with the actual NodeGroup where it does the deployment?

Refer to nodeselectoroverrider.go, when each node is bound by a node group, the node automatically adds a label apps.kubeedge.io/belonging-to @EraKin575

@EraKin575

Copy link
Copy Markdown
Contributor Author

You need to restore the changes related to the device @EraKin575

Should I also restore those crd files which have been modified and not added with 'v1alpha2' device versioning?

Comment thread cloud/pkg/controllermanager/edgeapplication/overridemanager/imageoverrider.go Outdated
Comment thread cloud/pkg/controllermanager/edgeapplication/edgeapplicationcontroller.go Outdated
@EraKin575 EraKin575 requested a review from WillardHu October 10, 2024 23:08
@EraKin575 EraKin575 force-pushed the override branch 4 times, most recently from c15ebd0 to 2709b73 Compare October 16, 2024 19:28
@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2024
@kubeedge-bot kubeedge-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2024
@EraKin575 EraKin575 force-pushed the override branch 2 times, most recently from 84197e0 to e4d1de5 Compare October 28, 2024 09:50
@EraKin575

Copy link
Copy Markdown
Contributor Author

@WillardHu please review the PR
CC @wbc6080

@EraKin575 EraKin575 changed the title WIP:adding deployment spec overrides via node label selector adding deployment spec overrides via node label selector Nov 2, 2024
@kubeedge-bot kubeedge-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2024
@WillardHu

Copy link
Copy Markdown
Collaborator

Please rebase your commits to one commit. @EraKin575

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

wip:adding deployment spec overrides for node label selector

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

wip:adding deployment spec overrides for node label selector

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified overriders

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

updated openapischema

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added openapiv3schema

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

fixed type for targetnodelabel

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added utility functions

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

implemented targetnodelabelselector functionality

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added tests for env overrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

updated crds

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

updated crds

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

updated crds

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

corrected gci

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added image overrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

fixed overrider test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

fixed overridermanager import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added integration tests

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

correct import order

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

restored changes related to device

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

restored changes related to device

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

restored changes related to device

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

restored changes related to device

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

restored changes related to device

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added controller gen version compatible to go1.21

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

moved node affinity to node selector

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

moved node affinity to node selector

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

changed edgeapplicationcontroller

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

changed edgeapplicationcontroller

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

changed edgeapplicationcontroller

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

moved logic to node selector overrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

moved logic to node selector overrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

moved logic to node selector overrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

removed unnecessary comments

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration tests

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration tests

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

removed unnecessary function

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added node affinity to node selector

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

trying to fix integration test

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

resolve conflicts

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

modified nameoverrider

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>

added watcher

Signed-off-by: EraKin575 <tejaskumar574@gmail.com>
Signed-off-by: EraKin575 <tejaskumar574@gmail.com>
Signed-off-by: EraKin575 <tejaskumar574@gmail.com>
@WillardHu

Copy link
Copy Markdown
Collaborator

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2024

@fisherxu fisherxu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubeedge-bot

Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fisherxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2024
@kubeedge-bot kubeedge-bot merged commit 6f8d383 into kubeedge:master Dec 6, 2024
@Shelley-BaoYue Shelley-BaoYue added this to the v1.20 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants