Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditions error on deleting AWSManagedControlPlane #2180

Closed
richardcase opened this issue Dec 30, 2020 · 8 comments · Fixed by #3157 or #3234
Closed

Conditions error on deleting AWSManagedControlPlane #2180

richardcase opened this issue Dec 30, 2020 · 8 comments · Fixed by #3157 or #3234
Labels
area/provider/eks Issues or PRs related to Amazon EKS provider help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@richardcase
Copy link
Member

richardcase commented Dec 30, 2020

/kind bug
/area provider/eks
/help

What steps did you take and what happened:
When an EKS cluster (AWSManagedControlPlane) with managed node group (AWSManagedMachinePool) is deleted in the e2e tests the following error is seen:

E1229 22:54:05.485310       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="error deleting network for AWSManagedControlPlane eks-nodes-hxozyw/cluster-aoqabq-control-plane: error patching conditions: The condition \"ClusterSecurityGroupsReady\" was modified by a different process and this caused a merge/ChangeCondition conflict:   \u0026v1alpha3.Condition{\n  \tType:               \"ClusterSecurityGroupsReady\",\n  \tStatus:             \"False\",\n  \tSeverity:           \"Info\",\n- \tLastTransitionTime: v1.Time{Time: s\"2020-12-29 22:54:03 +0000 UTC\"},\n+ \tLastTransitionTime: v1.Time{Time: s\"2020-12-29 22:54:05 +0000 UTC\"},\n- \tReason:             \"Deleting\",\n+ \tReason:             \"Deleted\",\n  \tMessage:            \"\",\n  }\n" "controller"="awsmanagedcontrolplane" "name"="cluster-aoqabq-control-plane" "namespace"="eks-nodes-hxozyw"

This implies that 2 different manager processes are modifying the ClusterSecurityGroupsReady condition on the AWSManagedControlPlane instance.

What did you expect to happen:
I expect that this error not occur when deleting. If we are changing the ClusterSecurityGroupsReady condition on the AWSManagedControlPlane in 2 different managers then this needs to be changed. The only managed allowed to modify the AWSManagedControlPlane is the EKS control plane manager.

Logs from a e2e run with the error are here

Environment:

  • Cluster-api-provider-aws version: 0.6.3
@k8s-ci-robot
Copy link
Contributor

@richardcase:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind bug
/area provider/eks
/help

What steps did you take and what happened:
When an EKS cluster (AWSManagedControlPlane) with managed node group (AWSManagedMachinePool) is deleted in the e2e tests the following error is seen:

E1229 22:54:05.485310       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="error deleting network for AWSManagedControlPlane eks-nodes-hxozyw/cluster-aoqabq-control-plane: error patching conditions: The condition \"ClusterSecurityGroupsReady\" was modified by a different process and this caused a merge/ChangeCondition conflict:   \u0026v1alpha3.Condition{\n  \tType:               \"ClusterSecurityGroupsReady\",\n  \tStatus:             \"False\",\n  \tSeverity:           \"Info\",\n- \tLastTransitionTime: v1.Time{Time: s\"2020-12-29 22:54:03 +0000 UTC\"},\n+ \tLastTransitionTime: v1.Time{Time: s\"2020-12-29 22:54:05 +0000 UTC\"},\n- \tReason:             \"Deleting\",\n+ \tReason:             \"Deleted\",\n  \tMessage:            \"\",\n  }\n" "controller"="awsmanagedcontrolplane" "name"="cluster-aoqabq-control-plane" "namespace"="eks-nodes-hxozyw"

This implies that 2 different manager processes are modifying the ClusterSecurityGroupsReady condition on the AWSManagedControlPlane instance.

What did you expect to happen:
I expect that this error not occur when deleting. If we are changing the ClusterSecurityGroupsReady condition on the AWSManagedControlPlane in 2 different managers then this needs to be changed. The only managed allowed to modify the AWSManagedControlPlane is the EKS control plane manager.

Environment:

  • Cluster-api-provider-aws version: 0.6.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/provider/eks Issues or PRs related to Amazon EKS provider help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 30, 2020
@richardcase
Copy link
Member Author

/milestone v0.7.x

@k8s-ci-robot k8s-ci-robot added this to the v0.7.x milestone Mar 12, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2021
@richardcase
Copy link
Member Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 10, 2021
@randomvariable randomvariable modified the milestones: v0.7.x, v1.x Nov 8, 2021
@randomvariable
Copy link
Member

Is this error still a thing?

@richardchen331
Copy link
Contributor

@richardcase
Copy link
Member Author

We had this issue with the eks e2e in release-0.6 as well. Depending on the fix it may be worth us backporting this.

@richardchen331
Copy link
Contributor

Here's the proposed fix #3157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/eks Issues or PRs related to Amazon EKS provider help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
5 participants