Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unintended resource updates to LB attatchments #9794

Merged
merged 7 commits into from Oct 28, 2020

Conversation

rdrgmnzs
Copy link
Contributor

@rdrgmnzs rdrgmnzs commented Aug 21, 2020

For Terraform when using aws_autoscaling_attachment a situation can happen where the LB is attached/detached from it's associated ASG (more details can be found on the Note section of https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_attachment ).

This change should prevent this from happening. To simplify the code, I also moved cloudformantion and kops managed to the new attachment format.

Fixes #9891, fixes #9913

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 21, 2020
@rdrgmnzs rdrgmnzs requested a review from rifelpet August 21, 2020 16:12
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 21, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rdrgmnzs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 21, 2020
@rifelpet
Copy link
Member

If someone has other attachment resources in their terraform state, this change would conflict. We may need to expose a way to provide additional ELB names or TG ARNs to be included in these lists.

@rdrgmnzs
Copy link
Contributor Author

@rifelpet not sure I follow. Isn't the external LB resource in Kops providing a way to attach other LB resources already? From what I understand the changes above don't prevent other LBs from being attached with aws_autoscaling_attachment it only gives a initial list of LBs that are associated with the ASG. Or am I missing something?

@rifelpet
Copy link
Member

rifelpet commented Aug 21, 2020

Ah I forgot that InstanceGroupSpec already has an ExternalLoadBalancers field. That note in the terraform docs suggests you should use either separate attachment resources or the field in the ASG resource but not both (hence the need to use ignore_changes).

The externallb integration test has the externalLoadBalancers field set, I would expect this PR to be updating its terraform output to include those additional load balancers / TGs, but that test doesnt have any changes at all. Is that expected?

@rdrgmnzs
Copy link
Contributor Author

Ah, see I was reading that note in the docs as you could use aws_autoscaling_attachment and inline load_balancers. However if you were using aws_autoscaling_attachment you would need to set the ignore_changes.

That's sort of how I ended up here, while using aws_autoscaling_attachment as it previously was (no inline load_balancers) I kept seeing the LBs being attached and detached between different terraform runs with no changes to the actual terraform file.

Either way, moving ExternalLoadBalancers to be inline load_balancers as well seems to be an easy enough change so let me update the diff to take care of that.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 27, 2020
Comment on lines +31 to +40
type TargetGroup struct {
Name *string
Lifecycle *fi.Lifecycle

// ARN is the Amazon Resource Name for the Target Group
ARN *string

// Shared is set if this is an external LB (one we don't create or own)
Shared *bool
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the TargetGroup struct mostly as a skeleton struct so that if someone decides to implement TargetGroup for uses in other places (such as API) they won't then have to figure out how to convert a list of string in AutoscalingGroup to an actual struct.

Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want a release note instructing anyone using terraform or cloudformation that is also defining LB attachments not using kops (in a separate .tf file in their terraform state, for example) to pass those TGs or ELBs through the ClusterSpec.

upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go Outdated Show resolved Hide resolved
@rdrgmnzs
Copy link
Contributor Author

rdrgmnzs commented Sep 1, 2020

/test pull-kops-e2e-k8s-containerd
/test pull-kops-e2e-cni-cilium
/test pull-kops-verify-staticcheck

Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the migration from aws_autoscaling_attachment to inline aws_autoscaling_group fields will be seamless or if there will be a period of time during the terraform apply in which the ASG is not registered with the ELB / TG which would result in downtime for the service.

docs/releases/1.19-NOTES.md Show resolved Hide resolved
@rdrgmnzs
Copy link
Contributor Author

rdrgmnzs commented Sep 3, 2020

Terraform plan shows it as a NOOP using terraform 0.12.29 and I saw no downtime while testing.

Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than one nit 👍

upup/pkg/fi/cloudup/awstasks/autoscalinggroup.go Outdated Show resolved Hide resolved
@rdrgmnzs
Copy link
Contributor Author

rdrgmnzs commented Sep 8, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2020
@rifelpet
Copy link
Member

rifelpet commented Oct 5, 2020

@rdrgmnzs any update on this? looks like its becoming a pretty common issue for terraform users.

@rdrgmnzs
Copy link
Contributor Author

rdrgmnzs commented Oct 5, 2020

Hey @rifelpet I’ve had this on hold because I found instances where the switch causes the API endpoint to have a downtime when migrating from the old format to the new one. Unfortunately I haven’t been able to find a way around that.

What do you think about having a note in the release that indicates folks will most likely experience that when migrating? I’m also open to other suggestions on how to handle this.

@rifelpet
Copy link
Member

rifelpet commented Oct 6, 2020

downtime could be problematic to anyone running a nodeport service and a load balancer defined through terraform.

I wonder if we could use the terraform state commands to modify the resource states. For example, if the upgrade switches from aws_autoscaling_attachment to the load_balancers field of the aws_autoscaling_group, terraform state rm aws_autoscaling_attachment.foo might do the trick.

Similar to the release notes we had for terraform 0.12 support in 1.17 and 1.18.

@bmelbourne
Copy link
Contributor

bmelbourne commented Oct 7, 2020

I came across this issue whilst working on #9940 and I like this solution provided users are using Terraform 0.12+, so I'd suggest merging in Kops v1.20 when Terraform 0.11 is deprecated.

@bmelbourne
Copy link
Contributor

@rdrgmnzs
Can you add Fixes #9891, fixes #9913 to the PR description to give the other contributors an opportunity to review this PR?

@rdrgmnzs
Copy link
Contributor Author

rdrgmnzs commented Oct 27, 2020

/hold cancel

Ok, added instructions on switching over without downtime & fixed nit.

@rdrgmnzs rdrgmnzs removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2020
Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor comments but overall lgtm!

docs/releases/1.19-NOTES.md Outdated Show resolved Hide resolved
pkg/model/awsmodel/autoscalinggroup.go Show resolved Hide resolved
@rifelpet
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit f466403 into kubernetes:master Oct 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Oct 28, 2020
k8s-ci-robot added a commit that referenced this pull request Oct 29, 2020
Automated cherry pick of #9794 and #10138 onto release-1.19
k8s-ci-robot added a commit that referenced this pull request Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation area/nodeup area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
4 participants