Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RollingUpdate behaviour when using LaunchTemplates for both kops & terraform spec updates #8261

Merged
merged 1 commit into from
Jan 4, 2020

Conversation

KashifSaadat
Copy link
Contributor

@KashifSaadat KashifSaadat commented Jan 3, 2020

There have been a few regressions in the way InstanceGroups are marked for a RollingUpdate when LaunchTemplates are involved (e.g. when using Mixed Instance Policies). This bug resulted in one of two scenarios that I've seen (with previous attempted fixes regressing one of the scenarios):

  • Updates via kops (kops update cluster --yes) did not set a LT Version, and so an Instance Group using a Mixed Instance Policy would always be marked as NeedsUpdate when performing a Rolling Update
  • Updates via terraform (kops update cluster --out=tf-files/ --target=terraform) were not being marked for a RollingUpdate as the Version was not picked up and matched between changes

This PR aims to fix the above two scenarios. I've tested a cluster build with 2 different Instance Groups (one standard and another with mixed policy), via kops updates and terraform, and both appear to have expected behaviour with marking the correct nodes for an update between changes.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 3, 2020
@KashifSaadat
Copy link
Contributor Author

/assign @gambol99

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KashifSaadat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 3, 2020
@gambol99
Copy link
Contributor

gambol99 commented Jan 4, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit c345b93 into kubernetes:master Jan 4, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 4, 2020
@KashifSaadat KashifSaadat deleted the lt-rolling-updates branch January 6, 2020 08:10
@paalkr
Copy link

paalkr commented Mar 2, 2020

Will this be added to the 1.16 branch, and included in the next 1.16 release?

k8s-ci-robot added a commit that referenced this pull request Mar 31, 2020
…8567-upstream-release-1.16

Automated cherry pick of #8261: Fix RollingUpdate behaviour when using LaunchTemplates for #8567: Treat nil of LaunchTemplateSpecification.Version as $Default
k8s-ci-robot added a commit that referenced this pull request Mar 31, 2020
…8261-#8567-upstream-release-1.17

Automated cherry pick of #8038: Fix Handling of LaunchTemplate Versions for #8261: Fix RollingUpdate behaviour when using LaunchTemplates for #8567: Treat nil of LaunchTemplateSpecification.Version as $Default
kaldorn pushed a commit to getoutreach/kops that referenced this pull request Jul 29, 2020
… when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Mar 29, 2021
… when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Mar 29, 2021
… when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Mar 29, 2021
… when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Mar 29, 2021
… when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Apr 14, 2021
Cache LaunchConfigurations

On any given read operation for LCs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Cache AMIs

AMIs can often be the same across different ASGs.
Cache on each fetch for faster lookup later.

Cache autoscaling groups

On any given read operation for ASGs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Don't default adding MIMEBOUNDARY headers when a mixed instances policy is set

Fixed "NeedsUpdate" status of nodes in mixedinstancegroups after rolling update kubernetes#7445

https://github.com/kubernetes/kops/pull/7445/files

Upgrading k8s-srcdst to v.0.2.2

https://github.com/kubernetes/kops/pull/7388/files

Align AWS and kops validation for spot allocation strategy

https://github.com/kubernetes/kops/pull/7660/files

add our calico changes

calico-kube-controllers is required: https://github.com/kubernetes/kops/pull/7517/files

calico-node patch: https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-node.yaml

calico-config patch:
https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-config.yaml

calico-typha:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-typha.yaml

calico-kube-controllers:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-kube-controllers.yaml

Update aws_cloud.go

Patching in capacity-optimized spot allocation strategy and updating AWS SDK

Fix Handling of LaunchTemplate Versions for MixedInstancePolicy according to  kubernetes#8047

Automated cherry pick of kubernetes#8261: Fix RollingUpdate behaviour when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808

Machine types update from - kubernetes#7947

A4-935 Make CircleCI build pipeline for kops fork

[A4-935](https://outreach-io.atlassian.net/browse/A4-935)

Adds a `.circleci/config.yml` to allow us to reproducibly build and
upload assets for our fork of kops.  This is used mainly to backport
fixes and features into a 0.13-based branch.

The management of this fork is complicated by the fact that kops
configures nodes to go load the `nodeup` binary from a well known URL
managed by vanilla upstream.  We need to have our own S3 bucket with our
own custom built binaries ready for download onto our nodes if we are to
make changes to `nodeup` behavior, which is sometimes necesasry for the
features we want to backport.  So this CircleCI build goes through all
the effort of building those assets and uploading them to S3.

Tweak `Gopkg.toml` and run `make dep-ensure`

Updates `Gopkg.toml` to attempt to work-around the fact that "goautoneg"
no longer lives at bitbucket.org.  The update process here was very
finnicky.  I had to make the update and delete some old generated files
to get `make dep-ensure` to run to completion.

Checks in the results of `make dep-ensure`.  I suspect that last time
there were changes to `Gopkg.toml` in [1] the changes to generated files
were not fully committed and so we've partly lost the ability to build
from this particular fork of kops.

[1] 0984f14

Update gitignore preventing checkin of go-bindata vendor

Upload to path without a `+`

Upload a duplicate copy of our assets to a path that doesn't include a
`+` sign.

Although the S3 issue can be worked around by referencing the path as
`%2B`, it seems `kops`, via the Go `url` package, will aggressively
convert it back into a `+` and not re-encode it.  The kops and Go
behaviors would be fine if S3 followed the spec, but it doesn't.  The
easiest and safest work-around to this whole mess is to just not have
any + signs in our path.

Expose API Server flags needed for aws pod identities

This adds the fields described in the documentation here:

https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md#kubernetes-api-server-configuration

Update k8s-1.12.yaml.template

fix: calico

Merge pull request #12 from getoutreach/fix-calico

fix: calico
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Apr 14, 2021
Cache LaunchConfigurations

On any given read operation for LCs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Cache AMIs

AMIs can often be the same across different ASGs.
Cache on each fetch for faster lookup later.

Cache autoscaling groups

On any given read operation for ASGs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Don't default adding MIMEBOUNDARY headers when a mixed instances policy is set

Fixed "NeedsUpdate" status of nodes in mixedinstancegroups after rolling update kubernetes#7445

https://github.com/kubernetes/kops/pull/7445/files

Upgrading k8s-srcdst to v.0.2.2

https://github.com/kubernetes/kops/pull/7388/files

Align AWS and kops validation for spot allocation strategy

https://github.com/kubernetes/kops/pull/7660/files

add our calico changes

calico-kube-controllers is required: https://github.com/kubernetes/kops/pull/7517/files

calico-node patch: https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-node.yaml

calico-config patch:
https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-config.yaml

calico-typha:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-typha.yaml

calico-kube-controllers:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-kube-controllers.yaml

Update aws_cloud.go

Patching in capacity-optimized spot allocation strategy and updating AWS SDK

Fix Handling of LaunchTemplate Versions for MixedInstancePolicy according to  kubernetes#8047

Automated cherry pick of kubernetes#8261: Fix RollingUpdate behaviour when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808

Machine types update from - kubernetes#7947

A4-935 Make CircleCI build pipeline for kops fork

[A4-935](https://outreach-io.atlassian.net/browse/A4-935)

Adds a `.circleci/config.yml` to allow us to reproducibly build and
upload assets for our fork of kops.  This is used mainly to backport
fixes and features into a 0.13-based branch.

The management of this fork is complicated by the fact that kops
configures nodes to go load the `nodeup` binary from a well known URL
managed by vanilla upstream.  We need to have our own S3 bucket with our
own custom built binaries ready for download onto our nodes if we are to
make changes to `nodeup` behavior, which is sometimes necesasry for the
features we want to backport.  So this CircleCI build goes through all
the effort of building those assets and uploading them to S3.

Tweak `Gopkg.toml` and run `make dep-ensure`

Updates `Gopkg.toml` to attempt to work-around the fact that "goautoneg"
no longer lives at bitbucket.org.  The update process here was very
finnicky.  I had to make the update and delete some old generated files
to get `make dep-ensure` to run to completion.

Checks in the results of `make dep-ensure`.  I suspect that last time
there were changes to `Gopkg.toml` in [1] the changes to generated files
were not fully committed and so we've partly lost the ability to build
from this particular fork of kops.

[1] 0984f14

Update gitignore preventing checkin of go-bindata vendor

Upload to path without a `+`

Upload a duplicate copy of our assets to a path that doesn't include a
`+` sign.

Although the S3 issue can be worked around by referencing the path as
`%2B`, it seems `kops`, via the Go `url` package, will aggressively
convert it back into a `+` and not re-encode it.  The kops and Go
behaviors would be fine if S3 followed the spec, but it doesn't.  The
easiest and safest work-around to this whole mess is to just not have
any + signs in our path.

Expose API Server flags needed for aws pod identities

This adds the fields described in the documentation here:

https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md#kubernetes-api-server-configuration

Update k8s-1.12.yaml.template

fix: calico

Merge pull request #12 from getoutreach/fix-calico

fix: calico
jaredallard pushed a commit to getoutreach/kops that referenced this pull request Apr 19, 2021
Cache LaunchConfigurations

On any given read operation for LCs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Cache AMIs

AMIs can often be the same across different ASGs.
Cache on each fetch for faster lookup later.

Cache autoscaling groups

On any given read operation for ASGs, warm a thread-safe cache
if needed. Continue to use this cache until a write operation
is performed.

Don't default adding MIMEBOUNDARY headers when a mixed instances policy is set

Fixed "NeedsUpdate" status of nodes in mixedinstancegroups after rolling update kubernetes#7445

https://github.com/kubernetes/kops/pull/7445/files

Upgrading k8s-srcdst to v.0.2.2

https://github.com/kubernetes/kops/pull/7388/files

Align AWS and kops validation for spot allocation strategy

https://github.com/kubernetes/kops/pull/7660/files

add our calico changes

calico-kube-controllers is required: https://github.com/kubernetes/kops/pull/7517/files

calico-node patch: https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-node.yaml

calico-config patch:
https://github.com/getoutreach/kube_factory/blob/master/patches/calico/calico-config.yaml

calico-typha:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-typha.yaml

calico-kube-controllers:
https://github.com/getoutreach/kube_factory/blob/master/addons/calico/calico-kube-controllers.yaml

Update aws_cloud.go

Patching in capacity-optimized spot allocation strategy and updating AWS SDK

Fix Handling of LaunchTemplate Versions for MixedInstancePolicy according to  kubernetes#8047

Automated cherry pick of kubernetes#8261: Fix RollingUpdate behaviour when using LaunchTemplates for kubernetes#8567: Treat nil of LaunchTemplateSpecification.Version as  from kubernetes#8808

Machine types update from - kubernetes#7947

A4-935 Make CircleCI build pipeline for kops fork

[A4-935](https://outreach-io.atlassian.net/browse/A4-935)

Adds a `.circleci/config.yml` to allow us to reproducibly build and
upload assets for our fork of kops.  This is used mainly to backport
fixes and features into a 0.13-based branch.

The management of this fork is complicated by the fact that kops
configures nodes to go load the `nodeup` binary from a well known URL
managed by vanilla upstream.  We need to have our own S3 bucket with our
own custom built binaries ready for download onto our nodes if we are to
make changes to `nodeup` behavior, which is sometimes necesasry for the
features we want to backport.  So this CircleCI build goes through all
the effort of building those assets and uploading them to S3.

Tweak `Gopkg.toml` and run `make dep-ensure`

Updates `Gopkg.toml` to attempt to work-around the fact that "goautoneg"
no longer lives at bitbucket.org.  The update process here was very
finnicky.  I had to make the update and delete some old generated files
to get `make dep-ensure` to run to completion.

Checks in the results of `make dep-ensure`.  I suspect that last time
there were changes to `Gopkg.toml` in [1] the changes to generated files
were not fully committed and so we've partly lost the ability to build
from this particular fork of kops.

[1] 0984f14

Update gitignore preventing checkin of go-bindata vendor

Upload to path without a `+`

Upload a duplicate copy of our assets to a path that doesn't include a
`+` sign.

Although the S3 issue can be worked around by referencing the path as
`%2B`, it seems `kops`, via the Go `url` package, will aggressively
convert it back into a `+` and not re-encode it.  The kops and Go
behaviors would be fine if S3 followed the spec, but it doesn't.  The
easiest and safest work-around to this whole mess is to just not have
any + signs in our path.

Expose API Server flags needed for aws pod identities

This adds the fields described in the documentation here:

https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md#kubernetes-api-server-configuration

Update k8s-1.12.yaml.template

fix: calico

Merge pull request #12 from getoutreach/fix-calico

fix: calico
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants