Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes upgrades to v1.19 are flaky #3564

Closed
vincepri opened this issue Aug 31, 2020 · 5 comments
Closed

Kubernetes upgrades to v1.19 are flaky #3564

vincepri opened this issue Aug 31, 2020 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@vincepri
Copy link
Member

vincepri commented Aug 31, 2020

During today session, we manually tested upgrading Kubernetes clusters from v1.18 to v1.19. More often than not, upgrades didn't successfully complete.

From @ncdc:

In Kubernetes 1.19.0, the certificates API (which does CertificateSigningRequests for node bootstrapping, among other things) introduced version v1. The kube-controller-manager switched to only speaking v1 of this API.

If you have a control plane with a load balancer in front, it is possible/likely that during an upgrade from 1.18 to 1.19, the 1.18 kube-controller-manager will be removed (as part of deleting one of the old machines), and the 1.19 kube-controller-manager will potentially acquire the leader election lock. It then connects to the load balancer, and there’s a chance it gets routed to one of the remaining 1.18 apiservers. If/when this happens, the kube-controller-manager will be unable to process any new CertificateSigningRequests (because it’s asking for certificates v1, which doesn’t exist in 1.18). This means all new node bootstrapping will fail.

The solution is to update kubeadm so the kube-controller-manager, kube-scheduler, and kubelet of control plane nodes talk to the local API endpoint for the apiserver (i.e. the local apiserver static pod), instead of going through the load balancer.

For more information see this slack thread.

Related to kubernetes/kubeadm#2270 and kubernetes/kubeadm#2271.

/milestone v0.3.10
/kind bug

@k8s-ci-robot
Copy link
Contributor

@vincepri: The label(s) kind/ cannot be applied, because the repository doesn't have them

In response to this:

During today session, we manually tested upgrading Kubernetes clusters from v1.18 to v1.19. More often than not, upgrades didn't successfully complete.

From @ncdc:

In Kubernetes 1.19.0, the certificates API (which does CertificateSigningRequests for node bootstrapping, among other things) introduced version v1. The kube-controller-manager switched to only speaking v1 of this API.
If you have a control plane with a load balancer in front, it is possible/likely that during an upgrade from 1.18 to 1.19, the 1.18 kube-controller-manager will be removed (as part of deleting one of the old machines), and the 1.19 kube-controller-manager will potentially acquire the leader election lock. It then connects to the load balancer, and there’s a chance it gets routed to one of the remaining 1.18 apiservers. If/when this happens, the kube-controller-manager will be unable to process any new CertificateSigningRequests (because it’s asking for certificates v1, which doesn’t exist in 1.18). This means all new node bootstrapping will fail.
The solution is to update kubeadm so the kube-controller-manager, kube-scheduler, and kubelet of control plane nodes talk to the local API endpoint for the apiserver (i.e. the local apiserver static pod), instead of going through the load balancer.

For more information see this slack thread.

Related to kubernetes/kubeadm#2270 and kubernetes/kubeadm#2271.

/milestone v0.3.10
/kind bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v0.3.10 milestone Aug 31, 2020
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 31, 2020
@vincepri
Copy link
Member Author

/close

This has been fixed with Kubernetes 1.19.1

@k8s-ci-robot
Copy link
Contributor

@vincepri: Closing this issue.

In response to this:

/close

This has been fixed with Kubernetes 1.19.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chymy
Copy link
Contributor

chymy commented Apr 13, 2021

Why not upgrade across 2 Minor versions? Is it related to API compatibility? For example, upgrade from v1.17.9 to v1.19.4

// Since upgrades to the next minor version are allowed, irrespective of the patch version.
ceilVersion := semver.Version{
Major: fromVersion.Major,
Minor: fromVersion.Minor + 2,
Patch: 0,
}
if toVersion.GTE(ceilVersion) {
allErrs = append(allErrs,
field.Forbidden(
field.NewPath("spec", "version"),
fmt.Sprintf("cannot update Kubernetes version from %s to %s", previousVersion, in.Spec.Version),
),
)
}
return allErrs
}

@neolit123
Copy link
Member

neolit123 commented Apr 13, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants