-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes v1.6.4 -> v1.7.0 update breaks statefulsets #48327
Comments
@CallMeFoxie There are no sig labels on this issue. Please add a sig label by: |
Deleted the first affinity-test-0 POD and this is the difference between new (affinity-test-0.yaml) and old (affinity-test-1.yaml) PODs, a lot has changed! |
/sig apps |
Seeing this issue as well when upgrading from 1.6.4 to 1.7.3. |
This was broken in #44137 That PR straight out removed PodHostnameAnnotation and PodSubdomainAnnotation without leaving a clear migration path for statefulsets. This broke things in at least two ways:
This is an unacceptable migration story for statefulsets. Think of the very real case where a statefulset has a preStop hook or you have an external controller to manage additional state. Think mysql master and needing to do a master/slave swap before restarting the pod. These tools are all broken because the pods are not resolvable by the service or DNS, basically you have to know the IP address and connect directly. |
cc @k82cn |
Hey @liggitt you actually already supported both simultaneously for quite some time and the hostname/subdomain annotations have been deprecated for a while. The problem is that even brand new StatefulSet pods on 1.6.8 were still using the annotations instead of the spec fields. See: https://github.com/kubernetes/kubernetes/blob/v1.6.8/pkg/controller/statefulset/stateful_set_utils.go#L182-L190 Additionally, the problem introduced by not transitioning to the new fields is that all statefulset pods stopped resolving in DNS as soon as the apiservers were upgraded, which is a very serious and surprising operational problem. I think bugs related to not taking updates are much less severe by comparison since they don't cause an immediate application outage. Furthermore, there is absolutely no possible upgrade path other than deleting all Statefulsets before the upgrade and re-creating them after since the hostname and subdomain fields are immutable. Surely, there has to be something better that could be done. This is the change that broke everything: 2899f47#diff-2ad3f4a4088a7249756d1b81af193493L188 It switched StatefulSets to use the spec fields, and removed the code in the getter that would honor the annotations in the same PR, leaving no versions of kubernetes pre-1.7 that would create statefulsets that can be safely upgraded. I think the fix that would help the most people is:
|
FYI @liggitt we used the above admission controller (HubSpot#7) to safely migrate our few hundred statefulsets to the new fields before upgrading. So I see no reason why @PaulFurtado's 1.6.9 suggestion wouldn't be possible. Is there some context we don't get? |
Another proposal being considered:
This should get us back to a safe state as far as I can tell. We would then have time to work out a plan for how to eventually stop respecting the annotation. |
Automatic merge from submit-queue [1.6] StatefulSet: Set Pod hostname/subdomain fields. This is a partial, manual cherrypick of #44137 to provide a path to mitigate #48327. To prepare for upgrading to 1.7+ in which the hostname/subdomain annotations are removed, we should have started setting the corresponding fields in a prior version. Pods that are freshly created by StatefulSet after updating to v1.6.9 will continue to function after upgrading to Kubernetes v1.7.x. ```release-note StatefulSet: Set hostname/subdomain fields on new Pods in addition to the deprecated annotations, to allow mitigation of Pod DNS issues upon upgrading to Kubernetes v1.7.x. ```
#50942 and #51149 should at least allow a viable, safe upgrade path between 1.6.[x>=9] and 1.7.[y>=5] as per the suggestion of @PaulFurtado. I'm going to meet with the API Machinery SIG lead and see if there is more that we can do here. |
So we landed on relaxing the validation in 1.7.x to make fields mutable if empty and modifying the controller to set them if empty. There still may be disruption to the network if you don't go through 1.6.9 and do a rolling recreate of all of the Pods, but at least it should be self healing. |
Automatic merge from submit-queue Manual cherrypick of #51043 StatefulSet controller no longer attempts to mutate v1.PodSpec.Hostname or v1.PodSpec.Subdomain fixes: #51043 Partial fix for #48327 Manual cherrypick of #51044 ```release-note StatefulSet: Fix "forbidden pod updates" error on Pods created prior to upgrading to 1.7. (#48327) ```
v1.6.9 is out now: https://github.com/kubernetes/kubernetes/releases/tag/v1.6.9 |
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166) Add statefulset upgrade tests to cluster_upgrade **What this PR does / why we need it**: Adds already created statefulset upgrade tests to cluster_upgrade.go. With further test infra changes, this will allow them to be continuously run, giving better signals. Detect and prevent issues like #48327 **Release note**: ```release-note NONE ```
…-upstream-release-1.6 Automatic merge from submit-queue. Automated cherry pick of #51199 Cherry pick of #51199 on release-1.6. This allows users to completely avoid any disruption due to #48327, by first upgrading to a 1.6.x release containing this patch, before upgrading to 1.7.x. #51199: Makes Hostname and Subdomain fields of v1.PodSpec settable
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Add statefulset upgrade tests to be run as part of upgrade testing Statefulset upgrade testing is not running at all in any testsuite. This has caused issues in the past like: #48327 Changing the tag to make it run in existing upgrade test clusters. @krzyzacy @kubernetes/sig-apps-misc @kubernetes/sig-release-members @kow3ns @enisoc
Automatic merge from submit-queue. Allow Hostname and Subdomain to be set if empty **What this PR does / why we need it**: This PR allows the Hostname and Subdomain field of v1.PodSpec to be set when empty, and modifies the StatefulSet controller to set them when empty. For #48327: We have merged #50942 to ensure that the Hostname and Subdomain fields are set when a new Pod is created. Users should upgrade to 1.6.9 and perform a rolling restart of all Pods in their StatefulSets to ensure that these fields are set prior to an upgrade to 1.7.5. We have merged #51149 and #51044 to rollback the attempted mutation introduced in #44137. This PR allows the Hostname and Subdomain field to be set exactly once, so that when users fail to read the notes, and encounter this issue, their clusters should self heal (even though they will experience a temporary network disruption for Pods in their StatefulSets.) ```release-note StatefulSet will now fill the `hostname` and `subdomain` fields if they're empty on existing Pods it owns. This allows it to self-correct the issue where StatefulSet Pod DNS entries disappear after upgrading to v1.7.x (#48327). ```
We've addressed this in the referenced PRs. |
/kind bug
What happened:
Updated from Kubernetes v1.6.4 environment to v1.7.0 environment with existing statefulsets. After seeing some errors regarding to scheduling I noticed repeated event:
It seems that the statefulset is trying to update all its pods but with something that is immutable. Trying to figure out what and my current best guess is serviceAccount (which was not there previously when it was created).
What you expected to happen:
Delete and re-create the pods since it is changing immutable fields
How to reproduce it (as minimally and precisely as possible):
I have logs from controller-manager but they are rather spammy at --v=8 so if you have something to look for - here is what I guessed should be relevant (private info removed): https://gist.github.com/CallMeFoxie/353971434e20a3048b368d3fc835ab76
Environment:
kubectl version
): v1.6.4 -> v1.7.0The text was updated successfully, but these errors were encountered: