New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[job failed] 1.9-master upgrade|downgrade jobs #60764

Closed
krzyzacy opened this Issue Mar 5, 2018 · 67 comments

Comments

@krzyzacy
Member

krzyzacy commented Mar 5, 2018

this is an unbrella issue for k8s-testgrid.appspot.com/sig-release-master-upgrade

there are multiple test failures, and I will open individual issues for each failing test

/priority failing-test
/priority critical-urgent
/kind bug
/status approved-for-milestone
/sig cluster-lifecycle
/sig gcp

cc @jdumars @jberkus
and also cc @krousey who's our upgrade expert

@krzyzacy

This comment has been minimized.

Member

krzyzacy commented Mar 5, 2018

/milestone v1.10

@krzyzacy

This comment has been minimized.

Member

krzyzacy commented Mar 5, 2018

cc @kubernetes/sig-cluster-lifecycle-bugs
I think sig-cluster-lifecycle want to triage the downgrade suite which is timing-out on downgrading
/assign @roberthbailey @luxas @lukemarsden @jbeda
assign you folks for now, please reassign as appropriate

@krousey

This comment has been minimized.

Member

krousey commented Mar 5, 2018

Downgrades are failing to delete the stateful set test's namespace because PVCs remain. All the other tests are actually passing.

@krzyzacy

This comment has been minimized.

Member

krzyzacy commented Mar 5, 2018

@krousey do you know who I need to bug next?

@krousey

This comment has been minimized.

Member

krousey commented Mar 5, 2018

In particular: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-master-new-downgrade-cluster-parallel/1910

I0305 18:18:01.917] STEP: Destroying namespace "e2e-tests-sig-apps-statefulset-upgrade-86ffb" for this suite.
I0305 18:28:01.928] Mar  5 18:28:01.928: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
I0305 18:28:02.233] Mar  5 18:28:02.233: INFO: namespace: e2e-tests-sig-apps-statefulset-upgrade-86ffb, resource: bindings, ignored listing per whitelist
I0305 18:28:02.237] Mar  5 18:28:02.236: INFO: namespace: e2e-tests-sig-apps-statefulset-upgrade-86ffb, resource: persistentvolumeclaims, items remaining: 3
I0305 18:28:02.244] Mar  5 18:28:02.243: INFO: namespace: e2e-tests-sig-apps-statefulset-upgrade-86ffb, DeletionTimetamp: 2018-03-05 18:18:01 +0000 UTC, Finalizers: [kubernetes], Phase: Terminating
I0305 18:28:02.246] Mar  5 18:28:02.246: INFO: namespace: e2e-tests-sig-apps-statefulset-upgrade-86ffb, total namespaces: 12, active: 11, terminating: 1
I0305 18:28:02.249] Mar  5 18:28:02.249: INFO: Couldn't delete ns: "e2e-tests-sig-apps-statefulset-upgrade-86ffb": namespace e2e-tests-sig-apps-statefulset-upgrade-86ffb was not deleted with limit: timed out waiting for the condition, namespaced content other than pods remain (&errors.errorString{s:"namespace e2e-tests-sig-apps-statefulset-upgrade-86ffb was not deleted with limit: timed out waiting for the condition, namespaced content other than pods remain"})
@krousey

This comment has been minimized.

Member

krousey commented Mar 5, 2018

@krzyzacy We should pull in someone from storage and maybe someone from apps.

@krzyzacy

This comment has been minimized.

Member

krzyzacy commented Mar 5, 2018

/sig storage
/sig apps

cc @kow3ns @saad-ali ^^

@kow3ns kow3ns added this to Backlog in Workloads Mar 5, 2018

@kow3ns kow3ns moved this from Backlog to In Progress in Workloads Mar 6, 2018

@pospispa

This comment has been minimized.

Contributor

pospispa commented Mar 18, 2018

ACK. Waiting on reviews and approvals (PRs #61282 and #61316)
ETA: depends on reviews and approvals
Risks: N/A

pospispa added a commit to pospispa/kubernetes that referenced this issue Mar 18, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764

pospispa added a commit to pospispa/kubernetes that referenced this issue Mar 18, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764
@pospispa

This comment has been minimized.

Contributor

pospispa commented Mar 18, 2018

In K8s 1.10 before K8s 1.11 release:

  1. Modify the pvc-protection-controller to run in finalizer deletion only mode if disabled.
  2. Modify pv-protection-controller to run in finalizer deletion only mode if disabled.

I've created PR #61324 for the above.

pospispa added a commit to pospispa/kubernetes that referenced this issue Mar 19, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764

gnufied added a commit to gnufied/kubernetes that referenced this issue Mar 19, 2018

Always Start pvc-protection-controller
After K8s 1.9 is upgraded to K8s 1.10 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature is enabled by default in K8s 1.10.
However, when K8s 1.10 is downgraded to K8s 1.9 the finalizers remain in the PVCs and as pvc-protection-controller
is not started by default in K8s 1.9 finalizers are not removed automatically from deleted PVCs and that's why
deleted PVC are not removed but remain in Terminating phase.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.

Related issue: kubernetes#60764

gnufied added a commit to gnufied/kubernetes that referenced this issue Mar 19, 2018

Backport pv-protection-controller Finalizer Removal Part
After K8s 1.9 is upgraded to K8s 1.10 finalizer [kubernetes.io/pv-protection] is added to PVs
because StorageObjectInUseProtection feature is enabled by default in K8s 1.10.
However, when K8s 1.10 is downgraded to K8s 1.9 the finalizers remain in the PVs and as pv-protection-controller
does not exist in K8s 1.9 PV finalizers are not removed automatically from deleted PVs and that's why
deleted PV remain in the system.

That's why the finalizer removing part of the pv-protection-controller is backported from K8s 1.10 in order to remove
finalizers automatically when a PV is deleted and is not Bound to a PVC.

Related issue: kubernetes#60764
Related pv-protection-controller PR: kubernetes#58743

gnufied added a commit to gnufied/kubernetes that referenced this issue Mar 19, 2018

Backport pv-protection-controller Finalizer Removal Part
After K8s 1.9 is upgraded to K8s 1.10 finalizer [kubernetes.io/pv-protection] is added to PVs
because StorageObjectInUseProtection feature is enabled by default in K8s 1.10.
However, when K8s 1.10 is downgraded to K8s 1.9 the finalizers remain in the PVs and as pv-protection-controller
does not exist in K8s 1.9 PV finalizers are not removed automatically from deleted PVs and that's why
deleted PV remain in the system.

That's why the finalizer removing part of the pv-protection-controller is backported from K8s 1.10 in order to remove
finalizers automatically when a PV is deleted and is not Bound to a PVC.

Related issue: kubernetes#60764
Related pv-protection-controller PR: kubernetes#58743
@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 20, 2018

[MILESTONENOTIFIER] Milestone Issue: Up-to-date for process

@krzyzacy @lukemarsden @luxas @roberthbailey

Issue Labels
  • sig/apps sig/cluster-lifecycle sig/gcp sig/storage: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@jberkus

This comment has been minimized.

jberkus commented Mar 21, 2018

@krzyzacy @childsb @saad-ali where are we on this? I thought we had a plan, and it was executed, and the remaining open PRs were for the future 1.9 patches. What's waiting to be done so that we can move this issue out of the milestone?

@liggitt

This comment has been minimized.

Member

liggitt commented Mar 21, 2018

@liggitt

This comment has been minimized.

Member

liggitt commented Mar 21, 2018

some tests are still flaking in the job, but no solid red tests

@krzyzacy

This comment has been minimized.

Member

krzyzacy commented Mar 21, 2018

I think we can close this issue, and I'll keep an eye there and open a few more flake issues

@krzyzacy krzyzacy closed this Mar 21, 2018

Workloads automation moved this from In Progress to Done Mar 21, 2018

pospispa added a commit to pospispa/kubernetes that referenced this issue Apr 20, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764

pospispa added a commit to pospispa/kubernetes that referenced this issue Apr 20, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764

k8s-merge-robot added a commit that referenced this issue Apr 21, 2018

Merge pull request #61324 from pospispa/60764-K8s-1.10-StorageObjectI…
…nUseProtection-downgrade-issue

Automatic merge from submit-queue (batch tested with PRs 61324, 62880, 62765). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Always Start pvc-protection-controller and pv-protection-controller

**What this PR does / why we need it**:
After K8s 1.10 is upgraded to K8s 1.11 finalizer `[kubernetes.io/pvc-protection]` is added to PVCs
because `StorageObjectInUseProtection` feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the `StorageObjectInUseProtection` feature is disabled the finalizers remain in the PVCs and as `pvc-protection-controller` is not started in K8s 1.10 finalizers are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system but remain in `Terminating` phase.
The same applies to `pv-protection-controller` and `[kubernetes.io/pvc-protection]` finalizer in PVs.

That's why `pvc-protection-controller` is always started because the `pvc-protection-controller` removes finalizers from PVCs automatically when a PVC is not in active use by a pod.
Also the `pv-protection-controller` is always started to remove finalizers from PVs automatically when a PV is not `Bound` to a PVC.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes N/A
This issue #60764 is for downgrade from K8s 1.10 to K8s 1.9.
This PR fixes the same problem but for downgrade from K8s 1.11 to K8s 1.10.

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

pospispa added a commit to pospispa/kubernetes that referenced this issue Apr 21, 2018

Always Start pvc-protection-controller and pv-protection-controller
StorageObjectInUseProtection feature is enabled by default in K8s 1.10+. Assume K8s cluster is used with this feature enabled, i.e. finalizers are added to all PVs and PVCs. In case the K8s cluster admin disables the StorageObjectInUseProtection feature and a user deletes a PVC that is not in active use by a pod then the PVC is not removed from the system because of the finalizer. Therefore, the user will have to remove the finalizer manually in order to have the PVC removed from the system. Note: deleted PVs won't be removed from the system also because of finalizers.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers from PVCs automatically when a PVC is not in active use by a pod. Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not Bound to a PVC.

Related issue:
kubernetes#60764

Related PRs:
kubernetes#61370
kubernetes#61324

k8s-merge-robot added a commit that referenced this issue Jun 4, 2018

Merge pull request #62938 from pospispa/60764-StorageObjectInUseProte…
…ction-downgrade-issue-cherry-pick-into-K8s-1.10

Automatic merge from submit-queue.

cherry-pick into K8s 1.10: Always Start pvc-protection-controller and pv-protection-controller

**What this PR does / why we need it**:
StorageObjectInUseProtection feature is enabled by default in K8s 1.10+. Assume K8s cluster is used with this feature enabled, i.e. finalizers are added to all PVs and PVCs. In case the K8s cluster admin disables the StorageObjectInUseProtection feature and a user deletes a PVC that is not in active use by a pod then the PVC is not removed from the system because of the finalizer. Therefore, the user will have to remove the finalizer manually in order to have the PVC removed from the system. Note: deleted PVs won't be removed from the system also because of finalizers.

This problem was fixed in [K8s 1.9.6](https://github.com/kubernetes/kubernetes/releases/tag/v1.9.6) in PR #61370
This problem is also fixed in K8s 1.11+ in PR #61324
However, this problem is not fixed in K8s 1.10, that's why I've cherry-picked the PR #61324 and proposing to merge it into K8s 1.10.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes 
N/A

Related issue: #60764

**Special notes for your reviewer**:

**Release note**:

```release-note
In case StorageObjectInUse feature is disabled and Persistent Volume (PV) or Persistent Volume Claim (PVC) contains a finalizer and the PV or PVC is deleted it is not automatically removed from the system. Now, it is automatically removed.
```

KIVagant added a commit to KIVagant/kubernetes that referenced this issue Jul 4, 2018

Always Start pvc-protection-controller and pv-protection-controller
After K8s 1.10 is upgraded to K8s 1.11 finalizer [kubernetes.io/pvc-protection] is added to PVCs
because StorageObjectInUseProtection feature will be GA in K8s 1.11.
However, when K8s 1.11 is downgraded to K8s 1.10 and the StorageObjectInUseProtection feature is disabled
the finalizers remain in the PVCs and as pvc-protection-controller is not started in K8s 1.10 finalizers
are not removed automatically from deleted PVCs and that's why deleted PVC are not removed from the system
but remain in Terminating phase.
The same applies to pv-protection-controller and [kubernetes.io/pvc-protection] finalizer in PVs.

That's why pvc-protection-controller is always started because the pvc-protection-controller removes finalizers
from PVCs automatically when a PVC is not in active use by a pod.
Also the pv-protection-controller is always started to remove finalizers from PVs automatically when a PV is not
Bound to a PVC.

Related issue: kubernetes#60764
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment