Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade k8s-infra prow build clusters from v1.14 to v1.15 #1120

Closed
spiffxp opened this issue Aug 7, 2020 · 27 comments
Closed

Upgrade k8s-infra prow build clusters from v1.14 to v1.15 #1120

spiffxp opened this issue Aug 7, 2020 · 27 comments
Assignees
Labels
area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@spiffxp
Copy link
Member

spiffxp commented Aug 7, 2020

v1.14 deprecation was announced here: https://cloud.google.com/kubernetes-engine/docs/release-notes#coming-soon-20200722

The clusters should have been automatically upgraded: https://cloud.google.com/kubernetes-engine/docs/release-notes#scheduled_automatic_upgrades

This issue is to confirm whether they have been, and if not, initiate such an upgrade

/area prow
/wg k8s-infra
/sig testing

@k8s-ci-robot k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Aug 7, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Aug 8, 2020

We're sitting on 1.14.10-gke.42 for the control plane, and 1.14.10-gke.37 for the main node pool

@spiffxp
Copy link
Member Author

spiffxp commented Aug 31, 2020

We're still sitting on 1.14. I am waiting until we've drained the bulk of outstanding v1.20 PR's (ref: https://groups.google.com/g/kubernetes-dev/c/YXGBa6pxLzo/discussion) before explicitly triggering this.

There's still a chance it'll happen when we're not watching though.

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

Upgrading k8s-infra-prow-build's control plane from 1.14.10-gke.42 to 1.15.12-gke.17

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

Control plane upgraded, next the greenhouse nodepool. This may disrupt bazel-based jobs, though they should fall back to not using the cache when it's unavailable.

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

Greenhouse nodepool upgraded. Waiting until kubernetes/test-infra#19182 (comment) is resolved before proceeding with the main nodepool

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

/assign
OK, prow was reverted and sinker is behaving as expected again.

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

Upgrading k8s-infra-prow-build's default node pool from 1.14.10-gke.42 to 1.15.12-gke.17
(4-5 min per node, currently 41 nodes... check back in 2 hours)

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

29/41 nodes done

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

And everything is at 1.15.12-gke.17 after ~4 hours

... so I think that node-by-node-upgrade and cluster-autoscaling aren't the best of friends. I suspect this would have gone more quickly if we had spun up an entirely new nodepool that was on 1.15.12 to begin with, and cordoned the the old nodepool.

I would like to move up to v1.16 next but I think we'll leave it here for the weekend

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

As a rough guess of "how disruptive was this?" I filtered down to kubernetes/kubernetes jobs on prow.k8s.io. Accepting that some presubmits are just gonna fail (on top of whatever flakiness may be out there), this looks reasonably non-disruptive. Specifically I'm looking at the left half of the graph (now - 6h ago)
Screen Shot 2020-09-11 at 4 09 49 PM

Another guess - look at the plank dashboard. No increase in jobs hitting failure state
(https://monitoring.prow.k8s.io/d/e1778910572e3552a935c2035ce80369/plank-dashboard?orgId=1&from=now-12h&to=now&var-cluster=k8s-infra-prow-build&var-org=All&var-repo=All&var-state=$__all&var-type=$__all&var-group_by_1=type&var-group_by_2=state&var-group_by_3=cluster)
Screen Shot 2020-09-11 at 4 16 46 PM

@spiffxp
Copy link
Member Author

spiffxp commented Sep 11, 2020

/close

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

/reopen
onnnn the other hand, @tpepper has noticed a number jobs seem to stuck in running for a while
e.g. https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/94470/pull-kubernetes-node-e2e/1304501511241338880/ say "7h" in progress

https://prow.k8s.io/tide-history?repo=kubernetes%2Fkubernetes&branch=master shows tide last issued a TRIGGER action around 1:50pm PT

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Reopened this issue.

In response to this:

/reopen
onnnn the other hand, @tpepper has noticed a number jobs seem to stuck in running for a while
e.g. https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/94470/pull-kubernetes-node-e2e/1304501511241338880/ say "7h" in progress

https://prow.k8s.io/tide-history?repo=kubernetes%2Fkubernetes&branch=master shows tide last issued a TRIGGER action around 1:50pm PT

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 12, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

looking at that node e2e job, I get this for prowjob yaml: https://prow.k8s.io/prowjob?prowjob=a56a5463-f464-11ea-a7c8-9eb8089ce657

pulling some useful fields from that

cluster: k8s-infra-prow-build
pod_name: a56a5463-f464-11ea-a7c8-9eb8089ce657

let's go look at the build cluster

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods | grep a56a5463-f464-11ea-a7c8-9eb8089ce657
a56a5463-f464-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          7h43m

let's see, are there other pods stuck in Terminating status?

$ k get pods -n test-pods | grep Terminating
01c66c46-f46d-11ea-8508-9a96569b20f1   1/1     Terminating   0          6h45m
14aea8d7-f45d-11ea-8508-9a96569b20f1   2/2     Terminating   0          8h
1c10ce48-f475-11ea-8508-9a96569b20f1   2/2     Terminating   0          5h47m
1fee4dd0-f46e-11ea-8508-9a96569b20f1   2/2     Terminating   0          6h37m
30bd3711-f46b-11ea-8508-9a96569b20f1   1/1     Terminating   0          6h58m
31079e2c-f46e-11ea-9130-56f34bfb1616   1/1     Terminating   0          6h37m
310be890-f46e-11ea-9130-56f34bfb1616   1/1     Terminating   0          6h37m
36910195-f46a-11ea-8508-9a96569b20f1   1/1     Terminating   0          7h5m
3c38c3b4-f469-11ea-8508-9a96569b20f1   1/1     Terminating   0          7h12m
3ca1a5cf-f45a-11ea-8233-42bc8ee613a9   1/1     Terminating   0          8h
44edcb66-f461-11ea-9284-967bb86a2b2f   1/1     Terminating   0          8h
5009aedd-f45e-11ea-8c01-ca1150a2ee58   1/1     Terminating   0          8h
50103df1-f45e-11ea-8c01-ca1150a2ee58   1/1     Terminating   0          8h
51784dc1-f450-11ea-8233-42bc8ee613a9   1/1     Terminating   0          10h
533d30e5-f456-11ea-8233-42bc8ee613a9   1/1     Terminating   0          9h
5681a11b-f471-11ea-8508-9a96569b20f1   1/1     Terminating   0          6h14m
584553d6-f477-11ea-8508-9a96569b20f1   1/1     Terminating   0          5h31m
6101bc0d-f470-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          6h21m
6105dc9e-f470-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          6h21m
6ca76e16-f46e-11ea-a7c8-9eb8089ce657   2/2     Terminating   0          6h35m
6ef7f131-f473-11ea-8508-9a96569b20f1   1/1     Terminating   0          5h59m
70f9109e-f466-11ea-8508-9a96569b20f1   2/2     Terminating   0          7h32m
7de725fe-f46a-11ea-8508-9a96569b20f1   2/2     Terminating   0          7h3m
7ff7423f-f45d-11ea-8508-9a96569b20f1   2/2     Terminating   0          8h
826a3f3c-f454-11ea-8233-42bc8ee613a9   1/1     Terminating   0          9h
84f080c7-f471-11ea-9130-56f34bfb1616   2/2     Terminating   0          6h13m
8ed24720-f47a-11ea-8508-9a96569b20f1   1/1     Terminating   0          5h8m
a56a5463-f464-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          7h45m
a56f764b-f464-11ea-a7c8-9eb8089ce657   2/2     Terminating   0          7h45m
a92edf86-f46f-11ea-8508-9a96569b20f1   1/1     Terminating   0          6h26m
acf9b650-f468-11ea-8508-9a96569b20f1   1/1     Terminating   0          7h16m
acfc2c7d-f468-11ea-8508-9a96569b20f1   2/2     Terminating   0          7h16m
b2a12f4d-f467-11ea-8508-9a96569b20f1   1/1     Terminating   0          7h23m
b696febc-f460-11ea-8508-9a96569b20f1   1/1     Terminating   0          8h
c8fb882c-f475-11ea-9130-56f34bfb1616   1/1     Terminating   0          5h42m
c978d235-f463-11ea-8508-9a96569b20f1   1/1     Terminating   0          7h51m
cbf83c22-f477-11ea-9130-56f34bfb1616   2/2     Terminating   0          5h28m
dfa5f616-f437-11ea-a379-bea988342348   1/1     Terminating   0          13h
eb769c01-f45d-11ea-8508-9a96569b20f1   1/1     Terminating   0          8h
f18dab5a-f476-11ea-9284-967bb86a2b2f   1/1     Terminating   0          5h34m
f1c2fa92-f476-11ea-9284-967bb86a2b2f   2/2     Terminating   0          5h34m
f2f1c361-f462-11ea-8508-9a96569b20f1   2/2     Terminating   0          7h57m
f65273f6-f470-11ea-9130-56f34bfb1616   2/2     Terminating   0          6h17m
fa67cfe1-f467-11ea-8508-9a96569b20f1   2/2     Terminating   0          7h21m

Possible mitigations:

  • try running /test foo on the afflicted PR(s) and see if that triggers a new job
  • delete the pod, see if plank reschedules

Also:

  • why are things stuck in terminating?

Definitely think migrating to a new node pool is the upgrade path to use next time

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

ok, does manually deleting do any better?

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k delete -n test-pods pod a56a5463-f464-11ea-a7c8-9eb8089ce657
pod "a56a5463-f464-11ea-a7c8-9eb8089ce657" deleted
# hangs...
^C
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods | grep a56a5463-f464-11ea-a7c8-9eb8089ce657
a56a5463-f464-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          8h
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k delete -n test-pods pod a56a5463-f464-11ea-a7c8-9eb8089ce657 --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "a56a5463-f464-11ea-a7c8-9eb8089ce657" force deleted
# hangs...
^C
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods | grep a56a5463-f464-11ea-a7c8-9eb8089ce657
a56a5463-f464-11ea-a7c8-9eb8089ce657   1/1     Terminating   0          8h

no

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k describe pod -n test-pods a56a5463-f464-11ea-a7c8-9eb8089ce657
Name:                      a56a5463-f464-11ea-a7c8-9eb8089ce657
Namespace:                 test-pods
Priority:                  0
Node:                      gke-prow-build-pool4-2020082817590115-b6fe6f0c-wzg9/10.128.15.230
Start Time:                Fri, 11 Sep 2020 19:26:11 +0000

How about that node

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get nodes | grep gke-prow-build-pool4-2020082817590115-b6fe6f0c-wzg9
spiffxp@cloudshell:~ (k8s-infra-prow-build)$

Looking at logs https://console.cloud.google.com/logs/viewer?project=k8s-infra-prow-build&minLogLevel=0&expandAll=false&timestamp=2020-09-12T03:45:42.461000000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-09-11T03:45:42.712Z&dateRangeEnd=2020-09-12T03:45:42.712Z&interval=P1D&resource=k8s_node&scrollTimestamp=2020-09-11T19:38:52.000000000Z&filters=text:gke-prow-build-pool4-2020082817590115-b6fe6f0c-wzg9&angularJsUrl=%2Flogs%2Fviewer%3Fproject%3Dk8s-infra-prow-build%26minLogLevel%3D0%26expandAll%3Dfalse%26timestamp%3D2020-09-12T03:20:19.152000000Z%26customFacets%3D%26limitCustomFacetWidth%3Dtrue%26dateRangeStart%3D2020-09-11T19:20:19.000Z%26dateRangeEnd%3D2020-09-11T21:20:19.404Z%26interval%3DCUSTOM%26resource%3Dk8s_cluster%26scrollTimestamp%3D2020-09-11T19:47:25.914570000Z%26filters%3Dtext:a56a5463-f464-11ea-a7c8-9eb8089ce657

12:37pm node status is NodeNotReady
12:38pm event: DeletingNode

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

Seems like we're running into kubernetes/kubernetes#72226

Issuing a /test command will create a new pod. Still not clear how to get rid of the pods stuck in terminating

@alejandrox1
Copy link
Contributor

/would it be reasonable for these kinds of workloads to go down the

kubectl delete po <pod> --grace-period=0 --force

route?

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

Issuing that /test command caused the pod to disappear

Last entry in the log for that pod

request: {
   @type: "k8s.io/Patch"    
   metadata: {
    finalizers: null     
   }
  }

So then I manually edited the finalizer for another pod

  finalizers:
  - prow.x-k8s.io/gcsk8sreporter # delete t his line
  labels:
    created-by-prow: "true"
    preset-bazel-remote-cache-enabled: "true"
    preset-bazel-scratch-dir: "true"
    preset-dind-enabled: "true"
    preset-kind-volume-mounts: "true"
    prow.k8s.io/build-id: "1304507550300901377"
    prow.k8s.io/id: fa67cfe1-f467-11ea-8508-9a96569b20f1
    prow.k8s.io/job: ci-kubernetes-kind-ipv6-e2e-parallel-1-19
    prow.k8s.io/type: periodic
  name: fa67cfe1-f467-11ea-8508-9a96569b20f1
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k edit pods -n test-pods fa67cfe1-f467-11ea-8508-9a96569b20f1
pod/fa67cfe1-f467-11ea-8508-9a96569b20f1 edited
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods -o=yaml fa67cfe1-f467-11ea-8508-9a96569b20f1
Error from server (NotFound): pods "fa67cfe1-f467-11ea-8508-9a96569b20f1" not found

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

@alejandrox1 I tried that (see #1120 (comment)) and it didn't delete

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

Everything that had a deletionTimestamp was hung (this was intended for a markdown table, but the formatting looked worse, so fixed-width it is)

$ echo "| created | job | pod | node | "; echo "| --- | --- | --- | --- |"; k get pods -n test-pods --field-selector=status.phase=Running -o=json | jq -r '.items | map(select(.metadata.deletionTimestamp))[] | "|\(.status.startTime) | \(.metadata.labels["prow.k8s.io/job"]) | \(.metadata.name) | \(.spec.nodeName) |"' | sort | tee old-nodepool-jobs
| created | job | pod | node |
| --- | --- | --- | --- |
|2020-09-11T14:05:53Z | ci-kubernetes-e2e-gci-gce-serial | dfa5f616-f437-11ea-a379-bea988342348 | gke-prow-build-pool4-2020082817590115-6b0f3325-1rlt |
|2020-09-11T17:01:01Z | ci-kubernetes-e2e-gce-cos-k8sbeta-serial | 51784dc1-f450-11ea-8233-42bc8ee613a9 | gke-prow-build-pool4-2020082817590115-aee934c3-8fnw |
|2020-09-11T17:31:02Z | ci-kubernetes-e2e-gce-cos-k8sstable1-serial | 826a3f3c-f454-11ea-8233-42bc8ee613a9 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-5j1n |
|2020-09-11T17:44:02Z | ci-kubernetes-gce-conformance-latest | 533d30e5-f456-11ea-8233-42bc8ee613a9 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-hjcr |
|2020-09-11T18:12:02Z | ci-kubernetes-e2e-gci-gce-slow | 3ca1a5cf-f45a-11ea-8233-42bc8ee613a9 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-79j2 |
|2020-09-11T18:32:11Z | ci-kubernetes-integration-master | 14aea8d7-f45d-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-9q3g |
|2020-09-11T18:35:11Z | ci-kubernetes-integration-stable3 | 7ff7423f-f45d-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-5j1n |
|2020-09-11T18:38:11Z | ci-kubernetes-e2e-gci-gce-ingress-canary | eb769c01-f45d-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-c0w9 |
|2020-09-11T18:41:11Z | pull-kubernetes-e2e-gce-100-performance | 50103df1-f45e-11ea-8c01-ca1150a2ee58 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-dxrc |
|2020-09-11T18:41:11Z | pull-kubernetes-e2e-gce-ubuntu-containerd | 5009aedd-f45e-11ea-8c01-ca1150a2ee58 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-s8k5 |
|2020-09-11T18:58:11Z | ci-kubernetes-e2e-gci-gce-alpha-features | b696febc-f460-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-s8k5 |
|2020-09-11T19:03:36Z | pull-kubernetes-e2e-gce-100-performance | 44edcb66-f461-11ea-9284-967bb86a2b2f | gke-prow-build-pool4-2020082817590115-6b0f3325-8qfl |
|2020-09-11T19:14:11Z | ci-kubernetes-kind-e2e-parallel-1-19 | f2f1c361-f462-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-whxr |
|2020-09-11T19:20:11Z | ci-kubernetes-node-kubelet-features-1-16 | c978d235-f463-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-b6fe6f0c-slgg |
|2020-09-11T19:26:11Z | pull-kubernetes-verify | a56f764b-f464-11ea-a7c8-9eb8089ce657 | gke-prow-build-pool4-2020082817590115-6b0f3325-9nrq |
|2020-09-11T19:39:11Z | periodic-kubernetes-bazel-test-master | 70f9109e-f466-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-45tc |
|2020-09-11T19:48:11Z | ci-kubernetes-e2e-gci-gce-reboot | b2a12f4d-f467-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-k9j0 |
|2020-09-11T19:55:11Z | ci-kubernetes-node-kubelet-features-1-17 | acf9b650-f468-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-c0w9 |
|2020-09-11T19:55:11Z | ci-kubernetes-verify-master | acfc2c7d-f468-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-t898 |
|2020-09-11T19:59:11Z | ci-kubernetes-e2e-gci-gce-flaky-repro | 3c38c3b4-f469-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-np72 |
|2020-09-11T20:06:11Z | ci-kubernetes-e2e-gce-cos-k8sbeta-default | 36910195-f46a-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-kzss |
|2020-09-11T20:08:11Z | ci-kubernetes-integration-beta | 7de725fe-f46a-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-p0xd |
|2020-09-11T20:13:11Z | ci-kubernetes-e2e-gce-cos-k8sbeta-reboot | 30bd3711-f46b-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-6b0f3325-xfw6 |
|2020-09-11T20:26:11Z | ci-kubernetes-e2e-gce-cos-k8sbeta-slow | 01c66c46-f46d-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-2drd |
|2020-09-11T20:34:11Z | ci-kubernetes-gce-conformance-latest-kubetest2 | 1fee4dd0-f46e-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-zvdk |
|2020-09-11T20:34:41Z | pull-kubernetes-e2e-gce-100-performance | 310be890-f46e-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-002v |
|2020-09-11T20:34:41Z | pull-kubernetes-e2e-gce-ubuntu-containerd | 31079e2c-f46e-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-2zd5 |
|2020-09-11T20:37:22Z | pull-kubernetes-verify | 6ca76e16-f46e-11ea-a7c8-9eb8089ce657 | gke-prow-build-pool4-2020082817590115-aee934c3-q882 |
|2020-09-11T20:45:11Z | ci-kubernetes-e2e-gce-cos-k8sstable1-default | a92edf86-f46f-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-7wf1 |
|2020-09-11T20:50:11Z | pull-kubernetes-e2e-gce-100-performance | 6105dc9e-f470-11ea-a7c8-9eb8089ce657 | gke-prow-build-pool4-2020082817590115-aee934c3-qqsj |
|2020-09-11T20:50:11Z | pull-kubernetes-e2e-gce-ubuntu-containerd | 6101bc0d-f470-11ea-a7c8-9eb8089ce657 | gke-prow-build-pool4-2020082817590115-aee934c3-glsn |
|2020-09-11T20:54:41Z | pull-kubernetes-dependencies | f65273f6-f470-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-2zd5 |
|2020-09-11T20:57:11Z | ci-kubernetes-e2e-gce-cos-k8sbeta-alphafeatures | 5681a11b-f471-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-8fnw |
|2020-09-11T20:58:41Z | pull-kubernetes-e2e-kind-ipv6 | 84f080c7-f471-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-8fnw |
|2020-09-11T21:12:11Z | ci-kubernetes-e2e-gci-gce | 6ef7f131-f473-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-kcnp |
|2020-09-11T21:24:11Z | ci-kubernetes-verify-stable1 | 1c10ce48-f475-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-t9mn |
|2020-09-11T21:29:11Z | pull-kubernetes-node-e2e | c8fb882c-f475-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-s2d1 |
|2020-09-11T21:37:11Z | pull-kubernetes-e2e-gce-100-performance | f18dab5a-f476-11ea-9284-967bb86a2b2f | gke-prow-build-pool4-2020082817590115-aee934c3-qxkl |
|2020-09-11T21:37:11Z | pull-kubernetes-verify | f1c2fa92-f476-11ea-9284-967bb86a2b2f | gke-prow-build-pool4-2020082817590115-aee934c3-tbzc |
|2020-09-11T21:40:11Z | ci-kubernetes-e2e-gce-cos-k8sbeta-ingress | 584553d6-f477-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-wfnz |
|2020-09-11T21:43:41Z | pull-kubernetes-integration | cbf83c22-f477-11ea-9130-56f34bfb1616 | gke-prow-build-pool4-2020082817590115-aee934c3-wfjt |
|2020-09-11T22:03:11Z | ci-kubernetes-e2e-gce-master-new-gci-kubectl-skew-stable1 | 8ed24720-f47a-11ea-8508-9a96569b20f1 | gke-prow-build-pool4-2020082817590115-aee934c3-wfnz |

Patched in empty finalizers for everything that had a deletionTimestamp.

$ for p in $(k get pods -n test-pods --field-selector=status.phase=Running -o=json | jq -r '.items | map(select(.metadata.deletionTimestamp) | .metadata.name)[]'); do 
  k patch -n test-pods pod $p --type=json -p='[{"op": "replace", "path": "/metadata/finalizers", "value":[]}]'; 
done
pod/14aea8d7-f45d-11ea-8508-9a96569b20f1 patched
pod/1c10ce48-f475-11ea-8508-9a96569b20f1 patched
pod/1fee4dd0-f46e-11ea-8508-9a96569b20f1 patched
pod/30bd3711-f46b-11ea-8508-9a96569b20f1 patched
pod/31079e2c-f46e-11ea-9130-56f34bfb1616 patched
pod/310be890-f46e-11ea-9130-56f34bfb1616 patched
pod/36910195-f46a-11ea-8508-9a96569b20f1 patched
pod/3c38c3b4-f469-11ea-8508-9a96569b20f1 patched
pod/3ca1a5cf-f45a-11ea-8233-42bc8ee613a9 patched
pod/44edcb66-f461-11ea-9284-967bb86a2b2f patched
pod/5009aedd-f45e-11ea-8c01-ca1150a2ee58 patched
pod/50103df1-f45e-11ea-8c01-ca1150a2ee58 patched
pod/51784dc1-f450-11ea-8233-42bc8ee613a9 patched
pod/533d30e5-f456-11ea-8233-42bc8ee613a9 patched
pod/5681a11b-f471-11ea-8508-9a96569b20f1 patched
pod/584553d6-f477-11ea-8508-9a96569b20f1 patched
pod/6101bc0d-f470-11ea-a7c8-9eb8089ce657 patched
pod/6105dc9e-f470-11ea-a7c8-9eb8089ce657 patched
pod/6ca76e16-f46e-11ea-a7c8-9eb8089ce657 patched
pod/6ef7f131-f473-11ea-8508-9a96569b20f1 patched
pod/70f9109e-f466-11ea-8508-9a96569b20f1 patched
pod/7de725fe-f46a-11ea-8508-9a96569b20f1 patched
pod/7ff7423f-f45d-11ea-8508-9a96569b20f1 patched
pod/826a3f3c-f454-11ea-8233-42bc8ee613a9 patched
pod/84f080c7-f471-11ea-9130-56f34bfb1616 patched
pod/8ed24720-f47a-11ea-8508-9a96569b20f1 patched
pod/a56f764b-f464-11ea-a7c8-9eb8089ce657 patched
pod/a92edf86-f46f-11ea-8508-9a96569b20f1 patched
pod/acf9b650-f468-11ea-8508-9a96569b20f1 patched
pod/acfc2c7d-f468-11ea-8508-9a96569b20f1 patched
pod/b2a12f4d-f467-11ea-8508-9a96569b20f1 patched
pod/b696febc-f460-11ea-8508-9a96569b20f1 patched
pod/c8fb882c-f475-11ea-9130-56f34bfb1616 patched
pod/c978d235-f463-11ea-8508-9a96569b20f1 patched
pod/cbf83c22-f477-11ea-9130-56f34bfb1616 patched
pod/dfa5f616-f437-11ea-a379-bea988342348 patched
pod/eb769c01-f45d-11ea-8508-9a96569b20f1 patched
pod/f18dab5a-f476-11ea-9284-967bb86a2b2f patched
pod/f1c2fa92-f476-11ea-9284-967bb86a2b2f patched
pod/f2f1c361-f462-11ea-8508-9a96569b20f1 patched
pod/f65273f6-f470-11ea-9130-56f34bfb1616 patched

Nothing stuck in terminating anymore

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods | grep Terminating
spiffxp@cloudshell:~ (k8s-infra-prow-build)$

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

Looks like most of the pods that were stuck in Terminating are now running. Calling it a night, we'll see if prow/tide end up picking up their results later.

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ k get pods -n test-pods $(cat patched-pods.txt)
NAME                                   READY   STATUS      RESTARTS   AGE
14aea8d7-f45d-11ea-8508-9a96569b20f1   2/2     Running     0          7m53s
1c10ce48-f475-11ea-8508-9a96569b20f1   2/2     Running     0          7m48s
1fee4dd0-f46e-11ea-8508-9a96569b20f1   2/2     Running     0          7m51s
30bd3711-f46b-11ea-8508-9a96569b20f1   1/1     Running     0          7m51s
31079e2c-f46e-11ea-9130-56f34bfb1616   1/1     Running     0          7m50s
310be890-f46e-11ea-9130-56f34bfb1616   1/1     Running     0          7m50s
36910195-f46a-11ea-8508-9a96569b20f1   1/1     Running     0          7m51s
3c38c3b4-f469-11ea-8508-9a96569b20f1   1/1     Running     0          7m51s
3ca1a5cf-f45a-11ea-8233-42bc8ee613a9   1/1     Running     0          7m53s
44edcb66-f461-11ea-9284-967bb86a2b2f   1/1     Running     0          7m53s
5009aedd-f45e-11ea-8c01-ca1150a2ee58   1/1     Running     0          7m53s
50103df1-f45e-11ea-8c01-ca1150a2ee58   1/1     Running     0          7m54s
51784dc1-f450-11ea-8233-42bc8ee613a9   1/1     Running     0          7m54s
533d30e5-f456-11ea-8233-42bc8ee613a9   1/1     Running     0          7m54s
5681a11b-f471-11ea-8508-9a96569b20f1   1/1     Running     0          7m50s
584553d6-f477-11ea-8508-9a96569b20f1   1/1     Running     0          7m49s
6101bc0d-f470-11ea-a7c8-9eb8089ce657   1/1     Running     0          7m51s
6105dc9e-f470-11ea-a7c8-9eb8089ce657   1/1     Running     0          7m50s
6ca76e16-f46e-11ea-a7c8-9eb8089ce657   2/2     Running     0          7m52s
6ef7f131-f473-11ea-8508-9a96569b20f1   1/1     Running     0          7m51s
70f9109e-f466-11ea-8508-9a96569b20f1   2/2     Running     0          7m54s
7de725fe-f46a-11ea-8508-9a96569b20f1   2/2     Running     0          7m53s
7ff7423f-f45d-11ea-8508-9a96569b20f1   2/2     Running     0          7m55s
826a3f3c-f454-11ea-8233-42bc8ee613a9   1/1     Running     0          7m56s
84f080c7-f471-11ea-9130-56f34bfb1616   2/2     Running     0          7m52s
8ed24720-f47a-11ea-8508-9a96569b20f1   1/1     Running     0          7m50s
a56f764b-f464-11ea-a7c8-9eb8089ce657   2/2     Running     0          7m55s
a92edf86-f46f-11ea-8508-9a96569b20f1   1/1     Running     0          7m53s
acf9b650-f468-11ea-8508-9a96569b20f1   1/1     Running     0          7m56s
acfc2c7d-f468-11ea-8508-9a96569b20f1   2/2     Running     0          7m56s
b2a12f4d-f467-11ea-8508-9a96569b20f1   1/1     Running     0          7m56s
b696febc-f460-11ea-8508-9a96569b20f1   1/1     Running     0          7m57s
c8fb882c-f475-11ea-9130-56f34bfb1616   1/1     Running     0          7m52s
c978d235-f463-11ea-8508-9a96569b20f1   1/1     Running     0          7m58s
cbf83c22-f477-11ea-9130-56f34bfb1616   2/2     Running     0          7m52s
dfa5f616-f437-11ea-a379-bea988342348   1/1     Running     0          7m58s
eb769c01-f45d-11ea-8508-9a96569b20f1   1/1     Running     0          7m58s
f18dab5a-f476-11ea-9284-967bb86a2b2f   0/1     Error       0          7m53s
f1c2fa92-f476-11ea-9284-967bb86a2b2f   2/2     Running     0          7m54s
f2f1c361-f462-11ea-8508-9a96569b20f1   2/2     Running     0          7m59s
f65273f6-f470-11ea-9130-56f34bfb1616   0/2     Completed   0          7m55s

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

(also definitely cordoning and migrating to a new pool next time)

@spiffxp
Copy link
Member Author

spiffxp commented Sep 12, 2020

/close
Looks like tide merged everything it needed to

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Looks like tide merged everything it needed to

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

3 participants