From 4e1485fa971abbfa95d4fc244cf65416c92cb7d6 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Thu, 2 Jan 2020 10:02:24 -0800 Subject: [PATCH] pkg/cvo/updatepayload: Drop ephemeral-storage request I dunno why there is no capacity reporting in 4.2, but ephemeral-storage capacity reporting is not working there, leading to version pods dying with [1]: Node didn't have enough resource: ephemeral-storage, requested: 2097152, used: 0, capacity: 0 For an example of a 4.2 cluster without ephemeral-storage capacity reporting, see this 4.2.10 -> 4.2.12 update test [2]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a0dbe73b7831a8ddb9a2c58a560461d7c2c23a92231289a2104b93e7723c0eff/cluster-scoped-resources/core/nodes/ip-10-0-129-58.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml attachable-volumes-aws-ebs: '39' cpu: '4' hugepages-1Gi: '0' hugepages-2Mi: '0' memory: 16419384Ki pods: '250' Capacity reporting is working in 4.3, e.g. see this 4.2.12 -> 4.3.0-0.nightly-2020-01-02-141332 update test [3]. $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-c6c63e67c3d38a704c8695a40bb64b9975df2bda3f00c9379592cd5596126f2d/cluster-scoped-resources/core/nodes/ip-10-0-130-241.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml attachable-volumes-aws-ebs: '39' cpu: '4' ephemeral-storage: 124768236Ki hugepages-1Gi: '0' hugepages-2Mi: '0' memory: 16419384Ki pods: '250' Although 4.3 kubelet capacity reporting works, we still need to drop the 4.3 request, to support flows like: 1. 4.2 cluster running with 4.2 CVO and 4.2 kubelets (so no capacity reporting). 2. Admin requests an update to 4.3.1. 3. 4.2 CVO launches a version pod without requests, because of the 4.2 reversion [4]. This works fine. 4. Update gets far enough to run a 4.3 CVO. 5. Update hangs on some 4.3.1 bug, while it's still running 4.2 kubelets. 6. Admin requests an update to 4.3.2. 7. 4.3 CVO launches a version pod with an ephemeral-storage request, which hangs because the 4.2 kubelets are still running and not reporting ephemeral-storage capacity. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1786315 [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620 [3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437 [4]: https://github.com/openshift/cluster-version-operator/pull/288 --- pkg/cvo/updatepayload.go | 1 - 1 file changed, 1 deletion(-) diff --git a/pkg/cvo/updatepayload.go b/pkg/cvo/updatepayload.go index 7f6aa09a29..13e9d33684 100644 --- a/pkg/cvo/updatepayload.go +++ b/pkg/cvo/updatepayload.go @@ -175,7 +175,6 @@ func (r *payloadRetriever) fetchUpdatePayloadToDir(ctx context.Context, dir stri Requests: corev1.ResourceList{ corev1.ResourceCPU: resource.MustParse("10m"), corev1.ResourceMemory: resource.MustParse("50Mi"), - corev1.ResourceEphemeralStorage: resource.MustParse("2Mi"), }, }, }},