-
Notifications
You must be signed in to change notification settings - Fork 210
Bug 1787422: pkg/cvo/updatepayload: Drop ephemeral-storage request #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1787422: pkg/cvo/updatepayload: Drop ephemeral-storage request #289
Conversation
I dunno why there is no capacity reporting in 4.2, but ephemeral-storage capacity reporting is not working there, leading to version pods dying with [1]: Node didn't have enough resource: ephemeral-storage, requested: 2097152, used: 0, capacity: 0 For an example of a 4.2 cluster without ephemeral-storage capacity reporting, see this 4.2.10 -> 4.2.12 update test [2]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a0dbe73b7831a8ddb9a2c58a560461d7c2c23a92231289a2104b93e7723c0eff/cluster-scoped-resources/core/nodes/ip-10-0-129-58.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml attachable-volumes-aws-ebs: '39' cpu: '4' hugepages-1Gi: '0' hugepages-2Mi: '0' memory: 16419384Ki pods: '250' Capacity reporting is working in 4.3, e.g. see this 4.2.12 -> 4.3.0-0.nightly-2020-01-02-141332 update test [3]. $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-c6c63e67c3d38a704c8695a40bb64b9975df2bda3f00c9379592cd5596126f2d/cluster-scoped-resources/core/nodes/ip-10-0-130-241.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml attachable-volumes-aws-ebs: '39' cpu: '4' ephemeral-storage: 124768236Ki hugepages-1Gi: '0' hugepages-2Mi: '0' memory: 16419384Ki pods: '250' Although 4.3 kubelet capacity reporting works, we still need to drop the 4.3 request, to support flows like: 1. 4.2 cluster running with 4.2 CVO and 4.2 kubelets (so no capacity reporting). 2. Admin requests an update to 4.3.1. 3. 4.2 CVO launches a version pod without requests, because of the 4.2 reversion [4]. This works fine. 4. Update gets far enough to run a 4.3 CVO. 5. Update hangs on some 4.3.1 bug, while it's still running 4.2 kubelets. 6. Admin requests an update to 4.3.2. 7. 4.3 CVO launches a version pod with an ephemeral-storage request, which hangs because the 4.2 kubelets are still running and not reporting ephemeral-storage capacity. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1786315 [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620 [3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437 [4]: openshift#288
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: This pull request references Bugzilla bug 1787422, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@wking: This pull request references Bugzilla bug 1787422, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@wking: This pull request references Bugzilla bug 1787422, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I think CI is hung up on my changing the PR base from master to release-4.3. Project name from here. Cleared the project with: oc delete project ci-op-n14zrm00Trying again: /retest |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Looks like CI cannot recover. Opening a replacement PR... /close |
|
@wking: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Forward-porting #288 to 4.3. Although 4.3 kubelet capacity reporting works, we still need to drop the 4.3 request, to support flows like: