-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ci-operator/step-registry/ipi/conf/aws: Default to m5.xlarge COMPUTE_NODE_TYPE #19195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-operator/step-registry/ipi/conf/aws: Default to m5.xlarge COMPUTE_NODE_TYPE #19195
Conversation
…NODE_TYPE Bumping from m4, to which we've defaulted since a8426c0 (steps: Update cloud (AWS/GCP) to match Azure worker size of 4 core, 2021-02-16, openshift#15923). For more on m4 vs. m5 in our AWS CI zones, see [1]. From [2], m5.xlarge has 4 vCPU and 16 GiB memory, just like m4.xlarge [3]. It should also bump our EBS bandwidth from m4.xlarge's "dedicated 750 Mbps" [3] to m5.xlarge's "up to 4,750 Mbps". This should avoid failures like [4]: alert MachineWithNoRunningPhase fired for 3523 seconds with labels: {api_version="machine.openshift.io/v1beta1", container="kube-rbac-proxy", endpoint="https", exported_namespace="openshift-machine-api", instance="10.128.0.77:8443", job="machine-api-operator", name="ci-op-2bslq277-8d118-ldpvh-worker-us-west-2d-9jlgb", namespace="openshift-machine-api", phase="Failed", pod="machine-api-operator-55dd6d8d9d-gh5xw", service="machine-api-operator", severity="warning"} alert MachineWithoutValidNode fired for 3474 seconds with labels: {api_version="machine.openshift.io/v1beta1", container="kube-rbac-proxy", endpoint="https", exported_namespace="openshift-machine-api", instance="10.128.0.77:8443", job="machine-api-operator", name="ci-op-2bslq277-8d118-ldpvh-worker-us-west-2d-9jlgb", namespace="openshift-machine-api", phase="Failed", pod="machine-api-operator-55dd6d8d9d-gh5xw", service="machine-api-operator", severity="warning"} which is from picking a zone that lacks m4 support: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1403576531913019392/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/machines.json | jq -r '.items[] | select(.metadata.name == "ci-op-2bslq277-8d118-ldpvh-worker-us-west-2d-9jlgb").status.errorMessage' ci-op-2bslq277-8d118-ldpvh-worker-us-west-2d-9jlgb: reconciler failed to Create machine: failed to launch instance: error launching instance: Your requested instance type (m4.xlarge) is not supported in your requested Availability Zone (us-west-2d). Please retry your request by not specifying an Availability Zone or choosing us-west-2a, us-west-2b, us-west-2c. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1713157#c1 [2]: https://aws.amazon.com/ec2/instance-types/m5/ [3]: https://aws.amazon.com/ec2/instance-types/ [4]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1403576531913019392
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@wking: Updated the
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bumping from m4, to which we've defaulted since a8426c0 (#15923). For more on m4 vs. m5 in our AWS CI zones, see rhbz#1713157. From the m5 docs, m5.xlarge has 4 vCPU and 16 GiB memory, just like m4.xlarge. It should also bump our EBS bandwidth from m4.xlarge's "dedicated 750 Mbps" to m5.xlarge's "up to 4,750 Mbps".
This should avoid failures like:
which is from picking a zone that lacks m4 support: