Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1981999: bump RHCOS boot images for 4.9 #5231

Conversation

miabbott
Copy link
Member

@miabbott miabbott commented Sep 21, 2021

This updates the RHCOS boot image metadata in the installer with the
most recent version of RHCOS 4.9 artifacts.

This includes fixes for the following BZs:

1967483 - coreos-installer fails to download Ignition (DNS error, failed to lookup address)
1974411 - Installation with multipath parameters in parmfile fails (DNS resolution missing)
1980679 - On a Azure IPI installation MCO fails to create new nodes
1999577 - RHCOS live ISO can fail to boot in UEFI mode; drops to grub shell
2002374 - Inexplicably slow kubelet on bootstrap makes installation fail
2004605 - RHCOS-4.9 failed to boot in FIPS mode on s390x
2004676 - Boot option recovery menu prevents image boot

Changes generated with the following:

$ plume cosa2stream --target data/data/rhcos-stream.json --distro rhcos --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=49.84.202109201428-0 ppc64le=49.84.202109211846-0 s390x=49.84.202109201416-0 x86_64=49.84.202109172039-0
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-aarch64/49.84.202109201428-0/aarch64/meta.json aarch64
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-ppc64le/49.84.202109201428-0/ppc64le/meta.json ppc64le
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-s390x/49.84.202109201416-0/s390x/meta.json s390x
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9/49.84.202109172039-0/x86_64/meta.json amd64

@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Sep 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 21, 2021

@miabbott: This pull request references Bugzilla bug 1981999, which is invalid:

  • expected dependent Bugzilla bug 2005108 to target a release in 4.10.0, but it targets "4.9.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1981999: bump RHCOS boot images for 4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Sep 21, 2021
@miabbott
Copy link
Member Author

It looks like the CI jobs need to be updated with new image references; I believe openshift/release#22103 is the fix needed

This updates the RHCOS boot image metadata in the installer with the
most recent version of RHCOS 4.9 artifacts.

This includes fixes for the following BZs:

1967483 - coreos-installer fails to download Ignition (DNS error, failed to lookup address)
1974411 - Installation with multipath parameters in parmfile fails (DNS resolution missing)
1980679 - On a Azure IPI installation MCO fails to create new nodes
1999577 - RHCOS live ISO can fail to boot in UEFI mode; drops to grub shell
2002374 - Inexplicably slow kubelet on bootstrap makes installation fail
2004605 - RHCOS-4.9 failed to boot in FIPS mode on s390x
2004676 - Boot option recovery menu prevents image boot

Changes generated with the following:

```
$ plume cosa2stream --target data/data/rhcos-stream.json --distro rhcos --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=49.84.202109201428-0 ppc64le=49.84.202109211846-0 s390x=49.84.202109201416-0 x86_64=49.84.202109172039-0
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-aarch64/49.84.202109201428-0/aarch64/meta.json aarch64
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-ppc64le/49.84.202109201428-0/ppc64le/meta.json ppc64le
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9-s390x/49.84.202109201416-0/s390x/meta.json s390x
$ ./hack/update-rhcos-bootimage.py https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.9/49.84.202109172039-0/x86_64/meta.json amd64
```
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 21, 2021

@miabbott: This pull request references Bugzilla bug 1981999, which is invalid:

  • expected dependent Bugzilla bug 2004596 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is ASSIGNED instead
  • expected dependent Bugzilla bug 2005108 to target a release in 4.10.0, but it targets "4.9.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1981999: bump RHCOS boot images for 4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vikaslaad
Copy link

/retest

@mtnbikenc
Copy link
Member

/uncc

@openshift-ci openshift-ci bot removed the request for review from mtnbikenc September 22, 2021 13:09
@miabbott
Copy link
Member Author

/retest

1 similar comment
@miabbott
Copy link
Member Author

/retest

@miabbott
Copy link
Member Author

/test e2e-gcp
/test e2e-vsphere
/test e2e-azure

@miabbott
Copy link
Member Author

/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ipi-virtualmedia

@miabbott
Copy link
Member Author

/retest

@miabbott
Copy link
Member Author

/test ci/prow/golint

@miabbott
Copy link
Member Author

/test golint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@miabbott: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws
  • /test e2e-aws-upgrade
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi-ovn-ipv6-required
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test openstack-manifests
  • /test shellcheck
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test e2e-aws-disruptive
  • /test e2e-aws-fips
  • /test e2e-aws-proxy
  • /test e2e-aws-rhel8
  • /test e2e-aws-shared-vpc
  • /test e2e-aws-single-node
  • /test e2e-aws-upi
  • /test e2e-aws-workers-rhel7
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-resourcegroup
  • /test e2e-azure-shared-vpc
  • /test e2e-azure-upi
  • /test e2e-crc
  • /test e2e-gcp
  • /test e2e-gcp-shared-vpc
  • /test e2e-gcp-upi
  • /test e2e-gcp-upi-xpn
  • /test e2e-kubevirt
  • /test e2e-libvirt
  • /test e2e-metal
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-metal-ipi-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-openstack
  • /test e2e-openstack-byon
  • /test e2e-openstack-kuryr
  • /test e2e-openstack-parallel
  • /test e2e-openstack-provider-network
  • /test e2e-openstack-upi
  • /test e2e-ovirt
  • /test e2e-vsphere
  • /test e2e-vsphere-upi
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-release-4.9-e2e-aws
  • pull-ci-openshift-installer-release-4.9-e2e-aws-fips
  • pull-ci-openshift-installer-release-4.9-e2e-aws-upgrade
  • pull-ci-openshift-installer-release-4.9-e2e-aws-workers-rhel7
  • pull-ci-openshift-installer-release-4.9-e2e-aws-workers-rhel8
  • pull-ci-openshift-installer-release-4.9-e2e-crc
  • pull-ci-openshift-installer-release-4.9-e2e-libvirt
  • pull-ci-openshift-installer-release-4.9-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-installer-release-4.9-e2e-metal-ipi-ovn-ipv6-required
  • pull-ci-openshift-installer-release-4.9-e2e-metal-single-node-live-iso
  • pull-ci-openshift-installer-release-4.9-e2e-openstack
  • pull-ci-openshift-installer-release-4.9-e2e-openstack-kuryr
  • pull-ci-openshift-installer-release-4.9-e2e-ovirt
  • pull-ci-openshift-installer-release-4.9-gofmt
  • pull-ci-openshift-installer-release-4.9-golint
  • pull-ci-openshift-installer-release-4.9-govet
  • pull-ci-openshift-installer-release-4.9-images
  • pull-ci-openshift-installer-release-4.9-openstack-manifests
  • pull-ci-openshift-installer-release-4.9-shellcheck
  • pull-ci-openshift-installer-release-4.9-tf-fmt
  • pull-ci-openshift-installer-release-4.9-tf-lint
  • pull-ci-openshift-installer-release-4.9-unit
  • pull-ci-openshift-installer-release-4.9-verify-codegen
  • pull-ci-openshift-installer-release-4.9-verify-vendor
  • pull-ci-openshift-installer-release-4.9-yaml-lint

In response to this:

/test ci/prow/golint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@miabbott
Copy link
Member Author

/test golint

1 similar comment
@petr-muller
Copy link
Member

/test golint

@miabbott
Copy link
Member Author

/retest

1 similar comment
@miabbott
Copy link
Member Author

/retest

@miabbott
Copy link
Member Author

e2e-aws failing when the cluster is up on [sig-network-edge][Feature:Idling] Unidling should work with TCP (while idling) - this looks like https://bugzilla.redhat.com/show_bug.cgi?id=2000746

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@miabbott: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-crc 4462e5e link false /test e2e-crc
ci/prow/e2e-aws-fips 4462e5e link false /test e2e-aws-fips
ci/prow/e2e-aws-workers-rhel7 4462e5e link false /test e2e-aws-workers-rhel7
ci/prow/e2e-vsphere 4462e5e link false /test e2e-vsphere
ci/prow/e2e-azure 4462e5e link false /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@miabbott
Copy link
Member Author

e2e-aws-upgrade showed that the boostrapping failed to complete, but the bootstrap node itself came up.

I can see cri-o started successfully and CSRs getting approved, but perhaps the API server wasn't fully initialized?; seeing liveness probes on the API sever fail.

Sep 22 15:23:43 ip-10-0-12-222 systemd[1]: Started Container Runtime Interface for OCI (CRI-O).

Sep 22 15:23:57 ip-10-0-12-222 systemd[1]: Started Kubernetes Kubelet.

Sep 22 15:45:35 ip-10-0-12-222 bootkube.sh[10083]: Waiting up to 20m0s for the Kubernetes API
Sep 22 15:45:36 ip-10-0-12-222 bootkube.sh[10083]: Still waiting for the Kubernetes API: Get "https://localhost:6443/readyz": dial tcp [::1]:6443: connect: connection refused
Sep 22 15:45:41 ip-10-0-12-222 bootkube.sh[10083]: API is up

Sep 22 15:52:54 ip-10-0-12-222 kubelet.sh[2225]: I0922 15:52:54.983187 2237 patch_prober.go:29] interesting pod/bootstrap-kube-apiserver-ip-10-0-12-222 container/kube-apiserver namespace/openshift-kube-apiserver: Liveness probe status=failure output="HTTP probe failed with statuscode: 500" start-of-body=[+]ping ok

@miabbott
Copy link
Member Author

e2e-azure seems to be hitting quota issues:

 level=info msg=Creating infrastructure resources...
level=error
level=error msg=Error: creating/updating User Assigned Identity "ci-op-60bir3p4-f9836-7h6cb-identity" (Resource Group "ci-op-60bir3p4-f9836-7h6cb-rg"): msi.UserAssignedIdentitiesClient#CreateOrUpdate: Failure responding to request: StatusCode=502 -- Original Error: autorest/azure: Service returned an error. Status=502 Code="Forbidden" Message="The directory object quota limit for the Tenant has been exceeded. Please ask your administrator to increase the quota limit or delete objects to reduce the used quota."
level=error
level=error msg=  on ../tmp/openshift-install-vnet-136931567/main.tf line 58, in resource "azurerm_user_assigned_identity" "main":
level=error msg=  58: resource "azurerm_user_assigned_identity" "main" {
level=error
level=error
level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 

@miabbott
Copy link
Member Author

e2e-vsphere appears to be hitting availability problems with VMC

 level=info msg=Creating infrastructure resources...
level=error
level=error msg=Error: POST https://ibmvcenter.vmc-ci.devcluster.openshift.com/rest/com/vmware/cis/session: 503 Service Unavailable
level=error
level=error msg=  on ../tmp/openshift-install--087038106/main.tf line 6, in provider "vsphere":
level=error msg=   6: provider "vsphere" {
level=error
level=error
level=error msg=Failed to read tfstate: open /tmp/openshift-install--087038106/terraform.tfstate: no such file or directory
level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 

@miabbott
Copy link
Member Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@miabbott: This pull request references Bugzilla bug 1981999, which is invalid:

  • expected dependent Bugzilla bug 2004596 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is ASSIGNED instead
  • expected dependent Bugzilla bug 2005108 to target a release in 4.10.0, but it targets "4.9.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson sdodson added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Sep 22, 2021
@sdodson
Copy link
Member

sdodson commented Sep 22, 2021

/lgtm
/approve
All of the critical platforms except vsphere installed but then failed e2e tests, we've always accepted boot image bumps that at least complete the installation. The vsphere problem is infrastructure and we've got a while to sort out and regression test with QE before this ships.

@miabbott
Copy link
Member Author

e2e-aws-fips looks like the cluster came up, but failed on [sig-devex][Feature:Templates] templateinstance impersonation tests should pass impersonation update tests [Suite:openshift/conformance/parallel]

Needs a BZ

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2021
@sdodson sdodson added the staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). label Sep 22, 2021
@sdodson
Copy link
Member

sdodson commented Sep 22, 2021

/refresh

@openshift-ci-robot
Copy link
Contributor

@miabbott: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure 4462e5e link false /test e2e-azure
ci/prow/e2e-vsphere 4462e5e link false /test e2e-vsphere

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sdodson
Copy link
Member

sdodson commented Sep 22, 2021

/override ci/prow/e2e-aws
/override ci/prow/e2e-aws-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@sdodson: Overrode contexts on behalf of sdodson: ci/prow/e2e-aws, ci/prow/e2e-aws-upgrade

In response to this:

/override ci/prow/e2e-aws
/override ci/prow/e2e-aws-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit e82d2ec into openshift:release-4.9 Sep 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@miabbott: All pull requests linked via external trackers have merged:

Bugzilla bug 1981999 has been moved to the MODIFIED state.

In response to this:

Bug 1981999: bump RHCOS boot images for 4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants