Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHCOS: bump to 44.81.202001291430.0 #3016

Closed

Conversation

miabbott
Copy link
Member

The problem with the RHCOS image on GCP was fixed with
coreos/coreos-assembler#1079 and new images have been produced using
the fixed coreos-assembler.

Signed-off-by: Micah Abbott miabbott@redhat.com

The problem with the RHCOS image on GCP was fixed with
coreos/coreos-assembler#1079 and new images have been produced using
the fixed `coreos-assembler`.

Signed-off-by: Micah Abbott <miabbott@redhat.com>
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 29, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign wking
You can assign the PR to them by writing /assign @wking in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@miabbott
Copy link
Member Author

Ran the following to generate the change:

hack/update-rhcos-bootimage.py https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.4/44.81.202001291430.0/x86_64/meta.json amd64

Package differences:

$ ./differ.py --first-endpoint art --first-version 44.81.202001241431.0 --second-endpoint art --second-version 44.81.202001291430.0
{                                    
    "sources": {                                                                                          
        "44.81.202001241431.0": "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.4/44.81.202001241431.0/x86_64/commitmeta.json",
        "44.81.202001291430.0": "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.4/44.81.202001291430.0/x86_64/commitmeta.json"
    },                                                                                                    
    "diff": {
        "libarchive": {             
            "44.81.202001241431.0": "libarchive-3.3.2-7.el8.x86_64",                       
            "44.81.202001291430.0": "libarchive-3.3.2-8.el8_1.x86_64"                                                                                                                                                
        },                                                                                                
        "machine-config-daemon": {   
            "44.81.202001241431.0": "machine-config-daemon-4.4.0-202001241232.git.1.189a2ca.el8.x86_64",
            "44.81.202001291430.0": "machine-config-daemon-4.4.0-202001291201.git.1.3ab2a31.el8.x86_64"
        },                         
        "openshift-clients": {                                                                            
            "44.81.202001241431.0": "openshift-clients-4.4.0-202001240656.git.1.db0174c.el8.x86_64",
            "44.81.202001291430.0": "openshift-clients-4.4.0-202001280616.git.1.0d9ea0b.el8.x86_64"
        },
        "openshift-hyperkube": {
            "44.81.202001241431.0": "openshift-hyperkube-4.4.0-202001241232.git.0.e552f5f.el8.x86_64",
            "44.81.202001291430.0": "openshift-hyperkube-4.4.0-202001291201.git.0.07d0e39.el8.x86_64"
        },
        "sqlite-libs": {
            "44.81.202001241431.0": "sqlite-libs-3.26.0-3.el8.x86_64",
            "44.81.202001291430.0": "sqlite-libs-3.26.0-4.el8_1.x86_64"
        }
    }
}

@miabbott
Copy link
Member Author

/test e2e-gcp

@miabbott
Copy link
Member Author

cc: @yuqi-zhang

@yuqi-zhang
Copy link
Contributor

/test e2e-azure

@miabbott
Copy link
Member Author

/retest

@sdodson
Copy link
Member

sdodson commented Jan 29, 2020

/test e2e-metal

@sdodson
Copy link
Member

sdodson commented Jan 29, 2020

/test e2e-vsphere

@yuqi-zhang
Copy link
Contributor

AWS is running into resource issues it seems
Azure is failing as expected with an etcd issue that is being worked on
GCP is failing with a new one:

time="2020-01-29T19:13:06Z" level=debug msg="2020/01/29 19:13:06 [ERROR] <root>: eval: *terraform.EvalApplyPost, err: Error waiting to create Image: Error waiting for Creating Image: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 4m0s)"
time="2020-01-29T19:13:06Z" level=debug msg="2020/01/29 19:13:06 [ERROR] <root>: eval: *terraform.EvalSequence, err: Error waiting to create Image: Error waiting for Creating Image: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 4m0s)"
time="2020-01-29T19:13:06Z" level=error
time="2020-01-29T19:13:06Z" level=error msg="Error: Error waiting to create Image: Error waiting for Creating Image: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 4m0s)"

Timeout creating the image... not sure if flake

@yuqi-zhang
Copy link
Contributor

/test e2e-gcp

1 similar comment
@yuqi-zhang
Copy link
Contributor

/test e2e-gcp

@yuqi-zhang
Copy link
Contributor

/retest

The failure seems pretty consistent, wonder if something is up with the gcp image

@miabbott
Copy link
Member Author

The failure seems pretty consistent, wonder if something is up with the gcp image

I was able to successfully boot a single RHCOS node in GCP with this image:

$ gcloud compute --project=openshift-rhcos-devel instances get-serial-port-output miabbott-rhcos-44-vm --zone=us-central1-a | tail

Specify --start=74085 in the next get-serial-port-output invocation to get only the new output starting from here.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Red Hat Enterprise Linux CoreOS 44.81.202001291430.0 (Ootpa) 4.4
SSH host key: SHA256:d9q5cZdugtxJ3jiVkcGlCH9OUswrImnUfw88xzentlI (ECDSA)
SSH host key: SHA256:DbwK5BHPc9j88RVi+xhT0rJaesTlP8DAmO9WDDEvF9U (ED25519)
SSH host key: SHA256:Md5yvY8Jz5pRq/SgGYws2PZlOaudfVTpVXPD2Q+Q98Y (RSA)
ens4: 10.128.0.34 fe80::681d:fb64:d705:78f0
miabbott-rhcos-44-vm login: 

$ sshq -l core 23.236.52.188
Warning: Permanently added '23.236.52.188' (ECDSA) to the list of known hosts.
Enter passphrase for key '/home/miabbott/.ssh/id_rsa': 
Red Hat Enterprise Linux CoreOS 44.81.202001291430.0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
Last login: Thu Jan 30 14:39:40 2020 from 108.49.50.209
[core@miabbott-rhcos-44-vm ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP group default qlen 1000
    link/ether 42:01:0a:80:00:22 brd ff:ff:ff:ff:ff:ff
    inet 10.128.0.34/32 brd 10.128.0.34 scope global dynamic noprefixroute ens4
       valid_lft 86149sec preferred_lft 86149sec
    inet6 fe80::681d:fb64:d705:78f0/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[core@miabbott-rhcos-44-vm ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://d65097b8671baabfc50040980f75b598f606d98c78b298b3737cf83da41175c5
                   Version: 44.81.202001291430.0 (2020-01-29T14:35:55Z)

@miabbott
Copy link
Member Author

/test e2e-gcp

@openshift-ci-robot
Copy link
Contributor

@miabbott: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere 2bbb29a link /test e2e-vsphere
ci/prow/e2e-azure 2bbb29a link /test e2e-azure
ci/prow/e2e-libvirt 2bbb29a link /test e2e-libvirt
ci/prow/e2e-aws-scaleup-rhel7 2bbb29a link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-gcp 2bbb29a link /test e2e-gcp

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yuqi-zhang
Copy link
Contributor

I found a similar BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1752380
Looking at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/3016/pull-ci-openshift-installer-master-e2e-gcp/220/artifacts/e2e-gcp/installer/.openshift_install.log we seem to be hitting the timeout without explicit errors. Looking at other runs we should have finished image creation in ~1 min or so, so not sure what we changed here thats causing it to hit a timeout?

@miabbott
Copy link
Member Author

It appears there may be a problem with how the RHCOS images for GCP are being created that is leading to the timeouts and the failure of the e2e-gcp tests.

https://bugzilla.redhat.com/show_bug.cgi?id=1796632

Closing this PR until that BZ is addressed.

@miabbott miabbott closed this Jan 30, 2020
@russellb
Copy link
Member

russellb commented Feb 3, 2020

@miabbott or maybe leave this open but with a hold to reflect that rhcos still needs an update? or what should I be watching to know when the rhcos update lands?

@yuqi-zhang
Copy link
Contributor

The above BZ Micah linked needs to be fixed before we can generate a working RHCOS build again. We will open a new PR with the bump and tag you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants