Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ext.config.systemd.journal-compat failing on SCOS in Prow #1505

Closed
jlebon opened this issue May 8, 2024 · 2 comments
Closed

ext.config.systemd.journal-compat failing on SCOS in Prow #1505

jlebon opened this issue May 8, 2024 · 2 comments

Comments

@jlebon
Copy link
Member

jlebon commented May 8, 2024

--- FAIL: ext.config.systemd.journal-compat (58.22s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 125
        cluster.go:162: 2024-05-08T15:19:40Z cli: Unit kola-runext.service exited with code 125
        harness.go:1263: kolet failed: : kolet run-test-unit failed: Process exited with status 1

Journal:

May  8 15:19:37.232425 init.scope[1]: Started kola-runext.service.
...
May  8 15:19:37.892038 kola-runext.service[2106]: 2024-05-08 15:19:37.889648267 +0000 UTC m=+0.519405285 system refresh
...
May  8 15:19:39.529980 init.scope[1]: kola-runext.service: Main process exited, code=exited, status=125/n/a
May  8 15:19:39.530128 init.scope[1]: kola-runext.service: Failed with result 'exit-code'.

Not very clear what's going on.

journal.txt
console.txt

jlebon added a commit to jlebon/os that referenced this issue May 8, 2024
There's a messy situation right now with the default `policy.json`
shipped by `containers-common` and the RHEL keys missing from the c9s
compose.

This is tracked at openshift#1505. But this
test is totally unrelated to all this. So let's work around it for now
to unblock CI.
jlebon added a commit to jlebon/os that referenced this issue May 9, 2024
There's a messy situation right now with the default `policy.json`
shipped by `containers-common` and the RHEL keys missing from the c9s
compose.

This is tracked at openshift#1505. But this
test is totally unrelated to all this. So let's work around it for now
to unblock CI by just downloading the missing keys.
@jlebon
Copy link
Member Author

jlebon commented May 9, 2024

OK, so podman run fails but because we capture stdout and stderr and don't output it on failure, it gets swallowed. Here's the error it actually hit:

Error: copying system image from manifest list: Source image rejected: None of the signatures were accepted, reasons: open /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release: no such file or directory

containers-common normally ships those files (which is an issue in itself: https://bugzilla.redhat.com/show_bug.cgi?id=2182197). But it doesn't ship them of course in RHEL: https://gitlab.com/redhat/centos-stream/rpms/containers-common/-/blob/5ebc0aa1895f562b7647f6a7acabd8805259dcaa/containers-common.spec#L154.

And the reason CI on c9s is hitting this is that we're pulling the containers-common from RHEL:

Installing 544 packages:
  ...
  containers-common-3:1-75.rhaos4.16.el9.x86_64 (rhel-9.4-server-ose-4.16)

And the reason for that is that for some reason those packages have epoch 3 while c9s has epoch 2: https://gitlab.com/redhat/centos-stream/rpms/containers-common/-/blob/5ebc0aa1895f562b7647f6a7acabd8805259dcaa/containers-common.spec#L12

So, we should reach out to the maintainers to understand what's going on there. For now, we can force containers-common to come from c9s-appstream. Long-term, this will be fixed by #799 because then we can get better delimit the hack of adding OCP repos to the SCOS build to just the OCP layer (but ideally, we can get rid of that too once all the packages are in CentOS proper).

jlebon added a commit to jlebon/os that referenced this issue May 9, 2024
There's a messy situation right now where the containers-common package
is higher versioned in OCP than in c9s proper. And because we need the
OCP repo for now to compose SCOS, we get the OCP one, which causes
issues because unlike the c9s version, it doesn't ship the RHEL keys.

Work around this by pinning containers-common to the c9s-appstream repo.

See also: openshift#1505 (comment)

Fixes: openshift#1505
@aaradhak
Copy link
Contributor

aaradhak commented May 9, 2024

I recently came across this discussion on forum-ocp-art regarding the epoch number discrepancy. This was with reference to the containers-common pkg downgrade issue that occurred recently.

┌─────────────────────────────┬──────────────────────────────────────┬───────┐
│ tag │ build │ epoch │
├─────────────────────────────┼──────────────────────────────────────┼───────┤
│ rhaos-4.12-rhel-8-candidate │ containers-common-1-36.rhaos4.12.el8 │ 2 │
│ rhaos-4.13-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │
│ rhaos-4.14-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │
│ rhaos-4.15-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │
│ rhaos-4.16-rhel-9-candidate │ containers-common-1-63.rhaos4.16.el9 │ 2 │
│ rhaos-4.17-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │
└─────────────────────────────┴──────────────────────────────────────┴───────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants