Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-392: openshift-hack/images/os: delete #1805

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Nov 14, 2023

All the logic there is geared towards machine-os-content which is no
longer used at all in the cluster.

Nowadays, the container to modify is rhel-coreos, which is what is
already being done in CI:

https://github.com/openshift/release/blob/463a8f244ba0f807e76e6fdf974f98d24efd1ced/ci-operator/config/openshift/kubernetes/openshift-kubernetes-master.yaml#L87-L97

@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Nov 14, 2023
@openshift-ci-robot
Copy link

@jlebon: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@jlebon
Copy link
Member Author

jlebon commented Nov 14, 2023

/hold
This probably also requires tweaking openshift/release.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2023
@openshift-ci-robot
Copy link

@jlebon: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@jlebon
Copy link
Member Author

jlebon commented Nov 15, 2023

Surfacing some internal discussions.

OK so after looking at this more:

  1. I also think that Aravindh’s PR in o/k: remove reference to machine-os-content release#45669 will fix CI. It will simply not build the Dockerfile in the o/k repo anymore, but as Colin mentioned, that's already useless today, so we're no worse off.

  2. I think that https://github.com/openshift/release/blob/463a8f244ba0f807e76e6fdf974f98d24efd1ced/ci-operator/config/openshift/kubernetes/openshift-kubernetes-master.yaml#L86C1-L105 is already doing the right thing. Look at e.g. the test logs from this PR: OCPBUGS-16922: Remove skip flag for e2e tests related to AdmissionWebhookMatchConditions #1790, which is the last PR to have merged into o/k with passing CI. You can see in the rpms logs the RPM getting built:

    2023-11-02T14:22:12.359308815Z Wrote: /tmp/openshift/build-rpms/rpm/RPMS/x86_64/openshift-hyperkube-1.28.3-1.0.6576be7.x86_64.rpm

    and in the rhel-coreos logs, the RPM getting installed in RHCOS:

    2023-11-02T14:33:38.224689257Z [2/2] STEP 4/6: RUN rpm -Uvh --oldpackage /*.rpm && rm -vf /*.rpm /usr/share/rpm/__db.* /usr/share/rpm/.*.lock
    2023-11-02T14:33:38.922902163Z Verifying...                          ########################################
    2023-11-02T14:33:39.716321595Z Preparing...                          ########################################
    2023-11-02T14:33:40.095222451Z Updating / installing...
    2023-11-02T14:33:40.095222451Z openshift-hyperkube-1.28.3-1.0.6576be7########################################
    2023-11-02T14:33:42.666254503Z Cleaning up / removing...
    2023-11-02T14:33:42.666254503Z openshift-hyperkube-4.15.0-20231027200########################################
    

    So basically, rhel-coreos is already being patched today with the PR code. We were also patching machine-os-content, but as we've said, that's a completely useless operation since nothing actually uses it anymore in-cluster.

One confusing difference is that the layering hack for rhel-coreos lives in openshift/release whereas the layering hack for machine-os-content lives in this repo. So we could either delete openshift-hack/images/os/ and keep the hack in openshift/release. Or move the rhel-coreos layering hack from openshift/release to this repo.

Someone asked:

My understanding is that kubelets come in via RHCOS, so openshift-hack/images/os/ is "I'm touching openshift/kubernetes, and one output of this repo is the kubelet/hyperkube, so I need to build a new RHCOS via this hack image so my CI will test that new kubelet/hyperkube".

Yes, CI builds the RPMs from the PR and layers it onto RHCOS. The Dockerfile here does it on machine-os-content. The literal Dockerfile in openshift/release does it on rhel-coreos. Only the latter has mattered for the past few releases since machine-os-content isn't actually what's used in the cluster.

@jlebon
Copy link
Member Author

jlebon commented Nov 15, 2023

Only the latter has mattered for the past few releases since machine-os-content isn't actually what's used in the cluster.

One way to confirm this is to look at the logs from #1790 some more.

artifacts/ci-operator.log has the following logs related to the rhel-coreos build:

 107   │ {"level":"info","msg":"Building rhel-coreos","time":"2023-11-02T14:37:37Z"}
 109   │ {"level":"info","msg":"Found existing build \"rhel-coreos-amd64\"","time":"2023-11-02T14:37:37Z"}
 110   │ {"level":"trace","msg":"Waiting for build to be complete.","name":"rhel-coreos-amd64","namespace":"ci-op-isxxxvqg","time":"2023-11-02T14:37:37Z"}
 113   │ {"level":"info","msg":"Build rhel-coreos-amd64 succeeded after 5m9s","time":"2023-11-02T14:42:45Z"}
 114   │ {"for-build":"rhel-coreos","level":"debug","msg":"Running command: /usr/bin/manifest-tool --debug --insecure --docker-cfg /secrets/manifest-tool/.dockerconfigjson push from-args --platforms linux/amd64 --template image-registry.openshift-image-registry.svc:5000/ci-op-isxxxvqg/pipeline:rhel-coreos-amd64 --target image-registry.openshift-image-registry.svc:5000/ci-op-isxxxvqg/pipeline:rhel-coreos","time":"2023-11-02T14:42:45Z"}
 115   │ {"for-build":"rhel-coreos","level":"debug","msg":"manifest-tool command succeeded","output":"Digest: sha256:bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc181cb69a755186170543f8a 403\n","time":"2023-11-02T14:42:46Z"}
 116   │ {"for-build":"rhel-coreos","level":"info","msg":"Image ci-op-isxxxvqg/pipeline:rhel-coreos created","time":"2023-11-02T14:42:46Z"}
 117   │ {"level":"info","msg":"Tagging rhel-coreos into stable","time":"2023-11-02T14:42:46Z"}

Note the digest there (bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc181cb69a755186170543f8a).

In artifacts/e2e-gcp/gather-extra/artifacts/releaseinfo.json of the gcp-e2e logs, we see:

3170   │         {
3171   │           "name": "rhel-coreos",
3172   │           "annotations": {
3173   │             "io.openshift.build.commit.id": "",
3174   │             "io.openshift.build.commit.ref": "",
3175   │             "io.openshift.build.source-location": ""
3176   │           },
3177   │           "from": {
3178   │             "kind": "DockerImage",
3179   │             "name": "registry.build04.ci.openshift.org/ci-op-isxxxvqg/stable@sha256:bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc181cb69a755186170543f8a"
3180   │           },
3181   │           "generation": null,
3182   │           "importPolicy": {},
3183   │           "referencePolicy": {
3184   │             "type": ""
3185   │           }
3186   │         },

So the rhel-coreos image in the payload for that test is definitely the same one we built.

Then, looking at the journal for e.g. master-0, we can see when the MCO pivots RHCOS onto that image:

2254   │ Nov 02 14:59:29.677838 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Process [pid: 3064 uid: 0 unit: libpod-ea44607742bfb72cffd6d001c148a7aac26715c2f2df1bccc4180a378abc6076.scope] connected to transaction progress
2255   │ Nov 02 14:59:29.681461 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Pulling manifest: ostree-unverified-image:docker://registry.build04.ci.openshift.org/ci-op-isxxxvqg/stable@sha256:bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc18
1cb69a755186170543f8a
2256   │ Nov 02 14:59:29.841402 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Fetching ostree-unverified-image:docker://registry.build04.ci.openshift.org/ci-op-isxxxvqg/stable@sha256:bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc181cb69
a755186170543f8a
2257   │ Nov 02 14:59:30.892395 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: layers already present: 43; layers needed: 9 (413.5 MB)
2258   │ Nov 02 14:59:30.893547 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Importing: ostree-unverified-image:docker://registry.build04.ci.openshift.org/ci-op-isxxxvqg/stable@sha256:bc7967fc2d190b95e3c983c4ee6dcc3bc927968fc181cb69a7
55186170543f8a (digest: sha256:13b9a2d421ef30611e9a52a730e9d0de0dccf848e96f6653717d8694deb40b9f)
2259   │ Nov 02 14:59:30.893547 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: ostree chunk layers stored: 43 needed: 8 (274.3?MB)
2260   │ Nov 02 14:59:30.893547 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: custom layers stored: 0 needed: 1 (139.2?MB)
2261   │ Nov 02 14:59:30.893547 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Fetching ostree chunk sha256:220010b6c898 (61.7?MB)
2317   │ Nov 02 14:59:47.672879 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Created new deployment /ostree/deploy/rhcos/deploy/72b2ac91644c90566293670566870a15522919883d818f581230919afbc1061a.0
2318   │ Nov 02 14:59:47.682181 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: sanitycheck(/usr/bin/true) successful
2319   │ Nov 02 14:59:47.695322 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Pruned container image layers: 0
2320   │ Nov 02 14:59:48.230651 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Txn Rebase on /org/projectatomic/rpmostree1/rhcos successful
2321   │ Nov 02 14:59:48.231129 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Freed: 969.6?kB (pkgcache branches: 0)
2322   │ Nov 02 14:59:48.232488 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Unlocked sysroot
2323   │ Nov 02 14:59:48.233489 ci-op-isxxxvqg-b2c4f-52h6s-master-0 rpm-ostree[2702]: Process [pid: 3064 uid: 0 unit: libpod-ea44607742bfb72cffd6d001c148a7aac26715c2f2df1bccc4180a378abc6076.scope] disconnected from transaction progress

Notably, we see the RPM diff:

2324   │ Nov 02 14:59:48.358999 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Upgraded:
2325   │ Nov 02 14:59:48.358999 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]:   openshift-clients 4.15.0-202310252144.p0.g3544c64.assembly.stream.el9 -> 4.15.0-202310311005.p0.gf67a45d.assembly.stream.el9
2326   │ Nov 02 14:59:48.358999 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Downgraded:
2327   │ Nov 02 14:59:48.358999 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]:   openshift-hyperkube 4.15.0-202310272003.p0.g993e262.assembly.stream.el9 -> 1.28.3-1.0.6576be7
2328   │ Nov 02 14:59:48.358999 ci-op-isxxxvqg-b2c4f-52h6s-master-0 podman[2586]: Changes queued for next boot. Run "systemctl reboot" to start a reboot

jlebon added a commit to jlebon/release that referenced this pull request Feb 6, 2024
That image isn't actually used in the cluster. We already have
code in these same files to update the real image that gets used
(`rhel-coreos`), so let's just drop these references.

See also openshift/kubernetes#1805 (comment)
and following.

See also https://issues.redhat.com/browse/MCO-392.
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024
jlebon added a commit to jlebon/release that referenced this pull request Feb 22, 2024
That image isn't actually used in the cluster. We already have
code in these same files to update the real image that gets used
(`rhel-coreos`), so let's just drop these references.

See also openshift/kubernetes#1805 (comment)
and following.

See also https://issues.redhat.com/browse/MCO-392.
@openshift-ci-robot
Copy link

@jlebon: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@jlebon jlebon changed the title openshift-hack/images/os: make kubelet layering hack "OSTree native", drop machine-os-content ref openshift-hack/images/os: delete Feb 22, 2024
@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2024

OK, I've updated this now to just nuke that whole directory since it's no longer used.

@jlebon
Copy link
Member Author

jlebon commented Feb 23, 2024

(This requires openshift/release#49156 to be clear.)

jlebon added a commit to jlebon/kubernetes that referenced this pull request Mar 5, 2024
This will sanity-check in a much more straight-forward way that the
kubelet built from PRs is what's actually getting tested in CI.

See also
openshift#1805 (comment)
and following.
jlebon added a commit to jlebon/kubernetes that referenced this pull request Mar 5, 2024
This will sanity-check in a much more straight-forward way that the
kubelet built from PRs is what's actually getting tested in CI.

See also
openshift#1805 (comment)
and following.
@jlebon
Copy link
Member Author

jlebon commented Mar 5, 2024

Only the latter has mattered for the past few releases since machine-os-content isn't actually what's used in the cluster.

One way to confirm this is to look at the logs from #1790 some more.

At the suggestion of @soltysh, I opened #1905 which is a much more direct way to verify this.

All the logic there is geared towards `machine-os-content` which is no
longer used at all in the cluster.

Nowadays, the container to modify is `rhel-coreos`, which is what is
already being done in CI:

https://github.com/openshift/release/blob/463a8f244ba0f807e76e6fdf974f98d24efd1ced/ci-operator/config/openshift/kubernetes/openshift-kubernetes-master.yaml#L87-L97
@openshift-ci-robot
Copy link

@jlebon: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-merge-bot bot pushed a commit to openshift/release that referenced this pull request Mar 6, 2024
That image isn't actually used in the cluster. We already have
code in these same files to update the real image that gets used
(`rhel-coreos`), so let's just drop these references.

See also openshift/kubernetes#1805 (comment)
and following.

See also https://issues.redhat.com/browse/MCO-392.
@jlebon
Copy link
Member Author

jlebon commented Mar 6, 2024

Now that openshift/release#49156 is in, CI here should pass.
/retest

@soltysh
Copy link

soltysh commented Mar 6, 2024

/retest

@soltysh
Copy link

soltysh commented Mar 6, 2024

/test unit

@soltysh
Copy link

soltysh commented Mar 6, 2024

/override ci/prow/unit
this failure is not related, I'll handle that separately

Copy link

openshift-ci bot commented Mar 6, 2024

@soltysh: Overrode contexts on behalf of soltysh: ci/prow/unit

In response to this:

/override ci/prow/unit
this failure is not related, I'll handle that separately

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

openshift-ci bot commented Mar 6, 2024

@jlebon: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/remove-label backports/unvalidated-commits
/label backports/validated-commits
/hold cancel

@openshift-ci openshift-ci bot added backports/validated-commits Indicates that all commits come to merged upstream PRs. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Mar 6, 2024
@soltysh soltysh changed the title openshift-hack/images/os: delete MCO-392: openshift-hack/images/os: delete Mar 6, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 6, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 6, 2024

@jlebon: This pull request references MCO-392 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target either version "4.16." or "openshift-4.16.", but it targets "4.15.0" instead.

In response to this:

All the logic there is geared towards machine-os-content which is no
longer used at all in the cluster.

Nowadays, the container to modify is rhel-coreos, which is what is
already being done in CI:

https://github.com/openshift/release/blob/463a8f244ba0f807e76e6fdf974f98d24efd1ced/ci-operator/config/openshift/kubernetes/openshift-kubernetes-master.yaml#L87-L97

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 6, 2024
Copy link

openshift-ci bot commented Mar 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlebon, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit a0beecc into openshift:master Mar 6, 2024
20 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-pod-container-v4.16.0-202403070215.p0.ga0beecc.assembly.stream.el9 for distgit openshift-enterprise-pod.
All builds following this will include this PR.

jlebon added a commit to jlebon/os that referenced this pull request Mar 7, 2024
This was previously enabled (openshift#1048, openshift#1374) and then disabled again (openshift#1084, openshift#1393).

The last time we tried it, the issue was that there were some references
remaining in openshift/kubernetes and openshift/release. Those have been
cleaned up now:

openshift/release#49156
openshift/kubernetes#1805

So... third time's the charm!
jlebon added a commit to jlebon/os that referenced this pull request Mar 7, 2024
This was previously enabled (openshift#1048, openshift#1374) and then disabled again (openshift#1084, openshift#1393).

The last time we tried it, the issue was that there were some references
remaining in openshift/kubernetes and openshift/release. Those have been
cleaned up now:

openshift/release#49156
openshift/kubernetes#1805

So... third time's the charm!
memodi pushed a commit to memodi/release that referenced this pull request Mar 14, 2024
)

That image isn't actually used in the cluster. We already have
code in these same files to update the real image that gets used
(`rhel-coreos`), so let's just drop these references.

See also openshift/kubernetes#1805 (comment)
and following.

See also https://issues.redhat.com/browse/MCO-392.
BradLugo pushed a commit to BradLugo/openshift-release that referenced this pull request Jun 13, 2024
)

That image isn't actually used in the cluster. We already have
code in these same files to update the real image that gets used
(`rhel-coreos`), so let's just drop these references.

See also openshift/kubernetes#1805 (comment)
and following.

See also https://issues.redhat.com/browse/MCO-392.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backports/validated-commits Indicates that all commits come to merged upstream PRs. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants