Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COS-2611: Use deploy-via-container #657

Merged

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Oct 19, 2021

Use deploy-via-container

This will cause us to run through the ostree-native container
stack when generating the disk images.

Today for RHCOS we're using the "custom origin" stuff which
lets us inject metadata about the built source, but rpm-ostree
doesn't understand it.

With this in the future (particularly after coreos/coreos-assembler#2685)
rpm-ostree status will show the booted container and understand it;
for example in theory we could have rpm-ostree upgrade work (if
we provisioned from a tag instead of a digest).

A side benefit is this should fix coreos/coreos-assembler#2685
because we're now pulling a single large file over 9p instead of lots
of little ones.


@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 19, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 19, 2021
@openshift-ci openshift-ci bot requested review from bgilbert and lucab October 19, 2021 15:52
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2021
@cgwalters
Copy link
Member Author

/test all

@cgwalters
Copy link
Member Author

cgwalters commented Oct 19, 2021

Experimenting with ostree-containers in OCP/RHCOS8

First, https://copr.fedorainfracloud.org/coprs/g/CoreOS/rpm-ostree-rhel8/ contains binary RPMs for RHEL8.

Using that with this PR, I've run coreos-assembler build:

New format container:

registry.ci.openshift.org/coreos/walters-rhcos-ostreecontainer - you can use this container via e.g. FROM registry.ci.openshift.org/coreos/walters-rhcos-ostreecontainer in a Dockerfile, or just kubectl/podman run it.

Old (current) format container (archive ostree + extensions RPMs):

registry.ci.openshift.org/coreos/walters-rhcos-ostreecontainer-oldformat

OCP release image:

registry.ci.openshift.org/coreos/walters-ocp410-ostreecontainer@sha256:8b0b769a8993e0c3e3bef11805644fb6ce89927473ed8640825a58f7091c708d

Things to try

rpm-ostree rebase --experimental ostree-unverified-registry:registry.ci.openshift.org/coreos/walters-rhcos-ostreecontainer

will flip your machine over into the new format container image.

Generating a derived build in-cluster

I pushed an rhcos branch of my fcos-derivation-example.

I then did this:

$ oc project default
$ oc create is my-rhcos
$ oc new-build --name rhcos-derivation-example --to=my-rhcos https://github.com/cgwalters/fcos-derivation-example#rhcos

Actually pulling the image

The first problem you'll notice here is that the registry is only on the pod network, not the host network. I chose to expose the registry to make it easier to fetch content from the host network. However, clearly we need to make it easy to fetch content from the pod network into the host network via a proxy - should we have rpm-ostree-on-RHCOS know how to automatically invoke env KUBECONFIG=/var/lib/kubelet/kubeconfig oc proxy or so?

Allow anonymous pulls

The default kubelet pull secret does not have credentials for the in-cluster registry. (TODO: figure out how that works)
The simplest thing here is to just: oc policy add-role-to-user registry-viewer system:anonymous

Add the cluster service CA to the host trust root

I think I took the content of the openshift-service-ca.crt configmap in the openshift-config namespace and injected it to the host as /etc/pki/ca-trust/source/anchors/cluster-ca.crt and reran update-ca-trust.

Result

# rpm-ostree rebase --experimental ostree-unverified-image:docker://image-registry.openshift-image-registry.svc.cluster.local:5000/default/my-rhcos
Pulling manifest: ostree-unverified-image:docker://default-route-openshift-image-registry.apps.ci-ln-v5gsgit-f76d1.origin-ci-int-gce.dev.openshift.com/default/my-rhcos
Importing: ostree-unverified-image:docker://default-route-openshift-image-registry.apps.ci-ln-v5gsgit-f76d1.origin-ci-int-gce.dev.openshift.com/default/my-rhcos (digest: sha256:58e44636c937ce03ec302ff623a9bb2f89726943fb1f15fd77edb02a0ced9424)
Using base: sha256:a07e4df9a074523a6904cb0d7c63f135b47f907c6667d20cf830ecf9f17a7c7d
Downloading layer: sha256:a68afa4a0bf1e0715adcdc83e2fbd622d92f0df84d43a4c0be626a3d4f4bdff3 (1.1 MB)
Downloading layer: sha256:279379e48bc0e016719bab7466cdac25ffe2806c6566450955684e172ccdc8ca (319 bytes)
Downloading layer: sha256:13660ae46cffb342b5f9b184a14cb7819e6ce5ba1ce107ef0efbea3a129c4682 (17.7 MB)
Staging deployment... done
Freed: 1.3 GB (pkgcache branches: 0)
Added:
  irssi-1.1.1-3.el8.x86_64
Changes queued for next boot. Run "systemctl reboot" to start a reboot

@cgwalters
Copy link
Member Author

I am not too familiar with the kubelet/CRI/cri-o here - how does it work today for the kubelet to pull an image from the internal registry? How is the authentication handled?

I found this code and may start tracing from there.

But basically I think what we want here is an easy way for the host update process to get the credentials to pull an image in the same way the kubelet is doing. (But not actually pull the image into the container storage)

@mrunalp
Copy link
Member

mrunalp commented Oct 19, 2021

I am not too familiar with the kubelet/CRI/cri-o here - how does it work today for the kubelet to pull an image from the internal registry? How is the authentication handled?

@cgwalters
kubelet delegates to CRI-O for pulling images.
Image pull requests end up in CRI ImagePullRequest. If kubelet has and passed the credentials for an image pull, they are extracted
here - https://github.com/cri-o/cri-o/blob/main/server/image_pull.go#L40 and passed down to containers/image.

Here are some docs for how kubelet could populate the pull secret:

  1. Specifying pull secret for a pod: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  2. Configuring the global pull secret: https://kubernetes.io/docs/concepts/containers/images/#configuring-nodes-to-authenticate-to-a-private-registry. We use /var/lib/kubelet/config.json

CRI-O also has a global_auth_file configuration that is used for getting pull secrets. It is the fallback and used in cases such as mirrors which have no pull secret support in k8s or CRI. This points to the kubelet config file and can be updated by https://docs.openshift.com/container-platform/4.8/openshift_images/managing_images/using-image-pull-secrets.html#images-update-global-pull-secret_using-image-pull-secrets.

@cgwalters
Copy link
Member Author

@mrunalp How does it work today for the kubelet to pull images from a cluster-internal registry? So crio is on the host network, and trying out nsenter -m -n -t $(pidof crio) gives me e.g.

$ host image-registry.openshift-image-registry.svc
Host image-registry.openshift-image-registry.svc not found: 3(NXDOMAIN)
$

Yet, clearly it's resolving that DNS name somehow. I've been looking in the crio code and it's not obvious...oh wait, does it have something to do with s.config.SeparatePullCgroup? But that's not about networking...

@cgwalters
Copy link
Member Author

OK moving this to ostreedev/ostree-rs-ext#121

@haircommander
Copy link
Member

@mrunalp How does it work today for the kubelet to pull images from a cluster-internal registry? So crio is on the host network, and trying out nsenter -m -n -t $(pidof crio) gives me e.g.

$ host image-registry.openshift-image-registry.svc
Host image-registry.openshift-image-registry.svc not found: 3(NXDOMAIN)
$

Yet, clearly it's resolving that DNS name somehow. I've been looking in the crio code and it's not obvious...oh wait, does it have something to do with s.config.SeparatePullCgroup? But that's not about networking...

Not quite, SeparatePullCgroup is for isolating the memory used to pull large images away from CRI-O. what you're looking for is the image destination resolution code, which lives at https://github.com/cri-o/cri-o/blob/main/internal/storage/image.go#L395.

As far as translating between a registry URL and an actual IP/port to pull from, I am not as familiar. I usually pull in @mtrmac or @vrothberg for such questions

@mtrmac
Copy link

mtrmac commented Oct 21, 2021

WRT the low-level pull implementation: There is a bit of mapping for *.docker.io host names, but basically it’s just the very simplest possible Go HTTP request, relying on, AFAIK, ordinary DNS.

At least https://github.com/openshift/cluster-image-registry-operator/blob/master/manifests/0000_90_cluster-image-registry-operator_01_operand-servicemonitor.yaml does suggest that the quoted name should be resolvable within the cluster’s pods.

So this should be a ~general Kubernetes DNS question, and I’m afraid I don’t know the details of that.

@cgwalters
Copy link
Member Author

the quoted name should be resolvable within the cluster’s pods.

Right - that's straightforward. But here I want to pull from the host network namespace. So far at least in OpenShift we have explicitly not had the host depend on or use cluster DNS, because it creates a dependency cycle - the host needs to use DNS to pull the pods that serve DNS in-cluster, etc. There is some thought that we should have the host optionally use cluster DNS only when it's up, and have a health check that resets the configuration back if e.g. the cluster DNS pods are being upgraded, etc.

Perhaps...I basically need to copy the code from containers/skopeo#1476 into crio too, and have the MCO do something like "if kubelet is up, inject configuration into rpm-ostreed.service to tell it to fork off a crio binary instead"? Or, perhaps even simpler, add /run/crio/image-pull-proxy.sock implemented by crio, and have the ostree code use it if it exists, otherwise fall back to forking skopeo?

But, I'd like to avoid duplicating that code...

OK since no one seems to know offhand how this is working I'll try to dig in, and if it's easy to replicate outside of crio (maybe e.g. kubelet is doing a dns lookup in the pod network and passing that IP down to crio via a backchannel?) I'll do that, otherwise fall back to patching crio.

@cgwalters cgwalters force-pushed the rpm-ostree-containers branch 2 times, most recently from fbfcfa8 to 8783c69 Compare October 29, 2021 18:43
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2021
@cgwalters
Copy link
Member Author

cgwalters commented Nov 30, 2021

OK https://copr.fedorainfracloud.org/coprs/g/CoreOS/rpm-ostree-rhel8 is updated with the latest skopeo+rpm-ostree changes. Previously it was trying to target both Fedora and RHEL8 (via C8S) but since the code is in Fedora I've now changed it to just target RHEL8. But even doing that suddenly required switching from c8s to epel8 because https://git.centos.org/rpms/json-c/c/aa505d489ccc4ad2e2abfcc61b08b8f8b272c4f4?branch=c8s broke binary backcompat (and libdnf links to json-c).

I also made a few fixes to ensure the stack runs on rhel8, which are rolled in to coreos/rpm-ostree#3249

Next step: update release image etc. (EDIT: DONE)

See #657 (comment) for new status.

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021
@cgwalters
Copy link
Member Author

cgwalters commented Dec 1, 2021

First, today I randomly stumbled across https://docs.openshift.com/container-platform/4.9/registry/accessing-the-registry.html#registry-accessing-directly_accessing-the-registry

Which is really exactly about this issue. And when I tried it, I was extremely confused because indeed podman seemed to be able to contact the registry (i.e. resolve the service DNS, but it failed as expected due to not trusting the service CA).

After some more digging, I finally figured it out:

$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.122.47 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
$

Something is special casing the registry in /etc/hosts - and of course the host image-registry.openshift-image-registry.svc command only talks directly to the DNS resolver and so bypasses /etc/hosts. I love that the name of the command is host and it doesn't read /etc/hosts!

This misled me to somehow think some magic with network namespaces was involved, because I then didn't even try using e.g. curl or podman/oc to interact with the registry. I need to retrain my fingers to use getent hosts instead. But it's more typing.

Also, hooray for the cookie openshift-generated-node-resolver which has exactly one hit on code search and so immediately helps me find the code involved.

And now I discover yet another chunk of load-bearing bash script run as root.

cgwalters added a commit to cgwalters/release that referenced this pull request Dec 2, 2021
This changes the CI configuration for the `mcbs` branch to use
openshift/os#657
@cgwalters
Copy link
Member Author

OK now I'm banging my head against image pull authentication again. I would swear previously I did

export KUBECONFIG=/etc/kubernetes/kubeconfig
oc registry login

as root to get a ~/.docker/config.json, but that isn't working now.

What we really need to drill into here is basically re-using the kubelet credentials I think.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2022
@cgwalters cgwalters removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 11, 2023
@cgwalters
Copy link
Member Author

OK, we have a green for RHEL now. However, this one could still use some more manual testing
/hold
because I've realized now it may intersect with the bug surfaced by openshift/machine-config-operator#3857
Additionally, it just looks...well, worse unfortuantely until we finally publish official pull specs.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2023
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2024
@cgwalters
Copy link
Member Author

cgwalters commented Jan 16, 2024

because I've realized now it may intersect with the bug surfaced by openshift/machine-config-operator#3857

I was asked about the status of this issue; I think it's likely to be fixed in the MCO now, but the simplest way to test is basically:

  • launch or access a test worker node (or a whole cluster if that's easier)
  • skopeo copy docker://<pull spec> oci:/var/tmp/someoci
  • rpm-ostree rebase ostree-unverified-image:oci:/var/tmp/someocidir
  • Kill and restart the MCD pod and see how it errors out

If it errors out with just "unexpected osimageurl", that's basically OK. If it errors in a more fatal way, then the osimageurl parsing still needs work most likely.

The most real test is to scale up a new worker node (or a whole cluster) from a disk image with this change.

Additionally, it just looks...well, worse unfortuantely until we finally publish official pull specs.

This, however is still an issue.

@jmarrero
Copy link
Member

  • I built an rhcos image with this change, pushed it to quay.
  • I started a cluster with clusterbot on gcp
  • ran the above steps, rebooted the node and killed all machine-config-daemon pods.
  • The MachineConfigPool details shows:
NodeDegraded
True
Jan 17, 2024, 1:49 PM
1 nodes are reporting degraded status on sync
Node ci-ln-1xk0c2t-72292-9lww5-worker-a-2vc5k is reporting: "unexpected on-disk state validating against rendered-worker-7e34c96a1c0e79f4e4ffd254c0d167cd: expected target osImageURL \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa632104c396a0dae5ac25814f791eb8650cd862bcb37dbe9eab107c30f56699\", have \"/var/tmp/rhcosoci\" (\"82aebaf74dfbbb748e4d4367c8cba7bb99d9c20537d03ec48681a1f7a0179c92\")"

@cgwalters
Copy link
Member Author

Yeah that's the error I'd expect; which is generally good. It means we didn't fail to parse it at least.

@cgwalters
Copy link
Member Author

I built an rhcos image with this change, pushed it to quay.

Note this change has no effect really on the generated container image, only on disk images.

@jmarrero
Copy link
Member

@cgwalters any reason not to let CI run on this and merge if all looks good then?

I guess the pull spec looking as a directory is not that awful. Not sure if we are tracking official pull specs anywhere else.

@cgwalters
Copy link
Member Author

Unfortunately CI in this repository still doesn't cover running OpenShift, which is where this change would matter in general. Most of our kola unit tests don't care.

@dustymabe
Copy link
Member

/retest

The most real test is to scale up a new worker node (or a whole cluster) from a disk image with this change.

so this is what still needs to be tested I think. It's been a while since I've done this but @jmarrero I think you can you create an image in GCP (or an AMI in AWS if that's easier) and then modify the machine pool of a test cluster in that cloud to specify a different AMI ID or GCP image name for workers to start, then scale it up.

@jmarrero
Copy link
Member

jmarrero commented Jan 22, 2024

Tested on a cluster by:

  1. uploading the ami:
cosa buildextend-aws --upload --credentials-file=awscreds --bucket=s3://mybucket --region=us-east-1 --grant-user=####
  1. On a cluster created by cluster bot I modify the workers machine-set to use the ami I just uploaded.
  2. Delete one of the machines that are part of the machine-set.
  3. Verify that the new machine is using the ami I uploaded and becomes healthy.

also changed the ami name a few chars to make sure it is really using the one I uploaded.

@dustymabe
Copy link
Member

Thanks @jmarrero - also coreos/coreos-assembler#3700 just merged which should give us a less hacky looking imgref when running rpm-ostree status.

I'd say this is unblocked now?

@jmarrero
Copy link
Member

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Jan 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jmarrero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mike-nguyen
Copy link
Member

mike-nguyen commented Jan 22, 2024

/retitle COS-2611: Use deploy-via-container

@openshift-ci openshift-ci bot changed the title Use deploy-via-container COS-2611: Use deploy-via-container Jan 22, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 22, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 22, 2024

@cgwalters: This pull request references COS-2611 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.

In response to this:

Use deploy-via-container

This will cause us to run through the ostree-native container
stack when generating the disk images.

Today for RHCOS we're using the "custom origin" stuff which
lets us inject metadata about the built source, but rpm-ostree
doesn't understand it.

With this in the future (particularly after coreos/coreos-assembler#2685)
rpm-ostree status will show the booted container and understand it;
for example in theory we could have rpm-ostree upgrade work (if
we provisioned from a tag instead of a digest).

A side benefit is this should fix coreos/coreos-assembler#2685
because we're now pulling a single large file over 9p instead of lots
of little ones.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dustymabe
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 22, 2024
Copy link
Contributor

openshift-ci bot commented Jan 22, 2024

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/build-test-qemu-kola-all 767f4e7 link true /test build-test-qemu-kola-all
ci/prow/rhcos-86-build-test-metal 767f4e7 link true /test rhcos-86-build-test-metal
ci/prow/build-test-qemu-kola-metal 767f4e7 link true /test build-test-qemu-kola-metal
ci/prow/build-test-qemu-kola-basic 767f4e7 link true /test build-test-qemu-kola-basic
ci/prow/rhcos-86-build-test-qemu 767f4e7 link true /test rhcos-86-build-test-qemu
ci/prow/rhcos-90-build-test-qemu 767f4e7 link true /test rhcos-90-build-test-qemu
ci/prow/rhcos-90-build-test-metal 767f4e7 link true /test rhcos-90-build-test-metal
ci/prow/build-test-qemu-kola-upgrade 767f4e7 link true /test build-test-qemu-kola-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jmarrero
Copy link
Member

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 9697abf into openshift:master Jan 22, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support cosa init --ostree docker://quay.io/coreos-assembler/fcos:testing-devel