Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPNODE-2205: Lazy image pull support #1600

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

harche
Copy link
Contributor

@harche harche commented Mar 18, 2024

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2024
@openshift-ci openshift-ci bot requested review from russellb and sdodson March 18, 2024 19:53
@harche harche changed the title WIP: Lazy image pull support OCPNODE-2205: Lazy image pull support Mar 18, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 18, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 18, 2024

@harche: This pull request references OCPNODE-2205 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

xref : https://issues.redhat.com/browse/OCPNODE-2205

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2024
@harche
Copy link
Contributor Author

harche commented Mar 18, 2024

/cc @sairameshv

# Non-Goals

* Enabling lazy image pulling on the control plane nodes.
* Replacement of the standard OCI image format; `eStargz` will be an additional supported format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our tools do not support eStargz. We are not able to produce this format.

We can use the zstd:chunked format.

Same comment applies in other places in this patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stargz store performs lazy pulling if the image is eStargz or zstd:chunked and CRI-O performs lazy pulling for them when stargz store enabled. So I think the formats don't need to be limited to zstd:chunked and eStargz can still be listed as one of the lazy-pullable formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktock to begin with only zstd:chunked format be supported by OCP support. i.e. we will not technically prevent anyone from using any other format, but the official support will only be extending, at least in the beginning, to zstd:chunked format.

name: cluster
spec:
lazyImagePools:
- "serverless-custom-pool1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the type in addition to the pool name. Just in case we want to change the type of the lazy image pull later.

[storage.options]
additionallayerstores = ["/var/lib/stargz-store/store:ref"]
```
As of now, that `c/storage` configuration file does not support drop-in config. The entire content of that file is [hard-coded in MCO](https://github.com/openshift/machine-config-operator/blob/release-4.16/templates/common/_base/files/container-storage.yaml). Without the drop-in config support, we risk overriding that critical file and potentially losing custom configuration that might be already present in there. A Request for adding drop-in config support for applying similar custom configurations to `c/storage` [already exists](https://github.com/containers/storage/issues/1306). Hence, this proposal advocates adding support for drop-in config in `c/storage`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need this for KEP-4191 also. Since we are going to modify imagestore.


* **Faster Serverless:** - Serverless end points need to serve the cold requests as fast as possible. Lazy-pulling of the images allow the serverlesss end point to start serving the user request without getting blocked on the entire container image to get downloaded on the host.
* **Improved User Experience:** Large images often lead to extended pod startup times. Lazy-pulling drastically reduces the initial delay, providing a smoother user experience.
* **Potential Benefits for Large Images:** If the container image is shared between various containers which utilize only part of the image during their execution, Lazy image pulling speeds up such container startup by downloading only the relavent bits from those images.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth exploring what types of images would not work with stargz. And how we can deduce that?

ie Can we understand why stargz is not working for this images and what kind of logs can we look at to determine that for support/operational?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That snapshotter supports zstd:chunked and eStargz format. OCP will support zstd:chunked format only. You can enable debugging in /etc/container/storage.conf to get more verbose output of the image handling.

Copy link

@kannon92 kannon92 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't really answering my question.

Potential Benefits for Large Images: If the container image is shared between various containers which > utilize only part of the image during their execution, Lazy image pulling speeds up such container startup by > downloading only the relavent bits from those images.

this makes it sound like the benefits of lazy pulling is not guaranteed. So i wanted to know how we could know if it wasn't working or working. From a support and a user perspective.

Copy link

@kannon92 kannon92 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it always work if someone uses zstd:chunked for their containers. And in all cases, that is all that is needed?

Are there cases where all the bits would need to be downloaded for a container to start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Stargz service plugin is not running on the host, then the images will not get lazy pulled. i.e. the container would start times will be like regular container images.

If that plugin is running and the users suspect any defect in the plugin they can look at the unit logs for details.

You can use image with zstd:chucked format, but without that plugin the lazy part of image pulling won't work, so you would have container start up times comparable to the regular container images.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly asking because the language you said makes it sound like this isn't guaranteed.

I think if stargz service plugin is enabled, and images will get lazy pulled.

I would maybe just drop the "Potentially".

But adding some logs on debugging this would be useful for docs/feature enablement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically lazy pulling is trading throughput for latency — startup might be faster, but all operations are a bit slower due to the indirection required for the “is the file available already, or do I need to download it” checks.

So, the answer to “what types of images don’t benefit” from lazy pulling is “all well-designed ones” (when throughput matters): if all files in the image are actually necessary, and consumed over the lifetime of the application, then it is faster to pull and to use the image using an ordinary pull, rather than the FUSE slowly-download-files-on-demand lazy approach.

Lazy pulling should be beneficial only if many files in the container are not actually necessary for processing the typical request (and if the files that are necessary are so few that the per-file request latency is not overwhelming). I’m sure that happens frequently in practice, but it is quite likely an indication of something that could be fixed in the image.


(There are other benefits to zstd:chunked, like possibly sharing files across container images / layers, but those don’t depend on lazy pulling.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically lazy pulling is trading throughput for latency

stargz-snapshotter will continue to download rest of the image in the background.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FUSE overhead is paid during every single filesystem operation during the container runtime, isn’t it? For a long-running container which is not running purely out of memory (e.g. a single statically linked binary) it will eventually overwhelm any possible savings.

@harche
Copy link
Contributor Author

harche commented Mar 28, 2024

/test markdownlint

* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.
* Support lazy image pulling only on the worker nodes (or subset of worker nodes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(why?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the dev preview phase we thought it would be better to limit the usage to the worker nodes because we still have an important pending issue of downloaded chucks not being verified in backing c/storage library. @giuseppe, @ktock and @mtrmac are working on addressing that - containers/storage#795 (comment)

We will be blocking on that before graduating this to TechPreview

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW broken layer integrity enforcement is a security blocker.

I don’t know that it’s appropriate to ship to customers in any channel; at the very least it would require a well-advertised caveat that customers certainly see before actively opting in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, at the very least I am not actively working on that, I don’t know about the others listed.

As a blocker for this enhancement it probably should be tracked in more salient place than a closed upstream PR.)


* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious how "battle tested" this is. I guess we'll find out in testing, but the amount of new moving parts in code here has me left with an instinct that this will require a maintainer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious how "battle tested" this is. I guess we'll find out in testing

Initial tests in OCP are promising. We are also proposing to add Openshift e2e job around lazy image pulling in this EP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the plugin will certainly require a knowledgeable maintainer able to deal with customer support escalations. Even if there were no bugs in it (which I doubt), expertise would still be required to allow post-mortem analysis of broken systems.

That’s a non-trivial ongoing cost.

# Non-Goals

* Enabling lazy image pulling on the control plane nodes.
* Replacement of the standard OCI image format; `zstd:chunked` will be an additional supported format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you're trying to say here but it sounds like it's implying zstd:chunked is separate from OCI; as you know it's just an alternative compression scheme with metadata, but readers may not. How about

Replacement of the standard OCI image format; zstd:chunked is a variant of zstd compressed layers in OCI, which is supported by podman and moby

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, thanks. Updated.

lazyImagePull:
enabled: true
```
* Optionally, A list of supported types of the image format for lazy pulling can also be specified. At the time of writing this document, Openshift will only support `zstd:chunked` type image format. If the user doesn't explicitly specify the image format, the default value is set to `zstd:chunked`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they can only specify one thing, why would this be mentioned in the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have eStargz and zstd:chunked formats supported by the stargz-snapshotter. We are going to support only zstd:chucked format to begin with. But in future, we may support other snapshotters with their own image formats e.g. https://github.com/awslabs/soci-snapshotter

So we are just trying to make sure in future we should be able to support more snapshotters and formats (if required).

/cc @rphillips


[Service]
Type=notify
Environment=HOME=/home/core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That's bizarre.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, dropped it.

```toml=
[Unit]
Description=Stargz Store plugin
After=network.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means very little and I doubt it's necessary https://systemd.io/NETWORK_ONLINE/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, dropped it.

RestartSec=1

[Install]
WantedBy=multi-user.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us actually, WantedBy=crio.service or kubelet-dependencies.target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to crio.service

[Service]
Type=notify
Environment=HOME=/home/core
ExecStart=/usr/local/bin/stargz-store --log-level=debug --config=/etc/stargz-store/config.toml /var/lib/stargz-store/store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so here's where I think what we're doing in splitting binaries in CoreOS and configuration in MCO makes little sense.

I think if this is packaged as an RPM then this systemd unit should live in that RPM and not in the MCO.

And most importantly, it's ExecStart=/usr/bin/stargz-store.

(/usr/local is machine local state on CoreOS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ExecStart=/usr/bin/stargz-store, thanks.

If we ship the systemd unit with the RPM, is there a way to enable/disable the service using MCO? We would want this service only if user explicitly wants to use lazy image pulling. Otherwise, there is no reason to have that service running by default.


#### Resource Consumption with FUSE

The stargz-snapshotter employs a FUSE filesystem, which can lead to increased resource usage on the node during intensive local disk I/O operations. It’s important to note that this does not affect containers performing heavy I/O with local volume mounts, as they will not experience performance degradation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that pointer, added a remark in that section about it at the end.


* Enabling lazy image pulling on the control plane nodes.
* Replacement of the standard OCI image format; `zstd:chunked` will be an additional supported format.
* Alterations to the image build process; developers can use standard OCI build tools, such as buildah to build and convert images to `zstd:chunked` format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But...I think a notable callout here is going to need to be the docs work to describe how to do that.

In general it should be highlighted here more obviously that because zstd:chunked isn't the default, in reality customers will need to enable it in their image builds for their own images, and if they want the benefit for our images, they'll need to re-compress (which is probably not very viable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will add a documentation related goal so that we won't miss it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if they want the benefit for our images, they'll need to re-compress (which is probably not very viable).

In particular recompressing the OpenShift product images is, AFAIK, not viable in production workloads, because the release image payload, and the image integrity design, is built around the release image containing a full list of image digests of the OpenShift images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly any signed image. And I’d expect operators / Helm charts to fairly frequently refer to images by digest.

So… this will most likely primarily apply to customer-internal images only.

containerRuntimeConfig:
lazyImagePull:
enabled: true
format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format: zstd:chunked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will support only one, but in future if you want to support one or more then the list would be helpful.

format:
    - "zstd:chucked"
    - "ystd:hunked"
    - "xstd:junked" 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stargz store also supports eStargz and CRI-O performs lazy pulling for both of zstd:chunked and eStargz so can we list that format here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktock Thanks for your reply. This enhancement proposal is aimed at introducing and supporting the new feature in Openshift. While it is true that the stargz snapshotter supports both zstd:chunked and eStargz formats, from the Openshift's supportability point of view we have to evaluate overall ecosystem around any feature and how best we can support our customers. At this point in time, Redhat's main container image building tools like podman and buildah focus on supporting zstd:chunked image format.

But having said that, we would definitely officially support more formats if overall container tools from Redhat support those formats as well. In fact, this is the reason we have added format field in the API with this enhancement proposal.

Copy link

@ktock ktock Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification!

This enhancement proposal is aimed at introducing and supporting the new feature in Openshift.

👍

While it is true that the stargz snapshotter supports both zstd:chunked and eStargz formats, from the Openshift's supportability point of view we have to evaluate overall ecosystem around any feature and how best we can support our customers. At this point in time, Redhat's main container image building tools like podman and buildah focus on supporting zstd:chunked image format.
But having said that, we would definitely officially support more formats if overall container tools from Redhat support those formats as well. In fact, this is the reason we have added format field in the API with this enhancement proposal.

So the runtime will have a logic to decide whether it enables lazy pulling on the pulling image, right? I have a question in the proposal, related to the implementation of that logic: PTAL #1600 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a operator in Openshift called Machine Config Operator (MCO). We use that operator to lay down node specific configurations (e.g. config files in /etc). Our plan is to use that operator to lay down required configuration for the c/storage library used by crio to enable this feature. In fact, that config file for c/storage is already placed by MCO already, we just need to update it with the lazy image related configuration (if user decides to use this feature).

lazyImagePull:
enabled: true
format:
- "zstd:chunked"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this value consumed by the runtime? i.e. How can the runtime use this value for deciding whether it enable lazy pulling on the pulling image? Maybe this should be a mediatype + manifest annotation?

Copy link
Contributor

@mtrmac mtrmac Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, none of this exists at a generic level: the on-demand layer plugin is provided with an image reference and a layer digest, and that’s it. So if this had to be implemented today, that plugin would need to re-parse the image manifest, find the layer, and impose its own conditions on the layer’s metadata.

If we need to add a generic feature, that will need to be co-designed along with this enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this value consumed by the runtime? i.e. How can the runtime use this value for deciding whether it enable lazy pulling on the pulling image? Maybe this should be a mediatype + manifest annotation?

@ktock As of now, this values isn't directly used by the runtime. We are using this value to in MCO to prepare the configuration file for the c/storage library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c/storage AFAIK has no option to restrict ALS lookups by compression format right now.


## Upgrade / Downgrade Strategy

* Downgrade expectations - In the event of a cluster downgrade to a version that does not support zstd:chunked image format, containers utilizing such image format will get downloaded again and their startup times would be comparable to the non-zstd:chunked equivalent container image. Even in the absence of stargz-snapshotter, those container will start but without any performance gains.
Copy link
Contributor

@mtrmac mtrmac Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see any mechanism for containers to be downloaded again.

(Cc: @giuseppe to keep me honest about the c/storage aspects.)

Once an Image object exists in c/storage, it doesn’t go automatically away when a backend layer disappears; the image is just broken. Once a layer record exists in c/storage metadata, it doesn’t go automatically away when the layer store changes and the layer no longer exists; the layer is just broken (and might even “infect” future pulls that share this layer, because the metadata lookup will indicate that the layer already exists locally).

Unless the downgrade triggers a complete wipe? I don’t know whether it does, and the text above is not worded in a way consistent with that.

(Also, zstd in a non-chunked variant has been supported for a long time; and layers are locally stored uncompressed, so the compression format doesn’t matter all that much. But that’s all mostly irrelevant.)

Even in the absence of stargz-snapshotter, those container will start but without any performance gains.

Absolutely not; the snapshotted layers are stored in different paths from the point of view of the graph driver (and I very much hope that the backend plugins are not trying to store anything in the graph driver-managed locations).

See above about “once a layer record exists”…

This downgrade either needs to explicitly trigger a wipe (assuming that’s possible at all), or must be prevented.


And it’s not just downgrades: disabling the snapshotters on an already-running node would break the node for the same reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct, if the images need to be cleaned up if the stargz store service is not enabled (or fails to start). I will update the proposal to explicitly call out that scenario. Thanks.


#### What consequences does it have on the cluster health?

Containers that use zstd:chunked image format will observe no associated performance gains in their start up time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the “downgrade” section; just disabling the option will break the node, AFAICS.

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Just to make sure OCPNODE-2205: Lazy image pull support #1600 (comment) is considered
  • I don’t see any discussion of registry credentials. The lazy-pull layer store does not currently get any credentials from the actual pull operation in c/image, they must be provided to the plugin via other means
    • I see no discussion of that happening here at all
    • Assuming the capabilities as of today, that would either mean that per-image pull secrets linked from the Pod object would not be used at all, at best this would work for the cluster global pull secret; that’s possible but a fairly significant downside for any production workloads; or maybe this particular snapshotter seems to support somehow capturing credentials by interposing a CRI proxy (ingenious but eww, and I don’t know how that could work in multi-tenant situations).
    • Alternatively, some new feature to provide the credentials directly from the pull operation would have to be built. And because providing credentials via file paths passed to FUSE is out of the question, that would be a major rearchitecture of the API, AFAICS

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024
@ktock
Copy link

ktock commented May 8, 2024

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024
@ktock
Copy link

ktock commented May 8, 2024

registry credentials

WDYT about the following design for allowing c/image directly sharing creds to Additional Layer Store.

  • Additional Layer Store implementation (e.g. stargz-store) provides a helper binary that is exceuted from c/image.
  • This helper binary is registered to c/image using registries.conf with the following field (store-helper can be any command name of the helper binary).
additional-layer-store-auth-helper = "store-helper"
  • That helper binary receives registry creds via stdin and Additional Layer Store can use that creds for registry authentication.

@mtrmac
Copy link
Contributor

mtrmac commented May 8, 2024

How is execution of that binary bound to a particular pull RPC and on-demand mount creation? OpenShift is multi-tenant, and some users are not allowed to see some images. (The “obvious” way of “the ALS pull operation should just provide the operation while setting up the on-demand mount” would just get us back where we started, “this would be a major rearchitecture of the c/storage … snapshotted API to not be based on FUSE paths”: If we could privately pass a credential handle down to the snapshotter from c/storage, we could just as well pass the credentials themselves.)

Or is this proposal entirely giving up on multi-tenant, content to support only the cluster pull secret?

  • If so, the backend could probably just read the cluster pull secret directly, instead of inventing a RPC API and settings
  • If not, building all that extra binary / configuration / … API for a limited short-term feature … is possible, but feels, at the very least, inelegant.

@romfreiman
Copy link

Are you familiar with this work?
#1599

cc: @hexfusion

@ktock
Copy link

ktock commented May 9, 2024

How is execution of that binary bound to a particular pull RPC and on-demand mount creation? OpenShift is multi-tenant, and some users are not allowed to see some images.

I think c/image should check if the client is allowed pulling the image with the passed creds, e.g. by accessing to the manifest. If the pulling has access to the image, the already-mounted layers can be reused (like cacehed non-lazy layers are reused).

@mtrmac
Copy link
Contributor

mtrmac commented May 13, 2024

Yes, c/image would be accessing the manifest, and you’re right that this, in a sense, validates the credentials being used. (And, anyway, CRI-O’s pull operations also boil down to the manifest access check if layers are already present, so that must be good enough.)

So that would work. The way the credentials get from the “on-registry source” to the on-demand FUSE filesystem … break our neat abstractions and seem inelegant to me, but ultimately possible (and, in c/image, would be hidden in a private API).

Copy link
Contributor

openshift-ci bot commented May 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lalatendumohanty for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Harshal Patil <harpatil@redhat.com>
Copy link
Contributor

openshift-ci bot commented May 17, 2024

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@harche
Copy link
Contributor Author

harche commented May 17, 2024

@mtrmac I updated the enhancement to have more stricter goals based on your feedback. Thanks.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2024
@ktock
Copy link

ktock commented Jun 15, 2024

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2024
@ktock
Copy link

ktock commented Jun 15, 2024

@ktock
Copy link

ktock commented Jun 24, 2024

@mtrmac @giuseppe PTAL

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This must be reviewed by an OpenShift expert; my review is not sufficient in any case
  • I continue to be generally, intuitively (no numbers), skeptical of on-demand pulling. If applications are being created with huge mostly unused images, trimming the images, or using a large shared base layer, seems to me to be in all respects better than on-demand FUSE usage.
  • Who is going to be the SME on-call for downtime and suport escalations ( OCPNODE-2205: Lazy image pull support #1600 (comment) )?

lazyImagePull:
enabled: true
format:
- "zstd:chunked"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c/storage AFAIK has no option to restrict ALS lookups by compression format right now.

Comment on lines +48 to +49
* Add support for chunk verification in `c/storage`
* Add support for lazy images pulling from a private registry in `c/image`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this has happened — link to it, or remove it from the list?


Really this a list of implementation steps, not goals, anyway…


## Proposal

In order to lazy pull images, we need to have [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter/blob/aaa46a75dd97e401025f82630c9d3d4e41c9f670/docs/INSTALL.md#install-stargz-store-for-cri-opodman-with-systemd) installed on the node as well as `additionallayerstores` configuration for container storage library. [ContainerRuntimeConfig](https://github.com/openshift/machine-config-operator/blob/release-4.16/docs/ContainerRuntimeConfigDesign.md) will be used to deploy the required configuration on the target nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and the credential passing hook?


## Upgrade / Downgrade Strategy

* Downgrade expectations - If you downgrade the cluster to a version that does not support the `zstd:chunked` image format, any containers using this format will fail to start. However, since `zstd:chunked` is supported by `c/storage`, deleting the image from the node (for example, using `crictl rmi <image>`) will allow the image to be redownloaded, and the containers should start successfully. Note, though, that without the benefits of lazy image pulling, the start time for these containers will be similar to that of regular container images.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is “deleting the image from the node” happening? Is that automatic? Would admins be required to ssh into the nodes before triggering the update, or something similarly implausible?

Copy link

@ktock ktock Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented to #1600 (comment)


#### What consequences does it have on the cluster health?

Container that used lazy pulled images will fail to start, nodes need to be cleaned up using `crictl rmi <image>` to recover.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that reasonably conveniently be done?

Would users be expected to do that while the ALS is still enabled?

  • What if they forget?
  • If they do that, can’t the Kubelet trigger a new pull afterwards again?

Or would that only happen after the downgrade? Will the cluster working enough to make that possible?


Or is this an ~irreversible decision, once ALS is enabled, there is no practical way back?

Something else?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented to #1600 (comment)

@ktock
Copy link

ktock commented Jul 2, 2024

@mtrmac Thanks for comments.

About #1600 (comment) and #1600 (comment) :
If the repair of c/storage should be handled automatically after config change or downgrade, I think CRI-O should add a logic similar to the existing "clean_shutdown_file".

WDYT about the following design?

CRI-O adds the following flag to crio.conf:

  • additional_layer_store_file="<location>": If Additional Layer Store is enabled, CRI-O creates a file at the specified location when it starts. After restarting CRI-O, it can know if the previous invocation of CRI-O used Additional Layer Store.

When CRI-O is started with disabling Additional Layer Store but the previous invocation enabled it (i.e. "additional_layer_store_file" exists), CRI-O performs check+repair of c/storage to delete damaged images.

c/storage will have an API Store.IsAdditionalLayerStoreEnabled() returns true if Additional Layer Store is enabled. This will be used by CRI-O to know if Additional Layer Store is enabled or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants