Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eStargz specification to OCI v1 (support lazy pulling) #815

Open
ktock opened this issue Dec 9, 2020 · 2 comments · May be fixed by #877
Open

Add eStargz specification to OCI v1 (support lazy pulling) #815

ktock opened this issue Dec 9, 2020 · 2 comments · May be fixed by #877

Comments

@ktock
Copy link

ktock commented Dec 9, 2020

TL;DR

  • Standardize eStargz archive format as an optional extension to OCI Image Spec v1: https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md
  • Define org.opencontainers.image.toc.digest annotation for enabling chunk-level content verification
  • No need to introduce a new layer media type, because eStargz is fully compatible with application/vnd.oci.image.layer.v1.tar+gz
    • Though compression methods other than gzip is out-of-scope, this spec can be smoothly extended to other compression methods in the future (e.g. zstd)

Overview

Pull is one of the time-consuming steps in the container lifecycle. One of the root causes of this issue is tar (+gzip) archived layer that doesn't allow image consumers (e.g. container runtimes, builders, etc.) to run container until the entire contents being locally available.

This proposal aims at solving this issue by enabling lazy pulling for OCI images. Lazy pulling here means image consumers don't download the entire image on pull operation but fetches necessary chunks of contents on-demand. This allows us to reduce the time to take for pull and startup the container quickly.

We propose standardizing lazily-pullable and OCI-compatible tar.gz extension "eStargz" (https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md) which is developed in containerd Stargz Snapshotter project. The recent benchmarking result shows the performance improvement on the pull operation (Please also see the README for the detailed explanation).

benchmarking result

Because eStargz is fully compatible with the current spec,

  • it can be lazily pulled without any changes to the registry
  • it can still run on eStargz-agnostic runtimes so the community can adopt the new spec without taking risk of breaking their environment

Though this proposal focuses on the extension to the gzip-compressed layer, we believe eStargz can be smoothly extended to other compression methods in the future. Recently, Podman community tries to define zstd-version of lazy-pullable format zstd:chunked based on the eStargz spec. Standardizing eStargz will also help standardize zstd:chunked in the future, with a minimum amount of changes to the spec. This consistency of the format across compression methods should also be beneficial for runtime implementers to adopt lazy pulling without unnecessary complexity.

Thanks @AkihiroSuda for the discussion about this proposal.

Goal

The goal of this proposal is to add support of lazy pulling to OCI Image Spec by standardizing eStargz spec (https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md) as an optional extension and by defining an annotation org.opencontainers.image.toc.digest for content verification. Changes aren't needed to the OCI Distribution Spec because eStargz can be lazily pulled from the registry as long as it supports HTTP Range Request which is already included to that spec.

Proposed Changes

Fig 1. The Structure Fig 2.Prefetching Support Fig 3. Content Verification

Starndardize eStargz archive format as an optional extension to application/vnd.oci.image.layer.v1.tar+gz (Fig 1 and 2)

eStargz is compatible with application/vnd.oci.image.layer.v1.tar+gz so a new Media Type doesn't need to be introduced. Instead, we propose adding eStargz spec to OCI Image Spec as the optional extension to +gzip Media Types.

The overview of eStargz is the following. For more details, please refer to eStargz spec.

  • Gzip-compressing tar entry per file (or chunk if that file is large). This enables the image consumer to decompress each tar entry selectively.
  • Adding TOC JSON to the layer tar blob. This contains metadata and content offset of all files. This allows image consumers to mount a layer without scanning the entire tar.gz and to extract necessary contents, selectively.
  • Adding meta entries for indicating "prioritized" files that SHOULD be prefetched when mounting the layer. This helps image consumers to make sure that these files are locally available and to avoid network-related overheads when reading these files.

Define org.opencontainers.image.toc.digest annotation (Fig 3)

In the current OCI Spec, a layer can be verified by the Digest of the layer written in the descriptor in the manifest. However, when a user lazily pull a layer (i.e. fetch and extract chunks separately on demand), this verification method cannot be applied because the entire layer contents haven't acquired.

For solving this issue, eStargz can verify the contents in chunk-granularity on demand. Digests of each chunk are written in the TOC JSON so that the image consumers can verify them separately every time it acquire the file contents. The TOC JSON itself is verified by the digest written in a pre-defined annotation on the layer descriptor in the manifest which is already verifiable with the current spec. More details of this extension are described in the eStargz definition doc.

For enabling this, we propose adding the following pre-defined annotation, following the OCI's naming convention of annotation.

  • org.opencontainers.image.toc.digest: OCI Digest of the TOC JSON in the layer

Out-of-scope

This proposal focuses on lazy pulling and standardizing eStargz spec which is used in the wild, for OCIv1. Thus some requirements discussed in OCIv2 are out-of-scope in this proposal, incluiding:

Though OCIv2 is out-of-scope in this proposal, eStargz doesn't conflict to OCIv2 discussion.

This proposal focuses on the extension to application/vnd.oci.image.layer.v1.tar+gz and other types of compression method (e.g. zstd) are out-of-scope.

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Dec 10, 2020

Is there a similar tool to stargzify for eStargz that I can play with?

Edit: nvm, found https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/ctr-remote.md

@ktock
Copy link
Author

ktock commented Dec 18, 2020

Added recent eStargz creation support by ggcr to the list.
@mattmoor @jonjohnsonjr Thank you very much about this!

Could we move this discussion forward?
@cyphar WDYT?

mattmoor added a commit to mattmoor/imgutil that referenced this issue Dec 18, 2020
This pulls in the estargz support in google/go-containerregistry, which enables buildpack builders to produce estargz layers by simply setting `GGCR_EXPERIMENT_ESTARGZ=1`.

Related: opencontainers/image-spec#815
mattmoor added a commit to mattmoor/imgutil that referenced this issue Dec 18, 2020
This pulls in the estargz support in google/go-containerregistry, which enables buildpack builders to produce estargz layers by simply setting `GGCR_EXPERIMENT_ESTARGZ=1`.

Related: opencontainers/image-spec#815
mattmoor added a commit to mattmoor/imgutil that referenced this issue Dec 18, 2020
This pulls in the estargz support in google/go-containerregistry, which enables buildpack builders to produce estargz layers by simply setting `GGCR_EXPERIMENT_ESTARGZ=1`.

Related: opencontainers/image-spec#815
Signed-off-by: Matt Moore <mattmoor@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants