Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote cache based on OCI Image Manifest layout #2251

Closed
joaodrp opened this issue Jul 13, 2021 · 12 comments
Closed

Support remote cache based on OCI Image Manifest layout #2251

joaodrp opened this issue Jul 13, 2021 · 12 comments

Comments

@joaodrp
Copy link

joaodrp commented Jul 13, 2021

Context

When using Buildkit, namely through buildx, it's possible to source/export cache images from/to a remote container registry. This is achieved using the --cache-from and --cache-to options of the build command.

Problem

As it stands, this feature is not supported or consistent across registry providers. This is likely because the compliance of these images with the OCI Image Spec is questionable.

When this feature was first introduced, cache images were represented with a Docker Manifest List.

Sample
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "manifests": [
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
            "size": 2427917,
            "annotations": {
                "buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
                "containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
            }
        },
        ...
        {
            "mediaType": "application/vnd.buildkit.cacheconfig.v0",
            "digest": "sha256:7aa23086ec6b7e0603a4dc72773ee13acf22cdd760613076ea53b1028d7a22d8",
            "size": 1654
        }
    ]
}

According to the Docker Image Spec, the manifests portion of a list should contain references to manifests, with the possible mediaTypes being application/vnd.docker.distribution.manifest.v2+json and application/vnd.docker.distribution.manifest.v1+json. This seems to be where the compliance issues started.

Later on, cache images started using an OCI Image Index instead of a Docker Manifest List. Still, the compliance problems persisted as the only thing that changed were the mediaType field values.

Sample
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.oci.image.index.v1+json",
    "manifests": [
        {
            "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
            "digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
            "size": 2427917,
            "annotations": {
                "buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
                "containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
            }
        },
        ...
        {
            "mediaType": "application/vnd.buildkit.cacheconfig.v0",
            "digest": "sha256:7aa23086ec6b7e0603a4dc72773ee13acf22cdd760613076ea53b1028d7a22d8",
            "size": 1654
        }
    ]
}

According to the OCI Image Spec, "an image index is a higher-level manifest which points to specific image manifests". While the spec may be loose enough to accommodate unknown (but not unexpected) media types within indexes, a good portion of the existing registries currently reject pushes of these cache images.

Although these cache images are currently supported by the reference distribution registry, this is not by design but rather due to an unforeseen side effect of an old workaround, which has converged layer and manifest link validations (distribution/distribution#3452). Without this, the registry requires all "artifacts" referenced in a list/index to exist and have been uploaded through the PUT /v2/<name>/manifests endpoint, which does not accept these layers (only JSON payloads that conform with the Docker/OCI Image Manifest specs are allowed).

Some of the registries that currently support Buildkit cache images are built on top of the distribution registry. The fact that they support these images may not be by design but rather because they were not aware of this validation bug in distribution. This might be an unpleasant surprise for some, so continued support may not be guaranteed in the mid/long term.

Support across providers

Disclaimer: I am a maintainer of the GitLab Container Registry and the Distribution registry.

I tested support for cache images against a series of providers and shared my findings below. These tests were performed on July 4 and 5, 2021. The command used was the following:

docker buildx build \
   --cache-to=type=registry,ref=$REGISTRY_ADDRESS/$REPOSITORY_PATH:cache,mode=max \
   --platform linux/amd64,linux/arm64,linux/arm/v7 \
   -t $REGISTRY_ADDRESS/$REPOSITORY_PATH:image \
   --push - <<<$(echo -e "FROM alpine:3.14.0\nRUN echo 'foo' > /foo")
Provider Supported Observations
Amazon ECR 🔴 Does not allow pushes.
Azure Container Registry 🟢
DockerHub 🟢 There is no indication of the image size in the UI.
GitHub Container Registry 🟡 Pushing and pulling works, but there are UI/UX problems, as the manifest payload is not rendered properly.
GitLab Container Registry 🟡 Pushing and pulling works, but there are UI/UX problems, as the image metadata can't be parsed as usual.
Google Container Registry 🔴 Does not allow pushes.
JFrog Artifactory 🔴 Does not allow pushes.
Red Hat Quay 🔴 Does not allow pushes.

I also found reports of incompatibilities with other registries, such as for Sonatype Nexus (#2068), but I did not test it.

For transparency, out of the above, I can say that the GitLab Container Registry only supports push/pulls because we were not aware of the validation issue in distribution.

Related issues

Proposal

Compatibility across registry providers would be higher if Buildkit used an OCI Image Manifest instead of an "OCI" Image Index to represent cache images. Even for those that currently support them, the UX would be improved.

Cache images are comprised of a list of tarballs/layers and a config, so this seems to be the ideal use case for a manifest and not an index. This is similar to how Helm, WASM, and Homebrew implemented their (non-container) images.

For example, instead of this index:

{
    "schemaVersion": 2,
    "mediaType": "application/vnd.oci.image.index.v1+json",
    "manifests": [
        {
            "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
            "digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
            "size": 2427917,
            "annotations": {
                "buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
                "containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
            }
        },
        ...
        {
            "mediaType": "application/vnd.buildkit.cacheconfig.v0",
            "digest": "sha256:7aa23086ec6b7e0603a4dc72773ee13acf22cdd760613076ea53b1028d7a22d8",
            "size": 1654
        }
    ]
}

We could have the following manifest:

{
    "schemaVersion": 2,
    "config": {
        "mediaType": "application/vnd.buildkit.cacheconfig.v0",
        "size": 1654,
        "digest": "sha256:7aa23086ec6b7e0603a4dc72773ee13acf22cdd760613076ea53b1028d7a22d8"
      },
    "layers": [
        {
            "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
            "digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
            "size": 2427917,
            "annotations": {
                "buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
                "containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
            }
        },
        ...
    ]
}

AFAIK, based on the conversations in docker/buildx#173, the only reason why this was not done from the beginning was that the OCI Manifest Spec mentions that layers "MUST have the base layer at index 0. Subsequent layers MUST then follow in stack order...", so layer order matters according to the spec. To avoid going against that rule (because Buildkit layers have no order - they're not used to create a container filesystem) it seems that it was decided to use a list/index instead, which end up causing the problem we're discussing here.

Given that layer order is not applicable to these cache images, would this really be a spec violation? Perhaps this is where the OCI Image Manifest spec can be adjusted. Regardless, It seems more tolerable to use a manifest and ignore the layer order requirement rather than using a list/index with non-manifest references.

While the ultimate solution seems to be the conclusion and wide adoption of an OCI Artifacts spec, until that becomes a reality, we should do the best we can with what we have now to ensure wide compatibility and a smooth UX across providers. Based on the above, I believe the best option would be to use an OCI manifest instead of an index.

Backward compatibility

In order to avoid a breaking change, whenever pulling a cache image, Buildkit could detect whether it's based on an index (old) or a manifest (new). If it's an index, it would parse/process it as it does today. Support for index based images could be deprecated and later removed without impacting users.

Support across providers

The described proposal would make cache images compatible with all OCI complaint registries, without requiring these to implement any workarounds.

To test this, I manually crafted a manifest payload following the format proposed here and uploaded it (as well as the referenced config/layers) to all registries, following the OCI Distribution Spec.

Manifest Payload
{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.buildkit.cacheconfig.v0",
    "size": 1360,
    "digest": "sha256:7ef6538d16596526a8b2824cea1c52dc3bef2cf6987f3c963fd9e77ba1bc8226"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
      "size": 2427917,
      "annotations": {
        "buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
        "containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:5843afab387455b37944e709ee8c78d7520df80f8d01cf7f861aae63beeddb6b",
      "size": 2811478,
      "annotations": {
        "buildkit/createdat": "2021-07-02T17:08:09.105436766Z",
        "containerd.io/uncompressed": "sha256:72e830a4dff5f0d5225cdc0a320e85ab1ce06ea5673acfe8d83a7645cbd0e9cf"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:58ab47519297212468320b23b8100fc1b2b96e8d342040806ae509a778a0a07a",
      "size": 2709626,
      "annotations": {
        "buildkit/createdat": "2021-07-02T17:08:09.119131923Z",
        "containerd.io/uncompressed": "sha256:5e04d10b60a48b2fef3614c8b2bf64312b03380ac01bcec220e16885fe660be5"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:cc28e5fb26aec14963e8cf2987c137b84755a031068ea9284631a308dc087b35",
      "size": 110,
      "annotations": {
        "buildkit/createdat": "2021-07-14T10:22:42.8079342Z",
        "containerd.io/uncompressed": "sha256:be99b9c0a6081ee2b9a13e4eebcc09350f956b2c2fefe2173b2d63bf898793a5"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:cc28e5fb26aec14963e8cf2987c137b84755a031068ea9284631a308dc087b35",
      "size": 110,
      "annotations": {
        "buildkit/createdat": "2021-07-14T10:22:42.8079342Z",
        "containerd.io/uncompressed": "sha256:be99b9c0a6081ee2b9a13e4eebcc09350f956b2c2fefe2173b2d63bf898793a5"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:cc28e5fb26aec14963e8cf2987c137b84755a031068ea9284631a308dc087b35",
      "size": 110,
      "annotations": {
        "buildkit/createdat": "2021-07-14T10:22:42.8079342Z",
        "containerd.io/uncompressed": "sha256:be99b9c0a6081ee2b9a13e4eebcc09350f956b2c2fefe2173b2d63bf898793a5"
      }
    }
  ]
}
Provider Supported Observations
Amazon ECR 🟢
Azure Container Registry 🟢
DockerHub 🔴 Not OCI compliant. Rejects the manifest upload with DENIED - unknown manifest class for application/vnd.buildkit.cacheconfig.v0.
GitHub Container Registry 🟢
GitLab Container Registry 🟢
Google Container Registry 🟢
JFrog Artifactory 🟢
Red Hat Quay 🔴 Does not seem to be OCI compliant. Is doing something funny with the manifest validation. Getting MANIFEST_INVALID - manifest schema version not supported when advertising the OCI manifest media type in the content-type header or DENIED - unknown manifest class for application/vnd.buildkit.cacheconfig.v0.

Apart from the above, this also works for the Distribution registry, and therefore any provider based on it that retained OCI compliance. This will continue to be true after the fix for the validation issue that created space for the problem described here.

The UI was also consistent across providers.

@SteveLasker
Copy link

Minor nit: as these layers aren't ordinal, and not meant to construct an actual container image, I'd suggest the layer.mediaType would be application/tar.
Would also suggest manifest.config.mediaType use a version: application/vnd.buildkit.cacheconfig.v1

Most registries already support the OCI Artifacts approach as they've relaxed config.mediaType and layer.mediaType validations to support additional types.

See: OCI Artifact Implementors, which isn't a complete list of supported registries, but a good start.

An easy way to validate this across all registries would be to use ORAS
The following command would simulate a buildx cache push:

oras push registry.example.com/hello-artifact:v1 \
--manifest-config /dev/null:application/vnd.buildkit.cacheconfig.v1 \
./artifact.txt:application/tar

By replacing registry.example.com with gcr, ecr, acr, gitlab, etc. you can validate the experience would work. Based on the list of currently working registries, looks like you'll have more consistent working experiences that align with registry lifecycle management (GC) expectations.

I know the big one missing is docker hub, however @justincormack said this being worked on, and tracked here: docker/roadmap#135

@tonistiigi tonistiigi changed the title Cache images: OCI compliance and compatibility with registry providers Support remote cache based on OCI Image Manifest layout Jul 14, 2021
@tonistiigi
Copy link
Member

Support across providers

I've used cache manifests in Google's Artifact Registry that iiuc supersedes their Container Registry.

I'd like to see testing results for the proposed format for the same set of registries as I don't think all of them would just work.

the ultimate solution seems to be the conclusion and wide adoption of an OCI Artifacts spec,

Can you give an overview what changes are to be expected from this(if any) and what is the migration/backward compatibility story here?

Generally, I don't see many blockers for accepting it. The initial version should be implemented as an opt-in key to --cache-to and --cache-from should be able to detect either format(atm it already detects index-based cache manifest and inline cache info). Then in the next steps we can discuss the path of making it default and starting deprecation.

@tonistiigi
Copy link
Member

tonistiigi commented Jul 14, 2021

The relevant code is in

type manifestList struct {
for export and
if err := json.Unmarshal(dt, &mfst); err != nil {
for import. Doesn't look like it would be too tricky to add conditions to move these descriptors around.

@joaodrp
Copy link
Author

joaodrp commented Jul 14, 2021

I'd like to see testing results for the proposed format for the same set of registries as I don't think all of them would just work.

@tonistiigi, I've updated the description with the test results for that. I wouldn't have proposed this if the result wasn't far better than the current status.

Can you give an overview what changes are to be expected from this(if any) and what is the migration/backward compatibility story here?

I think we're still far from a final spec (and even more from wide adoption from the majority of providers), but @SteveLasker may be able to provide more insight. Regardless, in the end, it'll be a simplified/different payload format, so we can use the same approach as proposed here, i.e., retain backward compatibility until it's safe to deprecate and remove a given format.

Generally, I don't see many blockers for accepting it. The initial version should be implemented as an opt-in key to --cache-to and --cache-from should be able to detect either format(atm it already detects index-based cache manifest and inline cache info). Then in the next steps we can discuss the path of making it default and starting deprecation.

Makes sense. An oci-manifest=true key seems a good pick.

@SteveLasker
Copy link

SteveLasker commented Jul 14, 2021

I've updated the description with the test results for that.

@joaodrp, wow, that's awesome to see so much support. I know Joey Schoor was working to enable Quay, as they had the same idea many years ago. It just wasn't proposed as a standard, which is what we're trying to do here. I'll reach out to Hank to get the latest thinking.

Looking at the results @joaodrp provided, it looks like you'll have a wider support matrix for buildx and other types.

@tonistiigi, are you asking what the manifest would look like?

@BillDett
Copy link

Thanks for the heads up here- we're tracking this on the Quay side via https://issues.redhat.com/browse/PROJQUAY-2253. Not sure when we'll be able to add to quay.io as we have some other structural work going on that is higher priority at moment. We will be adding user-managed OCI mime types to next version of Red Hat Quay and that (along with this fix) will find its way over to quay.io eventually.

@theomessin
Copy link

Any update on this issue?

@rafavallina
Copy link

rafavallina commented Oct 20, 2022

Hello everyone,

Wanted to check in on this issue and check in on the updates to Buildkit. Since the last comment here, The OCI has added the OCI Artifact manifest into the specification, which can be used to index a wider variety of artifacts, including signatures and also cached layers.

Has there been any progress in adopting this specification as part of Buildkit? Are there plans to do so?

@georgeseifada
Copy link

I'm also interested in this feature.

@kattmang
Copy link
Contributor

kattmang commented Mar 7, 2023

Just wanted to let folks know I've begun work on a candidate PR for this and have a successfully exported remote cache manifest pushed to AWS ECR: https://imgur.com/a/ox7iOGH. I also plan on exposing this via an opt-in cache key similar to type=registry and mode=max. I plan on opening a PR sometime next week.

kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 7, 2023
Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 7, 2023
Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 7, 2023
)

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 17, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 17, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 17, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue Mar 17, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 1, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 1, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 3, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 3, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 3, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 11, 2023
… of cache manifest (opt-in on export, inferred on import) moby/buildkit moby#2251

Signed-off-by: Kang, Matthew <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 11, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 11, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 12, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 12, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
kattmang added a commit to kattmang/buildkit-cache-manifest-oci that referenced this issue May 12, 2023
Signed-off-by: Matt Kang <impulsecss@gmail.com>
@tonistiigi
Copy link
Member

#3724

@digglife
Copy link

For others' reference, the fix has been released to buildkit v0.12.

It's still fresh(2023-07-13) so you might need to manually upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants