Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: dockerfile: support Dockerfile.sum for pinning sources #2794

Closed
AkihiroSuda opened this issue Apr 10, 2022 · 27 comments
Closed

RFC: dockerfile: support Dockerfile.sum for pinning sources #2794

AkihiroSuda opened this issue Apr 10, 2022 · 27 comments

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 10, 2022

EDIT: the current PR:


Dockerfile.sum is an equivalent of go.sum but s/go/Dockerfile/ .
The content is a subset of BuildInfo:

{
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ]
}

When Dockerfile.sum exists in the context, the Dockerfile builder does:

  • Pinning the digest of docker-image sources (FROM ...)
  • Pinning the digest of http sources (ADD https://...)
  • Recording the consumed entries to the build info structure (["containerimage.buildinfo"].consumedPin)

In the future, Dockerfile should also support ADD <gitref>. and pinning its commit hash.

POC

https://github.com/AkihiroSuda/buildkit_poc/commits/pin-poc.20220411-0

⚠️ The filename and the format of the "Dockerfile.sum" are subject to change.

$ cat Dockerfile
FROM alpine
ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md

$ cat Dockerfile.sum 
{
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ]
}

$ sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 3.0s (6/6) FINISHED                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                      2.8s
 => [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md                                                                 0.0s
 => CACHED [2/2] ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md                                     0.0s

$ cat metadata.json 
{
  "containerimage.buildinfo": {
    "frontend": "dockerfile.v0",
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ],
    "consumedPin": {
      "digest": "sha256:42b78052859819b268e047da95512b20d2e64991d662e4af9f286d743f20b2d4",
      "sources": [
        {
          "type": "docker-image",
          "ref": "docker.io/library/alpine:latest",
          "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
        },
        {
          "type": "http",
          "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
          "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
        }
      ]
    }
  }
}

When a docker-image pin is wrong:

$sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 1.6s (3/3) FINISHED                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => ERROR [internal] load metadata for docker.io/library/alpine:latest                                                                1.4s
------
 > [internal] load metadata for docker.io/library/alpine:latest:
------
Dockerfile:1
--------------------
   1 | >>> FROM alpine
   2 |     ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md
   3 |     
--------------------
error: failed to solve: alpine: docker.io/library/alpine:latest@sha256:fedbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454: not found

When an http pin is wrong:

$ sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 0.6s (5/6)                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                      0.0s
 => [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => ERROR https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md                                                           0.3s
------
 > https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md:
------
error: failed to solve: digest mismatch sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53: sha256:fe4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53
@AkihiroSuda
Copy link
Member Author

@tonistiigi @crazy-max @ktock PTAL 🙏

@crazy-max
Copy link
Member

crazy-max commented Apr 10, 2022

Looks like a nice way to provide/show the buildinfo feature to the end user, thanks @AkihiroSuda!

Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).

Atm metadata resolution is not cached in the frontend (I think it uses the containerimage metaresolver which lacks of cache support). I recall about #706 and wonder if we should not look first at improving the cache experience on this side and maybe align it with your proposal.

I will think a bit more about your proposal but sgtm!

@dlorenc
Copy link

dlorenc commented Apr 10, 2022

This looks awesome.

I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.

@crazy-max
Copy link
Member

crazy-max commented Apr 10, 2022

Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.

@tonistiigi
Copy link
Member

Some things to consider here:

BuildInfo is a result for a specific build, not a specific Dockerfile. A Dockerfile can have lots of targets depending on different images. And with the latest Buildx you can have a single build (one BuildInfo) from multiple Dockerfiles. If sum file is created manually, this isn't really a big issue as it only affects cases where different targets would use different pins. But I think the intent is to use sums from a previous build or generate this file.

BuildInfo in v0.10 also contains Attrs field that is opt-in for now, not only sources. We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.

Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well. We already added --build-context that includes lots of this functionality. You can already do a build that sends the pins for previous images, and they are used even if image tags have been updated. This only works for images atm though (and build stages), not git/http URLs etc. and assumes that client-side can already attach the pins to the build request. As an example, @crazy-max and I discussed possibility for --build-context <image-ref-with-embedded-buildinfo> and --build-context <previous-build-metadata-file-with-buildinfo>.json. This would load the pins (and config) from a previous build.

I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.

@AkihiroSuda
Copy link
Member Author

@crazy-max

Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).

The current POC does not increase the numbers of HTTP request AFAICS, and I don't see increased possibility of hitting the rate limit.

Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.

👍


@dlorenc

I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.

Is it better to change Dockerfile.sum to something else like Dockerfile.pin or Dockerfile.lock ?


@tonistiigi

We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.

Maybe we should have a version field or mediaType field for extensibility?

Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well.

Frontend, because the image meta resolver has to be called before generating LLB.
The pin format is designed to be portable to other frontends though.

I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.

Should this be moved to exporter metadata?

@ktock
Copy link
Collaborator

ktock commented Apr 11, 2022

Sounds very cool 👍

AkihiroSuda@99071c3#diff-50c7e2f4bb8c30ace7a5a6d2858ade67023db56d633f887b26768afae38ece7dR76

// HTTPChecksum returns the checksum if url is present in a.Pin.
// Otherwise returns an empty string without an error.

Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)

And +1 to @crazy-max 's docker build(x) tidy
Maybe, Dockerfile or something can provide the user a way to declare how to acquire a checksum value of a dependency (e.g. downloading SHA256SUMS file from github release page).

@nishakm
Copy link

nishakm commented Apr 11, 2022

Would something like this help?
https://twitter.com/_ctlfsh/status/1250596462502703104

@imjasonh
Copy link

Cosign has also explored adding a CLI to resolve image references in a Dockerfile so they're more reproducible/verifiable: see sigstore/cosign#707 and PR sigstore/cosign#1120

If there was one reliable, canonical way to do this, that would be a huge improvement.

@cpuguy83
Copy link
Member

Also considering something like a Dockerfile.mod where such locks could be added (as they are in go.mod), however a separate .lock file may be good here just because many times you don't want to actually pin resources (e.g. images like ubuntu:18.04 you probably want the most up to date version).

@nishakm
Copy link

nishakm commented Apr 11, 2022

Just a thought: how much do you want to pin? Pinning the FROM image should be straightforward, but pinning the state of software at the time of build is much harder. In order to do tern lock Dockerfile, we had to pull out the created_by commands from the Dockerfile, pull out the environment variables from the config, do a variable expansion in the created_by string, and write it back. For individual components, we had to first inventory the image and then replace all instances of the package in the Dockerfile with a pinned dependency. This doesn't work for ADD/COPY statements after build, especially if you are copying specific files with no context.

@AkihiroSuda
Copy link
Member Author

@ktock

Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)

No, not all entries have to be / can be pinned.
Especially http source entries.


@nishakm @imjasonh The goal of this PR is to cover http sources (ADD https://...) and git sources (ADD git://...) as well as docker-image (FROM ...) sources. Tern and Cosign do not seem capable to support non-image sources.

@rittneje
Copy link
Contributor

I don't understand the purpose of this proposal. If you really want to pin the base image, you can already do this by referencing the image digest instead of the tag. Also, the image tag could come from a build arg, so I don't see how you could have one authoritative pin/lock/sum file in that kind of circumstance.

I think instead of this, there should just be a way to specify the digest as part of the ADD instruction itself, just like you can already do for FROM.

@cpuguy83
Copy link
Member

@rittneje

There's a couple of reasons:

  1. Not having to change the source definition to pin the resources
  2. Not having to change the same definition in multiple places in the same source
  3. Applying the same pins across multiple builds.

@rittneje
Copy link
Contributor

@cpuguy83 I'm assuming by "source" here you mean the Dockerfile.

Not having to change the source definition to pin the resources

If you use build args for the pins then you don't need to change the Dockerfile per se, just whatever is supplying the args. Also, if you are downloading files within RUN via curl or the like, then already you should be doing something similar for sha256sum. So actually, it is preferable to have it all in the Dockerfile, rather than half in the Dockerfile and half in some new lock/pin/sum file.

Not having to change the same definition in multiple places in the same source

Build args already handle this too. For example:

ARG IMAGE_DIGEST=sha256:...

FROM ubuntu@${IMAGE_DIGEST} AS install
...

FROM ubuntu@${IMAGE_DIGEST} AS final
...

Applying the same pins across multiple builds.

I don't know what this means.

@cpuguy83
Copy link
Member

Build args do not handle this.
Build args require the author of the Dockerfile to add those args explicitly, and everywhere.
It also requires the person performing the build to know what the args are.
It also assumes the person building is calling "docker build" directly as opposed to something like "make build".

@cpuguy83
Copy link
Member

I don't know what this means.

Multiple dockerfiles or build targets in one repo.

@rittneje
Copy link
Contributor

@cpuguy83 One thing that is unclear to me from this proposal is what specifically happens when a tag gets changed. For example, suppose I do FROM python:3.8. At the time that resolved to sha256:abc123, so that gets recorded in the lock/sum/pin file. But now that tag is pointing to sha256:def456 in DockerHub. Does buildkit fail the build (because it pulls python:3.8 and now the digests don't match), or does it essentially rewrite FROM python:3.8 into FROM python@sha256:abc123? A similar situation unfolds for ADD git.

@cpuguy83
Copy link
Member

If you have it pinned, it doesn't matter what digest the tag points to in DockerHub, you've pinned it to what you want it to be.

@AkihiroSuda
Copy link
Member Author

Does buildkit fail the build (because it pulls python:3.8 and now the digests don't match), or does it essentially rewrite FROM python:3.8 into FROM python@sha256:abc123?

Latter for regular tags, former for immutable tags

@thaJeztah
Copy link
Member

thaJeztah commented Apr 27, 2022

@AkihiroSuda
Copy link
Member Author

Yes, this proposal covers #975, but provides a generic way for multiple LLB source ops

@thaJeztah
Copy link
Member

Yes, to some extend this can cover that feature request (but likely would require one to manually create the Dockerfile.sum (which also feels a bit awkward).

I haven't looked in-depth, but wondering how this proposal works with build-args (which can influence the build-steps, and thus what "checksum" to look for?), but I guess in that case it's an exercise to the user to create entries for every variant 🤔

@rittneje
Copy link
Contributor

How would this work when working with a multi-arch base image? I haven't really worked with them, but my understanding is you would have multiple images with the same tag but different digests (one per os/arch).

@thaJeztah
Copy link
Member

For multi arch it can refer to the manifest index (which is the "wrapper" tying together the image variants)

@AkihiroSuda
Copy link
Member Author

We have source-policy-file now: https://github.com/moby/buildkit/blob/v0.12.2/docs/build-repro.md#reproducing-the-pinned-dependencies

{
  "rules": [
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "docker-image://docker.io/library/alpine:latest"
      },
      "updates": {
        "identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      }
    },
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md"
      },
      "updates": {
        "attrs": {"http.checksum": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"}
      }
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants