Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply SOURCE_DATE_EPOCH to the files and the whiteouts inside OCI tar layers #3560

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jan 31, 2023

Propagate the build-arg:SOURCE_DATE_EPOCH opt value to the differ, to limit the upper bound of the file timestamps and set the whiteout timestamps.

With this commit, the following workarounds mentioned in docs/build-repro.md are no longer needed for reproducible builds:

# Limit the timestamp upper bound to SOURCE_DATE_EPOCH.
# Workaround for https://github.com/moby/buildkit/issues/3180
ARG SOURCE_DATE_EPOCH
RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference
# Squash the entire stage for resetting the whiteout timestamps.
# Workaround for https://github.com/moby/buildkit/issues/3168
FROM scratch
COPY --from=0 / /

Limitations:

Backup

ac6f97b (withdrew due to dependency on ctx)

cache/blobs.go Outdated Show resolved Hide resolved
cache/blobs.go Outdated
@@ -62,6 +63,8 @@ func (sr *immutableRef) computeBlobChain(ctx context.Context, createIfNeeded boo
}

func computeBlobChain(ctx context.Context, sr *immutableRef, createIfNeeded bool, comp compression.Config, s session.Group, filter map[string]struct{}) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is safe. If the blob already exists, then just the blob is used without any timestamp check. If the first build uses one epoch and the second uses another or no epoch at all then the second build gets epoch from the first build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now validated in cache/blobs.go:blobExistsWithSourceDateEpoch()

@AkihiroSuda AkihiroSuda marked this pull request as draft February 8, 2023 05:52
@ReillyBrogan
Copy link

Is this and #3263 still blocked?

@AkihiroSuda
Copy link
Member Author

Is this and #3263 still blocked?

Yes, will try to update next week, sorry

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 3 times, most recently from 6155189 to 45acbcd Compare March 7, 2023 10:54
@AkihiroSuda AkihiroSuda marked this pull request as ready for review March 7, 2023 10:54
@AkihiroSuda
Copy link
Member Author

Rebased

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 2 times, most recently from 75276f4 to 7121a39 Compare March 14, 2023 05:15
@AkihiroSuda
Copy link
Member Author

Rebased again

if cdAddress := sb.ContainerdAddress(); cdAddress != "" {
// https://github.com/containerd/containerd/commit/9c9f564a35df7990870a53a125afbf88ac412753
integration.CheckContainerdVersion(t, cdAddress, ">= 1.7.0-beta.1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good for containerd worker but I'm not sure what we should do for dockerd with containerd snapshotter enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe having a factory in the integration sandbox to return the containerd version depending on the worker if that makes sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it can be revisited in a separate PR?
The current code is passing the current CI.

@AkihiroSuda
Copy link
Member Author

Rebased

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior. I think it may be possible to use the DiffIDs in the image config that the frontend passes to the exporter to figure out which layers are from the base image and which are not.

Not for this PR, but we are also assuming that archive/tar and compress/gzip are deterministic. We at least need to add test coverage that would reveal possible changes in Go updates and give us option to vendor old library if needed.

})
}

if _, ok := filter[sr.ID()]; ok {
eg.Go(func() error {
_, err := g.Do(ctx, fmt.Sprintf("%s-%t", sr.ID(), createIfNeeded), func(ctx context.Context) (interface{}, error) {
if sr.getBlob() != "" {
if blobExistsWithSourceDateEpoch(sr, sourceDateEpoch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if there is no blob for epoch but there is one for another one, we will run the differ again? Is it a good idea? I think it would be better to do something similar to compressions there we convert a blob to another one.

@@ -305,6 +320,9 @@ func (sr *immutableRef) setBlob(ctx context.Context, desc ocispecs.Descriptor) (
sr.queueMediaType(desc.MediaType)
sr.queueBlobSize(desc.Size)
sr.appendURLs(desc.URLs)
if sourceDateEpoch != nil {
sr.queueSourceDateEpoch(*sourceDateEpoch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand how this works. Doesn't it just fix a ref to a specific epoch forever? Instead, it should track multiple variants of blobs that are all associated with a ref (similar to compressions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should track multiple variants of blobs that are all associated with a ref (similar to compressions).

Does that require putting the epoch value into an OCI descriptor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for the remote build-cache or for something else? In build-cache we already have "created time" but not exactly the same(although both would be epoch value). We might need a separate field for that.

If you think for the case of avoiding regenerating blobs for base image layers if they were already generated with the same epoch, then yes, I think saving epoch would be needed. Otoh, I don't think any user would want us to regenerate the layers for the base image, just to change the timestamps in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sr.queueSourceDateEpoch() is only called for new diffs, so it seems safe for now

@tonistiigi tonistiigi requested a review from ktock March 23, 2023 04:39
@ktock
Copy link
Collaborator

ktock commented Mar 23, 2023

Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior.

Maybe we can introcue an envvar to experimentally allow this behavour?

Possible another way is managing epoch variants as compression variants by making compression.Type aware of the source date epoch.

@AkihiroSuda AkihiroSuda marked this pull request as draft March 28, 2023 23:01
@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 3 times, most recently from 300f800 to 415a7ae Compare March 30, 2023 11:32
@AkihiroSuda
Copy link
Member Author

Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior. I think it may be possible to use the DiffIDs in the image config that the frontend passes to the exporter to figure out which layers are from the base image and which are not.

Updated PR to avoid converting the base images, by checking len(sr.getImageRefs()) > 0.
Please see cache/blobs.go:computeBlobChain().

Maybe we can introcue an envvar to experimentally allow this behavour?

I don't see a usecase for this, but it can be covered in a follow-up PR if somebody needs it.

@AkihiroSuda AkihiroSuda marked this pull request as ready for review March 30, 2023 13:16
@@ -65,9 +65,12 @@ See also the [documentation](/frontend/dockerfile/docs/reference.md#buildkit-bui

## Caveats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The limitation described in this PR should be documented somewhere?

Limitations:

containerd 1.7 is needed for the containerd worker mode. (containerd/containerd@9c9f564)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in docs/build-repro.md

… layers

Propagate the `build-arg:SOURCE_DATE_EPOCH` opt value to the differ,
to limit the upper bound of the file timestamps and set the whiteout timestamps.

With this commit, the following workarounds mentioned in `docs/build-repro.md`
are no longer needed for reproducible builds:

> ```dockerfile
> # Limit the timestamp upper bound to SOURCE_DATE_EPOCH.
> # Workaround for moby#3180
> ARG SOURCE_DATE_EPOCH
> RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference
> ```

> ```dockerfile
> # Squash the entire stage for resetting the whiteout timestamps.
> # Workaround for moby#3168
> FROM scratch
> COPY --from=0 / /
> ```

Limitations:
* containerd 1.7 is needed for the containerd worker mode.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

@tonistiigi Could you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants