Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/image/export: Allow generating deterministic tarballs #2904

Merged
merged 2 commits into from
Jun 26, 2024

Conversation

mauriciovasquezbernal
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal commented May 27, 2024

The build command now supports the SOURCE_DATE_EPOCH env variable[0] to allow reproducible builds. Also, the export command is changed to always sort the index to guarentee the exported tarball is deterministic.

Testing

Before

# IG_SOURCE_PATH is needed to build in-tree gadgets that contain a wasm module
$ export IG_SOURCE_PATH=<absolute path to IG source location>

$ cd gadgets/trace_open # any other gadget works as well

# multiple invocations of build produce an image with a different digest each time
$ sudo -E ig image build . -t trace_open:last
INFO[0000] Experimental features enabled
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:last@sha256:a00e322b2ad607cfb7051dbe241f976b96a189fe7618fc5f4a48e65ffa2fa518

$ sudo -E ig image build . -t trace_open:last
INFO[0000] Experimental features enabled
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:last@sha256:6c530558cb88a5bf3e6be89866477e819111d5af2e077d192c86e42fe05f7946

# exporting those images will for sure produce a tar file with changing digest a well.

After

# IG_SOURCE_PATH is needed to build in-tree gadgets that contain a wasm module
$ export IG_SOURCE_PATH=<absolute path to IG source location>

# the digest is constant when setting SOURCE_DATE_EPOCH

$ SOURCE_DATE_EPOCH=0 sudo -E ig image build . -t trace_open:latest
INFO[0000] Experimental features enabled
Pulling builder image ghcr.io/inspektor-gadget/ebpf-builder:latest
latest: Pulling from inspektor-gadget/ebpf-builder
Digest: sha256:aa544c4316d583ad7ac4a500b97b67cfbb22e7f3125f8a9a002a4eaedf7f45ef
Status: Image is up to date for ghcr.io/inspektor-gadget/ebpf-builder:latest
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:latest@sha256:faec585ef38bf306497bb3989753fe9657c05a0fc14d0fabd590ef5ec4aab0b3

$ SOURCE_DATE_EPOCH=0 sudo -E ig image build . -t trace_open:latest
INFO[0000] Experimental features enabled
Pulling builder image ghcr.io/inspektor-gadget/ebpf-builder:latest
latest: Pulling from inspektor-gadget/ebpf-builder
Digest: sha256:aa544c4316d583ad7ac4a500b97b67cfbb22e7f3125f8a9a002a4eaedf7f45ef
Status: Image is up to date for ghcr.io/inspektor-gadget/ebpf-builder:latest
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:latest@sha256:faec585ef38bf306497bb3989753fe9657c05a0fc14d0fabd590ef5ec4aab0b3

# the exported images are constant as well
$ SOURCE_DATE_EPOCH=0 sudo -E ig image build . -t trace_open:latest
INFO[0000] Experimental features enabled
Pulling builder image ghcr.io/inspektor-gadget/ebpf-builder:latest
latest: Pulling from inspektor-gadget/ebpf-builder
Digest: sha256:aa544c4316d583ad7ac4a500b97b67cfbb22e7f3125f8a9a002a4eaedf7f45ef
Status: Image is up to date for ghcr.io/inspektor-gadget/ebpf-builder:latest
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:latest@sha256:faec585ef38bf306497bb3989753fe9657c05a0fc14d0fabd590ef5ec4aab0b3
$ sudo -E ig image export trace_open:latest /tmp/image1.tar
INFO[0000] Experimental features enabled
Successfully exported images to /tmp/image1.tar

$ SOURCE_DATE_EPOCH=0 sudo -E ig image build . -t trace_open:latest
INFO[0000] Experimental features enabled
Pulling builder image ghcr.io/inspektor-gadget/ebpf-builder:latest
latest: Pulling from inspektor-gadget/ebpf-builder
Digest: sha256:aa544c4316d583ad7ac4a500b97b67cfbb22e7f3125f8a9a002a4eaedf7f45ef
Status: Image is up to date for ghcr.io/inspektor-gadget/ebpf-builder:latest
Successfully built ghcr.io/inspektor-gadget/gadget/trace_open:latest@sha256:faec585ef38bf306497bb3989753fe9657c05a0fc14d0fabd590ef5ec4aab0b3
$ sudo -E ig image export trace_open:latest /tmp/image2.tar
INFO[0000] Experimental features enabled
Successfully exported images to /tmp/image2.tar

$ sha256sum /tmp/image1.tar /tmp/image2.tar
c3d83221dbda9770558a82cd85c6d1e93891c9dc0e111f5fdd9cd1926459bc54  /tmp/image1.tar
c3d83221dbda9770558a82cd85c6d1e93891c9dc0e111f5fdd9cd1926459bc54  /tmp/image2.tar
mauriciov@mvb-desktop:~/.../trace_open$

This will be useful for #2853 and for tests artifacts generated in #2818

@alban
Copy link
Member

alban commented May 27, 2024

Instead of disabling CreatedAt, could you pick the last mtime of the source files (gadget.yaml, program.bpf.c, etc.)?

cc @vbatts for https://github.com/vbatts/tar-split

@mauriciovasquezbernal
Copy link
Member Author

Instead of disabling CreatedAt, could you pick the last mtime of the source files (gadget.yaml, program.bpf.c, etc.)?

Do you mean in all cases? Or only when --disable-created-at is passed?

(only issue I see in the future is understanding which file to use, there are going to be gadgets with program.bpf.c and program.go, etc. or any of them)

@mauriciovasquezbernal mauriciovasquezbernal changed the title cm/image/export: Allow generating deterministic tarballs cmd/image/export: Allow generating deterministic tarballs May 27, 2024
@vbatts
Copy link
Contributor

vbatts commented May 28, 2024

image

@vbatts
Copy link
Contributor

vbatts commented May 28, 2024

First off: neat. Reproducibility helps make things more predictable and usually has ripple benefits.

Question: just how deterministic to you want or need these archives to be?

The approach is likely "good enough" for some simple use cases, but any slightly more complicated use cases like changes within the versions of golang archive/tar, differences between golang vs libarchive vs gnu vs bsd implementations, etc. then this is not far enough for reproducibility.
Further sometimes even subtle issues like the xattr metadata is a key-value dictionary that is not order dependent, so sometimes with multiple xattrs like selinux or IMA hash or cap permissions, the order changes everytime, and the hashes of the tar will never be consistent.

So, it all depends on your use-cases

@mauriciovasquezbernal
Copy link
Member Author

I'd like the export command to generate a deterministic tarball, so we only care about the implementation in golang and not other libraries. How does your library work? Is it just a matter of using github.com/vbatts/tar-split/archive/tar instead of archive/tar? or do I need to do something else?

@vbatts
Copy link
Contributor

vbatts commented May 31, 2024

the tar-split library is a stream reader that you can put inline before you would use archive/tar, and it builds something like a reply-log so you can reassemble the exact tarball from what has been extracted.
It doesn't sound like it matches your use-case.
Perhaps what you've done is simple and good enough.

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/test-import-export branch 3 times, most recently from 7f7452d to 9a1a7d0 Compare May 31, 2024 17:02
Base automatically changed from mauricio/test-import-export to main May 31, 2024 17:55
@mauriciovasquezbernal
Copy link
Member Author

mauriciovasquezbernal commented May 31, 2024

Instead of disabling CreatedAt, could you pick the last mtime of the source files (gadget.yaml, program.bpf.c, etc.)?

Updated the logic to use this when a --deterministic flag is passed.

@@ -331,6 +380,10 @@ func tarFolderToFile(src, filePath string) error {
return err
}

header.ModTime = headerTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'll say I don't recall the particulars about filepath.Walk() logic. With just a couple of files it may not matter, but could also be a source of non-determinism. That may want a little investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says:

The files are walked in lexical order, which makes the output deterministic but requires Walk to read an entire directory into memory before proceeding to walk that directory.

So I think we're fine or are you thinking about something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on what we are walking on.
If we walk only on the files of interest, they should not change between several build and should produce the same OCI images.
But, if we are walking all the files, then adding or removing one can have impact on the order of walking and act as a side effect.

In our specific case, I guess this is OK because the directory is temporary one and is used as OCI store, so it should only contains files of interest.

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I am not sure not understand the usage of EBPFSource.

Best regards.

cmd/common/image/build.go Outdated Show resolved Hide resolved
cmd/common/image/build.go Outdated Show resolved Hide resolved
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/stable-export branch 2 times, most recently from a91af8d to e87f983 Compare June 13, 2024 12:35
pkg/oci/oci.go Outdated Show resolved Hide resolved
Copy link
Member

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


Also, shouldn't it be the default?

I'm not sure. I think it's fine to have this disable by default so the created annotation is the time when the image was built.

I am also tempted to make it the default, but I don't have a strong opinion.

@mauriciovasquezbernal
Copy link
Member Author

I'll mark this on hold for a bit, perhaps having the --deterministic flag on the build command is a bad idea. I'll check how it's done in docker.

@vbatts
Copy link
Contributor

vbatts commented Jun 25, 2024

I would not add a flag, and if this is not breaking anything than I would roll forward with it.

@mauriciovasquezbernal
Copy link
Member Author

I didn't find a way to remove the flag. Having this by default would imply that the org.opencontainers.image.created annotation is set to the the latest mod time of the source files, which I don't think is correct. So I'd rather stick to have a specific flag for it.

Btw, I checked and docker doesn't support reproducible builds (yet?).

@eiffel-fl
Copy link
Member

Btw, I checked and docker doesn't support reproducible builds (yet?).

buildkit/buildx, which we use to bake the arm64 image, permits reproducible images:
https://github.com/moby/buildkit/blob/master/docs/build-repro.md
https://docs.docker.com/build/ci/github-actions/reproducible-builds/

@vbatts
Copy link
Contributor

vbatts commented Jun 26, 2024 via email

@mauriciovasquezbernal
Copy link
Member Author

buildkit/buildx, which we use to bake the arm64 image, permits reproducible images:
https://github.com/moby/buildkit/blob/master/docs/build-repro.md
https://docs.docker.com/build/ci/github-actions/reproducible-builds/

Thanks for the pointer. Now I understood the piece I was missing: https://reproducible-builds.org/docs/source-date-epoch/.

I updated the logic to use this variable, and now the --deterministic flag is gone.

@alban @eiffel-fl could you please check it again given that I did some changes?

(I also updated a missing piece of logic to iterate in order through a map, it was also causing the images to be not deterministic)

integration/components/image/image_test.go Outdated Show resolved Hide resolved
docs/core-concepts/images.md Outdated Show resolved Hide resolved
The build command now supports the SOURCE_DATE_EPOCH env variable[0] to
allow reproducible builds. Also, the export command is changed to
always sort the index to guarentee the exported tarball is deterministic.

[0]: https://reproducible-builds.org/docs/source-date-epoch/

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I have one small comment with regard to time parsing.
Otherwise it looks good to me and I definitely think this is a great addition.

Best regards.

}

if sourceDateEpoch, ok := os.LookupEnv("SOURCE_DATE_EPOCH"); ok {
sde, err := strconv.ParseInt(sourceDateEpoch, 10, 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rather use a time related function to parse this value?
Or since this is always seconds this is OK to use "simple" string conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use the string conversion as the goal is to convert from string to number. I don't think the time package provides any function that will take an unix time as a string.

Btw, this parsing code was taken from https://reproducible-builds.org/docs/source-date-epoch/.

pkg/oci/build.go Show resolved Hide resolved
@@ -331,6 +380,10 @@ func tarFolderToFile(src, filePath string) error {
return err
}

header.ModTime = headerTime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on what we are walking on.
If we walk only on the files of interest, they should not change between several build and should produce the same OCI images.
But, if we are walking all the files, then adding or removing one can have impact on the order of walking and act as a side effect.

In our specific case, I guess this is OK because the directory is temporary one and is used as OCI store, so it should only contains files of interest.

@mauriciovasquezbernal
Copy link
Member Author

Thanks for the review and all comments folks, I'll merge it now!

@mauriciovasquezbernal mauriciovasquezbernal merged commit f687e2a into main Jun 26, 2024
64 of 65 checks passed
@mauriciovasquezbernal mauriciovasquezbernal deleted the mauricio/stable-export branch June 26, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants