-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/image/export: Allow generating deterministic tarballs #2904
Conversation
Instead of disabling CreatedAt, could you pick the last mtime of the source files (gadget.yaml, program.bpf.c, etc.)? |
Do you mean in all cases? Or only when --disable-created-at is passed? (only issue I see in the future is understanding which file to use, there are going to be gadgets with program.bpf.c and program.go, etc. or any of them) |
First off: neat. Reproducibility helps make things more predictable and usually has ripple benefits. Question: just how deterministic to you want or need these archives to be? The approach is likely "good enough" for some simple use cases, but any slightly more complicated use cases like changes within the versions of golang So, it all depends on your use-cases |
I'd like the export command to generate a deterministic tarball, so we only care about the implementation in golang and not other libraries. How does your library work? Is it just a matter of using |
6270935
to
476b7f5
Compare
6049284
to
60351dc
Compare
the tar-split library is a stream reader that you can put inline before you would use |
7f7452d
to
9a1a7d0
Compare
476b7f5
to
9ca3c41
Compare
Updated the logic to use this when a --deterministic flag is passed. |
@@ -331,6 +380,10 @@ func tarFolderToFile(src, filePath string) error { | |||
return err | |||
} | |||
|
|||
header.ModTime = headerTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'll say I don't recall the particulars about filepath.Walk()
logic. With just a couple of files it may not matter, but could also be a source of non-determinism. That may want a little investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation says:
The files are walked in lexical order, which makes the output deterministic but requires Walk to read an entire directory into memory before proceeding to walk that directory.
So I think we're fine or are you thinking about something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really depends on what we are walking on.
If we walk only on the files of interest, they should not change between several build and should produce the same OCI images.
But, if we are walking all the files, then adding or removing one can have impact on the order of walking and act as a side effect.
In our specific case, I guess this is OK because the directory is temporary one and is used as OCI store, so it should only contains files of interest.
9ca3c41
to
f87cbfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I am not sure not understand the usage of EBPFSource
.
Best regards.
a91af8d
to
e87f983
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also, shouldn't it be the default?
I'm not sure. I think it's fine to have this disable by default so the created annotation is the time when the image was built.
I am also tempted to make it the default, but I don't have a strong opinion.
e87f983
to
b5cad43
Compare
I'll mark this on hold for a bit, perhaps having the --deterministic flag on the build command is a bad idea. I'll check how it's done in docker. |
I would not add a flag, and if this is not breaking anything than I would roll forward with it. |
b5cad43
to
f942b0f
Compare
I didn't find a way to remove the flag. Having this by default would imply that the Btw, I checked and docker doesn't support reproducible builds (yet?). |
|
On 25/06/24 14:52 -0700, Mauricio Vásquez wrote:
Btw, I checked and docker doesn't support reproducible builds (yet?).
no. Only reproducible once the image is pushed to a registry. The
reproducibility is only when pulled, that it can again be pushed and not
mutate.
|
f942b0f
to
43a49da
Compare
Thanks for the pointer. Now I understood the piece I was missing: https://reproducible-builds.org/docs/source-date-epoch/. I updated the logic to use this variable, and now the --deterministic flag is gone. @alban @eiffel-fl could you please check it again given that I did some changes? (I also updated a missing piece of logic to iterate in order through a map, it was also causing the images to be not deterministic) |
The build command now supports the SOURCE_DATE_EPOCH env variable[0] to allow reproducible builds. Also, the export command is changed to always sort the index to guarentee the exported tarball is deterministic. [0]: https://reproducible-builds.org/docs/source-date-epoch/ Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
43a49da
to
6095949
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I have one small comment with regard to time parsing.
Otherwise it looks good to me and I definitely think this is a great addition.
Best regards.
} | ||
|
||
if sourceDateEpoch, ok := os.LookupEnv("SOURCE_DATE_EPOCH"); ok { | ||
sde, err := strconv.ParseInt(sourceDateEpoch, 10, 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rather use a time related function to parse this value?
Or since this is always seconds this is OK to use "simple" string conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use the string conversion as the goal is to convert from string to number. I don't think the time package provides any function that will take an unix time as a string.
Btw, this parsing code was taken from https://reproducible-builds.org/docs/source-date-epoch/.
@@ -331,6 +380,10 @@ func tarFolderToFile(src, filePath string) error { | |||
return err | |||
} | |||
|
|||
header.ModTime = headerTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really depends on what we are walking on.
If we walk only on the files of interest, they should not change between several build and should produce the same OCI images.
But, if we are walking all the files, then adding or removing one can have impact on the order of walking and act as a side effect.
In our specific case, I guess this is OK because the directory is temporary one and is used as OCI store, so it should only contains files of interest.
Thanks for the review and all comments folks, I'll merge it now! |
The build command now supports the SOURCE_DATE_EPOCH env variable[0] to allow reproducible builds. Also, the export command is changed to always sort the index to guarentee the exported tarball is deterministic.
Testing
Before
After
This will be useful for #2853 and for tests artifacts generated in #2818