New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm package
should produce bitwise-deterministic tar files
#3612
Comments
I think this is actually due to gzip and is intentional by design. Part of the gzip header has a mod time for whatever is compressed in the file, so by calling If you |
|
Following up on my previous comment, you can reproduce this yourself at home:
So in this example, I've shown that I've got stable/traefik, both in compressed and decompressed versions from the official stable/traefik chart. If I gzip this up, we should be able to assert that we come up with the same shasum because the contents are the same, right? Wrongo.
But the shasums of both the recently-gzipped tarball and the old shasum are still identical.
So in other words, if you want a deterministic shasum of the contents of the chart, I'd suggest decompressing the packages first before checking the shasum. Would that work for you? |
The timestamp issue in the default gzip settings appears to be irrelevant since there are only a few unique shas that appear in thousands of tests on Perhaps sorting the dependencies in a deterministic manner before serialization is all that is needed, but any help on where to put the sort would be appreciated. For now a workaround is to do a file diff on the decompressed packages after removing the dependency tars. This won't catch differences in dependencies except version differences (which can happen for local dependencies, e.g.), but will catch most differences until this can be patched. |
I was unable to reproduce this behaviour using a stable chart. Can you provide a sample chart that produces this behaviour? My steps to test this:
Every invocation of /shrug |
https://github.com/cmaher/nondeterministic-charts demonstrates this using I don't think stable/traefik is going to be affected by this, because it doesn't have any dependencies. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I found out a way to reproduce this. One thing that seems to be helpful in reproducing is having more than one dependency in the requirements file. Also, the large the values.yaml file the easier it is to reproduce as well.
|
/remove-lifecycle rotten |
We have been able to reproduce this issue with some of our local charts as well. |
The problem definitely manifests itself on macOS. None of the suggestions mentioned in this tread solve the issues. I've created a repo to demonstrate the problem and a workaround: https://github.com/adrian-gierakowski/helm-package-sha-test. |
The problem is due to the order of entries in the tar file. It is not a gzip issue but rather an issue that the two tar files, while the same size, actually have the files within them in a different order. The issue is that the POSIX directory enumeration API does not define a specific order for file names to be returned. There may be a "common" order but since sorting/ordering is not something that can be done generically (it is locale/user specific), the core filesystems are not forced to provide a consistent order. Result orders may depend on available cache entries or order of data returned from elevator algorithms that read from the disk or a number of other out-of-order execution behaviors. We noticed this specifically when multiple sub-charts were involved. The order of the sub-charts in the tar file were sometimes different. One form was more common (70% or so). Note that we had not noticed this problem until we had a chart that had sub-charts like this one has. That seems to be what triggered the whole problem or ordering within the tar file. Also note that the two tar files, when extracted, produce exactly the same files and directories - they just are in the tar file in a different order which will cause different hashes. You can see this with tar -t on each of the helm tgz (or extract tar) files.
|
Is this related to dependencies, or is the issue just increasingly more likely to occur the more files a chart contains?
Is there really not a way enforce this order in Go? @Michael-Sinz |
You can write your own code to take the enumeration and sort it. Pick your sort to be consistent across all locales.
Unix/Posix does not push costly operations onto all use cases when most cases do not need that - it assumes that you would compose one operation with another to get the desired result.
…__
Michael Sinz C Architect C <http://midori/> v C Microsoft
________________________________
From: Josh Dolitsky <notifications@github.com>
Sent: Wednesday, May 27, 2020 4:36 PM
To: helm/helm <helm@noreply.github.com>
Cc: Michael Sinz <Michael.Sinz@microsoft.com>; Mention <mention@noreply.github.com>
Subject: Re: [helm/helm] `helm package` should produce bitwise-deterministic tar files (#3612)
Is this related to dependencies, or is the issue just increasingly more likely to occur the more files a chart contains?
The issue is that the POSIX directory enumeration API does not define a specific order for file names to be returned. There may be a "common" order but since sorting/ordering is not something that can be done generically (it is locale/user specific), the core filesystems are not forced to provide a consistent order. Result orders may depend on available cache entries or order of data returned from elevator algorithms that read from the disk or a number of other out-of-order execution behaviors.
Is there really not a way enforce this order in Go? @Michael-Sinz<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMichael-Sinz&data=02%7C01%7CMichael.Sinz%40microsoft.com%7C7f787afb198646f162fc08d80296c2ef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637262193827632571&sdata=5yvN1d7V5r1DYfxWZdVYeawCsSmPCOws73GJ68bY0m8%3D&reserved=0>
―
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhelm%2Fhelm%2Fissues%2F3612%23issuecomment-634999466&data=02%7C01%7CMichael.Sinz%40microsoft.com%7C7f787afb198646f162fc08d80296c2ef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637262193827632571&sdata=nE1BhhEAVlJaYWnLVFMs3dqW9sd89O%2Bw03m9jZi7IGE%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIZINKSLFWKEXWDAUN3OGL3RTWPXJANCNFSM4ETXKHGQ&data=02%7C01%7CMichael.Sinz%40microsoft.com%7C7f787afb198646f162fc08d80296c2ef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637262193827642528&sdata=9D784OITV0hBpmnLnLvUDIByZo%2FevAgnuYKA%2B5blDh4%3D&reserved=0>.
|
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
Is there no progress on this issue. The lack of deterministic output for constant input is a big problem. It means that helm package is not deterministic and we can not depend on it producing the same results. Imaging if a compiler could make different code for the same source code from run to run - how would you debug such a program? |
Agreed. We've had to make clumsy workarounds to only package the chart if one of the source files changes. One issue is the embedded time stamps. |
@Michael-Sinz I have not seen any activity from the community on this bug. Please feel free to work on a fix! |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
We still would like to have deterministic helm chart building. The fact that the same helm chart from the same source tree from the same git hash with the same helm version does not produce deterministic hashes is concerning. |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
Deterministic output is the goal of the Reproducible builds project. It has a recommended tar example: # requires GNU Tar 1.28+
$ tar --sort=name \
--mtime="@${SOURCE_DATE_EPOCH}" \
--owner=0 --group=0 --numeric-owner \
--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
-cf product.tar build
But as Lines 205 to 228 in f41f46c
|
Hi, we are really waiting for this issue to be fixed. The real advantage of using OCI is to make use of the delta changes in a chart that are only committed to the repository and thus save space at the repository heavily. We find this feature very useful in OCI container images. Without this feature, helm OCI support is will not be full. |
Any updates? |
I can reproduce it like this: $ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.19.1"}
$ helm create foo
Creating foo
$ for i in {1..10}; do helm package foo >/dev/null; md5sum foo-0.1.0.tgz; sleep 0.5; done
39e2b6bff940ed840873926f271d72f8 foo-0.1.0.tgz
f34d771d8039d77c9b76340c7940ab55 foo-0.1.0.tgz
f34d771d8039d77c9b76340c7940ab55 foo-0.1.0.tgz
afe15ee8c34f069e967e88e70c7bbd29 foo-0.1.0.tgz
afe15ee8c34f069e967e88e70c7bbd29 foo-0.1.0.tgz
7c1b20acd7a9a73944af13fec2303cf0 foo-0.1.0.tgz
7c1b20acd7a9a73944af13fec2303cf0 foo-0.1.0.tgz
16cef47516265bed7f8dbd9ef145042c foo-0.1.0.tgz
16cef47516265bed7f8dbd9ef145042c foo-0.1.0.tgz
1532cb3b9a5d828e19d74b034caebace foo-0.1.0.tgz I tried setting the |
FWIW the example from reproducible builds worked well for me, but that obviously requires custom scripting, which is not ideal. |
Fix for the above has been pending review/merge. Not sure what the blocker is TBH: |
We have a CI process for publishing charts where we try to determine if pushed repository needs to have it's charts built and published. We would like to do this with a bitwise comparison of the generated tgz files. This works for charts without dependencies, but it is inconsistent for charts with dependencies.
For the setup (Using v2.8.1):
With charts:
Running:
produces several (3, after a significant number of iterations) different shas. The shas also appear to be non-uniformly distributed (e.g. 1 sha appears 70% of the time).
My guess is that https://github.com/kubernetes/helm/blob/master/pkg/chartutil/save.go#L160 is iterating over dependencies in a non-deterministic order, thus producing different tar files.
The text was updated successfully, but these errors were encountered: