-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache chart layers without timestamp #8216
Conversation
d94a092
to
191fb64
Compare
Currently, whenever the chart is printed, the digest of the .tar.gz content layer is printed as the digest. The manifest digest is important for OCI purposes, particularly in pushing to a registry. These changes build on top of PR helm#8216, which is more critical and should be merged first. Resolves helm#8248. Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
This commit allows for the creation of a tar.gz archive of a chart without a timestamp. Doing so ensures that charts with identical content will have identical digests, which is important for storage in a registry. Effort was made to ensure the functionality of `helm package` was not affected. Additionally, a unit test for the registry cache's StoreReference function, to help track this bug. Resolves helm#8212 Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
191fb64
to
521c763
Compare
@bacongobbler This PR has been updated to reflect the changes you requested in #8249 . Please let me know if anything further is needed here. |
Currently, whenever the chart is printed, the digest of the .tar.gz content layer is printed as the digest. The manifest digest is important for OCI purposes, particularly in pushing to a registry. These changes build on top of PR helm#8216, which is more critical and should be merged first. Resolves helm#8248. Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Currently, whenever the chart is printed, the digest of the .tar.gz content layer is printed as the digest. The manifest digest is important for OCI purposes, particularly in pushing to a registry. These changes build on top of PR helm#8216, which is more critical and should be merged first. Resolves helm#8248. Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
@bacongobbler again updated based on comments made on PR #8249 |
@@ -288,7 +288,7 @@ func (cache *Cache) saveChartConfig(ch *chart.Chart) (*ocispec.Descriptor, bool, | |||
func (cache *Cache) saveChartContentLayer(ch *chart.Chart) (*ocispec.Descriptor, bool, error) { | |||
destDir := filepath.Join(cache.rootDir, ".build") | |||
os.MkdirAll(destDir, 0755) | |||
tmpFile, err := chartutil.Save(ch, destDir) | |||
tmpFile, err := chartutil.SaveWithOpts(ch, destDir, chartutil.WithoutTimestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be changed for both helm package
and helm dep up
as well? Why should this be a special case just for helm chart save
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 92 in 65ee1e0
name, err := chartutil.Save(ch, dest) |
helm/pkg/downloader/manager.go
Line 711 in 65ee1e0
_, err = chartutil.Save(ch, destPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bacongobbler I can't speak for helm dep up
, but as for helm package
, the timestamp was deliberately introduced in Helm v3.0.3 (specifically PR #4161) to resolve issue #4158.
Let me know your thoughts on this -- I did not want to break someone else's fix, but am happy to do as you ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so coming back to the original question... Why should we make a special case for helm chart save
? If users expect that their files are timestamped correctly within the package, why should helm chart save
be any different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I ask this is because we have been talking about using OCI as the standard protocol for distributing charts. The changes we make here are likely going to be used by community members in future Helm releases.
If #4158 was opened for helm package
and helm dep up
, why wouldn't someone from the community open a similar issue against helm chart save
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I ask this is because we have been talking about using OCI as the standard protocol for distributing charts. The changes we make here are likely going to be used by community members in future Helm releases.
If #4158 was opened for
helm package
andhelm dep up
, why wouldn't someone from the community open a similar issue againsthelm chart save
?
@bacongobbler The OCI specification is such that the digest of each layer (or blob) is determined by its content. The digests of the individual layers "bubble up" to affect the digest of the manifest, as well. This is used for registry-side deduplication of both blobs and manifests, as well as identification. If two charts contain the exact same content but a different digest (due to the timestamp), it causes OCI deduplication and identification efforts to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that only an issue if you're rebuilding the chart from source via helm chart save
? If you're fetching and pushing the same content across multiple registries via helm chart pull
and helm chart push
, the digest remains the same.
The same argument could be made for helm package
and chart provenance as well. If you rebuild the package from source, the chart's signature will fail verification because the digest changed. I would argue that's intentional behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UX of helm chart save
is designed to mimic that of docker build
.
Here's an example Dockerfile
and index.html
:
# Dockerfile
FROM scratch
ADD index.html /index.html
<!-- index.html -->
<html><h1>Hello World</h1></html>
Now we run docker build
:
$ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM scratch
--->
Step 2/2 : ADD index.html /index.html
---> 49c8907c89fa
Successfully built 49c8907c89fa
Now wait 1 minute, and run the command again without modifying either file:
$ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM scratch
--->
Step 2/2 : ADD index.html /index.html
---> Using cache
---> 49c8907c89fa
Successfully built 49c8907c89fa
Notice that the resulting image built has the same digest (49c8907c89fa
).
Now in the context of Helm - why, if we have not made any modification to any files within a chart directory, should new digests be generated dependent on the time which they are saved?
If you're fetching and pushing the same content across multiple registries via helm chart pull and helm chart push, the digest remains the same
This is correct, however it fails if I decide to republish/mirror these charts into my own private registry. There needs to be some indication that 2 charts across separate registries are indeed identical, and the digest is what we are using to determine that.
There may be a way around this if we decide to decouple helm chart save
from the tar-ing process (i.e. first run helm package mychart/
then helm chart save mychart-1.0.0.tgz
, using the raw .tgz for the layer), but I do not think this is necessarily a user-friendly approach.
I dont know the correct answer, but I do see this PR as fixing a registry-related regression introduced in 3.0.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why, if we have not made any modification to any files within a chart directory, should new digests be generated dependent on the time which they are saved?
I'm not arguing whether the digests are generated based on the time that helm package
is called. I am arguing whether the digests are generated based on the time the files have been modified. There's a slight nuance in behaviour here.
#4158 was opened because the user expected the timestamps of their files to be preserved during a helm package
similar to tar -c
.
In regards to docker build
's behaviour... That's another matter entirely. They cache the underlying build context. If the files did not change in your local directory since the last build, docker build
re-uses the same build context.
Taking your same example Dockerfile here, but instead let's add the --no-cache
flag to demonstrate:
><> cat Dockerfile
# Dockerfile
FROM scratch
ADD index.html /index.html
><> cat index.html
<!-- index.html -->
<html><h1>Hello World</h1></html>
><> docker build .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM scratch
--->
Step 2/2 : ADD index.html /index.html
---> Using cache
---> 1eb4e4f55090
Successfully built 1eb4e4f55090
><> docker build --no-cache .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM scratch
--->
Step 2/2 : ADD index.html /index.html
---> 6b92fac2d0a8
Successfully built 6b92fac2d0a8
><>
Notice that the resulting digest differed from the first call to docker build
. This is because docker build
checks the mtime of the build context to compute the build context, and Docker re-computed the package digest based on that timestamp.
In other words, if you invalidate the cache (the build context which docker build
runs from) and call docker build
using the exact same files, it will result in a new digest. Because the mtime changed.
If docker didn't care about the mtime of the build context, the resulting digest would've been the same for me as it would be for you.
That is why I'm against this change - every other packaging solution handles this in the same manner. tar
does this. docker build
does this. helm package
does this. mtime is preserved from the files into the package. Why can't helm chart save
do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to have helm package
produce bitwise-deterministic packages, then I'd suggest attempting to fix #3612. Stripping timedata from the underlying files isn't what's causing this issue.
As reported in that issue's first comment, helm package
will produce the same SHA 70% of the time. The other 30% are due to files being in another order within the tar package.
This was also clearly pointed out earlier in #3612 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested this with the test case provided in #3612: https://github.com/cmaher/nondeterministic-charts
$ export HELM_EXPERIMENTAL_OCI=1
$ ./helm version
version.BuildInfo{Version:"v3.2+unreleased", GitCommit:"d862cd9843663fc9cd621fcbb2ad8cdef90e2b1e", GitTreeState:"clean", GoVersion:"go1.14.4"}
$ cd parent
$ helm dep up
$ for i in {1..1000}; do ../helm chart save . helm-test-8216:$i; done
Here's the result.
$ helm chart list | awk '{ print $4 }' | sort | uniq
28be032
71c690a
95b90db
DIGEST
Please have a look at the assertions being made in #3612. Stripping the timestamp alone will not produce bitwise-deterministic package checksums.
This PR does not solve the problem this PR is attempting to fix. Please try out the suggestions made in #3612 to see if it will result in bitwise-deterministic packages. Based on some of the comments, I believe that fixing it in this way will get you the results you're trying to achieve.
removing from the milestone. |
closing as we're planning to remove the OCI cache, working around the need for this as users will be pushing packages built with |
What this PR does / why we need it:
This commit allows for the creation of a tar.gz archive of a chart
without a timestamp. Doing so ensures that charts with identical content
will have identical digests, which is important for storage in a
registry. Effort was made to ensure the functionality of
helm package
was not affected.
Additionally, a unit test for the registry cache's StoreReference
function, to help track this bug.
closes #8212
Special notes for your reviewer:
If applicable: