From 139d8763114e4a177445f9e5aa99d2b532c9663b Mon Sep 17 00:00:00 2001 From: Christian Richter Date: Fri, 7 May 2021 16:24:56 +0200 Subject: [PATCH 1/3] Add hip for download cache Co-authored-by: Matt Farina Signed-off-by: Matt Farina Signed-off-by: Christian Richter --- hips/hip-9999.md | 94 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 hips/hip-9999.md diff --git a/hips/hip-9999.md b/hips/hip-9999.md new file mode 100644 index 00000000..080ad0d8 --- /dev/null +++ b/hips/hip-9999.md @@ -0,0 +1,94 @@ +--- +hip: "9999" +title: "Download-cache for charts & provenance files" +authors: [ "Matt Farina " ] +created: "2021-20-20" +type: "feature" +status: "draft" +--- + +## Abstract + +This document describes the introduction of a download-cache for charts and provenance files to helm. This will speed up repeated operations and enable offline access to charts in repositories that have been previously pulled. + +## Motivation + +There are three primary motivations for leveraging a cache the impact [both the application operator and the application distributor](https://github.com/helm/community/blob/main/user-profiles.md). + +When repeatedly installing, upgrading, or showing information about a chart version within a repository, Helm will download the chart each time. When downloading the chart Helm currently puts the chart into the repositories cache directory (where index files are cached) but never leverages the cache. Helm cannot use the chart version in the cache because the identifier is the name and charts version (e.g., `wordpress-1.2.3.tgz`). A name and version is insufficient as the same chart and version could come from two or more different repositories and have different content. + +The first motivation is to make the cache useful by handling it in a manner where the downloaded chart's identifier can be make unique enough to be useful. + +The second motivation is around use and experience. Downloading the chart version each time an operation is run on a chart in a repository can lead to wasted time. For example, if one run two show commands (i.e., `helm show readme example-repo/foo` and `helm show values example-repo/foo`) the chart will be downloaded twice from the repository. This can be avoided with the use of a cache. + +The third motivation is for Helm repository owners. Cached charts means fewer downloads from repositories. As the Helm maintainers saw when operating their own repository, bandwidth use from a Helm repository can become large. There is a cost to that for the host. Reducing that number can have a real financial impact on the host. + +## Rationale + +The cache proposed here is designed to work with programatic use through the SDK and use existing information about charts for identifiers. + +### Digest Identifier + +The digest of a chart, used in provenance files and repository indexes, provides a unique identifier about a chart. A digest for a chart is a sha512 hash. This hash will be the identifier in the cache. + +Using the digest is both unique and useful for keeping the size of the cache minimal. If the same chart is in two repositories, such as when a chart is coppied from an upstream repository to an organizations internal repository, the hash is the same and the cached contents are the same. This means a cache based on this hash can be content addressable. + +### SDK Usage + +The Helm SDK is used by other applications to interact with instances running in clusters, charts, and repositories. Those applications should not be coupled to the way in which the Helm Client works with its storage (i.e. the file system). Applications using the SDK may want to have the cache reside in memory or in another system (e.g., object storage). The design of this cache system will use an interface and provide an implementation that uses the filesystem for the Helm Client cache. + +## Specification + +The `ChartDownloader` will be extended to providing the caching features. To do this, two new properties will be added to the `ChartDownloader` to accomodate a cache for both the chart archives and the provenance files. These can be stored in two separate locations. + +When the `ChartDownloader` resolves a chart in a repository it will also obtain the digest from the repository index. This piece of information is already available in the index. Using the digest, `ChartDownloader` will check if the chart archive and provenance file are already in the cache. If the chart archive is not in the cache the `ChartDownloader` will retrieve it and place it in the cache prior to returning as before. The same will happen for provenance files. If no provenance file is found on download the cache will be marked in a manner to note that none was available. If the files are in the cache they will be returned rather than downloading the files again. + +The [cache provided for the Helm client](https://github.com/rancher/sandbox/gofilecache) is based on the Go build cache and uses the file system. It uses the first two characters of the hash as the top directory followed by the rest of the hash for the second level directory. This is similar to the way in with Git stores objects on disk. Both the chart and provenance caches will live within the cache directory Helm already uses. + +The Helm Client would have two new environment variables and flags to specify the locations for the provenance files and chart archive cache. These would follow the same format as the current environment variable use for individual caching locations. + +## Backwards compatibility + +Helm charts are untouched in their structure. There is no impact to the charts themselves. + +Index files, where digests are looked up, remain untouched. There is no impact to the index files. + +The existing Helm cache directory is used for the cache storage. This location needs to exist for storing index files locally. There is no impact to the cache location. + +Those who use the `ChartDownloader` from the Helm SDK will have two new properties to define when instantiating it. + +## Security implications + +If a chart archive or a provenance file is in the cache it will be used instead of being downloaded. This means that a malicious user could place a chart archive in the cache in the location where the good one should be. When Helm looks up the chart listed in the repository it would retrieve the the malicious chart from the cache instead. + +For this to work the malicious user would need access to the system Helm is running on with write access to the cache. The current cache directory does not provide write access to those who are not the user or a system admin. That would mean the malicious user would need access as the user or a system admin to exploit this issue. + +To further mitigate the issue, `ChartDownloader` will check that the archive conforms to the digest that is being requested. If it does not the `ChartDownloader` will retrieve it from the source. + +This does mean that if the source chart archive does not have the same digest the one listed in the index it will be retrieved every time and bypass the cache. + +## How to teach this + +The Helm client will have a flag to bypass the cache. This will be in the client documentation alongside the other flags. + +## Reference implementation + +An implementation of the design described in this HIP is under development [here](https://github.com/dragonchaser/helm/tree/add_download_cache). + +## Rejected ideas + +There are two rejected ideas with regard to the cache. + +### Using The Current Repository Cache + +Charts are currently downloaded to the `$HELM_CACHE_HOME/repository` directory in the form `name-1.2.3.tgz`. This directory could be used to store chart archives and provenance files using the digest as the name. This is not ideal as users debugging issues would have many files and would not be able to easily identify the repositories. It would also make clearing the cache of charts and provenance files while preserving the repository cache more difficult. + +A better solution is to use the same caching methods as other disk based object caches. + +### Placing Files In The ~/.cache Directory + +One suggestion was to use the `~/.cache` directory which is common on POSIX systems to hold the cache. This does not follow the current caching directory used by Helm that works across platform. A better solution is to use the existing Helm cache location. + +## Open issues + +There are no open issues From 9dcf0e2ae089f405b4c39be34df875b7857d0aec Mon Sep 17 00:00:00 2001 From: Christian Richter Date: Mon, 7 Jun 2021 14:05:12 +0200 Subject: [PATCH 2/3] Rename file to hip-0013.md Signed-off-by: Christian Richter --- hips/{hip-9999.md => hip-0013.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename hips/{hip-9999.md => hip-0013.md} (99%) diff --git a/hips/hip-9999.md b/hips/hip-0013.md similarity index 99% rename from hips/hip-9999.md rename to hips/hip-0013.md index 080ad0d8..c93cadb9 100644 --- a/hips/hip-9999.md +++ b/hips/hip-0013.md @@ -1,5 +1,5 @@ --- -hip: "9999" +hip: "0013" title: "Download-cache for charts & provenance files" authors: [ "Matt Farina " ] created: "2021-20-20" From 7097cbae05f3daae58649abd509f7ae039e357a0 Mon Sep 17 00:00:00 2001 From: Christian Richter Date: Mon, 7 Jun 2021 17:48:18 +0200 Subject: [PATCH 3/3] Incorporate changes Signed-off-by: Christian Richter --- hips/hip-0013.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hips/hip-0013.md b/hips/hip-0013.md index c93cadb9..c1ed98cf 100644 --- a/hips/hip-0013.md +++ b/hips/hip-0013.md @@ -25,21 +25,21 @@ The third motivation is for Helm repository owners. Cached charts means fewer do ## Rationale -The cache proposed here is designed to work with programatic use through the SDK and use existing information about charts for identifiers. +The cache proposed here is designed to work with programmatic use through the SDK and use existing information about charts for identifiers. ### Digest Identifier The digest of a chart, used in provenance files and repository indexes, provides a unique identifier about a chart. A digest for a chart is a sha512 hash. This hash will be the identifier in the cache. -Using the digest is both unique and useful for keeping the size of the cache minimal. If the same chart is in two repositories, such as when a chart is coppied from an upstream repository to an organizations internal repository, the hash is the same and the cached contents are the same. This means a cache based on this hash can be content addressable. +Using the digest is both unique and useful for keeping the size of the cache minimal. If the same chart is in two repositories, such as when a chart is copied from an upstream repository to an organizations internal repository, the hash is the same and the cached contents are the same. This means a cache based on this hash can be content addressable. ### SDK Usage -The Helm SDK is used by other applications to interact with instances running in clusters, charts, and repositories. Those applications should not be coupled to the way in which the Helm Client works with its storage (i.e. the file system). Applications using the SDK may want to have the cache reside in memory or in another system (e.g., object storage). The design of this cache system will use an interface and provide an implementation that uses the filesystem for the Helm Client cache. +The Helm SDK is used by other applications to interact with instances running in clusters, charts, and repositories. Those applications should not be coupled to the way in which the Helm Client works with its storage (i.e. the file system). Applications using the SDK may want to have the cache reside in memory or in another system (e.g., object storage). The design of this cache system will use an interface and provide an implementation that uses the file system for the Helm Client cache. ## Specification -The `ChartDownloader` will be extended to providing the caching features. To do this, two new properties will be added to the `ChartDownloader` to accomodate a cache for both the chart archives and the provenance files. These can be stored in two separate locations. +The `ChartDownloader` will be extended to provide the caching features. To do this, two new properties will be added to the `ChartDownloader` to accommodate a cache for both the chart archives and the provenance files. These can be stored in two separate locations. When the `ChartDownloader` resolves a chart in a repository it will also obtain the digest from the repository index. This piece of information is already available in the index. Using the digest, `ChartDownloader` will check if the chart archive and provenance file are already in the cache. If the chart archive is not in the cache the `ChartDownloader` will retrieve it and place it in the cache prior to returning as before. The same will happen for provenance files. If no provenance file is found on download the cache will be marked in a manner to note that none was available. If the files are in the cache they will be returned rather than downloading the files again.