Save layers using their digest (diff-id) #97

rcowsill · 2020-12-20T12:20:11Z

rcowsill
Dec 20, 2020

This action splits the extracted output of docker save into a separate cache for each layer.tar file, plus one "root cache" containing all the metadata. Currently the layer.tar cache keys contain the layer's folder name from the extracted output.

From looking into #75 it seems that there are often multiple layer folders with the same layer.tar file, but different metadata. The action uploads a cache for every folder, even if it shares the same layer.tar as another folder.

On Linux this doesn't waste too much space; the extracted output uses symlinks to avoid duplication. Unfortunately that causes issue #75, because a layer's content might be uploaded as a symlink in one run and then can't be uploaded in full later. On Windows the extracted output contains full copies of every layer.tar and each is cached in full.

These issues could be avoided by using the SHA256 digest of each layer.tar in its cache key. Doing this would cache one copy of each unique layer.tar under a canonical ID.

Fortunately the extracted output already contains the SHA256 digests of the layer.tar files. In manifest.json there's a Config entry for every image, and that specifies another json file with the digests for each layer.tar in the image (aka "diff_ids"). These files can be used to map out which layer folders have the same layer.tar. The action could store one copy of each layer.tar keyed by its digest, then delete the remaining duplicates from the root. After loading the root cache the layer map could be used to restore layer.tar files to all the folders they were originally found in.

I'm testing a WIP version of this in my fork and am happy to submit a pull request if desired.

tshakah · 2023-01-25T13:41:07Z

tshakah
Jan 25, 2023

This still fails with the updated fork - could you create a PR with your fix there?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save layers using their digest (diff-id) #97

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Save layers using their digest (diff-id) #97

rcowsill Dec 20, 2020

Replies: 1 comment

tshakah Jan 25, 2023

rcowsill
Dec 20, 2020

tshakah
Jan 25, 2023