Hi
This follows a short discussion on IRC with @duglin
Goal: Minimize docker images
Method: Not copying files that already exist in the base image (or top layer), unless they are different
Use case:
We have a base image, 'multicloud/common'. It includes a whole bunch of stuff, including a set of files under /workdir/lib (a bunch of .jar files, to be specific).
We use maven to run our builds, and maven copies dependencies to ./lib, for both the 'common' project, and the projects that depend on it. One of these projects, for example, is called 'agent', and is packaged in a docker image called 'multicloud/agent', which is FROM multicloud/common.
As mentioned before, when building 'agent', maven copies all of its dependencies (files) to /lib, including all of those that have been packaged under multicloud/common's workdir/lib.
Of course, in the dockerfile for 'agent', I do a COPY (or ADD) ./lib /workdir/lib.
99% of the files copied, are exactly the same (CRC-wise, and probably timestamp-wise) as those that are already on the top layer of the docker image file system. However, copy-on-write adds them to a new layer, effectively increasing the docker image size (dramatically, in some cases).
It would be great, if docker build's ADD or COPY command, had something similar to 'cp -u' - or even better - something that would calculate CRC32 of the files and copy them only if changed.
IMO this could potentially dramatically decrease image sizes in many other use cases as well.
Hi
This follows a short discussion on IRC with @duglin
Goal: Minimize docker images
Method: Not copying files that already exist in the base image (or top layer), unless they are different
Use case:
We have a base image, 'multicloud/common'. It includes a whole bunch of stuff, including a set of files under /workdir/lib (a bunch of .jar files, to be specific).
We use maven to run our builds, and maven copies dependencies to ./lib, for both the 'common' project, and the projects that depend on it. One of these projects, for example, is called 'agent', and is packaged in a docker image called 'multicloud/agent', which is FROM multicloud/common.
As mentioned before, when building 'agent', maven copies all of its dependencies (files) to /lib, including all of those that have been packaged under multicloud/common's workdir/lib.
Of course, in the dockerfile for 'agent', I do a COPY (or ADD) ./lib /workdir/lib.
99% of the files copied, are exactly the same (CRC-wise, and probably timestamp-wise) as those that are already on the top layer of the docker image file system. However, copy-on-write adds them to a new layer, effectively increasing the docker image size (dramatically, in some cases).
It would be great, if docker build's ADD or COPY command, had something similar to 'cp -u' - or even better - something that would calculate CRC32 of the files and copy them only if changed.
IMO this could potentially dramatically decrease image sizes in many other use cases as well.