Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skipcache argument for Dockerfile ADD command, fill up all your whole disk space in no time #42864

Open
evandrocoan opened this issue Sep 19, 2021 · 2 comments

Comments

@evandrocoan
Copy link

evandrocoan commented Sep 19, 2021

Description

I am building a development docker image to run integration tests with https://github.com/pytest-docker-compose/pytest-docker-compose, with this, I always install big .tar.gz with my application when building my image before running the tests. The problem is that as my application tar.gz file is big, I easy endup with no free space because each time I build my application and run the tests, I unpack a new tar.gz file, which is cached on the disk when I create a new image (to run the tests after my changes).

If running docker build with --no-cache would be the solution, it would not be good because after I add my .tar.gz, I use a RUN --mount cache, to install a cached python wheel from Cython/C++ which is linked against my .tar.gz application I had run. I also tried running docker run with --no-cache, but even that command saves the cache.

ADD ./package/$TARGZNAME /
ADD ./MyPythonApp /MyPythonApp
WORKDIR /MyPythonApp
RUN --mount=type=cache,target=/cache/MyPythonApp /bin/bash -c "set -ex; \
    function clear() { \
        rm -rf ./build; \
        rm -rf ./MyPythonApp.egg-info; \
    }; \
    function install() { \
        python3 -m pip install /cache/MyPythonApp/dist/*.whl; \
    }; \
    function build() { \
        rm -rf ./dist; \
        python3 setup.py build_ext -j $(nproc) bdist_wheel; \
        clear; \
        rsync --recursive --verbose --delete --size-only --ignore-times /MyPythonApp /cache/; \
    }; \
    clear; \
    diff --recursive --no-dereference --exclude='*.cpp' --exclude='dist' . /cache/MyPythonApp || build; \
    install || { build && install; } \
    "

I cache the Python wheel because it takes time to build and I only need to build it when the Python source code changed (and not when my .tar.gz application source code changes, but the .tar.gz code/dlls are required to build and install the python wheel, that is why I put it after the ADD ... skipcache command).

This issue always happens when I use skipcache parameter to the ADD command. This was recommended on https://stackoverflow.com/a/58801213/ (Disable cache for specific RUN commands)

Use

ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

before the RUN line you want to always run. This works because ADD will always fetch the file/URL and the above URL generates random data on each request, Docker then compares the result to see if it can use the cache.

I have also tested this and works nicely since it does not require any additional Docker command line arguments and also works from a Docker-compose.yaml file :)

Someone also commented on the answer:

not sure why noone highlighted yet, but this does add a layer with a useless skipcache file.
using ARG CACHEBUST + RUN echo $CACHEBUST does not

It seems, then, another problem would be this extra layer called skipcache being added.

Steps to reproduce the issue:

  1. Check your free space (df -h)
  2. Create docker file which untars a 1GB file
    FROM debian:11
    ADD ./package/$TARGZNAME / skipcache
    WORKDIR /
    CMD /bin/bash
  3. Run it with?
    DOCKER_BUILDKIT=1 docker build \
            --progress=plain \
            --tag "imagebug" \
            --file "bug.Dockerfile" \
            --build-arg "TARGZNAME=$TARGZNAME" \
            .
  4. Check your free space (df -h)

Describe the results you received:
The free space was reduced in much more than 1GB on step 4

Describe the results you expected:
It is expected to have the same free space on steps 1 and 4

The rationale behind this is that if I am always skipping cache, the cache should not be saved.

If this skipcache argument to ADD does not allow this, how can I make so the cache for this step is not saved when the build completes (quicky filling my whole disk after several tests cycles)?

I like the cache it does, I just do not need the dangling old caches from my oldest .tar.gz files, added by the ADD ... skipcache command because those caches I marked with ADD ... skipcache are never going to be reused again.

Output of docker version:

$ docker version
Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 20.10.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-47-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.52GiB
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support

Related: docker/buildx#458 , moby/buildkit#1359

@thaJeztah
Copy link
Member

I cache the Python wheel because it takes time to build and I only need to build it when the Python source code changed (and not when my .tar.gz application source code change

Would it be an option to mount the (content of) tar.gz in the stage that uses it? Are the files that are extracted from the .tar.gz located in a single directory, or are they spread over the filesystem?

@evandrocoan
Copy link
Author

It would not be possible to mount them because they are spread across the file system, i.e., /etc/, /opt/ and /usr/.

I do not known why to save the cache of a stage when the stage is always skipped. Saving the cache on this case does not make much sense because if the stage is always skipped, it means the cache will never be used and it will always pilled up/be garbage.

For now, I added this on my pipeline:

docker rmi deploy_image:latest || cd .;

DOCKER_BUILDKIT=1 docker build \
    --progress=plain \
    --tag deploy_image \
    --file deploy.Dockerfile \
    .

docker builder prune --force --filter type=regular;

With this, the garbage cache is always cleaned up, although I am not using the skipcache thing, as it does nothing useful because the cache is always invalidated because my .tar.gz changes every build. The one useful thing skipcache could do for my workflow would be not saving the save when skipping it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants