Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could the default mount cache id include target architecture? #2598

Open
couling opened this issue Feb 4, 2022 · 4 comments
Open

Could the default mount cache id include target architecture? #2598

couling opened this issue Feb 4, 2022 · 4 comments

Comments

@couling
Copy link

couling commented Feb 4, 2022

Problem

There's a "gotya" when working with cache mounts and building with multiple architectures. The stated use case for these is:

This mount type allows the build container to cache directories for compilers and package managers.

However both compilers and package managers are often architecture dependant. The default id for the cache is just target so the cache is, by default, shared between architectures. This can be damaging.

At best, the cache gets flushed and is useless every build with a different architecture.

At worst, the code using the cache can't detect the incorrect architecture and gets confused by the content.

At worst worst, the code using the cache can't detect the incorrect architecture and builds a corrupt image as a result.

It's fine to expect programs using the cache to detect the cache is stale. But it's extremely uncommon for such programs to detect the wrong architecture's cache has been swapped in.

Example Error

If I have a dockerfile:

FROM alpine:latest AS base
RUN --mount=type=cache,sharing=locked,target=/var/cache/apk \
    apk add python3 py3-pip py3-wheel

And then I build twice (with qemu installed):

docker build -t my_image:latest_arm64 --platform linux/arm64 .
docker build -t my_image:latest_x86_64 --platform linux/x86_64 .

I'll end up with errors caused by the cache:

WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/main: UNTRUSTED signature
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/community: UNTRUSTED signature
ERROR: unable to select packages:
py3-pip (no such package):
   required by: world[py3-pip]
py3-wheel (no such package):
   required by: world[py3-wheel]
python3 (no such package):
   required by: world[python3]
ERROR: executor failed running [/bin/sh -c apk add python3 py3-pip py3-wheel]: exit code: 3

Workaround

As a workaround you can include ${TARGETARCH} in the id. For example:

FROM alpine:latest AS base
ARG TARGETARCH
RUN --mount=type=cache,sharing=locked,id=${TARGETARCH}/var/cache/apk,target=/var/cache/apk \
    apk add python3 py3-pip py3-wheel

Since TARGETARCH is set by default The workaround only needs a change to the dockerfile and the build commands will then work.

Desired enhancement

Ideally this "workaround" should be the default behaviour: include the value from TARGETARCH in the default id. If developers want to share a cache between multiple architectures, the current behaviour would still be available by setting an id manually. But it means that by default the cache would "just work".

The worst case of not knowing about this behaviour would be slower builds and increased network usage on multi-arch builds for platform independent caches (java, javascript...).

@tonistiigi
Copy link
Member

I think the behavior depends on the use case. In a lot of cases same cache is desired for all platforms, eg. when downloading package source code that does not contain binaries it is usually identical. Also for general build cache other languages as go just understand that the cases when cache is specific to platform. If your case does not then I think ability to separate it via id is a good approach.

Regarding apk I don't think this is the way how you would do it and none of your packages are cached with this method.

I would do:

FROM alpine:latest AS base
RUN --mount=type=cache,sharing=locked,target=/etc/apk/cache \
   ls -l /etc/apk/cache && apk add --no-cache python3 py3-pip py3-wheel && ls -l /etc/apk/cache

That actually caches the packages that have been installed before and doesn't seem to have any requirements for TARGETARCH in id either. ls is just for debug so you see what is in cache before and after.

@couling
Copy link
Author

couling commented Feb 4, 2022

As I say, this really about the safety of the defaults. I realise there's two use cases:

  • platform independent code - worst case poorer caching
  • platform dependant code - worst case failed builds or corrupted images

My reason for raising this request is that on balance I prefer default safety over default performance.

Alternatively a note about this in the documentation wouldn't go amiss. It took me an unfortunate amount of time to figure out what was going wrong.

Ultimately it's your call so I won't labor the point.


Regarding apk I don't think this is the way how you would do it and none of your packages are cached with this method.

The example I give is an SSCCE of what can go wrong. It's not a suggested way to cache PIP packages. It caches the package index and saves some performance loss from --no-cache. Its use case is a little bit lost in the given example.

@tonistiigi
Copy link
Member

platform dependant code - worst case failed builds or corrupted images

A failed build isn't necessarily a worst-case in the dev phase but a hint for user that they forgot to set id. Not understanding that your build is inefficient although you think you did everything correctly might hurt more in a long run. In a lot of cases TARGETARCH is even completely wrong, eg. all our internal Dockerfiles are cross-compiling where separating cache by target doesn't make any sense.

Alternatively a note about this in the documentation wouldn't go amiss.

PR welcome.

It caches the package index and saves some performance loss from --no-cache

Iiuc it caches only the index, meaning if you change the command it will still download all packages again but they will always be the old versions. And I guess if the index gets old it will just fail to download packages because they don't exist anymore? A more useful pattern is to cache the packages (it's bit confusing that I use --no-cache but it still does that) so if command changes you always get the latest packages but the packages that were already downloaded once are not downloaded again.

@ciaranmcnulty
Copy link

Just as a side-note to this, I had a similar problem and tried to resolve it with id=apk-${TARGETPLATFORM} but it didn't look like it was being expanded - would allowing arg usage here help with similar issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants