Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with BUILDKIT and cache-from fails in Google Cloud Build #40262

Closed
liiri opened this issue Nov 27, 2019 · 29 comments
Closed

Building with BUILDKIT and cache-from fails in Google Cloud Build #40262

liiri opened this issue Nov 27, 2019 · 29 comments

Comments

@liiri
Copy link

liiri commented Nov 27, 2019

I'm trying to migrate my docker build in Google Cloud Build CI to use DOCKER_BUILDKIT.

The build step is now using docker:19.03-rc-dind:

  - name: 'docker:19.03-rc-dind'
    id: 'Build and Push'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        export DOCKER_BUILDKIT=1 && \
        docker build -f service.Dockerfile -t gcr.io/acme/service:latest --build-arg BUILDKIT_INLINE_CACHE=1 . && \
        docker push gcr.io/acme/service:latest

And this works great. However, when I tried to integrate cache-from, I'm getting errors. Switching the build command above to

docker build -f service.Dockerfile -t gcr.io/acme/service:latest --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from gcr.io/acme/service:latest .

Fails with the following:

error on cache query: invalid build cache from {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:xxx Size:3685 URLs:[] Annotations:map[] Platform:<nil>}

When I run the same command using the same docker:19.03-rc-dind image in my local macOS, build completes successfuly with --cache-from and even actually loads the cache context.

Is this an issue with Google Cloud environment? This gitlab issue suggests it may be an issue with gcr (Google Container Registry), but I don't think so, as I can pull from gcr fine locally.

@liiri liiri changed the title Building with buildkit and cache-from fails in Google Cloud Build Building with BUILDKIT and cache-from fails in Google Cloud Build Nov 27, 2019
@thaJeztah
Copy link
Member

ping @tonistiigi @AkihiroSuda

@thaJeztah
Copy link
Member

Are you still having this issue on a current version of 19.03? (I see you were using a release candidate; docker:19.03-rc-dind)

@liiri
Copy link
Author

liiri commented Jan 19, 2020

Hi, I now tried to re-integrate this using docker:latest, but it fails on earlier steps. My dockerfile is using FROM rabbitmq:management-alpine, and

export DOCKER_BUILDKIT=1 && \
docker build -f service.Dockerfile -t gcr.io/acme/service:latest --build-arg BUILDKIT_INLINE_CACHE=1 .

fails with the following output:

#2 [internal] load .dockerignore
#2 transferring context: 339B done
#2 DONE 0.1s

#1 [internal] load build definition from service.Dockerfile
#1 transferring dockerfile: 245B done
#1 DONE 0.0s

#3 [internal] load metadata for docker.io/library/rabbitmq:management-alpin...
#3 ERROR: docker.io/library/rabbitmq:management-alpine not found

#6 [internal] load build context
#6 DONE 0.0s

#5 [1/3] FROM docker.io/library/rabbitmq:management-alpine
#5 resolve docker.io/library/rabbitmq:management-alpine 0.0s done
#5 ERROR: docker.io/library/rabbitmq:management-alpine not found

#4 [internal] helper image for file operations
#4 resolve docker.io/docker/dockerfile-copy:v0.1.9@sha256:e8f159d3f00786604b93c675ee2783f8dc194bb565e61ca5788f6a6e9d304061 0.0s done
#4 CANCELED
------
> [internal] load metadata for docker.io/library/rabbitmq:management-alpine:
------
------
> [1/3] FROM docker.io/library/rabbitmq:management-alpine:
------
docker.io/library/rabbitmq:management-alpine not found
ERROR

I verified that building without DOCKER_BUILDKIT doesn't fail.

@thaJeztah
Copy link
Member

Hi, I now tried to re-integrate this using docker:latest, but it fails on earlier steps.

Looking at your example again; so that's only the client that's upgraded, correct?

When I run the same command using the same docker:19.03-rc-dind image in my local macOS, build completes successfuly with --cache-from and even actually loads the cache context.

Would you be able to check what version of the daemon is running in Google Cloud Build CI? (e.g., add a docker version and docker info to the steps you're executing? From the above, it's possible that it's running an older version of the daemon perhaps?

@liiri
Copy link
Author

liiri commented Jan 20, 2020

Correct, that's the only change I made. Possibly Google Cloud Build made other environment changes that I'm not aware of.
Commands output:
docker version

Client: Docker Engine - Community
Version: 19.03.5
API version: 1.39 (downgraded from 1.40)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:22:05 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 05:59:55 2019
OS/Arch: linux/amd64
Experimental: false

docker info

Client:
Debug Mode: false

Server:
Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 170
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1044-gcp
Operating System: Ubuntu 18.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.607GiB
Name: worker-824dd2d6-b2b4-476b-b59b-b5d174c0d682
ID: UZVV:CNSH:L34X:M35U:7RDX:3LFR:FQTN:ZKFX:STWD:KGD4:6XID:F3CI
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 49
Goroutines: 73
System Time: 2020-01-20T10:30:58.766640791Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://mirror.gcr.io/
Live Restore Enabled: false
Product License: Community Engine

@thaJeztah
Copy link
Member

Ah, yes, that's a pretty old version (and not even up-to-date with the latest 18.09.x release)

@liiri
Copy link
Author

liiri commented Jan 20, 2020

OK thanks, I'll refer the issue to Google support and will reopen this if I manage to reproduce with an upgraded version. Thanks

@liiri liiri closed this as completed Jan 20, 2020
@lukas1994
Copy link

@liiri did you manage to get it to work on Google Cloud Build?

@liiri
Copy link
Author

liiri commented Feb 7, 2020

@lukas1994 not yet, currently waiting for Google to upgrade their Docker server version

@matti
Copy link

matti commented Feb 13, 2020

this is also broken in google cloud shell

@rdeknijf
Copy link

@liiri Do you have a ticket in the issuetracker that we can follow? I can't seem to find anything.

@liiri
Copy link
Author

liiri commented May 27, 2020

@rdeknijf no sorry, didn't have time or priority to do it yet, would appreciate if anyone else that encountered the issue could report it there

@rdeknijf
Copy link

@vitaliytv
Copy link

Cloud Build Upgraded to Docker server version 19.03.8.

@matti
Copy link

matti commented Jul 8, 2020

now it works, but for some reason all FROM images need to be pulled - created a helper script for this: https://github.com/matti/gbuilder

@thaJeztah
Copy link
Member

@matti could you describe what's happening if you didn't pull the images? ISTR BuildKit will check the image metadata of the images specified in --cache-from, and pull them on-demand

@matti
Copy link

matti commented Jul 8, 2020

@thaJeztah In google cloud build this happens: moby/buildkit#1271

It does pull them locally when running the exactly same commands. So my workaround is to "scan" all the dependencies and pull them: https://github.com/matti/gbuilder/blob/5cf510f6abe75a69a5a746a377fdb198565ed5f4/gbuilder#L40-L53

@thaJeztah
Copy link
Member

You're using a registry mirror?

Looks like that issue was fixed in moby/buildkit#1397, but I think that's not (yet) in a Docker release. I could try if it's possible to backport for 19.03, but the diff looks rather big, so not sure if it's safe to do so

@thaJeztah
Copy link
Member

Looks like it depends on the containerd 1.3.x client, which is too big of a change, so don't think we'd be able to backport that

@matti
Copy link

matti commented Jul 9, 2020

Like I said, this issue happens only in google cloud build which was just updated to 19.03.8 (https://issuetracker.google.com/issues/157501467) and now it supports BuildKit.

It might be that google cloud build internally has a registry mirror, don't know. Here's the output from failing google cloud build.

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 380B 0.0s done
#1 DONE 0.0s

#3 [internal] load metadata for docker.io/library/golang:1.14.4-alpine3.12
#3 ERROR: docker.io/library/golang:1.14.4-alpine3.12 not found

#13 importing cache manifest from eu.gcr.io/svc-vodka/k8s-node-labeler:lates...
#13 DONE 0.0s

#6 [internal] load build context
#6 DONE 0.0s

#14 [builder 1/8] FROM docker.io/library/golang:1.14.4-alpine3.12
#14 resolve docker.io/library/golang:1.14.4-alpine3.12 0.1s done
#14 ERROR: docker.io/library/golang:1.14.4-alpine3.12 not found
------
 > [internal] load metadata for docker.io/library/golang:1.14.4-alpine3.12:
------
------
 > [builder 1/8] FROM docker.io/library/golang:1.14.4-alpine3.12:
------
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to load cache key: docker.io/library/golang:1.14.4-alpine3.12 not found

@thaJeztah
Copy link
Member

I recall some workarounds had to be applied for non-standard things in the gcr registry (moby/buildkit#1143, moby/buildkit#720), but that wouldn't explain why it fails to pull an image from docker hub (unless there's some mirror configured by default on their side perhaps?).

Could be worth opening a ticket in the buildkit repository for visibility https://github.com/moby/buildkit/issues 🤔

@thaJeztah
Copy link
Member

Ah, looks like it's indeed the mirror issue; moby/buildkit#1271. From kubernetes/kubernetes#92030, it appears that Google Cloud configures a mirror (https://mirror.gcr.io) by default, which would likely explain the issue

@matti
Copy link

matti commented Jul 9, 2020

So what's next? Can we mitigate this somehow on environments where mirror is configured?

@tonistiigi
Copy link
Member

but that wouldn't explain why it fails to pull an image from docker hub (unless there's some mirror configured by default on their side perhaps?).

I'm pretty sure there is automatic mirror but this image golang:1.14.4-alpine3.12 does not exist on that mirror. Handling that error gracefully is enabled after moby/buildkit#1397

@matti
Copy link

matti commented Jul 9, 2020

I'm not sure what is meant by "automatic mirror", but for example that golang:1.14.4-alpine3.12 is not found after ~50 builds so the mirror does not "lazy update".

@tonistiigi
Copy link
Member

I meant just a mirror that you didn't configure yourself but is automatically added to your build by the infra. Yes, that is the problem with grc mirror that is doesn't fallback itself if it doesn't have contents, like the mirror mode in docker/distribution registry does for example.

@ishallbethat
Copy link

@tonistiigi is there a solution now ? I'm not using cloud build but similarly I pull image from gcr and gcr doesn't contain a image specify in my Dockerfile and i get error image not found.

@matti
Copy link

matti commented Aug 28, 2020 via email

inferno-chromium added a commit to google/fuzzbench that referenced this issue Sep 14, 2020
…GCB build systems" (#653)"

Has bug fixes
- Fix for buildkit failures, see
  moby/moby#40262
  Use latest docker version and pull ubuntu:xenial image.
- Fix for coverage build failure, also add the dependent
  {benchmark}-project-builder template.
- Add missing tests for base-image and benchmark-coverage
  cloud build usecases.

Original patch by Tanishq Rupaal.

This reverts commit 45cad69.
inferno-chromium added a commit to google/fuzzbench that referenced this issue Sep 14, 2020
#729)

* Reland "Use single source of truth for building images for Local and GCB build systems" (#653)"

Has bug fixes
- Fix for buildkit failures, see
  moby/moby#40262
  Use latest docker version and pull ubuntu:xenial image.
- Fix for coverage build failure, also add the dependent
  {benchmark}-project-builder template.
- Add missing tests for base-image and benchmark-coverage
  cloud build usecases.

Original patch by Tanishq Rupaal.

This reverts commit 45cad69.

* Fix CI failure with openthread, bump revision.

* Revert openthread fix, as it breaks KLEE, unrelated to this CL.

* Improve comment based on review.
@hansbogert
Copy link

This is going to get worse with many people, me included, jumping to the gcr mirror since the Docker Hub rate limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants