Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: unclear how to cache transitive dependencies in a Docker image #27719

Open
wedow opened this issue Sep 17, 2018 · 31 comments

Comments

Projects
None yet
@wedow
Copy link

commented Sep 17, 2018

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

I'm attempting to populate a Docker cache layer with compiled dependencies based on the contents of go.mod. The general recommendation with Docker is to use go mod download however this only provides caching of sources.

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build. This causes a cache invalidation on every code change and renders the step useless.

Here's a Dockerfile demonstrating my issue:

FROM golang:1.11-alpine
RUN apk add git

ENV CGO_ENABLED=0 GOOS=linux

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

# this fails
RUN go build all
# => go: warning: "all" matched no packages

COPY . .

# this now works but isn't needed
RUN go build all

# compile app along with any unbuilt deps
RUN go build

From package lists and patterns:

When using modules, "all" expands to all packages in the main module and their dependencies, including dependencies needed by tests of any of those.

where the main module is defined by the contents of go.mod (if I'm understanding this correctly).

Since "the main module's go.mod file defines the precise set of packages available for use by the go command", I would expect go build all to rely on go.mod and build any packages listed within.

Other actions which support "all" have this issue but some have flags which resolve it (go list -m all).

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

@wedow

This comment has been minimized.

Copy link
Author

commented Sep 18, 2018

Thanks Dave, go build ./... is a bit of an improvement since it doesn't include the test dependencies that all does. However it still requires my application source to be present and gives go: warning: "./..." matched no packages if run with only go.mod and go.sum present.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

@wedow

This comment has been minimized.

Copy link
Author

commented Sep 18, 2018

For sure. I've found in most previous projects that dependency build times are fast enough to not be an issue so in the end the existing behaviour is probably fine.

Part of my current project is the creation of a custom Terraform Provider for managing some of our internal systems. Building the Terraform packages only happens once locally so not a big deal, but they need to be rebuilt every time a new docker image is built. When these packages are already compiled, go build completes in under a second. When they need to be rebuilt from scratch, go build can take up to two minutes locally or longer on our CI servers.

Some time can be saved by using go mod download to cache the Terraform package sources but afaict there is no command to compile them after download without having our package main present for go build to determine what the dependencies actually are.

Based on the existing module documentation, I would expect the go.mod file to have an accurate list of required dependencies and for the toolchain to be able to rely on it in isolation.

We do similar things with projects in other languages for building Docker images. The flow is generally:

  1. Copy package manifest (Gemfile, package.json, etc.) into container
  2. Download dependency code and compile associated libraries (bundle, npm install, etc.)
  3. Copy the rest of our project source into container

This lets us avoid having to rebuild dependencies on every commit. It would be nice if this could be replicated with the Go module system. go mod download gets us halfway but doesn't allow caching of compilation artifacts.

Here's an example repo: https://github.com/wedow/docker-go-build

To see the issue we're having, clone it and run docker build ., add a comment or something to main.go and run docker build . again. Ideally all deps would be be built and cached prior to the COPY . . step and the final go build would be a sub-second operation.

@bcmills bcmills added this to the Go1.12 milestone Sep 18, 2018

@myitcv

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

I think what you're after here is:

go list -export $(go list -m)/...
The -export flag causes list to set the Export field to the name of a
file containing up-to-date export information for the given package.

This will populate the build cache (go env GOCACHE) with the results of compiling for the -export flag. The module cache ($GOPATH/pkg/mod) as you say contains the module-related caches.

If you want to install main packages too then:

go install $(go list -f '{{ $ip := .ImportPath}}{{if eq .Name "main"}}{{$ip}}{{end}}' $(go list -m)/...)
@bcmills

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build.

Yes, that is working as designed: in module mode, all refers to the transitive imports of the packages in the main module, not the packages in its module dependencies. That's not going to change.

This causes a cache invalidation on every code change and renders the step useless.

If the code changes are only in your .go source files, then only the cache entries for the packages containing those source files should be invalidated: the cache contents for the other transitive dependencies should be unaffected.

The build artifact cache is separate from the module cache: the former is controlled by GOCACHE (and defaults to $HOME/.cache), while the latter is a subdirectory of the first entry in GOPATH. You may need to set the GOCACHE environment variable to make sure it is within the container; see Build and test caching for detail.

Can you confirm that both the build cache and the module cache are present and populated in your docker image after the first go build all?

@bcmills bcmills changed the title cmd/go: go action all ignores go.mod file cmd/go: unclear how to cache transitive dependencies in a Docker image Nov 13, 2018

@bcmills bcmills modified the milestones: Go1.12, Go1.13 Nov 13, 2018

@wedow

This comment has been minimized.

Copy link
Author

commented Nov 29, 2018

Thanks guys, I think there may be some confusion about which caches are being affected and when.

The issue is in how docker caches layers after each operation. When my source files are changed, all side effects which occur after the COPY . . line (such as populating GOCACHE) are lost. Those changes are isolated in a layer which has been invalidated and must be fully rebuilt.

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build. go build all also has this issue.

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . . line that adds our sources to the container. Totally understand if that's not possible with the current module system. I may just experiment with parsing and building the deps separately.

@myitcv

This comment has been minimized.

Copy link
Member

commented Dec 1, 2018

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build

I'm unclear why you say it must come after the copy - please can you explain?

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . .

go list -export $(go list -m)/... should be all you need here. But let's first unravel the question above first.

@bcmills

This comment has been minimized.

Copy link
Member

commented Dec 1, 2018

@myitcv, note that -export may at some point do less than a full build. I don't think it's a perfect fit for the use-case.

@hinshun

This comment has been minimized.

Copy link

commented Jan 18, 2019

You have to export both GOCACHE and GOPATH/pkg/mod:

Example:

FROM golang:1.11-alpine AS mod
RUN apk add -U git
WORKDIR /src
COPY go.mod .
COPY go.sum .
RUN go mod download

FROM golang:1.11-alpine
COPY --from=mod $GOCACHE $GOCACHE
COPY --from=mod $GOPATH/pkg/mod $GOPATH/pkg/mod
WORKDIR /src
COPY . .
RUN go build

alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 19, 2019

revert to a single stage build
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang

alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 22, 2019

revert to a single stage build
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang
@dbudworth

This comment has been minimized.

Copy link

commented Feb 16, 2019

@myitcv the go list trick only works if you have your source present
The way we avoid re-downloading all deps is to simply copy over go.mod and go.sum then run go mod download which creates the package source cache, but does not create the compiled cache of the modules.

so we're looking for a way to get the stuff listed in go.mod compiled and placed in ~/.cache before we copy all the project source over, this lets us avoid the length re-compile of our deps on each build

think of it as a 2 phase build
phase 1: copy go.mod, download and (hopefully) compile deps
phase 2: copy project source and compile our stuff against phase 1 cached stuff

@wedow

This comment has been minimized.

Copy link
Author

commented Feb 16, 2019

@dbudworth it doesn't really seem possible to do what we're looking to do with the currently available tooling. I came up with a hacky workaround to get the results I was looking for and just updated my example repo to illustrate it.

The basic idea is the use of a dummy import file which can trigger the compilation of dependencies when run through go build. This file is added with go.mod to the docker image, then compiled to prime the cache, then removed before adding the real application source files.

While I'd much prefer a way to compile dependencies separate from application code as part of the official toolchain, this method does dramatically reduce subsequent docker image build times for our project and has really sped up our CI process.

@dinvlad

This comment has been minimized.

Copy link

commented Feb 28, 2019

Same issue here. go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies. The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies. So we run go build on that file first, and only then we COPY/ADD main.go and go build the latter (but this later go build now reuses deps pre-compiled with the previous go build).

This would be somewhat easier to handle if docker build supported a -v option so we could mount a "compilation cache" directory at build time.

@benweissmann

This comment has been minimized.

Copy link

commented Apr 12, 2019

Would it be possible to add a --install or --compile flag to go mod download, that would compile and cache the downloaded packages?

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@benweissmann, that seems like it would have significant overlap with go get, which does build and install the requested packages.

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@dinvlad

go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies.

The Go build cache is content-addressed, and contains intermediate artifacts. If you are correctly storing the build cache (as @hinshun describes), then it should not recompile dependencies whose sources are unchanged.

The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies.

You can use go list to query the dependencies of your top-level package and request to build those dependencies explicitly. (A dummy .go file is fine too, but not strictly necessary.)

@bcmills bcmills removed the WaitingForInfo label Apr 12, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Please try the above approach (saving both GOCACHE and GOPATH/pkg/mod and using go list to compute the set of packages to warm the cache) and let us know if there are any remaining issues.

@wedow

This comment has been minimized.

Copy link
Author

commented Apr 16, 2019

@bcmills I'm kind of at a loss on how to explain the issue in a different way. The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache which also throws away anything in GOCACHE.

Similarly, @hinshun's approach of copying GOCACHE from a previous build step has no effect because go mod download doesn't populate GOCACHE. There is nothing to be copied.

You mention an --install flag would overlap with go get, but go get requires application source whereas go mod download does not and works on .mod files. If there is a way to have either go get operate on .mod files in isolation, or have go mod download populate GOCACHE after downloading, there'd be no issue. Since this doesn't work, we need a new option or command or something to accomplish this.

Personally, go mod download --install or even go mod install seem like good fits.

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache

Yes, you'd need to prime the cache in your Docker image from a specific version of your application source, and changing that source would invalidate the image caching. (I suspect that you could discard that source from the final image, but I don't use Docker much so I'm a bit fuzzy on the details.)

You could also use go list to compute the dependency versions (and dependency packages), and build those even without your application source.

go get does not require your application source in general: it can download packages and modules as needed. (You still need to pass it an appropriate list of packages to build, though.)

@bcmills bcmills removed the WaitingForInfo label Apr 16, 2019

@wedow

This comment has been minimized.

Copy link
Author

commented Apr 16, 2019

you'd need to prime the cache in your Docker image

I may be misunderstanding you but Docker doesn't have this capability.

Correct me if I'm wrong but you're suggesting using go list to generate a separate list of dependencies from what's already maintained in the .mod file and also committing that when dependencies are updated. Then using that list to build those dependencies (with go get or otherwise).

If you're suggesting using go list as part of the Dockerfile, that again doesn't work due to the cache invalidation issue. Unless go list has an option for parsing only the .mod file?

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

Correct me if I'm wrong but you're suggesting using go list to generate a separate list of dependencies from what's already maintained in the .mod file and also committing that when dependencies are updated. Then using that list to build those dependencies (with go get or otherwise).

Yes, exactly: use go list to produce a list of modules and versions, and to separately produce a list of packages to prime in the cache. Then commit that alongside your Dockerfile (or wherever you like), and have the Docker image run the equivalent of go mod init foo && go get -m $(<module_list.txt) && go get $(<package_list.txt) && rm go.mod go.sum.

@dinvlad

This comment has been minimized.

Copy link

commented Apr 16, 2019

Is module_list.txt the output of go list -m all? And is package_list.txt = go.sum? Thanks

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

Is module_list.txt the output of go list -m all?

Probably, yes. With the main module (go list -m) filtered out, and perhaps with the output transformed a bit into something that works as an argument to go get.

And is package_list.txt = go.sum?

No, it's probably more like go list all minus go list ./... (with both commands evaluated in module mode).

@dinvlad

This comment has been minimized.

Copy link

commented Apr 16, 2019

Hmm, thanks. If we experiment with a wrapper to produce the right lists, would you accept a PR to add this as a first-class option for go mod download?

@wedow

This comment has been minimized.

Copy link
Author

commented Apr 16, 2019

To be honest, I'm a bit confused on the purpose of the .mod and .sum files if they're not meant as a list of dependencies. Seems weird to create a third file for this purpose.

Other language ecosystems don't seem to have an issue with building dependencies based on some sort of manifest file.

  1. bundle install only needs a Gemfile
  2. npm install only needs a package.json
  3. cargo build only needs a Cargo.toml and an empty src/lib.rs file.

What makes Go special that this isn't feasible?

EDIT: To clarify, go mod download works exactly as expected on .mod files. All we want is to compile whatever was just downloaded by that command. It's cumbersome to maintain custom tooling to generate a redundant list of dependencies just to be able to compile them. I can't understand why if go mod download is able to fetch dependencies, another command can't exist to then build them.

I'd also be very happy to come up with a PR for this if it'd be welcome.

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

@wedow, you can copy in your go.mod and go.sum files to list the module versions. So I suppose you don't need a module_list.txt; those files suffice.

You still need a package_list.txt to tell the build exactly which targets you want to be warmed in the cache. (Presumably you don't want to pre-build packages that aren't actually needed to satisfy the transitive imports of the packages and tests you're running in the image.)

@hinshun

This comment has been minimized.

Copy link

commented Apr 19, 2019

Alternatively, if your docker daemon is recent enough (18.09+), you can export DOCKER_BUILDKIT="1" to use the new image builder with cache mount features. Currently only available on the experimental Dockerfile frontend, see: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypecache

# syntax = docker/dockerfile:experimental
FROM golang
...
RUN --mount=type=cache,target=/root/.cache/go-build go build ...
@gopherbot

This comment has been minimized.

Copy link

commented May 9, 2019

Change https://golang.org/cl/175985 mentions this issue: cmd/coordinator: stop using gitlock, use go modules

gopherbot pushed a commit to golang/build that referenced this issue May 9, 2019

cmd/coordinator: stop using gitlock, use go modules
Also, along for the ride:

* update from jessie to stretch
* update from Go 1.10 to Go 1.12
* move to multi-stage Dockerfile, including drawterm, reducing image size
* remove the static linking which was resulting in build warnings
* clean up Makefile

Updates golang/go#26872
Updates golang/go#27719

Change-Id: Ic4dc9b8539fb8662c9621c113fa94b70bc7de061
Reviewed-on: https://go-review.googlesource.com/c/build/+/175985
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented May 9, 2019

Change https://golang.org/cl/176257 mentions this issue: cmd/buildlet/stage0, cmd/scaleway: stop using gitlock, use go modules

gopherbot pushed a commit to golang/build that referenced this issue May 9, 2019

cmd/{buildlet/stage0,scaleway,tip}, devapp: stop using gitlock, use g…
…o modules

Updates golang/go#26872
Updates golang/go#27719

Change-Id: I4de6d4f157b349911362e02b1781abd8b813f87a
Reviewed-on: https://go-review.googlesource.com/c/build/+/176257
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@kuujo

This comment has been minimized.

Copy link

commented Jun 22, 2019

FWIW I ran into this exact same issue and found no great solution, and I know it's not sexy but I just used go mod vendor to vendor the dependencies on the host before COPYing the source into the Docker build, then set GOFLAGS=-mod=vendor prior to building binaries inside the Docker build. Then the vendor directory is just cleaned up after the Docker build. Until there's a way to download dependencies from go.mod/go.sum, using a temporary vendor directory just feels much simpler and more maintainable than these other solutions.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

@nicollecastrog

This comment has been minimized.

Copy link

commented Jul 9, 2019

Thanks @hinshun for your comment, it's how I found out about Buildkit, which is working really well for our team! 🙌

In case anyone is using Go modules (and coming from a Node/Ruby background such as myself), there's no good way to "pre-compile" the Go modules ahead of doing the full compilation. I think the fact that Go is a compiled language makes this particularly challenging, but others can feel free to correct me on that as I'm new to Go! However, using Buildkit, what you can do is use a mounted cache that is shared across builds (for a given build agent) so that your Go modules don't re-download every time:

# syntax = docker/dockerfile:experimental
FROM golang:1.12-stretch
...
RUN --mount=type=cache,target=/go/pkg/mod go build ...

The different thing to note here ^ is the target=/go/pkg/mod. As mentioned previously in this thread go mod download doesn't populate the go build cache (GOCACHE , which is usually someplace like: /root/.cache/go-build), but rather downloads the modules to a different location, which took me a while to work out amongst all the googling, and that's: /go/pkg/mod.

This alone is a great optimisation! The Go modules downloads were taking forever for us and they hardly ever change. You can apply that same cache mount logic to other things besides the Go modules too:

...
RUN --mount=type=cache,target=/var/cache/apk apk add --update curl ...
...
RUN --mount=type=cache,target=/usr/local/share/.cache/yarn/v1 yarn ...

Buildkit (and in particular this cache mount beauty) brought our full monorepo build times down from about 45 minutes to 12 minutes 🎉

Hope this helps someone 🤷‍♀😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.