Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: unclear how to cache transitive dependencies in a Docker image #27719

Open
wedow opened this Issue Sep 17, 2018 · 13 comments

Comments

Projects
None yet
7 participants
@wedow
Copy link

wedow commented Sep 17, 2018

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

I'm attempting to populate a Docker cache layer with compiled dependencies based on the contents of go.mod. The general recommendation with Docker is to use go mod download however this only provides caching of sources.

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build. This causes a cache invalidation on every code change and renders the step useless.

Here's a Dockerfile demonstrating my issue:

FROM golang:1.11-alpine
RUN apk add git

ENV CGO_ENABLED=0 GOOS=linux

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

# this fails
RUN go build all
# => go: warning: "all" matched no packages

COPY . .

# this now works but isn't needed
RUN go build all

# compile app along with any unbuilt deps
RUN go build

From package lists and patterns:

When using modules, "all" expands to all packages in the main module and their dependencies, including dependencies needed by tests of any of those.

where the main module is defined by the contents of go.mod (if I'm understanding this correctly).

Since "the main module's go.mod file defines the precise set of packages available for use by the go command", I would expect go build all to rely on go.mod and build any packages listed within.

Other actions which support "all" have this issue but some have flags which resolve it (go list -m all).

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Sep 18, 2018

@wedow

This comment has been minimized.

Copy link
Author

wedow commented Sep 18, 2018

Thanks Dave, go build ./... is a bit of an improvement since it doesn't include the test dependencies that all does. However it still requires my application source to be present and gives go: warning: "./..." matched no packages if run with only go.mod and go.sum present.

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Sep 18, 2018

@wedow

This comment has been minimized.

Copy link
Author

wedow commented Sep 18, 2018

For sure. I've found in most previous projects that dependency build times are fast enough to not be an issue so in the end the existing behaviour is probably fine.

Part of my current project is the creation of a custom Terraform Provider for managing some of our internal systems. Building the Terraform packages only happens once locally so not a big deal, but they need to be rebuilt every time a new docker image is built. When these packages are already compiled, go build completes in under a second. When they need to be rebuilt from scratch, go build can take up to two minutes locally or longer on our CI servers.

Some time can be saved by using go mod download to cache the Terraform package sources but afaict there is no command to compile them after download without having our package main present for go build to determine what the dependencies actually are.

Based on the existing module documentation, I would expect the go.mod file to have an accurate list of required dependencies and for the toolchain to be able to rely on it in isolation.

We do similar things with projects in other languages for building Docker images. The flow is generally:

  1. Copy package manifest (Gemfile, package.json, etc.) into container
  2. Download dependency code and compile associated libraries (bundle, npm install, etc.)
  3. Copy the rest of our project source into container

This lets us avoid having to rebuild dependencies on every commit. It would be nice if this could be replicated with the Go module system. go mod download gets us halfway but doesn't allow caching of compilation artifacts.

Here's an example repo: https://github.com/wedow/docker-go-build

To see the issue we're having, clone it and run docker build ., add a comment or something to main.go and run docker build . again. Ideally all deps would be be built and cached prior to the COPY . . step and the final go build would be a sub-second operation.

@bcmills bcmills added this to the Go1.12 milestone Sep 18, 2018

@myitcv

This comment has been minimized.

Copy link
Member

myitcv commented Nov 13, 2018

I think what you're after here is:

go list -export $(go list -m)/...
The -export flag causes list to set the Export field to the name of a
file containing up-to-date export information for the given package.

This will populate the build cache (go env GOCACHE) with the results of compiling for the -export flag. The module cache ($GOPATH/pkg/mod) as you say contains the module-related caches.

If you want to install main packages too then:

go install $(go list -f '{{ $ip := .ImportPath}}{{if eq .Name "main"}}{{$ip}}{{end}}' $(go list -m)/...)
@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Nov 13, 2018

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build.

Yes, that is working as designed: in module mode, all refers to the transitive imports of the packages in the main module, not the packages in its module dependencies. That's not going to change.

This causes a cache invalidation on every code change and renders the step useless.

If the code changes are only in your .go source files, then only the cache entries for the packages containing those source files should be invalidated: the cache contents for the other transitive dependencies should be unaffected.

The build artifact cache is separate from the module cache: the former is controlled by GOCACHE (and defaults to $HOME/.cache), while the latter is a subdirectory of the first entry in GOPATH. You may need to set the GOCACHE environment variable to make sure it is within the container; see Build and test caching for detail.

Can you confirm that both the build cache and the module cache are present and populated in your docker image after the first go build all?

@bcmills bcmills changed the title cmd/go: go action all ignores go.mod file cmd/go: unclear how to cache transitive dependencies in a Docker image Nov 13, 2018

@bcmills bcmills modified the milestones: Go1.12, Go1.13 Nov 13, 2018

@wedow

This comment has been minimized.

Copy link
Author

wedow commented Nov 29, 2018

Thanks guys, I think there may be some confusion about which caches are being affected and when.

The issue is in how docker caches layers after each operation. When my source files are changed, all side effects which occur after the COPY . . line (such as populating GOCACHE) are lost. Those changes are isolated in a layer which has been invalidated and must be fully rebuilt.

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build. go build all also has this issue.

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . . line that adds our sources to the container. Totally understand if that's not possible with the current module system. I may just experiment with parsing and building the deps separately.

@myitcv

This comment has been minimized.

Copy link
Member

myitcv commented Dec 1, 2018

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build

I'm unclear why you say it must come after the copy - please can you explain?

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . .

go list -export $(go list -m)/... should be all you need here. But let's first unravel the question above first.

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Dec 1, 2018

@myitcv, note that -export may at some point do less than a full build. I don't think it's a perfect fit for the use-case.

@hinshun

This comment has been minimized.

Copy link

hinshun commented Jan 18, 2019

You have to export both GOCACHE and GOPATH/pkg/mod:

Example:

FROM golang:1.11-alpine AS mod
RUN apk add -U git
WORKDIR /src
COPY go.mod .
COPY go.sum .
RUN go mod download

FROM golang:1.11-alpine
COPY --from=mod $GOCACHE $GOCACHE
COPY --from=mod $GOPATH/pkg/mod $GOPATH/pkg/mod
WORKDIR /src
COPY . .
RUN go build

alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 19, 2019

revert to a single stage build
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang

alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 22, 2019

revert to a single stage build
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang
@dbudworth

This comment has been minimized.

Copy link

dbudworth commented Feb 16, 2019

@myitcv the go list trick only works if you have your source present
The way we avoid re-downloading all deps is to simply copy over go.mod and go.sum then run go mod download which creates the package source cache, but does not create the compiled cache of the modules.

so we're looking for a way to get the stuff listed in go.mod compiled and placed in ~/.cache before we copy all the project source over, this lets us avoid the length re-compile of our deps on each build

think of it as a 2 phase build
phase 1: copy go.mod, download and (hopefully) compile deps
phase 2: copy project source and compile our stuff against phase 1 cached stuff

@wedow

This comment has been minimized.

Copy link
Author

wedow commented Feb 16, 2019

@dbudworth it doesn't really seem possible to do what we're looking to do with the currently available tooling. I came up with a hacky workaround to get the results I was looking for and just updated my example repo to illustrate it.

The basic idea is the use of a dummy import file which can trigger the compilation of dependencies when run through go build. This file is added with go.mod to the docker image, then compiled to prime the cache, then removed before adding the real application source files.

While I'd much prefer a way to compile dependencies separate from application code as part of the official toolchain, this method does dramatically reduce subsequent docker image build times for our project and has really sped up our CI process.

@dinvlad

This comment has been minimized.

Copy link

dinvlad commented Feb 28, 2019

Same issue here. go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies. The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies. So we run go build on that file first, and only then we COPY/ADD main.go and go build the latter (but this later go build now reuses deps pre-compiled with the previous go build).

This would be somewhat easier to handle if docker build supported a -v option so we could mount a "compilation cache" directory at build time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.