Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: DAG image builder (can execute muti-stage build in parallel) #32550

Closed
AkihiroSuda opened this issue Apr 12, 2017 · 14 comments
Closed
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. roadmap

Comments

@AkihiroSuda
Copy link
Member

Proposal

Beginning from 17.05, Docker supports "multi-stage" Dockerfile.

My proposal is to convert such Dockerfile to DAG internally, and execute it in parallel.

image

No change to file format nor UX

POC is available: #32402

Tasks

Implement generic DAG utility

I made a small generic DAG package: https://github.com/AkihiroSuda/go-dag

import (
    "github.com/AkihiroSuda/go-dag"
    "github.com/AkihiroSuda/go-dag/scheduler"
)

g := &dag.Graph{
	Nodes: []dag.Node{0, 1, 2},
	Edges: []dag.Edge{
		{Depender: 2, Dependee: 0},
		{Depender: 2, Dependee: 1},
	},
}
concurrency := 0
scheduler.Execute(g, concurrency, func(n dag.Node) { buildStage(n) })

If this design is correct, I think we can vendor this package (or just copy it to github.com/docker/docker/pkg/dag)

Determine Dockerfile DAG granularity

Alternatively, we could use fine-grained DAG like this, but I'm -1 ATM, because it is likely to cause implementation issue

image

Refactor builder pkg to create DAG

My previous POC (#32402) was implemented in weird way:

  • builder/dockerfile/parser(unchanged): parses Dockerfile text and returns parsed tree structure
  • builder/dockerfile/parallel: re-parses output from builder/dockerfile/parser, and creates DAG

Re-parsing output from builder/dockerfile/parser is likely to cause implementation issue; actually I forgot to implement support for ARG.

Rather, we should refactor builder/dockerfile/parser to emit DAG directly.

Investigate other usecases of DAG

#32402 (comment)

The DAG based execution engine should be the new core of the builder, not only provide concurrent stage execution but concurrent build jobs, more efficient processing and cache reuse, cache export/import for DAG branches etc. In the same time, this core should be separated from the frontend(Dockerfile) logic to provide more options for declaring the build definition and also provide more extension points for others to reuse this solver.

Investigate why docker build is slow

Storage driver seems being bottleneck, but haven't looked into this deeper

#9656
#32402 (comment)
#32402 (comment)

cc @tonistiigi @dnephin @aaronlehmann @vdemeester @cpuguy83 @simonferquel @thaJeztah

@AkihiroSuda AkihiroSuda added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Apr 12, 2017
@simonferquel
Copy link
Contributor

I would prefer to have a fine grained DAG. Possibly even more fine grained than a direct equivalent to the dockerfile commands (opening optimizations like auto commit squashing, squashing of multiple ENV / COPY calls, more precise dependency management opening more parallelism etc.)

For exemple if you take

FROM img AS foo
<...>
FROM img2
RUN <...>
COPY --from=foo /foo /bar

we could run foo and img2 context in parallel until we actually need to COPY from foo.
It would also be very usefull to have this fine grained DAG observable so that we can do a PrePass where we could do Semantic validation, and repo list extraction (very usefull for the upcoming credential on demand streaming)

@thaJeztah
Copy link
Member

Fine grained DAG could be a continuation on this proposal? I see benefits in having that, but also agree that adding complication at the start may not be wise; happy to hear from others though!

@duglin
Copy link
Contributor

duglin commented Apr 12, 2017

While this looks like an interesting optimization, given the newness of some of the features this requires (like the multi-stage build), I'd prefer to wait until we know how well people react and use the existing new build features before we introduce something like this.

@tonistiigi
Copy link
Member

In addition to my comments in #32402 (comment) I wrote a proposal for a direction this could take https://gist.github.com/tonistiigi/059fc72c4630f066d94dafb5e0e70dc6 . Please note that that is only a proposal, there in nobody developing that atm.

I think the correct ways to get started on this would be:

  • Fix builder dependency problem builder: clean up builder dependancy interface #32904
  • Implement parsing of build stages so they can be preprocessed and analyzed
  • Remove linear execution of build stages. Make sure unnecessary stages are skipped.
  • Make sure multiple builder invocations don't duplicate workloads.

I think specifically skipping unnecessary stages is a good goal to take before tackling what configuration of parallelization would be optimum. It would also make it possible for #32487 to build images for multiple operating systems.

@AkihiroSuda or anyone else, if the above looks good to you, let me know if you would be interested in helping with any of this or #32904 (or actually any builder proposal). We can do a quick design discussion or live design doc for the specific items.

@AkihiroSuda
Copy link
Member Author

@tonistiigi

Thank you a lot for working on this, SGTM 👍.

Connection with docker/docker project

Now this section would need to be rewritten for "docker product" and "moby project" (cc @shykes @moby/moby-maintainers)

Sources

With regarding to "git repositories, http archives" sources, do you plan to add more remote sources? (Then it might be good to consider more modularization) Or are they just for compatibility with the current implementation?

Snapshots

I wonder we can just use containerd snapshot drivers for simplicity.
Non-standard drivers could be implemented as some 3rd party containerd plugin

Cache import/export

👍

Also, it might be good to consider P2P caching here?
(Scope of docker/distribution?)

@tonistiigi
Copy link
Member

@AkihiroSuda

With regarding to "git repositories, http archives" sources, do you plan to add more remote sources? (Then it might be good to consider more modularization) Or are they just for compatibility with the current implementation?

Buildkit itself should be quite open about adding new sources. docker build would probably ship with a more stricter set. Buildkit in its core is a library so if you are thinking of having pluggable sources then that would just need to be implemented by a tool that includes it.

I wonder we can just use containerd snapshot drivers for simplicity.
Non-standard drivers could be implemented as some 3rd party containerd plugin

We definitely one to rely on containerd snapshot drivers for the actual implementation. One of the issues is that for docker build use case we need to share the snapshots with other components(like docker images). So currently I see it as a thin wrapper around containerd drivers that provides reference counting capabilities.

Also, it might be good to consider P2P caching here?

Agree, but one problem at a time.

I think I'll open up a new issue with that proposal even if the code for it would not end up in this repo so we don't have to share the gist. We can discuss there if we are ready to open a new repo etc.

@AkihiroSuda
Copy link
Member Author

Should we close this issue for housekeeping purpose?

Now we have buildkit repo: https://github.com/moby/buildkit

@dnephin
Copy link
Member

dnephin commented Jul 4, 2017

Let's keep it open to track this issue in this repo, since the builder is still here.

@AkihiroSuda
Copy link
Member Author

ok

@AkihiroSuda
Copy link
Member Author

Now BuildKit supports converting Dockerfile to LLB DAG 🎉
moby/buildkit#106

@vdemeester
Copy link
Member

@tonistiigi @AkihiroSuda should we close this one as buildkit supports it (mainly to not forget closing this one once buildkit is integrated in moby).

@AkihiroSuda
Copy link
Member Author

This is being integrated to Docker/Moby via PR #37151 🎉

@vdemeester Probably after merging the PR?

@vdemeester
Copy link
Member

@AkihiroSuda yes 👍

@AkihiroSuda
Copy link
Member Author

Let's close this one as #37151 has been merged now 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. roadmap
Projects
None yet
Development

No branches or pull requests

9 participants