New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-stage Build Issues #4246
Comments
I think moby/moby#32100 would fix this
Would be fixed by moby/moby#32507 and moby/moby#32904 . Copy and many other metadata operations can be implemented without creating any layers.
This seems like a really specific use case, and I don't think this reflects the general problem that is solved by multi stage builds. I would personally put those 4 into 4 separate Dockerfiles. They are building different applications, not a single one. You could use |
I look into this.
As far as I can tell from exploring Notice the creation of a new layer for each
I disagree. One of the primary objectives of Multistage build is the separation of build time concerns from the run time image. Essentially, the example manufactures three different artifacts needed by the run time image using three different stages that focus exclusively on providing the environments needed to construct each artifact. Once finished, the last stage transfers the artifacts (golang executables) from their no longer necessary build environments, combining them to create the final run time image. At a minimum, at least 2 stages are required when a run time artifact must be built, instead of simply copied from the Build Context. In this situation, the first stage is polluted by the build environment needed to construct the run time artifact while the second stage extracts the constructed run time artifact from its build environment by transferring it into the run time image. Therefore, it's not unreasonable to expect scenarios where more than one run time artifact must be built to satisfy the run time image requirements. Finally, one could easily create another example involving the building of two dynamic C++ libraries with the third stage creating an executable that depends on them . This can be accomplished by furnishing the appropriate source code and substituting the ONBUILD golang images with corresponding ONBUILD C++ ones. Any feedback regarding the Issue: Ignores Aggregate Build Context? |
Why are intermediate layers a problem? Layers from a previous stage are not in the final stage, so it shouldn't matter how many layers you have in intermediate stages. You can grab them all as a single layer in the final stage. Also moby/moby#32904 will allow for
This does seem reasonable, and I believe that works fine, as you demonstrage in your example.
I don't really see the issue. You can do something like this to append to merge contexts:
You can filter by starting from a fresh base and using Also |
Agreed, there shouldn't be a problem. However, I currently don't know how to "grab them all as a single layer". As far as I can tell, two
How would one write the golang example without rewriting the golang ONBUILD image? |
This line from your example should accomplish that. It will be a single layer in the final image:
That golang example is already working, right? So is the problem that its so verbose? and each stage seems to be very similar? |
Yes of course, within the example However, according to what I now understand from our posts, once moby/moby#32904 is merged, one should be able to eliminate requiredExtra and simply issue four independent
Therefore, the above should generate exactly 2 layers:
Let me know if my understanding above is incorrect.
Yes. The Example: Current Multistage Design should work.
Yes & Yes. Due to the Aggregate Build Context and lack of mechanisms to partition/map it so each stage can be defined with its own Local Build Context, one can't use the current golang on-build trigger image to implement any stage. There are essentially two reasons for the repetitive code:
|
It makes it possible for builder to squash these layers but we do not want to do that. You should not care about the number of layers, and in the future not even know how many layers there were. Multiple layers that don't share contents do not perform any worse than a single one. In that case, multiple layers perform much better as they can reuse the data from previous builds. Checking for deduplication is a separate issue. If these copies share sources then it is not how multi-stage builds should be used.
What you are asking should be basically |
Thanks for reminding me, as the original reason for eliminating layers was to flush build time artifacts from the run time image. Since multistage builds properly separate build and run time concerns, your right, layer count doesn't matter.
I believe I understand this reference. For me, the Build Context represents the essential abstraction for resolving a stage's file path references. A stage acquires input artifacts for its transforms and shares output artifacts via its Build Context. Its just a simple file system whose content and structure are unique to a given stage. To remain simple, the file paths do not directly expose concepts of an image or stage reference. Therefore, in order to include other abstractions like image or stage file paths, these abstractions must be mapped to a Build Context file path. This is analogous to how the Unix file system works. In Unix, network files, in memory file systems, RAID arrays, ... can be mounted into the local file system, permitting processes to read and write to these hidden abstractions using simple file path references to the local file system, concealing the complexity of where/how these files are actually stored. Additionally, the simple file path references present a static interface that can be rebound to a different hidden abstraction. For example, a simple file path reference can be bound to a RAID array then rebound to another hidden abstraction, like an in memory file system. After rebinding, the processes referencing this file path wouldn't know/care about the change. So what's my point? I would suggest eliminating the notion of stage/file references from A couple final points:
|
What you are calling context isn't really any different from any of the other sources that build can use like images, stages, tar archives, git repos. It is just the source that happens to contain the files from the working dir of the client. An important property of these sources that makes the core of builder to work is that they are all immutable. |
Exactly the point! A Build Context is an abstraction, just like the *nix file system is an abstraction allowing various kinds of resources to present themselves as simple file path(s) that can be traversed, read, renamed ... using a standard interface. Therefore, instead of limiting the notion of a "Build Context" to a concrete definition: the "source that happens to contain the files from the working dir of the client" extended it to include: images, stages, tar archives, git repos by reflecting these things as Build Context file paths. Prior to the introduction of
What's suggested by Finally, |
In addition to my initial reply, the below discusses a few more reasons why the suggested workaround is problematic:
All the issues above apply to the provided workaround due to the lack of declarative and inflexible mapping mechanisms offered by |
Came here to complain about lack of global args w/ multistage builds... and WhisperingChaos critique did not disappoint! 5/5 will subscribe. |
Thank you for your kind complement of my critique. Although I'm not quite sure what you expect to experience by subscribing to this thread. I do appreciate that the core maintainers/developers where polite enough to respond to my arguments given the competition for their time to both respond to the other community posts and their driving desire to improve Docker through actually writing code. However, it's evident to me that even if they suspected the validity of some of the technical arguments presented above, that they believe the already encoded mult-stage mechanisms address the concerns well enough for the common use cases. |
TL;DR
The current semantics of
--from
intrinsically induce pathological coupling between build stages. Its intimate binding to build stage implementation opposes the principle of encapsulation necessary to permit reuse, as well as reason, in isolation, about an individual stage's behavior. By defeating encapsulation,--from
thwarts applying current Dockerfile reuse features, such asONBUILD
and inhibits the introduction of future reuse mechanisms.To avoid the harmful traits associated to
--from
, the existing Build Context abstraction should be adapted so its content can be extended by mounting a stage's image file path into it, instead of introducing the new stage/image reference concept to Dockerfile development. By extending its content and introducing a mapping mechanism to the existing Build Context abstraction, the--from
syntax can be eliminated, current reuse features restored, and the introduction of new reuse mechanisms unencumbered.TOC
--from
IssuesONBUILD
TriggersExtra Build Stage & RedundantExtra layers are OK see commentCOPY
ingIssue: Tight, Pathological Coupling
The design of
--from
ensures theCOPY
instruction tightly couples itself to the implementation of other build stages. Tight coupling results from--from
’s purposely crafted facility to directly reference artifacts of other build stages, within a given Dockerfile, by stage names/positions and their physical locations (paths) in those other images.This pathological coupling, encouraging the internals of any build stage to intimately bind themselves to any other stage within a Dockerfile, eliminates the interface boundary between stages. This absence of an interface boundary negates encapsulation prohibiting human developers and algorithms from considering an individual build stage as a “black box” when defining or analyzing its behavior.
Issue expresses itself by:
ONBUILD
).Issue: Precludes ONBUILD Trigger Support
ONBUILD
trigger support enables a developer to declaratively encode an image’s transform behavior: operations responsible for converting a set of input artifacts to output ones. This declarative code includes a specification of an input interface followed by command(s) that execute a transform. The input interface definition emerges from the union of source file artifact (directory/filename) references specified by the triggeredADD/COPY
Dockerfile commands and is statically defined during the construction of theONBUILD
image while the transform consists of one or moreRUN
commands.Example
Create a golang compiler image that executes
ONBUILD
commands to automatically produce a golang executable image but not run it. Define the input interface: the path to copy golang source file(s) for the compiler image's Build Context, as/golang/app
. Name the compiler image exgolang. Create the Dockerfile for this image by modifying a copy of the Docker Hub golang:1.7-onbuild image Dockerfile.Dockerfile Contents:
To reuse the defined trigger behavior, simply encode a
FROM
statement that references the image name (FROM exgolang
) configured withONBUILD
commands. By promoting the DRY principle,ONBUILD
triggers dramatically increase an image’s build time utility, reliability, and adaptability while simultaneously eliminating or greatly decreasing the code required to employ this image in other Dockerfiles by other developers. Given this understanding, anONBUILD
trigger definition is remarkably akin to a function definition.Example
Using the
exgolang
image created above, generate a golangserver
executable from sourceserver.go
located in/golang/app/
.Build Context
Dockerfile
Docker build command:
The single instruction Dockerfile above when executed by
docker build
:/golang/app
directory into the image directory of/golang/app
./go/bin/app
fromserver.go
that resides in the resultant image's file system.As described and demonstrated by example, images incorporating
ONBUILD
statements are analogous to function definitions. This similarity extends to the equivalence of anONBUILD
image's input interface to a function's parameter list. As in the case of a function parameter list, anONBUILD
image's body: the series ofONBUILD
statements, binds (couples) to the file paths referenced by each instruction just like statements within a function body bind to its parameters. For example, theCOPY
issued by the trigger statementONBUILD COPY /golang/app /go/src/app
binds to the source file path:/golang/app
. This file path:/golang/app
is equivalent to a parameter defined for a function and performs a similar role, as it represents an interface element. Given this equivalence, why isn't there a mapping mechanism, like the one implemented for functions, that maps arguments specified by an invocation statement to parameters?When formulating
ONBUILD
support, the design avoided implementing an argument to parameter mapping mechanism on the trigger invocation statement:FROM
. Although this mapping mechanism is intrinsic to function invocation, I speculate, at the time when trigger support was implemented, the multistage build feature was a distant, future consideration. Meanwhile, the limitation of a single stage Dockerfile masked this issue, as the Build Context could be structured to mirror the input interface required by a single stage'sONBUILD
triggers. In other words, the Build Context file path (argument) names exactly match the (parameter) names required by theONBUILD ADD/COPY
instructions. However, introducing multistage builds starkly silhouettes the absence of an argument to parameter mapping mechanism.Multistage support forces the once "elemental" Build Context, whose content and structure was dictated by the needs of a single
FROM
, to become a composite one that must comply to the dependencies of two or moreFROM
statements. Since the problems inherent to the transformation from an elemental to composite Build Context diminish not only trigger support but also affect non-trigger statements that follow aFROM
, their discussion occurs in the topic: Issue: Ignores Aggregate Build Context below. Besides this issue of composite Build Contexts, pathological coupling introduced by--from
impedes applyingONBUILD
triggers.COPY
trigger instructions are currently bound at the time of their creation to a Build Context file path. IfCOPY
where to include--from
which stage name/position should it bind to, as it has to resolve the stage name within the context of all other existing and future Dockerfiles? Unfortunately, without introducing another mechanism to rebind the source file path references specified byONBUILD COPY
instructions within the scope of its invocation, it's very difficult within a multistage Dockerfile to reuse existing triggered enabled images once, let alone twice.Issue: Ignores Aggregate Build Context
Since the Dockerfile semantics before incorporating multistage assumed a single
FROM
statement, the expected Build Context reflected only those source artifacts located in the directory structure required byADD/COPY
commands immediately followingFROM
. Incorporating manyFROM
statements within a single Dockerfile requires a means to initially compose/aggregate the Build Context with the more elemental ones needed by eachFROM
then partition this composite/aggregate to supply the specific (elemental) Build Context expected by an individualFROM
(stage).Example
Using the
exgolang
image created above, attempt to generate three golang server executables from an Aggregate Build Context. Note, issues related to partitioning the Aggregate Build Context are broadly applicable to any multistage Dockerfile without regard to its use ofONBUILD
.Build Context
Dockerfile
Docker build command:
Unfortunately, the multistage build design ignores addressing Aggregate Build Context issues by failing to provide a mechanism that both partitions and restructures the Aggregate Build Context to supply the elemental Build Context needed by a specific
FROM
. Therefore, executing the abovedocker build
command copies the same golang source/server/golang/app/server.go
into three distinct images, runs the compiler and generates the sameserver
executable writing it to each image's/go/bin
directory.Additionally, when incorporating stages referencing
ONBUILD
triggers, current multistage Dockerfile support not only inhibits their use but when "it works" the outcome can be dangerous, especially when the trigger assumes a Build Context interface of "." (everything interface) as inCOPY . /go/src
. In this situation, the entire Aggregate Build Context would be accessible to any stage, thereby, polluting an individual stage's source artifact set with artifacts from all other stages.Issue: Complexity due to added Dockerfile abstractions
Any worthwhile program must apply coupling to map its abstractions to an implementation. However, it's important to minimize coupling whenever possible. One method to reduce coupling relies on limiting the abstractions required to only the essential ones applicable to realize the encoded algorithm's objective.
The purpose of a Dockerfile is to provide the scaffolding needed to deliver source artifact(s) to a transform that then produces output artifact(s). Since the transforms, executed by the
RUN
command, rely on reading and writing to files within a file system, the source artifacts must be eventually mapped as files within a file system. Perhaps due to a desire to align with this necessity, the Build Context abstraction responsible for providing source artifacts was also designed to represent source artifacts as files within a file system. This design choice, matching the representation of the Build Context with the one required by the underlying transforms (files in a file system), resulted in Dockerfile commands, likeCOPY
, whose syntax and behavior nearly mirrors that of a corresponding OS command, such ascp
, and facilitated Dockerfile adoption by leveraging a developer's existing understanding of it.The introduction of
COPY --from
adds a new abstraction: stage/image reference, to Dockerfile coding. This addition abstraction necessitated changingCOPY
's interface and weaving the resolution of stage/image references into its implementation soCOPY
's binding mechanisms could differentiate between Build Context and other stage/image sources. Besides adding some complexity to applyingCOPY
, introducing the stage/image reference abstraction imposes implications for features that rely onCOPY
's behavior. When assessing these implications one hopes for beneficial or neutral outcomes regarding their effect. However in this situation, the rigid binding of--from
to a particular stage/image precludes the use ofCOPY --from
in any current reuse mechanism, such asONBUILD
, or future one. This negative outcome not only prevents reuse mechanisms, likeONBUILD
, from referencing other stages/images but also diminishes the utility of--from
, as it can't be applied in all valid contexts of theCOPY
instruction.An often sighted strength of Unix derivative OSes is their insistence on mapping various abstractions, like hard drives, IPC, ... to a file. Therefore, instead of adding complexity by creating a corresponding concrete OS concept for each supported device/abstraction, which in many cases would only offer a slightly different interface, Unix designers mapped new abstractions (especially devices) to a single one - the file. Once mapped, the majority of the code written to manage/manipulate this single abstraction (file) immediately applies to the new one. Since image/stage references are essentially file path references, perhaps, in lieu of explicitly exposing
--from
's stage/image reference abstraction, it should be mapped to an existing abstraction: the Build Context.Recasting the stage/image references as file paths in the Build Context confers the following benefits:
--from
option.COPY
reverts to its prior, simpler syntax.Issue: Extra Build Stage & Redundant COPYingIf the objective of a multistage build is the creation of a single layer representing a runtime image, the current semantics ofCOPY --from
requires an extra build stage and redundant COPYing when the resultant build artifacts must be assembled from more than one build stage or image.##### ExampleApplying the current semantics ofCOPY --from
, create a golang webserver whose stdout and stderr is redirected to a remote logging facility as a single layer in the resulting image.```FROM golang:nanoserver as webserver
COPY /web /code
WORKDIR /code
RUN go build webserver.go
FROM golang:nanoserver as remoteloggerCOPY /remotelogger /code
WORKDIR /code
RUN go build remotelogger.go
# extra build stage and physical coping due to semantics of COPY --from in order# to generate single layer in next build stageFROM scratch as extra_redundant_copyingCOPY --from=webserver /code/webserver.exe /redundant/webserver.exe
COPY --from=remotelogger /code/webserver.exe /redundant/remogelogger.exe
COPY /script/pipem.ps1 /redundant
FROM microsoft/nanoserver as extra_redundant_copyingCOPY --from=extra_redundant_copying /redundant /
CMD ["\pipem.ps1"]
EXPOSE 8080
```The above situation generalizes to N extra build stages and X redundant copy operations when there's a desire to create a resultant image of N layers where each layer requires artifacts from more than a single stage.Recommendations:
--from
as an option toCOPY
.CONTEXT
, mounts the desired Aggregate Build Context file paths, similar todocker run -v
option, into the Build Context created for an individual stage.MOUNT
that's analogous toCONTEXT
. However,MOUNT
mounts an image's file path into the Aggregate Build context instead of mounting it into the stage's local Build Context.Applying the recommendations above, when compared to the currently implement multistage design:
ONBUILD
triggers.CONTEXT
andMOUNT
proposed by the links referenced above.Comparison: Current Multistage Design vs. Recommended
The examples below concretely contrast, through the encoding of the same scenario, the benefits offered by the recommended approached when compared to the existing multistage design.
Scenario
Using already available Docker Hub images, construct a container composed of three independent golang executables. One executable implements a webserver, another a logging device that relays messages to a remote server, while the third reports on the webserver's health.
Initial Build Context
The initial Build Context common to both examples.
Build Context (initial aggregate/global context)
Example: Current Multistage Design
Example: Recommended Multistage Design
Differences
Recommended Multistage Design when compared to Current Multistage Design:
ONBUILD
, that minimize developer produced code andCONTEXT & MOUNT
separately from the Dockerfile operations likeCOPY
.ONBUILD
images.docker run -v
.--from
and stage/image reference support by replacing both with a mapping mechanism that encourages encapsulation.FROM
instructions need be parsed to reveal the data dependencies between stages.Example: Recommended Multistage Design: Explained
FROM
. For this stage, the webserver's golang source namedserver.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes,MOUNT
associates the file/go/bin/app
located in the last container created by this stage to the Aggregate Build Context as/final/bin/webserver
.Local Build Context
Aggregate Build Context
FROM
image. For this stage, the logger's golang source namedserver.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes,MOUNT
associates the file/go/bin/app
located in the last container created by this stage to the Aggregate Build Context as/final/bin/logger
.Local Build Context
Aggregate Build Context
FROM
image. For this stage, the health's golang source namedserver.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes,MOUNT
associates the file/go/bin/app
located in the last container created by this stage to the Aggregate Build Context as/final/bin/health
.Local Build Context
Aggregate Build Context
/final/bin/
directory and projecting (renaming) it as/bin/
. Additionally the shell scriptscript.sh
is renamed tostart.sh
.Local Build Context
COPY
ing the Local Build Context, into the root directory ofalpine
.The text was updated successfully, but these errors were encountered: