Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARG before FROM in Dockerfile doesn't behave as expected #34129

Closed
Benjamin-Dobell opened this issue Jul 16, 2017 · 24 comments

Comments

@Benjamin-Dobell
Copy link

@Benjamin-Dobell Benjamin-Dobell commented Jul 16, 2017

Description

It's documented that ARG can appear before FROM, so that arguments may be substituted into image names etc.

Rather than having some ARG before and some ARG after FROM, for consistency I attempted to place all my ARG before FROM. However, to my surprise (after a lot of debugging) I determined that my arguments are always blank after FROM.

I believe the meta-arg functionality/refactoring may somehow be responsible:

239c53b

Steps to reproduce the issue:

  1. Produce a Dockerfile such as:
ARG environment
FROM alpine:3.5
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment
  1. Build the image and run the image, printing the value of environment ARG (stored in /value_of_environment):
docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

Describe the results you received:

development

Describe the results you expected:

production

Additional information you deem important (e.g. issue happens only occasionally):

Altering the Dockerfile such that ARG comes after FROM i.e.

FROM alpine:3.5
ARG environment
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment

then running again:

docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

gives the expected output of production.

Output of docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:51:55 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 59
 Running: 0
 Paused: 0
 Stopped: 59
Images: 370
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 457
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.31-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.818GiB
Name: moby
ID: BCV5:MEMK:BYKI:I2IU:QY2V:5DRM:F2FP:JFAG:SM46:M2WJ:73YV:3KLP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 20
 Goroutines: 40
 System Time: 2017-07-16T19:58:09.054157098Z
 EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@boaz0

This comment has been minimized.

Copy link
Member

@boaz0 boaz0 commented Jul 17, 2017

@thaJeztah correct me if I'm wrong.

@Benjamin-Dobell after investigating this, 239c53b is not the origin of this behavior.

Basically, after the FROM instruction all the build arguments are reset and thus aren't available in the Dockerfile.

From what I found the purpose of ARG before FROM is to use it inside the FROM instruction #31352

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Jul 17, 2017

Yes, this doesn't look like a bug; see this pull request, which adds some more information docker/cli#333

@boaz0

This comment has been minimized.

Copy link
Member

@boaz0 boaz0 commented Jul 17, 2017

@thaJeztah I guess we can close this

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

Irrespective of whether this was implemented this way intentionally or it's a bug; I think it's a bit of a usability nightmare.

It's not clearly documented that this is the expected behaviour, and it makes for messy Dockerfile. But more importantly, it opens a pandora's box of confusing edge-cases.

What if I intend to use an ARG in both my FROM statement and after it? Am I expected to have multiple ARG statements referring to the same build-arg?

What happens if I use default value syntax ARG argument=some_value before FROM and just ARG argument after FROM? What is the expected value of argument after FROM if no argument build-arg was passed?

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Jul 17, 2017

What is the expected value of argument after FROM if no argument build-arg was passed?

The same as it would be if you're not using multi-stage build; empty / no value set

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

@thaJeztah I know that's true now, I've experimented with it. The issue is that it's hugely non-obvious.

If this is expected behaviour and no-one is willing to change it. Then at the very least ARG ought to be deprecated (before FROM) and instead when used prior to FROM the syntax should be FROMARG (which must come before FROM).

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Jul 17, 2017

ARG is reset after each FROM. If this is documented; why would ARG before FROM have to be deprecated?

/cc @tonistiigi @dnephin

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

Improved documentation is always appreciated, and would have saved me some time. However, just because behaviour is documented doesn't preclude the behaviour itself from scrutiny.

ARG has too much complexity to it. I'd argue this functionality shouldn't have been added to the ARG keyword in the first place, it's effectively been repurposed and its behaviour is now far to nuanced. A new keyword FROMARG from the on-set would have made a lot more sense.

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

I should note, that I'm not actually an advocate of expanding the grammar when the usage of the existing grammar can be expanded.

However, in this particular instance ARG has had its existing semantics altered; the behaviour is not additive. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.

It's extremely confusing in single stage builds, and perhaps more-so in multi-stage ones. If arguments really are tied to build stages (although I must confess I'm not sure why this is desirable), then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

Realistically, you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages. Additionally:

a “cache miss” occurs upon its first usage, not its definition

So there is zero incentive to intersperse ARG definitions through-out a file. Therefore, the most logical behaviour would be to encourage all ARG definitions to be placed at the top of a file (where they can clearly be seen) and then update the behaviour to ensure there's no funny business with build stages.

@tonistiigi

This comment has been minimized.

Copy link
Member

@tonistiigi tonistiigi commented Jul 17, 2017

However, in this particular instance ARG has had it's semantics altered. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.

The new ARG features are 100% backward compatible. No previous Dockerfile needs any changes.

then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

a “cache miss” occurs upon its first usage, not its definition

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.

@dnephin

This comment has been minimized.

Copy link
Member

@dnephin dnephin commented Jul 17, 2017

However, in this particular instance ARG has had it's semantics altered.

The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.

If arguments really are tied to build stages, then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

I think you're misunderstanding the scope. They are only scoped to the stage where they are declared.

(although I must confess I'm not sure why this is desirable) ... you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages

The use cases supported by a Dockerfile expanded quite a bit with multi-stage builds. It's no longer the case that a single Dockerfile will produce a single image. You can use --target to run different stages. At this time the build is still sequential but in the future we should be able to build more optimally. Not every build stage will run on every build.

In this context the design should make more sense. Although the values might not change, which lines actually run will change depending on the --target, which means the args must be defined in each stage, not in the meta section before a FROM.

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.

Yikes! That also needs documenting... and changing.

It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

When looking at a Dockerfile, what syntax marks the beginning of a new build stage?

FROM does, and yet, somehow it accesses ARG defined prior to this line.

@tonistiigi

This comment has been minimized.

Copy link
Member

@tonistiigi tonistiigi commented Jul 17, 2017

I'm was just clarifying what "first use" means. You use an ARG by executing a RUN command. No changes from the time ARG was introduced.

FROM defines a stage. What do you mean by accessing ARG?
There is a specific syntax that can be used to avoid redefining a default value for ARG multiple times in same file (something that you asked in #34129 (comment) btw). That requires both places to define that they want to share it. No ARG defined before FROM accidentally leaks into any build stage.

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

@Benjamin-Dobell Benjamin-Dobell commented Jul 17, 2017

To be clear, I'm not saying I don't understand how the current implementation works, what has been written in this issue explains it clearly enough. I'm suggesting the implementation itself is non-ideal and confusing; after all, I read the existing docs and literally cloned Docker compose, Docker client and finally Docker before working out what was going on - at which point I opened this issue.

It's just too complicated. Adding so much complexity to the Dockerfile syntax and the corresponding documentation is simply not sustainable.

The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.

I don't think this is necessarily 100% accurate that multi-stage and ARG in FROM are independent, they should have been independent, but I think the existence of multi-stage impacted the implementation of ARG in FROM.

The properties of ARG were:

  1. It may appear after FROM.

  2. The argument defined by ARG may be used on any line following the definition.

(2. is the way Dockerfiles always worked, sequential, state is additive, never subtractive).

A feature request comes along:

I'd like to use arguments in FROM.

Reasonable enough, the two previously defined properties still hold if implemented. We now have a third property:

  1. ARG may appear before FROM.

This can cleanly be implemented, without any backwards compatibility issues. Except, it wasn't; it could have been, but it wasn't.

Instead, property 2. was violated, suddenly ARG can't always be used after its defined. If it appears before FROM, then it can only be used in FROM, not on all subsequent lines.

That's changing the semantics of ARG, hence why I'm suggesting it should have been FROMARG, a keyword that can only appear in the "meta section" prior to FROM.

Mind you, this constraint is artificial in nature, there's zero reason 3. shouldn't have been implemented cleanly. The only reason the current implementation was deemed acceptable is because multi-stage builds were also coming, and it was also violating 2., albeit in a (roughly) well-defined fashion.

Anyway, my issue is complexity; that's subjective and given I'm not a maintainer, not for me to decide. Documentation is certainly better than nothing, so this issue may be closed if you see fit.

@ferrouswheel

This comment has been minimized.

Copy link

@ferrouswheel ferrouswheel commented Aug 10, 2017

As a new user of ARG it was very unintuitive why my ARG was empty. I saw someone use an example of ARG in a Dockerfile, but they were using it in the FROM line. For me it makes sense to define any parameterisation of a Dockerfile at the very top, so I didn't question it. Only upon rereading the docs after reading this issue do I understand why.

I would suggest a warning that ARG gets reset after FROM in the documentation, as not everyone is up to speed on multistage builds.

@shaunc

This comment has been minimized.

Copy link

@shaunc shaunc commented Sep 20, 2017

@Benjamin-Dobell I wanted to use build-args in multistage builds to pass secure keys to intermediate build stages which would then disappear. I haven't completely got confirmation that this is secure, but I was actually happy to see your issue.

For the record, aside from implementation details which respondents seem to be burdening you with, clearing build args -- at least so they can't be read from the build history -- seems IMO to be a very important feature... well worth the complexity.

UPDATE -- sigh ... I guess I spoke prematurely. Multistage builds don't help with the fact that args are written to build history.

@tonistiigi

This comment has been minimized.

Copy link
Member

@tonistiigi tonistiigi commented Sep 20, 2017

@shaunc Are you saying that build-arg defined for an intermediate stage is visible in the history of the final stage? This should not happen if you use COPY --from.

@lucendio

This comment has been minimized.

Copy link

@lucendio lucendio commented Dec 7, 2017

I ran into the same issue and in order to underline the impact of that behaviour, I want so share my example here, whos cause took a significant amount of time to figure out. Still it's totally unexpected and I wont exactlly call that user experience.
Please, if you don't see the necessity to change that bahaviour, then at least document it as the creator of this issue suggested, so that people can stumble upon this.

docker image build \
        --build-arg NODE_VERSION="4.8.3" \
        --build-arg NPM_VERSION="4.5.0"

Works not as expected. NPM_VERSION holds "latest".

ARG NODE_VERSION="latest"
ARG NPM_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

RUN npm install -g npm@${NPM_VERSION}
...

Works as intended. NPM_VERSION holds "4.5.0".

ARG NODE_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

ARG NPM_VERSION="latest"
RUN npm install -g npm@${NPM_VERSION}
...
@tonistiigi

This comment has been minimized.

Copy link
Member

@tonistiigi tonistiigi commented Dec 7, 2017

Please, if you don't see the necessity to change that bahaviour, then at least document it so that people can stumble upon this.

https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
https://docs.docker.com/engine/reference/builder/#scope

If this is a common pattern a PR would probably be accepted that detects this case (at least for variable substitution) and shows a warning about possible misuse.

@tonistiigi tonistiigi closed this Dec 7, 2017
@nik-shornikov

This comment has been minimized.

Copy link

@nik-shornikov nik-shornikov commented May 6, 2018

As far as this keyword behaves with multiple FROM statements, in "multi-stage" builds, ARG lets you specify different defaults for different stages, but there is no way (nor should there be) to pass different values explicitly to different stages. That's far more convoluted than having ARGs go into effect from the keyword down, across any number of stages/FROMs.

@ClementWalter

This comment has been minimized.

Copy link

@ClementWalter ClementWalter commented Aug 31, 2018

If you want to use the same ARG before and after FROM, simply re-declare it after, e.g.:

ARG my_arg
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
@himslm01

This comment has been minimized.

Copy link

@himslm01 himslm01 commented Dec 9, 2018

simply re-declare it

This is an over simplification. You are not considering default values and the programming rule of one single source of truth.

ARG my_arg="default"
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg="default"
# This should not be empty
RUN echo "my_arg is $my_arg"

We now have the arg's default value defined twice in one file - we have lost the single source of truth.

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Dec 9, 2018

This is an over simplification. You are not considering default values

The example given actually takes care of default values;

docker build --no-cache -<<'EOF'
ARG my_arg=latest
FROM busybox:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
EOF

Sending build context to Docker daemon  2.048kB
Step 1/5 : ARG my_arg=latest
Step 2/5 : FROM busybox:$my_arg
 ---> 59788edf1f3e
Step 3/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 029ff9c3cdc8
my_arg is 
Removing intermediate container 029ff9c3cdc8
 ---> f9135f511c84
Step 4/5 : ARG my_arg
 ---> Running in 7c9616537324
Removing intermediate container 7c9616537324
 ---> 35ccdf7ea0a9
Step 5/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 1e712eef0399
my_arg is latest
Removing intermediate container 1e712eef0399
 ---> 56c25e303cb9
Successfully built 56c25e303cb9

I also posted some examples in #37622 (comment), #37345 (comment)

@varnav

This comment has been minimized.

Copy link

@varnav varnav commented Jul 3, 2019

I lost couple of hours to this. Intuitively I was expecting that ARG before FROM in multistage build will be a global ARG (for all stages). In simply gets cleared instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.