New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude “default” build-args from image history #31584

Merged
merged 1 commit into from Mar 22, 2017

Conversation

@dave-tucker
Contributor

dave-tucker commented Mar 6, 2017

- What I did

Based on the discussion in #30588 (comment). This PR removes any of the "default" build args from the image history.
This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the HTTP_PROXY variable.

- How I did it

Edited the builder code to make sure that the any default variables are filtered from the saveCmd unless there is a corresponding ARG statement`

- How to verify it

Use the force tests Luke...

Or alternatively.

Dockerfile:

FROM busybox
ARG http_proxy
RUN echo "Test build arg filterage"

Then

$ docker build -t ba-test --build-arg="HTTP_PROXY=http://user:password@proxy.example.com" --build-arg="http_proxy=http://someproxy.example.com" .
$ docker image history ba-test

The HTTP_PROXY variable will be absent from the history but http_proxy will be present.

- Description for the changelog

The values of default build time arguments (e.g HTTP_PROXY) are no longer displayed in docker image history unless a corresponding ARG instruction is written in the Dockerfile.

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 6, 2017

Member

Thanks! ping @tonistiigi ptal; IIRC you mentioned that the build arg should be included if it's explicitly defined in the Dockerfile, but perhaps I misunderstood

Member

thaJeztah commented Mar 6, 2017

Thanks! ping @tonistiigi ptal; IIRC you mentioned that the build arg should be included if it's explicitly defined in the Dockerfile, but perhaps I misunderstood

@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi

tonistiigi Mar 6, 2017

Member

Yes, I think adding only explicit ones is slightly better than skipping the builtin ones.

Member

tonistiigi commented Mar 6, 2017

Yes, I think adding only explicit ones is slightly better than skipping the builtin ones.

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 7, 2017

Contributor

@thaJeztah @tonistiigi updated to only remove from history if is a builtin arg and a corresponding ARG instruction is not present in the Dockerfile.

Contributor

dave-tucker commented Mar 7, 2017

@thaJeztah @tonistiigi updated to only remove from history if is a builtin arg and a corresponding ARG instruction is not present in the Dockerfile.

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Mar 7, 2017

Contributor

If I'm following this correctly, I disagree with this PR. In particular this:

This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the HTTP_PROXY variable.

A changed HTTP_PROXY MUST change the image being built. We have no idea what RUN cmd will actually make use of this variable and in what way. For all we know the RUN cmd will echo the entire list of env vars into a file and save that in the image. I would hope a changed HTTP_PROXY would cause a new image to be built.

Contributor

duglin commented Mar 7, 2017

If I'm following this correctly, I disagree with this PR. In particular this:

This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the HTTP_PROXY variable.

A changed HTTP_PROXY MUST change the image being built. We have no idea what RUN cmd will actually make use of this variable and in what way. For all we know the RUN cmd will echo the entire list of env vars into a file and save that in the image. I would hope a changed HTTP_PROXY would cause a new image to be built.

@thaJeztah thaJeztah added this to backlog in maintainers-session Mar 9, 2017

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Mar 9, 2017

Contributor

@duglin Honestly I think this is fairly surprising behavior. The cache is there to help the user get faster builds. If the user wants to break the cache, they certainly can do so manually, meanwhile it's not possible to not break the build cache if the user doesn't want to break it.

Contributor

cpuguy83 commented Mar 9, 2017

@duglin Honestly I think this is fairly surprising behavior. The cache is there to help the user get faster builds. If the user wants to break the cache, they certainly can do so manually, meanwhile it's not possible to not break the build cache if the user doesn't want to break it.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 9, 2017

Member

We're discussing this again in the maintainers meeting, and we think that busting the cache when switching to a different proxy is really a corner case. If someone wants it in the history, they need to define it in the Dockerfile. If they need to explicitly bust the cache, --no-cache is probably the best approach.

Member

thaJeztah commented Mar 9, 2017

We're discussing this again in the maintainers meeting, and we think that busting the cache when switching to a different proxy is really a corner case. If someone wants it in the history, they need to define it in the Dockerfile. If they need to explicitly bust the cache, --no-cache is probably the best approach.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Mar 9, 2017

Contributor

Not to mention that if we started to itemize all the ways environmental variations could affect an image (example: switching networking config on the system hosting the daemon and getting a different DNS server without a recent (hours old) update to a CNAME/A record for a site used to download code), we would be able to never satisfy ourselves that cacheing was even a reasonable commitment based on the information we do know.

Contributor

estesp commented Mar 9, 2017

Not to mention that if we started to itemize all the ways environmental variations could affect an image (example: switching networking config on the system hosting the daemon and getting a different DNS server without a recent (hours old) update to a CNAME/A record for a site used to download code), we would be able to never satisfy ourselves that cacheing was even a reasonable commitment based on the information we do know.

@thaJeztah thaJeztah removed this from backlog in maintainers-session Mar 9, 2017

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Mar 9, 2017

Contributor

In the meeting we also said that this should cover defaults such as HTTP_PROXY, NO_PROXY, FTP_PROXY and HTTPS_PROXY. Any new build-args in the future should be excluded from this behavior.

Contributor

alexellis commented Mar 9, 2017

In the meeting we also said that this should cover defaults such as HTTP_PROXY, NO_PROXY, FTP_PROXY and HTTPS_PROXY. Any new build-args in the future should be excluded from this behavior.

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Mar 9, 2017

Contributor

My comments were not about allowing people to bust the cache, it was about making sure that when the cache is used its because we know for sure that its because we have the exact same inputs. Changing env vars available to a RUN are supposed to bust the cache.

Contributor

duglin commented Mar 9, 2017

My comments were not about allowing people to bust the cache, it was about making sure that when the cache is used its because we know for sure that its because we have the exact same inputs. Changing env vars available to a RUN are supposed to bust the cache.

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Mar 9, 2017

Collaborator

@dave-tucker can you please add a comment where the list of default build args are defined, so that if we ever add more default build args, it is understood that this new behavior is applied to that whole list. And if that is not desired for some newly added args, then there will have to be two different lists: "default" one and the "exclude_from_history" one. Today there is no reason to make them separate.

Collaborator

tiborvass commented Mar 9, 2017

@dave-tucker can you please add a comment where the list of default build args are defined, so that if we ever add more default build args, it is understood that this new behavior is applied to that whole list. And if that is not desired for some newly added args, then there will have to be two different lists: "default" one and the "exclude_from_history" one. Today there is no reason to make them separate.

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Mar 9, 2017

Collaborator

Maybe @cyli or @diogomonica could have a look at this too ?

Collaborator

tiborvass commented Mar 9, 2017

Maybe @cyli or @diogomonica could have a look at this too ?

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Mar 9, 2017

Contributor

To be clear.. if I have:

RUN doit.sh

and in doit.sh I have:

if [[ -n "$HTTP_PROXY" ]; then
   app1
else
   app2
fi

Then with this PR, building once with HTTP_PROXY set and once w/o will yield the exact same results because the cache will be used in the 2nd run even though it should not have been.

Contributor

duglin commented Mar 9, 2017

To be clear.. if I have:

RUN doit.sh

and in doit.sh I have:

if [[ -n "$HTTP_PROXY" ]; then
   app1
else
   app2
fi

Then with this PR, building once with HTTP_PROXY set and once w/o will yield the exact same results because the cache will be used in the 2nd run even though it should not have been.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Mar 9, 2017

Contributor

@duglin we discussed using --no-cache for this. As a long-term proxy user - this would mean changing from the office network to hotel WiFi would invalidate all my images' cache if I did a rebuild.

Contributor

alexellis commented Mar 9, 2017

@duglin we discussed using --no-cache for this. As a long-term proxy user - this would mean changing from the office network to hotel WiFi would invalidate all my images' cache if I did a rebuild.

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Mar 9, 2017

Contributor

I feel like we're talking about different things here. How does --no-cache help someone - aside from basically having them start from scratch?

Let's back up... first, in order for HTTP_PROXY to be used as a build-arg it needs to be specified on the docker build command line. This means the user is explicitly aware of it. Now, I realize that with this PR we're making it implicit but to me a config file property is just a short-cut so that people don't need to type something over and over - it shouldn't change the semantics of how things work.

So, would people really expect (ignoring any config file):

docker build --build-arg HTTP_PROXY=myproxy.com .

and

docker build .

to yield the same results if they used my "doit.sh" Dockerfile example above? I would claim no.

Asking the user to know how an image was previously built so that they could add --no-cache when the issue is related to an env var the Docker itself knows about and has total control over its value in the container seems more than a little odd to me.

Contributor

duglin commented Mar 9, 2017

I feel like we're talking about different things here. How does --no-cache help someone - aside from basically having them start from scratch?

Let's back up... first, in order for HTTP_PROXY to be used as a build-arg it needs to be specified on the docker build command line. This means the user is explicitly aware of it. Now, I realize that with this PR we're making it implicit but to me a config file property is just a short-cut so that people don't need to type something over and over - it shouldn't change the semantics of how things work.

So, would people really expect (ignoring any config file):

docker build --build-arg HTTP_PROXY=myproxy.com .

and

docker build .

to yield the same results if they used my "doit.sh" Dockerfile example above? I would claim no.

Asking the user to know how an image was previously built so that they could add --no-cache when the issue is related to an env var the Docker itself knows about and has total control over its value in the container seems more than a little odd to me.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Mar 9, 2017

Contributor

@duglin Why would anyone use your doit.sh example..it is quite remarkably a stupid thing to do that no one has any example of anyone doing :)

This is exactly what people are expecting to get the same results; the months spent arguing over even giving users http_proxy was an understanding that for some people to have "normal" internet access, this is a required feature. What has come up now is that many corporations have usernames and passwords embedded in this variable which are now stored in their (possibly public) images on DockerHub as a historical build layer. That is extremely unnecessary for a feature that is merely giving them access to the external internet and was never designed and should never be used to make build-time decisions. It's like saying I might decide to do something different in my Dockerfile because the TZ setting is different. The principle of least surprise is trying to work in our favor here. Some people like to come up with far-fetched reasons to surprise people with behavior, but I'm guessing that kind of Dockerfile is not exactly going to be a viral hit :)

Contributor

estesp commented Mar 9, 2017

@duglin Why would anyone use your doit.sh example..it is quite remarkably a stupid thing to do that no one has any example of anyone doing :)

This is exactly what people are expecting to get the same results; the months spent arguing over even giving users http_proxy was an understanding that for some people to have "normal" internet access, this is a required feature. What has come up now is that many corporations have usernames and passwords embedded in this variable which are now stored in their (possibly public) images on DockerHub as a historical build layer. That is extremely unnecessary for a feature that is merely giving them access to the external internet and was never designed and should never be used to make build-time decisions. It's like saying I might decide to do something different in my Dockerfile because the TZ setting is different. The principle of least surprise is trying to work in our favor here. Some people like to come up with far-fetched reasons to surprise people with behavior, but I'm guessing that kind of Dockerfile is not exactly going to be a viral hit :)

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Mar 9, 2017

Contributor

I'm not suggesting that we can or should try to take all aspects of a container into account. I've used this:

RUN date > build.timestamp

as an example of something people shouldn't do in their Dockerfiles if they really want "build.timestamp" to be the last time they ran the build because caching will muck with that logic.

And in cases like that, the limitation is easy to understand... Docker doesn't know what's going on in the cmd string so it can't make an educated guess as to whether or not to use the cache (at least for the cmd string part of it).

This case is different, as I mentioned above. This is about 1) an env var that the user is explicitly mentioning and 2) an env var the Docker is setting in the container, which means it knows whether its been changed.

I've heard people complain about how we have 2 types of env vars during builds:
1 - known, saved in the image, part of cache check
2 - known, not saved in the image, part of cache check
I'm sure they'll be thrilled to know that now we're adding a third:
3 - known, not saved in the image and not part of the cache checking
:-)

Contributor

duglin commented Mar 9, 2017

I'm not suggesting that we can or should try to take all aspects of a container into account. I've used this:

RUN date > build.timestamp

as an example of something people shouldn't do in their Dockerfiles if they really want "build.timestamp" to be the last time they ran the build because caching will muck with that logic.

And in cases like that, the limitation is easy to understand... Docker doesn't know what's going on in the cmd string so it can't make an educated guess as to whether or not to use the cache (at least for the cmd string part of it).

This case is different, as I mentioned above. This is about 1) an env var that the user is explicitly mentioning and 2) an env var the Docker is setting in the container, which means it knows whether its been changed.

I've heard people complain about how we have 2 types of env vars during builds:
1 - known, saved in the image, part of cache check
2 - known, not saved in the image, part of cache check
I'm sure they'll be thrilled to know that now we're adding a third:
3 - known, not saved in the image and not part of the cache checking
:-)

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 10, 2017

Member

The proposal is that if the HTTP_PROXY is an essential part of the image (i.e., the doit.sh example), that people should define it in the image;

ARG HTTP_PROXY
COPY doit.sh /
RUN /doit.sh

If it's defined in the Dockerfile, it's an explicit choice of the image-author, and is not ignored in the cache.

There probably are edge cases where a different proxy should produce a different result, but my expectation is that 99.9% of the cases, a proxy is just needed to give people internet access. Producing a different image because a proxy was used for those case, would be the same as invalidating the cache if the IP-address changes (/different network), or the DNS server changed (which would far more likely being a differentiator when building)

Member

thaJeztah commented Mar 10, 2017

The proposal is that if the HTTP_PROXY is an essential part of the image (i.e., the doit.sh example), that people should define it in the image;

ARG HTTP_PROXY
COPY doit.sh /
RUN /doit.sh

If it's defined in the Dockerfile, it's an explicit choice of the image-author, and is not ignored in the cache.

There probably are edge cases where a different proxy should produce a different result, but my expectation is that 99.9% of the cases, a proxy is just needed to give people internet access. Producing a different image because a proxy was used for those case, would be the same as invalidating the cache if the IP-address changes (/different network), or the DNS server changed (which would far more likely being a differentiator when building)

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 10, 2017

Contributor

@tiborvass the list of build args used comes from https://github.com/docker/docker/blob/master/builder/dockerfile/builder.go#L40. This is the only place these are defined in the code so this approach should be still be ok if new args are added.

Contributor

dave-tucker commented Mar 10, 2017

@tiborvass the list of build args used comes from https://github.com/docker/docker/blob/master/builder/dockerfile/builder.go#L40. This is the only place these are defined in the code so this approach should be still be ok if new args are added.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 10, 2017

Member

@dave-tucker we were discussing, and thought it would be good to put a comment there so that in case new "default" variables are added in future, we should review if those should be stripped from history as well, or if we should "split" the list at that point

Member

thaJeztah commented Mar 10, 2017

@dave-tucker we were discussing, and thought it would be good to put a comment there so that in case new "default" variables are added in future, we should review if those should be stripped from history as well, or if we should "split" the list at that point

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 13, 2017

Contributor

@thaJeztah @tiborvass added some comments to the code!

Contributor

dave-tucker commented Mar 13, 2017

@thaJeztah @tiborvass added some comments to the code!

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Mar 16, 2017

Collaborator

LGTM

Collaborator

tiborvass commented Mar 16, 2017

LGTM

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 17, 2017

Contributor

@thaJeztah added some docs 📖

Contributor

dave-tucker commented Mar 17, 2017

@thaJeztah added some docs 📖

@thaJeztah

Thanks! I left some suggestions, but looks like we've got things covered. Possibly @mstanleyjones would have suggestions for wording as well 😃

Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 17, 2017

Contributor

Addressed @thaJeztah's comments.
@alexellis' comment re: behaviour/behavior still stands. IMO there is only one correct way 🇬🇧 :trollface:

Contributor

dave-tucker commented Mar 17, 2017

Addressed @thaJeztah's comments.
@alexellis' comment re: behaviour/behavior still stands. IMO there is only one correct way 🇬🇧 :trollface:

Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
@thaJeztah

LGTM

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 21, 2017

Contributor

Updated docs to address comments from @mstanleyjones and @thaJeztah

Contributor

dave-tucker commented Mar 21, 2017

Updated docs to address comments from @mstanleyjones and @thaJeztah

@thaJeztah

Have some suggestions, but I'm ok with addressing them in a follow up 😅

Show outdated Hide outdated docs/reference/builder.md Outdated
Show outdated Hide outdated docs/reference/builder.md Outdated
Ignore built-in allowed build-args in image history
Removes the build-args from the image history if they are in the
BuiltinAllowedBuildArgs map unless they are explicitly defined in an ARG
instruction.

Signed-off-by: Dave Tucker <dt@docker.com>
@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Mar 21, 2017

Contributor

@mstanleyjones @thaJeztah thanks for the suggestions! Updated once again...

Contributor

dave-tucker commented Mar 21, 2017

@mstanleyjones @thaJeztah thanks for the suggestions! Updated once again...

@thaJeztah

LGTM, thanks!

@vdemeester vdemeester merged commit c497499 into moby:master Mar 22, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 31910 has succeeded
Details
janky Jenkins build Docker-PRs 40521 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 627 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 11609 has succeeded
Details
z Jenkins build Docker-PRs-s390x 509 has succeeded
Details

@GordonTheTurtle GordonTheTurtle added this to the 17.05.0 milestone Mar 22, 2017

@dave-tucker dave-tucker deleted the dave-tucker:ignore-default-build-args branch Mar 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment