New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build with `ADD` urls without any sub path #34217

Merged
merged 2 commits into from Sep 14, 2017

Conversation

@yongtang
Member

yongtang commented Jul 22, 2017

- What I did
This fix tries to address the issue raised in #34208 where in Dockerfile an ADD followed by an url without any sub path will cause an error.

The issue is because the temporary filename relies on the sub path.

- How I did it
This fix fixes this issue by having index.html as the filename if no sub path exists.

- How to verify it
An integration test has been added.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)
5vznpa4dkmk

This fix fixes #34208.

Signed-off-by: Yong Tang yong.tang.github@outlook.com

@thaJeztah

Thanks for looking into this; I suspected this was the cause of the issue being reported, but think this may be difficult to fix (without introducing unwanted side-effects)

perhaps @tonistiigi has some ideas as well

Show outdated Hide outdated builder/dockerfile/copy.go Outdated
Show outdated Hide outdated builder/dockerfile/copy.go Outdated
@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Jul 22, 2017

Member

Thanks @thaJeztah for the review. I changed the default filename to unnamed-file.

For URL resources, the build process is actually different from a local file system. A URL always points to one file. For example,

# Add the default web page of https://example.com/
ADD https://example.com/ filename.foo

#Add the web page of "Get Started, Part 2: Containers"
ADD https://docs.docker.com/get-started/part2/ filename.foo 

Also, see the line:

filename := filepath.Base(filepath.FromSlash(u.Path)) // Ensure in platform semantics

filename := filepath.Base(filepath.FromSlash(u.Path))

In case of the target directory like:

ADD https://example.com/ /some/path/

The filename is actually path (after stripping /some/.../).

The only issue with filepath.Base(filepath.FromSlash(u.Path)) is that, it does not handle empty filenames. If u.Path == "" or u.Path == "/", then "." is returned.

filename := filepath.Base(filepath.FromSlash(u.Path)) // returned value is actually "."

For that I tend to think having a default filename of unnamed-file for u.Path == "" or u.Path == "/" would be suffice to address the issue.

Member

yongtang commented Jul 22, 2017

Thanks @thaJeztah for the review. I changed the default filename to unnamed-file.

For URL resources, the build process is actually different from a local file system. A URL always points to one file. For example,

# Add the default web page of https://example.com/
ADD https://example.com/ filename.foo

#Add the web page of "Get Started, Part 2: Containers"
ADD https://docs.docker.com/get-started/part2/ filename.foo 

Also, see the line:

filename := filepath.Base(filepath.FromSlash(u.Path)) // Ensure in platform semantics

filename := filepath.Base(filepath.FromSlash(u.Path))

In case of the target directory like:

ADD https://example.com/ /some/path/

The filename is actually path (after stripping /some/.../).

The only issue with filepath.Base(filepath.FromSlash(u.Path)) is that, it does not handle empty filenames. If u.Path == "" or u.Path == "/", then "." is returned.

filename := filepath.Base(filepath.FromSlash(u.Path)) // returned value is actually "."

For that I tend to think having a default filename of unnamed-file for u.Path == "" or u.Path == "/" would be suffice to address the issue.

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 24, 2017

Contributor

Is it necessary for the download target's filename to have anything to do with the URL path at all? IOW why not just use ioutil.TempFile() (either passing it the existing tmpdir or arranging to clean it up separately) to get a unique local name?

Contributor

ijc commented Jul 24, 2017

Is it necessary for the download target's filename to have anything to do with the URL path at all? IOW why not just use ioutil.TempFile() (either passing it the existing tmpdir or arranging to clean it up separately) to get a unique local name?

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 24, 2017

Contributor

Or since you already have a dedicated/local/unique tmpdir, why not just call it bob?

Contributor

ijc commented Jul 24, 2017

Or since you already have a dedicated/local/unique tmpdir, why not just call it bob?

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Jul 24, 2017

Member

@ijc The name is not an issue as long as the target is also a file:

ADD http://example.com /filename.foo

However, I realized it will be an issue in case the target is a directory (was pointed out by @thaJeztah):

ADD http://example.com /

In that case, the container will have a file of /blob in its file system. That is something that should be addressed.

I think in case

ADD http://example.com /

is specified, an error should be returned.

In case

ADD https://docs.docker.com/get-started/part2/ /

is specified, a filename named /part2 should be created in container's file system. (The part2 actually carries a meaning in HTTP 's URL and it is very common in some web pages.)

I will update the PR.

Member

yongtang commented Jul 24, 2017

@ijc The name is not an issue as long as the target is also a file:

ADD http://example.com /filename.foo

However, I realized it will be an issue in case the target is a directory (was pointed out by @thaJeztah):

ADD http://example.com /

In that case, the container will have a file of /blob in its file system. That is something that should be addressed.

I think in case

ADD http://example.com /

is specified, an error should be returned.

In case

ADD https://docs.docker.com/get-started/part2/ /

is specified, a filename named /part2 should be created in container's file system. (The part2 actually carries a meaning in HTTP 's URL and it is very common in some web pages.)

I will update the PR.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 24, 2017

Member

I don't think we should take part2 as name here; it has a trailing slash, so it's a directory, not a file (and the webserver happens to be configured to treat index.html as a default.

In such a case;

  • if webserver has a header specifying the filename, use it
  • if not, and no target filename was specified, we should probably error, as there's no "good" choice
Member

thaJeztah commented Jul 24, 2017

I don't think we should take part2 as name here; it has a trailing slash, so it's a directory, not a file (and the webserver happens to be configured to treat index.html as a default.

In such a case;

  • if webserver has a header specifying the filename, use it
  • if not, and no target filename was specified, we should probably error, as there's no "good" choice
@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 25, 2017

Contributor

I had missed that filename was used in the return as well, so I thought it was just a local tempfile, sorry.

I agree with #34217 (comment) and erroring rather than fabricating names. Content-Disposition is the header to look for I think.

Contributor

ijc commented Jul 25, 2017

I had missed that filename was used in the return as well, so I thought it was just a local tempfile, sorry.

I agree with #34217 (comment) and erroring rather than fabricating names. Content-Disposition is the header to look for I think.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 25, 2017

Member

Yes, I think that's the right header (see my first comment #34217 (comment))

Member

thaJeztah commented Jul 25, 2017

Yes, I think that's the right header (see my first comment #34217 (comment))

Show outdated Hide outdated builder/dockerfile/copy.go Outdated

@yongtang yongtang requested review from dnephin and tonistiigi as code owners Jul 27, 2017

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Jul 27, 2017

Member

Thanks all for the review. The PR has been updated so that srcURL, dest, and Content-Disposition would be taken into consideration to find the file name. Please take a look and let me know if there are any issues.

Member

yongtang commented Jul 27, 2017

Thanks all for the review. The PR has been updated so that srcURL, dest, and Content-Disposition would be taken into consideration to find the file name. Please take a look and let me know if there are any issues.

Show outdated Hide outdated builder/dockerfile/copy.go Outdated
Show outdated Hide outdated integration-cli/docker_cli_build_test.go Outdated
@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Jul 28, 2017

Member

I see now my suggestion has already been discussed.

I had missed that filename was used in the return as well, so I thought it was just a local tempfile, sorry.

You were correct. It is used in the return, but only as the source for the copy operation.

We should just use any random or hardcoded filename if filepath.Base() is not valid. We could even removed filepath.Base() entirely.

Member

dnephin commented Jul 28, 2017

I see now my suggestion has already been discussed.

I had missed that filename was used in the return as well, so I thought it was just a local tempfile, sorry.

You were correct. It is used in the return, but only as the source for the copy operation.

We should just use any random or hardcoded filename if filepath.Base() is not valid. We could even removed filepath.Base() entirely.

Show outdated Hide outdated builder/dockerfile/copy.go Outdated
@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Aug 15, 2017

Member

@dnephin Thanks for the review. The PR has been updated. The dest is still carried though as the docker file will need to have some "definitive" name. If both src and dest end with '/' then it it is actually problematic:

ADD http://example.org /

In the above case, we could not use a random name because we don't want to copy a "random__name__xxx" file to /:

/__random__name__xxx__ 

Please take a look.

Member

yongtang commented Aug 15, 2017

@dnephin Thanks for the review. The PR has been updated. The dest is still carried though as the docker file will need to have some "definitive" name. If both src and dest end with '/' then it it is actually problematic:

ADD http://example.org /

In the above case, we could not use a random name because we don't want to copy a "random__name__xxx" file to /:

/__random__name__xxx__ 

Please take a look.

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Aug 15, 2017

Member

In the above case, we could not use a random name because we don't want to copy a "random__name__xxx" file to /

Yes, that's a good point. So the problem case is when source is a directory path, and dest is also a directory path. If dest is a file path we don't really need it, because it will be used later on during the copy. I think this is not really obvious from the code right now.

Instead of passing in dest string what do you think about this:

  • keep the existing definition for type sourceDownloader
  • have downloadSource attempt to get a filename from either u.Path or the response
  • if it fails to get a filename from either then return the source and random generated filename, with a custom error errUndefinedPath (or something like that).

In the caller getCopyInfoForSourcePath() it can check dest if it gets that error:

remote, path, err := o.download(orig)
switch err {
case err == errUndefinedPath && strings.HasSuffix(dest, "/"):
     return nil, errors.Errorf("cannot determine filename for source %s...")
case err:
    return nil, err
}

In the case where err == errUndefinedPath but dest is a path to a file then we're ok, because dest will be used later, right?

Member

dnephin commented Aug 15, 2017

In the above case, we could not use a random name because we don't want to copy a "random__name__xxx" file to /

Yes, that's a good point. So the problem case is when source is a directory path, and dest is also a directory path. If dest is a file path we don't really need it, because it will be used later on during the copy. I think this is not really obvious from the code right now.

Instead of passing in dest string what do you think about this:

  • keep the existing definition for type sourceDownloader
  • have downloadSource attempt to get a filename from either u.Path or the response
  • if it fails to get a filename from either then return the source and random generated filename, with a custom error errUndefinedPath (or something like that).

In the caller getCopyInfoForSourcePath() it can check dest if it gets that error:

remote, path, err := o.download(orig)
switch err {
case err == errUndefinedPath && strings.HasSuffix(dest, "/"):
     return nil, errors.Errorf("cannot determine filename for source %s...")
case err:
    return nil, err
}

In the case where err == errUndefinedPath but dest is a path to a file then we're ok, because dest will be used later, right?

Show outdated Hide outdated builder/dockerfile/copy.go Outdated
@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Aug 23, 2017

Member

@dnephin Thanks for the review. The PR has been updated. Please take a look.

Member

yongtang commented Aug 23, 2017

@dnephin Thanks for the review. The PR has been updated. Please take a look.

@dnephin

thanks, LGTM!

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang
Member

yongtang commented Sep 12, 2017

Ping @thaJeztah.

@thaJeztah

LGTM, thanks!

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 13, 2017

Member

I triggered CI, because it was quite a while back that it was last run on this PR

Member

thaJeztah commented Sep 13, 2017

I triggered CI, because it was quite a while back that it was last run on this PR

yongtang added some commits Jul 21, 2017

Fix build with `ADD` urls without any sub path
This fix tries to address the issue raised in #34208 where
in Dockerfile an `ADD` followed by an url without any sub path
will cause an error.

The issue is because the temporary filename relies on the sub path.

An integration test has been added.

This fix fixes #34208.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Add unit test to cover changes.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Sep 14, 2017

Member

All tests passed. Thanks all for the review!

Member

yongtang commented Sep 14, 2017

All tests passed. Thanks all for the review!

@yongtang yongtang merged commit d60c186 into moby:master Sep 14, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 36838 has succeeded
Details
janky Jenkins build Docker-PRs 45478 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 5879 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 17072 has succeeded
Details
z Jenkins build Docker-PRs-s390x 5658 has succeeded
Details

@yongtang yongtang deleted the yongtang:34208-http-add-root branch Sep 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment