-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block pulling Windows images on non-Windows daemons #29001
Conversation
Instead of downloading the config before starting to download the layers (which will add a noticeable delay), how running |
@aaronlehmann That can't be done on Windows, as some filenames in Linux images are not valid in NTFS resulting in a failure mid download. I'm not sure if it can be done in the Linux case, since the Linux daemon is not aware of foreign layers, the failure is occurring during download, rather than when attempting to run the resulting image. I'll try it on Linux pulling a Windows image, but even if it works, it would be diverging the handling between platforms. |
If the problem is foreign layers, those are easy to detect before downloading the layers. We could abort the pull if the platform is Linux and any of the descriptors in the manifest has a media type indiciating a foreign layer. That wouldn't require downloading the image config before starting the layer downloads. |
The problem isn't only the foreign layers. It just happens that all Windows images have at least one foreign layer, since Windows images must be based off one of the two base images, both of which contain foreign layers. Downloading both the image config and the layers at the same time is a race between which would fail first, and produce a different error depending on which download wins. I initially avoided this solution (reading image config before layer download), due to the expected performance impact, and did what you are suggesting (checking the layer descriptor before attempting to download a foreign layer). See #28903. It was suggested there to fail earlier by doing the same checks that Windows already does on Linux. Both of these solve the same problem, but have different drawbacks. Edit: Just to be clear, I don't have a huge preference for this solution vs #28903, they both (almost) solve the same problem of Linux users being confused when attempting to download a Windows based image. |
ping @aaronlehmann @stevvooe any way to get this PR, or #28903, moving again? |
My preference here is for something like this (untested, only meant to illustrate the idea): https://github.com/aaronlehmann/docker/tree/block-wrong-os This runs the download manager in a separate goroutine, so the The NTFS issue wouldn't be a factor because Windows daemons always download and check the config first. The foreign layer issue on Linux could be solved by adding a check for foreign layers in the loop over |
@darrenstahlmsft WDYT about @aaronlehmann 's comment? |
ping @darrenstahlmsft |
also ping @stevvooe PTAL |
b6d0e66
to
40ca3c9
Compare
Updated to be similar to the commit by @aaronlehmann except that in the case of an error downloading the layer, the config download is not cancelled so that the resulting error message takes priority over the layer download failure. I think that lets us remove the serialization of the Windows config as well. I've left that for a follow-up PR though. |
I see that |
// Regression test for https://github.com/docker/docker/issues/28892 | ||
func (s *DockerSuite) TestPullWindowsImageFailsOnLinux(c *check.C) { | ||
testRequires(c, DaemonIsLinux, Network) | ||
_, _, err := dockerCmdWithError("pull", "microsoft/nanoserver") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd serve a minimal Windows image from a local registry instead of depending on Docker Hub. I just filed #30626 about something similar.
This is probably outside the scope of this PR though. I think the download-frozen-image-v2.sh
script in the Dockerfile
is capable of downloading the image at docker build
time, but it would take some extra scripting work to load this image into a local registry without the use of docker pull
/ docker push
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be even better to use some plain "scratch" image with Windows config, instead of downloading 333MB nanoserver image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkihiroSuda There is no such thing. Nanoserver is the smallest Windows image available.
distribution/pull_v2.go
Outdated
select { | ||
case <-downloadsDone: | ||
case err = <-errChan: | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling cancel
here, lets add defer cancel()
after the call to WithCancel
. We should be making sure to always cancel the context when returning anyway, as this releases the resources associated with it.
distribution/pull_v2.go
Outdated
@@ -552,6 +553,7 @@ func (p *v2Puller) pullSchema2(ctx context.Context, ref reference.Named, mfst *s | |||
configJSON []byte // raw serialized image config | |||
downloadedRootFS *image.RootFS // rootFS from registered layers | |||
configRootFS *image.RootFS // rootFS from configuration | |||
release func() // relase resources from rootFS download |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "relase"
This looks like the right approach to me. Moving to code review. Let me know if I can be of any help in tracking down the plugin test issue. Also pinging @dmcgowan who converted plugins to use the |
40ca3c9
to
f3cfb73
Compare
Fixed the plugin error, but now it's failing on TestDaemonNoSpaceLeftOnDeviceError, which still seems related. I'll take a look at that tomorrow. |
distribution/pull_v2.go
Outdated
@@ -533,7 +533,8 @@ func (p *v2Puller) pullSchema2(ctx context.Context, ref reference.Named, mfst *s | |||
} | |||
|
|||
configChan := make(chan []byte, 1) | |||
errChan := make(chan error, 1) | |||
errChan := make(chan error, 2) | |||
downloadsDone := make(chan error, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be downloadsDone := make(chan struct{})
, since no data is ever conveyed over it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I was certain I'd made it a struct{} chan in the first place, oops... Fixed.
Ah, I think I see the problem that is causing
One way to fix this would be to use separate error channels for layer downloads and the config retrieval. |
480c8b1
to
0755ed4
Compare
Signed-off-by: Darren Stahl <darst@microsoft.com>
0755ed4
to
d553040
Compare
Good catch @aaronlehmann. Fixed the hang. |
LGTM |
if runtime.GOOS == "windows" && unmarshalledConfig.OS == "linux" { | ||
return nil, fmt.Errorf("image operating system %q cannot be used on this platform", unmarshalledConfig.OS) | ||
} else if runtime.GOOS != "windows" && unmarshalledConfig.OS == "windows" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just check runtime.GOOS != unmarshelledConfig.OS
here?
(Although we can support runtime.GOOS == "freebsd" && unmarshalledConfig.OS == "linux"
in future 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just check
runtime.GOOS != unmarshelledConfig.OS
here?
I'm afraid this might have some unintended consequences. For example, what if there are images out there built with nonstandard tools that don't set the OS
field? For now I think it's best to explicitly check for incompatible configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
LGTM ping @anusha-ragunathan @vdemeester |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🐸
This is an alternative solution to #28903
- What I did
fixes #28892
Block pulling Windows images on Linux daemons.
- How I did it
If the listed OS of an image's config is Windows, but the current daemon platform is not Windows, fail to download the image.
- How to verify it
Pull
microsoft/nanoserver
on a Linux daemon. See test that checks this.- Description for the changelog
Block pulling of Windows images on a non-windows daemon.
Signed-off-by: Darren Stahl darst@microsoft.com
/cc @jhowardmsft @jstarks