New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block pulling Windows images on non-Windows daemons #29001

Merged
merged 1 commit into from Feb 16, 2017

Conversation

@darstahl
Contributor

darstahl commented Nov 30, 2016

This is an alternative solution to #28903

- What I did

fixes #28892

Block pulling Windows images on Linux daemons.

- How I did it

If the listed OS of an image's config is Windows, but the current daemon platform is not Windows, fail to download the image.

- How to verify it

Pull microsoft/nanoserver on a Linux daemon. See test that checks this.

- Description for the changelog

Block pulling of Windows images on a non-windows daemon.

Signed-off-by: Darren Stahl darst@microsoft.com

/cc @jhowardmsft @jstarks

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Nov 30, 2016

Contributor

Instead of downloading the config before starting to download the layers (which will add a noticeable delay), how running p.config.DownloadManager.Download in a goroutine and aborting it by canceling the context if the config turns out to be incompatible?

Contributor

aaronlehmann commented Nov 30, 2016

Instead of downloading the config before starting to download the layers (which will add a noticeable delay), how running p.config.DownloadManager.Download in a goroutine and aborting it by canceling the context if the config turns out to be incompatible?

@darstahl

This comment has been minimized.

Show comment
Hide comment
@darstahl

darstahl Nov 30, 2016

Contributor

@aaronlehmann That can't be done on Windows, as some filenames in Linux images are not valid in NTFS resulting in a failure mid download. I'm not sure if it can be done in the Linux case, since the Linux daemon is not aware of foreign layers, the failure is occurring during download, rather than when attempting to run the resulting image. I'll try it on Linux pulling a Windows image, but even if it works, it would be diverging the handling between platforms.

Contributor

darstahl commented Nov 30, 2016

@aaronlehmann That can't be done on Windows, as some filenames in Linux images are not valid in NTFS resulting in a failure mid download. I'm not sure if it can be done in the Linux case, since the Linux daemon is not aware of foreign layers, the failure is occurring during download, rather than when attempting to run the resulting image. I'll try it on Linux pulling a Windows image, but even if it works, it would be diverging the handling between platforms.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Nov 30, 2016

Contributor

If the problem is foreign layers, those are easy to detect before downloading the layers. We could abort the pull if the platform is Linux and any of the descriptors in the manifest has a media type indiciating a foreign layer. That wouldn't require downloading the image config before starting the layer downloads.

Contributor

aaronlehmann commented Nov 30, 2016

If the problem is foreign layers, those are easy to detect before downloading the layers. We could abort the pull if the platform is Linux and any of the descriptors in the manifest has a media type indiciating a foreign layer. That wouldn't require downloading the image config before starting the layer downloads.

@darstahl

This comment has been minimized.

Show comment
Hide comment
@darstahl

darstahl Dec 1, 2016

Contributor

The problem isn't only the foreign layers. It just happens that all Windows images have at least one foreign layer, since Windows images must be based off one of the two base images, both of which contain foreign layers. Downloading both the image config and the layers at the same time is a race between which would fail first, and produce a different error depending on which download wins.

I initially avoided this solution (reading image config before layer download), due to the expected performance impact, and did what you are suggesting (checking the layer descriptor before attempting to download a foreign layer). See #28903. It was suggested there to fail earlier by doing the same checks that Windows already does on Linux. Both of these solve the same problem, but have different drawbacks.

Edit: Just to be clear, I don't have a huge preference for this solution vs #28903, they both (almost) solve the same problem of Linux users being confused when attempting to download a Windows based image.

Contributor

darstahl commented Dec 1, 2016

The problem isn't only the foreign layers. It just happens that all Windows images have at least one foreign layer, since Windows images must be based off one of the two base images, both of which contain foreign layers. Downloading both the image config and the layers at the same time is a race between which would fail first, and produce a different error depending on which download wins.

I initially avoided this solution (reading image config before layer download), due to the expected performance impact, and did what you are suggesting (checking the layer descriptor before attempting to download a foreign layer). See #28903. It was suggested there to fail earlier by doing the same checks that Windows already does on Linux. Both of these solve the same problem, but have different drawbacks.

Edit: Just to be clear, I don't have a huge preference for this solution vs #28903, they both (almost) solve the same problem of Linux users being confused when attempting to download a Windows based image.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 12, 2016

Member

ping @aaronlehmann @stevvooe any way to get this PR, or #28903, moving again?

Member

thaJeztah commented Dec 12, 2016

ping @aaronlehmann @stevvooe any way to get this PR, or #28903, moving again?

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Dec 12, 2016

Contributor

My preference here is for something like this (untested, only meant to illustrate the idea):

https://github.com/aaronlehmann/docker/tree/block-wrong-os

This runs the download manager in a separate goroutine, so the pullSchema2 function can abort the pull as soon as the config is downloaded, if that config specifies the wrong platform. This doesn't require fully downloading the config before starting the layer downloads on Linux.

The NTFS issue wouldn't be a factor because Windows daemons always download and check the config first. The foreign layer issue on Linux could be solved by adding a check for foreign layers in the loop over mfst.Layers at the top of pullSchema2, before any of these downloads start. This will let us fail even faster than when trying to download a layer.

Contributor

aaronlehmann commented Dec 12, 2016

My preference here is for something like this (untested, only meant to illustrate the idea):

https://github.com/aaronlehmann/docker/tree/block-wrong-os

This runs the download manager in a separate goroutine, so the pullSchema2 function can abort the pull as soon as the config is downloaded, if that config specifies the wrong platform. This doesn't require fully downloading the config before starting the layer downloads on Linux.

The NTFS issue wouldn't be a factor because Windows daemons always download and check the config first. The foreign layer issue on Linux could be solved by adding a check for foreign layers in the loop over mfst.Layers at the top of pullSchema2, before any of these downloads start. This will let us fail even faster than when trying to download a layer.

@AkihiroSuda

This comment has been minimized.

Show comment
Hide comment
@AkihiroSuda

AkihiroSuda Jan 19, 2017

Member

@darrenstahlmsft

WDYT about @aaronlehmann 's comment?

Member

AkihiroSuda commented Jan 19, 2017

@darrenstahlmsft

WDYT about @aaronlehmann 's comment?

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 27, 2017

Contributor

ping @darrenstahlmsft

Contributor

LK4D4 commented Jan 27, 2017

ping @darrenstahlmsft

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jan 30, 2017

Member

also ping @stevvooe PTAL

Member

thaJeztah commented Jan 30, 2017

also ping @stevvooe PTAL

@darstahl

This comment has been minimized.

Show comment
Hide comment
@darstahl

darstahl Jan 31, 2017

Contributor

Updated to be similar to the commit by @aaronlehmann except that in the case of an error downloading the layer, the config download is not cancelled so that the resulting error message takes priority over the layer download failure.

I think that lets us remove the serialization of the Windows config as well. I've left that for a follow-up PR though.

Contributor

darstahl commented Jan 31, 2017

Updated to be similar to the commit by @aaronlehmann except that in the case of an error downloading the layer, the config download is not cancelled so that the resulting error message takes priority over the layer download failure.

I think that lets us remove the serialization of the Windows config as well. I've left that for a follow-up PR though.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 1, 2017

Contributor

I see that TestEventsPluginOps timed out on both janky and experimental, so I suspect this is causing some problem with pulling plugins.

Contributor

aaronlehmann commented Feb 1, 2017

I see that TestEventsPluginOps timed out on both janky and experimental, so I suspect this is causing some problem with pulling plugins.

// Regression test for https://github.com/docker/docker/issues/28892
func (s *DockerSuite) TestPullWindowsImageFailsOnLinux(c *check.C) {
testRequires(c, DaemonIsLinux, Network)
_, _, err := dockerCmdWithError("pull", "microsoft/nanoserver")

This comment has been minimized.

@aaronlehmann

aaronlehmann Feb 1, 2017

Contributor

Ideally we'd serve a minimal Windows image from a local registry instead of depending on Docker Hub. I just filed #30626 about something similar.

This is probably outside the scope of this PR though. I think the download-frozen-image-v2.sh script in the Dockerfile is capable of downloading the image at docker build time, but it would take some extra scripting work to load this image into a local registry without the use of docker pull / docker push.

@aaronlehmann

aaronlehmann Feb 1, 2017

Contributor

Ideally we'd serve a minimal Windows image from a local registry instead of depending on Docker Hub. I just filed #30626 about something similar.

This is probably outside the scope of this PR though. I think the download-frozen-image-v2.sh script in the Dockerfile is capable of downloading the image at docker build time, but it would take some extra scripting work to load this image into a local registry without the use of docker pull / docker push.

This comment has been minimized.

@AkihiroSuda

AkihiroSuda Feb 3, 2017

Member

It would be even better to use some plain "scratch" image with Windows config, instead of downloading 333MB nanoserver image.

@AkihiroSuda

AkihiroSuda Feb 3, 2017

Member

It would be even better to use some plain "scratch" image with Windows config, instead of downloading 333MB nanoserver image.

This comment has been minimized.

@jhowardmsft

jhowardmsft Feb 3, 2017

Contributor

@AkihiroSuda There is no such thing. Nanoserver is the smallest Windows image available.

@jhowardmsft

jhowardmsft Feb 3, 2017

Contributor

@AkihiroSuda There is no such thing. Nanoserver is the smallest Windows image available.

Show outdated Hide outdated distribution/pull_v2.go Outdated
Show outdated Hide outdated distribution/pull_v2.go Outdated
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 1, 2017

Contributor

This looks like the right approach to me. Moving to code review.

Let me know if I can be of any help in tracking down the plugin test issue. Also pinging @dmcgowan who converted plugins to use the pull_v2 code.

Contributor

aaronlehmann commented Feb 1, 2017

This looks like the right approach to me. Moving to code review.

Let me know if I can be of any help in tracking down the plugin test issue. Also pinging @dmcgowan who converted plugins to use the pull_v2 code.

@darstahl

This comment has been minimized.

Show comment
Hide comment
@darstahl

darstahl Feb 2, 2017

Contributor

Fixed the plugin error, but now it's failing on TestDaemonNoSpaceLeftOnDeviceError, which still seems related. I'll take a look at that tomorrow.

Contributor

darstahl commented Feb 2, 2017

Fixed the plugin error, but now it's failing on TestDaemonNoSpaceLeftOnDeviceError, which still seems related. I'll take a look at that tomorrow.

Show outdated Hide outdated distribution/pull_v2.go Outdated
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 2, 2017

Contributor

Ah, I think I see the problem that is causing TestDaemonNoSpaceLeftOnDeviceError to hang.

receiveConfig may get an error from the layer downloads on errChan, and afterwards the select below receiveConfig will wait for activity on downloadsDone or errChan, but neither will happen, because the downloading goroutine has already returned and its message on errChan was swallowed up by receiveConfig.

One way to fix this would be to use separate error channels for layer downloads and the config retrieval.

Contributor

aaronlehmann commented Feb 2, 2017

Ah, I think I see the problem that is causing TestDaemonNoSpaceLeftOnDeviceError to hang.

receiveConfig may get an error from the layer downloads on errChan, and afterwards the select below receiveConfig will wait for activity on downloadsDone or errChan, but neither will happen, because the downloading goroutine has already returned and its message on errChan was swallowed up by receiveConfig.

One way to fix this would be to use separate error channels for layer downloads and the config retrieval.

Block Windows images on Linux
Signed-off-by: Darren Stahl <darst@microsoft.com>
@darstahl

This comment has been minimized.

Show comment
Hide comment
@darstahl

darstahl Feb 2, 2017

Contributor

Good catch @aaronlehmann. Fixed the hang.

Contributor

darstahl commented Feb 2, 2017

Good catch @aaronlehmann. Fixed the hang.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 2, 2017

Contributor

LGTM

Contributor

aaronlehmann commented Feb 2, 2017

LGTM

if runtime.GOOS == "windows" && unmarshalledConfig.OS == "linux" {
return nil, fmt.Errorf("image operating system %q cannot be used on this platform", unmarshalledConfig.OS)
} else if runtime.GOOS != "windows" && unmarshalledConfig.OS == "windows" {

This comment has been minimized.

@AkihiroSuda

AkihiroSuda Feb 3, 2017

Member

why not just check runtime.GOOS != unmarshelledConfig.OS here?
(Although we can support runtime.GOOS == "freebsd" && unmarshalledConfig.OS == "linux" in future 😄)

@AkihiroSuda

AkihiroSuda Feb 3, 2017

Member

why not just check runtime.GOOS != unmarshelledConfig.OS here?
(Although we can support runtime.GOOS == "freebsd" && unmarshalledConfig.OS == "linux" in future 😄)

This comment has been minimized.

@aaronlehmann

aaronlehmann Feb 3, 2017

Contributor

why not just check runtime.GOOS != unmarshelledConfig.OS here?

I'm afraid this might have some unintended consequences. For example, what if there are images out there built with nonstandard tools that don't set the OS field? For now I think it's best to explicitly check for incompatible configurations.

@aaronlehmann

aaronlehmann Feb 3, 2017

Contributor

why not just check runtime.GOOS != unmarshelledConfig.OS here?

I'm afraid this might have some unintended consequences. For example, what if there are images out there built with nonstandard tools that don't set the OS field? For now I think it's best to explicitly check for incompatible configurations.

This comment has been minimized.

@AkihiroSuda
@AkihiroSuda

AkihiroSuda Feb 3, 2017

Member

ok

@runcom

This comment has been minimized.

Show comment
Hide comment
@runcom
Member

runcom commented Feb 16, 2017

@vdemeester

LGTM 🐸

@vdemeester vdemeester merged commit c31f73a into moby:master Feb 16, 2017

4 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 30278 has succeeded
Details
janky Jenkins build Docker-PRs 38891 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 9939 has succeeded
Details

@GordonTheTurtle GordonTheTurtle added this to the 1.14.0 milestone Feb 16, 2017

@darstahl darstahl deleted the darstahl:WindowsOnLinux branch Apr 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment