Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block pulling Windows images on non-Windows daemons #29001

Merged
merged 1 commit into from
Feb 16, 2017

Conversation

darstahl
Copy link
Contributor

@darstahl darstahl commented Nov 30, 2016

This is an alternative solution to #28903

- What I did

fixes #28892

Block pulling Windows images on Linux daemons.

- How I did it

If the listed OS of an image's config is Windows, but the current daemon platform is not Windows, fail to download the image.

- How to verify it

Pull microsoft/nanoserver on a Linux daemon. See test that checks this.

- Description for the changelog

Block pulling of Windows images on a non-windows daemon.

Signed-off-by: Darren Stahl darst@microsoft.com

/cc @jhowardmsft @jstarks

@aaronlehmann
Copy link
Contributor

Instead of downloading the config before starting to download the layers (which will add a noticeable delay), how running p.config.DownloadManager.Download in a goroutine and aborting it by canceling the context if the config turns out to be incompatible?

@darstahl
Copy link
Contributor Author

darstahl commented Nov 30, 2016

@aaronlehmann That can't be done on Windows, as some filenames in Linux images are not valid in NTFS resulting in a failure mid download. I'm not sure if it can be done in the Linux case, since the Linux daemon is not aware of foreign layers, the failure is occurring during download, rather than when attempting to run the resulting image. I'll try it on Linux pulling a Windows image, but even if it works, it would be diverging the handling between platforms.

@aaronlehmann
Copy link
Contributor

If the problem is foreign layers, those are easy to detect before downloading the layers. We could abort the pull if the platform is Linux and any of the descriptors in the manifest has a media type indiciating a foreign layer. That wouldn't require downloading the image config before starting the layer downloads.

@darstahl
Copy link
Contributor Author

darstahl commented Dec 1, 2016

The problem isn't only the foreign layers. It just happens that all Windows images have at least one foreign layer, since Windows images must be based off one of the two base images, both of which contain foreign layers. Downloading both the image config and the layers at the same time is a race between which would fail first, and produce a different error depending on which download wins.

I initially avoided this solution (reading image config before layer download), due to the expected performance impact, and did what you are suggesting (checking the layer descriptor before attempting to download a foreign layer). See #28903. It was suggested there to fail earlier by doing the same checks that Windows already does on Linux. Both of these solve the same problem, but have different drawbacks.

Edit: Just to be clear, I don't have a huge preference for this solution vs #28903, they both (almost) solve the same problem of Linux users being confused when attempting to download a Windows based image.

@thaJeztah
Copy link
Member

ping @aaronlehmann @stevvooe any way to get this PR, or #28903, moving again?

@aaronlehmann
Copy link
Contributor

My preference here is for something like this (untested, only meant to illustrate the idea):

https://github.com/aaronlehmann/docker/tree/block-wrong-os

This runs the download manager in a separate goroutine, so the pullSchema2 function can abort the pull as soon as the config is downloaded, if that config specifies the wrong platform. This doesn't require fully downloading the config before starting the layer downloads on Linux.

The NTFS issue wouldn't be a factor because Windows daemons always download and check the config first. The foreign layer issue on Linux could be solved by adding a check for foreign layers in the loop over mfst.Layers at the top of pullSchema2, before any of these downloads start. This will let us fail even faster than when trying to download a layer.

@AkihiroSuda
Copy link
Member

@darrenstahlmsft

WDYT about @aaronlehmann 's comment?

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 27, 2017

ping @darrenstahlmsft

@thaJeztah
Copy link
Member

also ping @stevvooe PTAL

@darstahl
Copy link
Contributor Author

Updated to be similar to the commit by @aaronlehmann except that in the case of an error downloading the layer, the config download is not cancelled so that the resulting error message takes priority over the layer download failure.

I think that lets us remove the serialization of the Windows config as well. I've left that for a follow-up PR though.

@aaronlehmann
Copy link
Contributor

I see that TestEventsPluginOps timed out on both janky and experimental, so I suspect this is causing some problem with pulling plugins.

// Regression test for https://github.com/docker/docker/issues/28892
func (s *DockerSuite) TestPullWindowsImageFailsOnLinux(c *check.C) {
testRequires(c, DaemonIsLinux, Network)
_, _, err := dockerCmdWithError("pull", "microsoft/nanoserver")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd serve a minimal Windows image from a local registry instead of depending on Docker Hub. I just filed #30626 about something similar.

This is probably outside the scope of this PR though. I think the download-frozen-image-v2.sh script in the Dockerfile is capable of downloading the image at docker build time, but it would take some extra scripting work to load this image into a local registry without the use of docker pull / docker push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even better to use some plain "scratch" image with Windows config, instead of downloading 333MB nanoserver image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda There is no such thing. Nanoserver is the smallest Windows image available.

select {
case <-downloadsDone:
case err = <-errChan:
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling cancel here, lets add defer cancel() after the call to WithCancel. We should be making sure to always cancel the context when returning anyway, as this releases the resources associated with it.

@@ -552,6 +553,7 @@ func (p *v2Puller) pullSchema2(ctx context.Context, ref reference.Named, mfst *s
configJSON []byte // raw serialized image config
downloadedRootFS *image.RootFS // rootFS from registered layers
configRootFS *image.RootFS // rootFS from configuration
release func() // relase resources from rootFS download
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "relase"

@aaronlehmann
Copy link
Contributor

This looks like the right approach to me. Moving to code review.

Let me know if I can be of any help in tracking down the plugin test issue. Also pinging @dmcgowan who converted plugins to use the pull_v2 code.

@darstahl
Copy link
Contributor Author

darstahl commented Feb 2, 2017

Fixed the plugin error, but now it's failing on TestDaemonNoSpaceLeftOnDeviceError, which still seems related. I'll take a look at that tomorrow.

@@ -533,7 +533,8 @@ func (p *v2Puller) pullSchema2(ctx context.Context, ref reference.Named, mfst *s
}

configChan := make(chan []byte, 1)
errChan := make(chan error, 1)
errChan := make(chan error, 2)
downloadsDone := make(chan error, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be downloadsDone := make(chan struct{}), since no data is ever conveyed over it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I was certain I'd made it a struct{} chan in the first place, oops... Fixed.

@aaronlehmann
Copy link
Contributor

Ah, I think I see the problem that is causing TestDaemonNoSpaceLeftOnDeviceError to hang.

receiveConfig may get an error from the layer downloads on errChan, and afterwards the select below receiveConfig will wait for activity on downloadsDone or errChan, but neither will happen, because the downloading goroutine has already returned and its message on errChan was swallowed up by receiveConfig.

One way to fix this would be to use separate error channels for layer downloads and the config retrieval.

@darstahl darstahl force-pushed the WindowsOnLinux branch 2 times, most recently from 480c8b1 to 0755ed4 Compare February 2, 2017 19:06
Signed-off-by: Darren Stahl <darst@microsoft.com>
@darstahl
Copy link
Contributor Author

darstahl commented Feb 2, 2017

Good catch @aaronlehmann. Fixed the hang.

@aaronlehmann
Copy link
Contributor

LGTM

if runtime.GOOS == "windows" && unmarshalledConfig.OS == "linux" {
return nil, fmt.Errorf("image operating system %q cannot be used on this platform", unmarshalledConfig.OS)
} else if runtime.GOOS != "windows" && unmarshalledConfig.OS == "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just check runtime.GOOS != unmarshelledConfig.OS here?
(Although we can support runtime.GOOS == "freebsd" && unmarshalledConfig.OS == "linux" in future 😄)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just check runtime.GOOS != unmarshelledConfig.OS here?

I'm afraid this might have some unintended consequences. For example, what if there are images out there built with nonstandard tools that don't set the OS field? For now I think it's best to explicitly check for incompatible configurations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@runcom
Copy link
Member

runcom commented Feb 16, 2017

LGTM ping @anusha-ragunathan @vdemeester

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸

@vdemeester vdemeester merged commit c31f73a into moby:master Feb 16, 2017
@GordonTheTurtle GordonTheTurtle added this to the 1.14.0 milestone Feb 16, 2017
@darstahl darstahl deleted the WindowsOnLinux branch April 3, 2017 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker pull microsoft/nanoserver on Linux daemon fails with "unknown blob"
10 participants