Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test images: Adds Windows Container images support (part 1) #76838

Merged

Conversation

claudiubelu
Copy link
Contributor

@claudiubelu claudiubelu commented Apr 19, 2019

What type of PR is this?

/kind feature

/sig testing
/sig windows

What this PR does / why we need it:

Adds Windows support to the test/images/image-util.sh script.

Adds support for building test images for multiple Windows versions, as we have to support both LTS and SAC channels. In order to build Windows container images for multiple OS versions, --isolation=hyperv is required. However, not all clouds / nodes supports or have it enabled by default, which is why we're going to rely on having multiple nodes to build the Windows images, until this issue is addressed.

All Windows images will be based on the images:
mcr.microsoft.com/windows/servercore:ltsc2019, mcr.microsoft.com/windows/servercore:1903, mcr.microsoft.com/windows/servercore:1909). There are a couple of features that are needed from this image, especially powershell.

A Windows node with Docker installed is required to build Windows images. The connection URL to it must be set in the REMOTE_DOCKER_URL_${os_version} env variable. Additionally, the authentication to the remote docker node is done through certificates, which must be found in ~/.docker-${os-version}.

By default, the REMOTE_DOCKER_URL_${os_version} env variable is set to "" in the Makefile, and because of it, the image-util.sh script will skip building and pushing Windows images.

Added GOOS argument to the go build process in order to be able to build Windows binaries. Additionally, the OS env variable was added to the images Makefiles (default value is linux) in order to maintain default behaviour.

Some images require a different Dockerfile for Windows images, since they have different ways of installing dependencies. Because of this, if a image needs to be built for Windows, it will first check for a Dockerfile_windows file instead of the default one. If there isn't one, it means that the same Dockerfilecan be used for both Windows and Linux.

Added busybox image for Windows. Most Windows images will be based on it, which will help reduce the command line differences between Linux and Windows, but not entirely.

Adds a README explaining the image building process, including the Windows Container image building process.

Changes the image naming template from:

$REGISTRY/$IMAGE-$arch:$TAG

to

$REGISTRY/$IMAGE:$TAG-$os_name-$arch

The previous naming template would generate a plethora of images (Ai * N images, where Ai is the number of OS/architectures for the image i and N is the number of images), while the new naming template will reduce the number of images to N.

The new template also includes the OS name, as we plan to integrate Windows images into the manifest lists as well.

When building images, their REGISTRY can be set to a custom one, instead of the default gcr.io/kubernetes-e2e-test-images. Some images are based on other images we're already building
(e.g.: kitten, nautilus), but their base images are set in the default registry name, which can be undesirable. This PR will allow the image-util.sh script to configure the REGISTRY for those images.

Windows images will require other base images, and thus, we will need to explicitly specify the OS type a base image is for in order to avoid confusion or errors. The BASEIMAGE entries now have an OS prefix.

Bumps the image versions in order to avoid breaking any currently built images and breaking the CIs.

Enables the Image Promoter to use Windows build nodes to build the Windows test images.

Which issue(s) this PR fixes:

Fixes #77268

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 19, 2019
@k8s-ci-robot k8s-ci-robot requested review from ixdy and luxas April 19, 2019 17:38
@neolit123
Copy link
Member

/assign @mkumatag
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 19, 2019
@claudiubelu claudiubelu force-pushed the test-images/windows-support branch 2 times, most recently from 2b1c11b to d6913c3 Compare April 22, 2019 15:46
@claudiubelu
Copy link
Contributor Author

/retest

@claudiubelu
Copy link
Contributor Author

/retest

@adelina-t
Copy link
Contributor

/cc @PatrickLang

@PatrickLang
Copy link
Contributor

cc @dims

@dims
Copy link
Member

dims commented Apr 23, 2019

/assign @listx

Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thanks for this PR. Just got few minutes to review and here are my review comments, will take some more time later to review in detail.

One very basic question I have is: who will build the windows images on that remote machine? Will that machine will be accessible for whoever is building/pushing images now.

test/images/image-util.sh Outdated Show resolved Hide resolved
test/images/image-util.sh Outdated Show resolved Hide resolved
Copy link
Contributor Author

@claudiubelu claudiubelu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thanks for this PR. Just got few minutes to review and here are my review comments, will take some more time later to review in detail.

One very basic question I have is: who will build the windows images on that remote machine? Will that machine will be accessible for whoever is building/pushing images now.

Are you speaking conceptually, or in practice? Basically, through this PR, the Linux build node will be able to send the docker build / push commands remotely to the Windows build node.

The goal is to also have Windows images built in the same process that builds and publishes the images to gcr.io/kubernetes-e2e-test-images. AFAIK, only Google folks have the rights to push those images there, so they will have to also create a Windows build node that is able to push to that repository.

test/images/image-util.sh Outdated Show resolved Hide resolved
test/images/image-util.sh Outdated Show resolved Hide resolved
test/images/dnsutils/Dockerfile_windows Outdated Show resolved Hide resolved
@timothysc timothysc added the area/conformance Issues or PRs related to kubernetes conformance tests label Apr 23, 2019
@@ -0,0 +1,113 @@
# Building Kubernetes test images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) docs are good

@BenTheElder
Copy link
Member

BenTheElder commented Feb 21, 2020

changing the image naming template is also a breaking change imo, I'm not sure we should be doing that. Surely there are consumers of the named images?

@BenTheElder
Copy link
Member

A Windows node will be required though. I could share mine if needed.

this really needs to be something managed under wg-k8s-infra, it's not sustainable to have to borrow your node every time we need to push these (also the push is automated now anyhow ...?)

@listx
Copy link

listx commented Feb 21, 2020

Today if I do make all-push WHAT=test-webserver then a bunch of images get built/pushed into gcr.io/kubernetes-e2e-test-images. Let's say test-webserver was pushed as version 1.0 in this case. If we want to now push Windows containers separately (that is, only build Windows images), is that possible (without overwriting the artifacts from the first make all-push ... invocation)? If yes, how does that play with manifest lists?

@claudiubelu
Copy link
Contributor Author

claudiubelu commented Feb 24, 2020

A Windows node with Docker installed is required to build Windows images. The connection URL to it must be set in the REMOTE_DOCKER_URL_${os_version} env variable. Additionally, the authentication to the remote docker node is done through certificates, which must be found in ~/.docker-${os-version}.

I haven't had a chance to see how related this is but noting that I would actually like to see us remove the remote docker daemon stuff from the existing build as-is, it adds a lot of unnecessary complexity when the build could just be invoked on the remote host.. :/

So, the REMOTE_DOCKER_URL bits are not too big IMO. They were smaller originally when the idea was to simply have a single Windows node build the images for all other Windows versions using the --isolation=hyperv option. The multiple Windows nodes idea came from the fact that GCE does not have Hyper-V enabled for some reason, and thus, --isolation=hyperv is not possible, so there was no other option. The plan was to drop the multiple Windows nodes bits after they've addressed that.

As an additional note, it is a lot easier to build go Windows binaries on Linux than on Windows; I've had issues with this in the past. Plus, having the REMOTE_DOCKER_URL bits in image-util.sh means that it is easier to coordinate the image building workflow: in order to create the image manifest list for both Linux and Windows, all the images need to be build and pushed first, which is easier to ensure / control in the current state: one node coordinates everything. Having multiple nodes build and push individually, and then having to coordinate with each other when they finish / fail, so the final manifest list can be build properly sounds like a nightmare (plus, it would probably involve additional changes to the Image Promoter for a niche case, which is unfeasable).

And finally, now it's a single bash script, which is easier to maintain rather than having an additional PowerShell script for the image building on the Windows nodes (from my experience, people tend not to touch something like this).

In the future the dockerized build can preform pretty efficiently with native go caching without the rsync dance we do today.

I think some of the other build maintainers are in agreement here. the main reason it hasn't been dropped already is that removing it will be somewhat involved.

EDIT: The rsync / existing remote stuff is in the main makefile, not the test image build, ignore above.

Is it still possible to build for linux without needing a remote windows node? Ideally I should still be able to check out the repo and build and push my test images without relying on the pre-built ones or now needing to obtain a windows remote docker daemon.

Yes you can. Basically, the windows/amd64 lines in the BASEIMAGE files are ignored if you don't have any REMOTE_DOCKER_URL env var set.

changing the image naming template is also a breaking change imo, I'm not sure we should be doing that. Surely there are consumers of the named images?

Well, we can drop that if needed, even though I wouldn't be a fan of doing som Having soo many "images" for just one image feels bloat-y. And I don't know how is it breaking. Those are test images, so they should only be used for testing purposes. And additionally, we're not going to touch any of the already published images; they're still going to be there untouched. Plus, since we now have the Image Promoter, those new images are going to be published in a different registry.

A Windows node will be required though. I could share mine if needed.

this really needs to be something managed under wg-k8s-infra, it's not sustainable to have to borrow your node every time we need to push these (also the push is automated now anyhow ...?)

There is a resource group sponsored by Microsoft on Azure and under sig-windows, so it's not my node(s). And I've made changes to to this PR to allow the Image Promoter to use those Windows build nodes (see cloudbuild.yaml changes, and also: kubernetes/test-infra#16083).

Today if I do make all-push WHAT=test-webserver then a bunch of images get built/pushed into gcr.io/kubernetes-e2e-test-images. Let's say test-webserver was pushed as version 1.0 in this case. If we want to now push Windows containers separately (that is, only build Windows images), is that possible (without overwriting the artifacts from the first make all-push ... invocation)? If yes, how does that play with manifest lists?

No, not really. You can either build and push only for Linux images, or both Linux and Windows images. The only case in which you can build only Windows images is if the image's BASEIMAGE file only contains windows/amd64 lines, and no linux ones (see the busybox/BASEIMAGE file added in this PR). If you reaally want to have only Windows images for the test-webserver image, clearly you can remove the Linux entries from its BASEIMAGE file. All the Linux images and tags will still be there, but the manifest list will be overwritten, because that's what the image-util.sh script does.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2020
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 24, 2020
@claudiubelu
Copy link
Contributor Author

/retest

@claudiubelu
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind

@smourapina
Copy link

Hello @claudiubelu !
This is Bug Triage for release 1.18 checking in, with a friendly reminder that code freeze is coming in about 1 week (5 March), so please take this into account!

@PatrickLang
Copy link
Contributor

/lgtm
/approve

@smourapina - we're still blocked on reviews for e2e/testing. If you can help ping SIG-Testing too that would help

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2020
@yliaog
Copy link
Contributor

yliaog commented Mar 2, 2020

@listx is there any more concern about this PR?

@listx
Copy link

listx commented Mar 2, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: claudiubelu, listx, PatrickLang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2020
@PatrickLang
Copy link
Contributor

/test pull-kubernetes-e2e-gce

@k8s-ci-robot k8s-ci-robot merged commit a504821 into kubernetes:master Mar 3, 2020
SIG-Windows automation moved this from In Review (v1.18) to Done (v1.18) Mar 3, 2020
conformance-definition automation moved this from In Progress to Done Mar 3, 2020
@k8s-ci-robot
Copy link
Contributor

@claudiubelu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce 91dc590 link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

claudiubelu added a commit to claudiubelu/kubernetes that referenced this pull request Mar 4, 2020
A previous PR (kubernetes#76838) introduced the ability to build and publish
Windows Test Images to kubernetes/test/images/image-util.sh.

Additionally, that PR also configured the Image Promoter to use a
few Windows Remote Docker build nodes to build the Windows Test Images,
however, there is a minor issue: the build container has a different $HOME
folder than expected (is: /builder/home, expected: /root - since it's the
root user), and the Remote Docker credentials are mounted in /root.

Because of that, image-build.sh cannot find the credentials it needs.
This will have to be properly fixed, but for now, we can just skip
the Windows image building part.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance Issues or PRs related to kubernetes conformance tests area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

Kubernetes test images should include Windows images in their manifest lists