New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local clusters with nested containers. #51661

Closed
wants to merge 2 commits into
base: master
from

Conversation

@Q-Lee
Contributor

Q-Lee commented Aug 30, 2017

What this PR does / why we need it: build an image capable of running a cluster in nested containers for local and remote testing.

Special notes for your reviewer: since the base image should be built from checked in code, we'll need a follow up PR to reference the google-containers/nested-cluster-base image in the WORKSPACE file.

Release note:

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Aug 30, 2017

Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", "release-note-experimental" or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

@ixdy

This comment has been minimized.

Member

ixdy commented Aug 30, 2017

@Q-Lee I think you may need to apply a release note label.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Aug 30, 2017

Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", "release-note-experimental" or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Aug 30, 2017

Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", "release-note-experimental" or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Aug 30, 2017

Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", "release-note-experimental" or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Aug 31, 2017

/assign ixdy

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Aug 31, 2017

Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", "release-note-experimental" or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

@dims

This comment has been minimized.

Member

dims commented Aug 31, 2017

@Q-Lee @ixdy Does this have to be in the main repo?

@spxtr

This comment has been minimized.

Member

spxtr commented Aug 31, 2017

@Q-Lee @ixdy Does this have to be in the main repo?

It makes some sense to put it here because it is an e2e test framework that isn't specific to any single component. It's also closely tied to bazel artifacts. @Q-Lee can probably say more.

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

@kubernetes kubernetes deleted a comment from k8s-merge-robot Sep 1, 2017

Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"
EOM
# sed -i 's/ExecStart=\/usr\/bin\/kubelet/ExecStart=\/usr\/bin\/kubelet --fail-swap-on=false/' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl enable kubelet

This comment has been minimized.

@errordeveloper

errordeveloper Jan 29, 2018

Member

This can be done ahead of the time also.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 29, 2018

Contributor

We install the kubelet with bazel's deb artifact. Since bazel doesn't offer a run command for our container, this is our first chance to enable the service.

This comment has been minimized.

@errordeveloper

errordeveloper Jan 30, 2018

Member

I cannot believe there is no way to put this file in place beforehand. I am just worried that debugging all the layers here will be very hard.

mkdir -p /etc/srv/kubernetes
# Change the kubelet to not fail with swap on.
cat > /etc/systemd/system/kubelet.service.d/kubeadm-20.conf << EOM

This comment has been minimized.

@errordeveloper

errordeveloper Jan 29, 2018

Member

I am not sure why this has to be done at startup, please consider making this part of the build.

# Apply a pod network.
# Calico is an ip-over-ip overlay network. This saves us from many of the
# difficulties from configuring an L2 network.
kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f http://docs.projectcalico.org/v2.4/getting-started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml

This comment has been minimized.

@errordeveloper

errordeveloper Jan 29, 2018

Member

If this is to be used for testing, I don't think hardcoding the network addon is a good idea.

This comment has been minimized.

@BenTheElder

BenTheElder Jan 29, 2018

Member

It's going to be used moreso for integration style testing, the network addons should be tested in other ways. I don't think there is any intention to support testing them here.

This comment has been minimized.

@errordeveloper

errordeveloper Jan 30, 2018

Member

It's not so hard to parametrize this, and I don't agree with the reason why Calico was chosen here, and I'd argue that configuration of Calico is much more complex, and the manifest is downloaded from an external URL (which cannot be relied upon).
FIY, an overlay network is actually safer choice for docker-in-docker, as it's completely isolated from the environment it runs in, but I cannot speak of how exactly Calico IPIP mode works in that respect – something to check.

This comment has been minimized.

@BenTheElder

BenTheElder Jan 30, 2018

Member

Hmm fair enough. Maybe something to PR in a follow-up then?

I do think we probably want to avoid this being commonly configured though, so we can make it easy for users to know that locally running tests and the CI tests match etc.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 30, 2018

Contributor

Although we don't intend to test every network provider ourselves, it's reasonable to use this to test your provider. Calico was chosen because it worked. If you'd like to get Weave working, that would be awesome.

And yes, depending on the external URL bothers me. But for now it sounds better than maintaining overlay configs. Until we have a better way to consume addons, or unless we see failures due to it, we'll do something similar for metrics-server, kubernetes-dashboard, etc.

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

nit: switch from http to https for the URL? or does that cause problems because certificates?

kubectl --kubeconfig=/etc/kubernetes/admin.conf get ds -n kube-system kube-proxy -o json | jq '.spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -
# Apply a pod network.
# Calico is an ip-over-ip overlay network. This saves us from many of the
# difficulties from configuring an L2 network.

This comment has been minimized.

@errordeveloper

errordeveloper Jan 29, 2018

Member

What are those difficulties? I've use Weave Net with docker-in-docker many times and it just works.

This comment has been minimized.

@errordeveloper

errordeveloper Jan 30, 2018

Member

It'd good if you could enumerate those difficulties, otherwise it just seems like there were unknowns and no particular consideration was given, and I'd hope that it wasn't the case.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 30, 2018

Contributor

Partially answered above. I tried a few overlays (flannel, weave, etc.), and chose the one that took the least work to get working in my existing setup. The L2 networks were particularly difficult, because they had the most dependencies on kernel resources and CNI tools.

@errordeveloper

This comment has been minimized.

Member

errordeveloper commented Jan 30, 2018

Not intending to get in the way, just want to point out thoughts I've head while reading this PR, perhaps we can iterate on a few things together once this lands.

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Jan 30, 2018

@errordeveloper that's the idea. There's no way one person can be an expert in every aspect of this, but once we have something working, experts can improve the pieces.

@ixdy

generally looks fine, just a few minor comments.

ARG DEBIAN_FRONTEND=noninteractive
RUN clean-install apt-utils \
apt-transport-https \

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

minor nit: it might be nice to alphabetize this.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

Sure, with the caveat (and a comment) that apt-utils be first.

# of the base image. This is necessary to avoid using cached local
# versions of image e.g. when updating insecure base images.
docker build --pull -t ${IMAGE} .
rm -f dind

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

this is fine, but FYI the other pattern I've seen is to create a tempdir, copy all needed dependencies to that tempdir, and then delete it afterwards.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

That's a better idea. I'll defer for now, because rebuilding the base image is a hassle.

docker_bundle(
name = "encapsulated-cluster-bundle",
images = {"gcr.io/google-containers/encapsulated-cluster:0.1": "encapsulated-cluster-build"},

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

it might be worth pulling the 0.1 tag out into a variable, much like we do in Makefiles.

set -o errexit
set -o nounset
set -o pipefail

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

it's probably worth including that explanation in a comment just after the boilerplate. also maybe open an issue to track refactoring?

(also, a minor quibble about naming this integ-test.sh when it really runs the e2e tests, not the "integration" tests. of course our naming is horrible.)

ginkgo_args+=("--nodes=25")
fi
if [[ "${GINKGO_UNTIL_IT_FAILS:-}" == true ]]; then

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

side note: I love how some of theses use y/n, others use true/false. (I know you're just inheriting existing behavior.)

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

Sigh.

"integ-test.sh",
"start.sh",
"//build:master-components-amd64",
"//build/nested-cluster/util:k8sversion",

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

even easier: depend on //build:version, which use a genrule to produce a file named version containing the "git version" string.

)
func main() {
// All this does is print the version of k8s we've built.

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

as I've noted elsewhere, I don't think you need this program at all.

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

That's true, but I want something in Go, so I have a place to start from.

This comment has been minimized.

@ixdy

ixdy Jan 31, 2018

Member

do you intend for this program to do anything besides print out the git version?

This comment has been minimized.

@Q-Lee

Q-Lee Feb 1, 2018

Contributor

I want to move the setup/testing into Go.

# make dind
endef
dind:
@echo "$$DIND_RELEASE_HELP_INFO"

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

should this be a tab rather than 2 spaces?

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

Whoops, vim doesn't let me use tabs :(

"vendor", "test/e2e/generated/bindata.go", "hack/boilerplate/test",
"pkg/generated/bindata.go"]
'vendor', 'test/e2e/generated/bindata.go', 'hack/boilerplate/test',
'pkg/generated/bindata.go', 'build/nested-cluster-base']

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

is this still needed now that you've put the dind stuff in third_party?

@@ -0,0 +1,84 @@
#!/bin/bash

This comment has been minimized.

@ixdy

ixdy Jan 30, 2018

Member

the amount of code duplication makes me a little sad. :(

This comment has been minimized.

@Q-Lee

Q-Lee Jan 31, 2018

Contributor

Do we really care about most of these arguments? Why not statically set them?

I plan to either modularize ginkgo-e2e.sh, or move some of this functionality into Go. But I'm not yet sure which is the better answer.

This comment has been minimized.

@fejta

fejta Jan 31, 2018

Contributor

We want to delete ginkgo-e2e.sh: kubernetes/test-infra#3536

It is not obvious to me why we need this file as opposed to setting the right ginkgo flags?

@BenTheElder

This comment has been minimized.

Member

BenTheElder commented Jan 31, 2018

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Jan 31, 2018

/test pull-kubernetes-verify

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Jan 31, 2018

/retest

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Jan 31, 2018

/test pull-kubernetes-cross

@BenTheElder

This comment has been minimized.

Member

BenTheElder commented Jan 31, 2018

/test pull-kubernetes-bazel-build

@fejta

Why do we need/want this in the k/k repo?

@@ -0,0 +1,84 @@
#!/bin/bash

This comment has been minimized.

@fejta

fejta Jan 31, 2018

Contributor

We want to delete ginkgo-e2e.sh: kubernetes/test-infra#3536

It is not obvious to me why we need this file as opposed to setting the right ginkgo flags?

name = "nested-kubelet-base",
digest = "sha256:f8527c0f16436d14bfacb192ce5522f7da73103d4e1f44f5cec2983c996e9adf",
registry = "gcr.io",
repository = "qlee-dev/nested-kubelet-base-amd64",

This comment has been minimized.

@timothysc

timothysc Feb 2, 2018

Member

nope nope nope.

This comment has been minimized.

@BenTheElder

BenTheElder Feb 2, 2018

Member

Per previous comments this is temporary so that the PR builds, there will be a follow up to make this use a k8s.io image. It doesn't make sense to publish this there until this is accepted.

@timothysc

This comment has been minimized.

Member

timothysc commented Feb 2, 2018

Please work within both sig-cluster-lifecycle and sig-testing to approve this... I'm not ok with this, until we have settled on a unified path.

@timothysc timothysc self-assigned this Feb 2, 2018

@Q-Lee

This comment has been minimized.

Contributor

Q-Lee commented Feb 2, 2018

@timothysc this has been discussed many times in sig-testing. Also, what's cluster life-cycle's interest? This is entirely for testing purposes.

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 7, 2018

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BenTheElder, krousey, Q-Lee
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 7, 2018

@Q-Lee: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-bazel-build ff06708 link /test pull-kubernetes-bazel-build
pull-kubernetes-bazel-test ff06708 link /test pull-kubernetes-bazel-test
pull-kubernetes-verify ff06708 link /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta

This PR wants to do the following:

  1. Define a new way to produce artifacts with an -amd64 suffix
  2. Define/create a bunch of dind-test targets and images, which require bazel
  3. Add a dev image to the root workspace, which bazel cannot build
  4. Fork ginkgo-e2e.sh
  5. Add third_party/moby to the repo

Can we put each these into their own PR, explaining why they are necessary (especially in the absence of a dind system)?

Otherwise it seems to possible to do all of this outside the main repo:

  1. The new rules produce the same images as the preexisting docker_bundle() commands without an -amd64 suffix
  2. Add a local_repository(name="main-repo", path="/path/to/k8s.io/kubernetes") line in a different repo's WORKSPACE and replace lines like //build/deps with @main-repo//build/deps
  3. Add to the new repo's WORKSPACE
  4. Not sure why we need/want to fork ginkgo-e2e.sh
  5. Add to new repo's WORKSPACE

Where do I go wrong?

@@ -130,8 +130,8 @@ def file_extension(filename):
return os.path.splitext(filename)[1].split(".")[-1].lower()
skipped_dirs = ['Godeps', 'third_party', '_gopath', '_output', '.git', 'cluster/env.sh',
"vendor", "test/e2e/generated/bindata.go", "hack/boilerplate/test",
"pkg/generated/bindata.go"]
'vendor', 'test/e2e/generated/bindata.go', 'hack/boilerplate/test',

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

This change is not germane to the purpose of this PR

@@ -14,6 +14,7 @@ filegroup(
srcs = [
":package-srcs",
"//build/debs:all-srcs",
"//build/nested-cluster:all-srcs",

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

Doesn't seem like this dir exists?

name = "dind-test-base",
digest = "sha256:c83ef23a0de78b8c43933a66d58fdeb61c526c12c1c2ab89db008ffb76b8577b",
registry = "gcr.io",
repository = "qlee-dev/dind-test-base-amd64",

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

I am not convinced we should have qlee-dev images in the root WORKSPACE

This comment has been minimized.

@errordeveloper

errordeveloper Feb 14, 2018

Member

Yes, it doesn't look like this is marked as WIP, so this should get replaced with something org-wide.

@@ -66,6 +67,12 @@ DOCKERIZED_BINARIES = {
stamp = True,
) for binary in DOCKERIZED_BINARIES.keys()]
[docker_bundle(
name = binary + "-amd64",
images = {"gcr.io/google_containers/%s-amd64:{STABLE_DOCKER_TAG}" % binary: binary + "-internal"},

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

How are these images different? It looks like they contain the exact same target.

TAG = 0.1
IMAGE_NO_TAG = ${PREFIX}/${NAME}-${ARCH}
IMAGE = ${IMAGE_NO_TAG}:$(TAG)

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

IMAGE = gcr.io/qlee_dev/dind-test-base-amd64:0.1 seems a lot easier to read here

COPY dockerd-entrypoint.sh /bin/dockerd-entrypoint.sh
# Remove unwanted systemd services.
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \

This comment has been minimized.

@fejta

fejta Feb 7, 2018

Contributor

Does RUN automatically blow up if something exits 1? or do we need &&

This comment has been minimized.

@errordeveloper

errordeveloper Feb 14, 2018

Member

I believe run means sh -c "<cmd>", and not sh -e -c "<cmd>", or anything like that.
However, I don't think it matter so much here – we are trimming a well-known set here.
Although, it maybe nicer to be more explicit – backup wanted file firs, remove directory and then re-create it with just that one file.

@errordeveloper

This comment has been minimized.

Member

errordeveloper commented Feb 14, 2018

Otherwise it seems to possible to do all of this outside the main repo

I would much prefer if this was in a dedicated repo, or perhaps test-infra maybe a good place to put this, as right now test-infra loops back into main repo for e2e shell script and that make it rather a very awkward thing for someone trying to wrap their head around. Starting to put turnup script for testing into test-infra would make sense in the context of unwinding the spaghetti of go-bash-go code that is split between repos right now.

@errordeveloper

This comment has been minimized.

Member

errordeveloper commented Feb 14, 2018

We want to delete ginkgo-e2e.sh: kubernetes/test-infra#3536
It is not obvious to me why we need this file as opposed to setting the right ginkgo flags?

@fejta I was leading up to very much the same point in a string of comments I made earlier...

@ixdy

This comment has been minimized.

Member

ixdy commented Mar 20, 2018

/close

since I believe this now exists in test-infra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment