Use static assets for temporary control plane. #425

diegs · 2017-04-11T18:02:01Z

This change:

Starts the bootstrap control plane using static manifests instead of
launching them in-process.
Auto-detects whether self-hosted etcd is being used in the run phase,
rather than requiring synchronized flag usage.
Simplifies a lot of code based on the above refactorings.

After this change clients can completely customize their bootstrap
environment, including changing the version of kubernetes that is
launched.

We might be able to un-vendor Kubernetes and pull in specific
dependencies after this change. I started to investigate but it
looks like it'd be a fairly big change to include here.

Fixes #168.

diegs · 2017-04-11T20:54:31Z

rktbot run tests

diegs · 2017-04-11T21:12:15Z

hack/quickstart/init-master.sh


    # Start etcd.
    if [ "$SELF_HOST_ETCD" = true ] ; then
        echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
-        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
-        etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
+        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"  #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 


Note my TODO here: I've currently hardcoded the bootstrap etcd server to be at http://127.0.0.1:12379. If we need more flexibility we will need a flag to adjust for that.

I think hard-code is fine for now. Only should be an issue if deployer wanted to use 12379 for the final etcd cluster - which would be odd.

diegs · 2017-04-11T21:14:37Z

pkg/bootkube/bootstrap.go

+	}
+
+	// Copy the static manifests to the kubelet's pod manifest path.
+	manifestsDir := filepath.Join(assetDir, asset.AssetPathBootstrapManifests)


I think there's a possible race here when bootstrapping etcd: if the apiserver tries to come up and etcd isn't up yet it will die. We could mitigate this by setting the restartPolicy on the bootstrap api server (and perhaps other pods too) or by waiting for etcd to be ready before bringing up the other pods (this was effectively the previous behavior).

I might start with leaving the restart policy as default (always). If we don't need to provide coordination points (e.g. x starts before y) it keeps bootkube somewhat more simple -- and the api-server should just keep retrying until it can access etcd. This is a common pattern across kubernetes.

Agreed, done.

diegs · 2017-04-11T21:51:46Z

rktbot run tests

These changes are needed to make the tests pass after this bootkube change: kubernetes-retired/bootkube#425

diegs · 2017-04-12T00:26:55Z

Note: tests won't pass as-is due to the flag and quickstart changes.

I have a separate patch for pluton (coreos/mantle#549) which allows the tests to pass.

peebs · 2017-04-12T00:32:54Z

pkg/bootkube/bootkube.go

 	}
-	return err
+
+	if err = CleanupBootstrapControlPlane(b.assetDir, b.podManifestPath); err != nil {


Can this be moved to the top of this function but using defer This way assets get cleaned up even if bootkube has an error before reaching this function.

Excellent point, done.

peebs · 2017-04-12T00:49:51Z

Overall LGTM, just one nit about how the cleanup routine is called. Thanks for digging into the test change required here. Since tests pass with coreos/mantle#549 locally, this PR should more or less be ready to go. Working on making that PR unnecessary as well.

diegs · 2017-04-12T01:04:28Z

Local test results:

diegs@x260:bootkube [static-manifests %]% ~/go/bin/pluton run --parallel 2 --basename diegs-bootkube-dev --platform=gce --gce-image=projects/coreos-cloud/global/images/coreos-stable-1298-7-0-v20170401 --bootkubeRepo=quay.io/coreos/bootkube-dev --bootkubeTag=57489f653e7fc7b865259648e723f1dc99024c79 bootkube.*
=== RUN bootkube.destruct.reboot
=== RUN bootkube.selfetcd.destruct.delete
=== RUN bootkube.smoke
=== RUN bootkube.destruct.delete
=== RUN bootkube.selfetcd.smoke
=== RUN bootkube.selfetcd.scale
=== RUN bootkube.selfetcd.destruct.reboot
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.reboot (612.91s)
--- PASS: bootkube.selfetcd.smoke (649.16s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.delete (455.37s)
--- PASS: bootkube.smoke (533.51s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.selfetcd.destruct.reboot (687.68s)
--- PASS: bootkube.selfetcd.destruct.delete (705.94s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- FAIL: bootkube.selfetcd.scale (730.83s)
run.go:107: creating cluster: final node check: cannot detect all nodes in kubectl output
map[10.240.0.27:{}]
[0xc42041e200 0xc420334e00]
harness: test suite failed
FAIL

The selfetcd.scale test has been flaky but I'll look into the logs to make sure it isn't anything troubling.

peebs · 2017-04-12T01:06:40Z

@diegs: looks like another flake due to hyperkube downloads freezing: I suspect if you look at the journal logs for that test you'll find the worker machine taking forever to download hyperkube. That test hasn't made it past running bootkube and making sure the worker node is registered which means all 6 other tests ran the same code and got further.

aaronlevy

Just need some minor changes - but overall this is great!

aaronlevy · 2017-04-12T01:45:20Z

hack/quickstart/init-master.sh


    # Start etcd.
    if [ "$SELF_HOST_ETCD" = true ] ; then
        echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
-        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
-        etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
+        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"  #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 


I think hard-code is fine for now. Only should be an issue if deployer wanted to use 12379 for the final etcd cluster - which would be odd.

aaronlevy · 2017-04-12T01:46:42Z

pkg/asset/internal/templates.go

+  hostNetwork: true
+  containers:
+  - name: kube-apiserver
+    image: quay.io/coreos/hyperkube:v1.6.0_coreos.0


bump image to v1.6.1

Note: I also added the flock here based on other discussions to try to reduce the port contention churn. However, it often takes >30s for the self-hosted pods to come up, and definitely >30s to migrate etcd, and we don't tear down the temporary apiserver until all that is done. So, unless we increase the flock timeouts more we'll still see pod restarts for the self-hosted apiserver.

aaronlevy · 2017-04-12T01:48:10Z

pkg/asset/internal/templates.go

+  hostNetwork: true
+  containers:
+  - name: kube-controller-manager
+    image: quay.io/coreos/hyperkube:v1.6.0_coreos.0


bump image to v1.6.1

aaronlevy · 2017-04-12T01:48:51Z

pkg/asset/internal/templates.go

+  hostNetwork: true
+  containers:
+  - name: kube-scheduler
+    image: quay.io/coreos/hyperkube:v1.6.0_coreos.0


bump image to v1.6.1

aaronlevy · 2017-04-12T02:03:44Z

hack/quickstart/init-master.sh


    # Start etcd.
    if [ "$SELF_HOST_ETCD" = true ] ; then
        echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
-        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
-        etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
+        etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"  #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 


I think we might be able to also drop the --etcd-servers flag altogether. 10.3.0.15 is the etcd service IP - which is already being inferred elsewhere from the etcd service asset (detectEtcdIP())

Sorry this is for the render side (not start) - so nothing to detect yet. But I do think we should probably just pick the service IP for the user if they're using self-hosted etcd. It has to be within the service-cidr, and we already assume certain IPs for other services (e.g. DNS gets the .10).

That change can be follow-up/optimization though (maybe just add a TODO if not addressed here)

The render code actually already does this: https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/bootkube/render.go#L163

I'll just clean up the unnecessary uses in the scripts.

aaronlevy · 2017-04-12T02:46:28Z

pkg/bootkube/bootstrap.go

+			return err
+		}
+	}
+	if err := os.RemoveAll(asset.BootstrapSecretsDir); err != nil {


Agree with @pbx0 about trying to clean up regardless of success -- but also specifically secrets - we should always try to remove these (even if steps above fail)

Agreed. Switched the order to remove the secrets first. Overall the chaining of all this is a little brittle. I'll give some thought to a smarter way for handling the shared secrets.

aaronlevy · 2017-04-12T02:56:26Z

pkg/bootkube/bootstrap.go

+		return err
+	}
+	for _, secret := range secrets {
+		if err := copyFile(filepath.Join(secretsDir, secret.Name()), filepath.Join(asset.BootstrapSecretsDir, secret.Name()), true); err != nil {


Leaving the secrets as world readable feels a bit wrong - but I'm not immediately sure what expectations we should set of how bootkube is executed and/or if the temp control plane should have a specific user (or privileged). Now that bootkube doesn't have to bind on protected port -- it doesn't really need special permissions, so I think we could scope its access down more.

Maybe we just add a TODO for now to re-asses the temporary secret handling later? I don't think this really should block us right now - as it is only first node on first boot.

Is it safe to assume that bootkube and the kubelet are always both running as root (seems to be the case in container linux)? If so I think we can lock this down to 0700.

@dghubble do you know what our expectations are for how we're executing bootkube?

Actually it looks like we do run it as root. If we keep that assumption we could change the permissions (0700 for dir 0600 for files)

Sounds good, I already verified that was true on the hack/ scripts, made the change, and ran it through the tests.

aaronlevy · 2017-04-12T16:36:17Z

rktbot run tests

aaronlevy

One last minor comment - but this looks great. Can you squash commits - then if tests are still flaky we can walk through some manual tests to at least get this merged

aaronlevy · 2017-04-13T01:41:50Z

pkg/bootkube/bootstrap.go

+	if err != nil {
+		return err
+	}
+	return ioutil.WriteFile(dst, data, os.FileMode(0644))


If this is just for secrets, can we switch to 0600

It's also for manifests, but I think 0600 is good there too.

aaronlevy · 2017-04-13T02:46:57Z

I just ran some manual smoke tests (also with self-hosted etcd). All looks good.

diegs · 2017-04-13T06:39:48Z

Cool, I also ran it through manual tests (hack/multi-node) as well as invoking the modified pluton tests again.

diegs · 2017-04-13T17:28:16Z

Ok, squashed and rebased. Running one more pass of tests now.

diegs · 2017-04-13T20:10:34Z

All tests passed except for the etcd scale test. I think that's unrelated.

=== RUN   bootkube.selfetcd.destruct.reboot
=== RUN   bootkube.smoke
=== RUN   bootkube.destruct.reboot
=== RUN   bootkube.destruct.delete
=== RUN   bootkube.selfetcd.smoke
=== RUN   bootkube.selfetcd.scale
=== RUN   bootkube.selfetcd.destruct.delete
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.selfetcd.destruct.reboot (561.92s)
--- PASS: bootkube.selfetcd.smoke (579.02s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.delete (523.29s)
--- PASS: bootkube.destruct.reboot (629.67s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.smoke (463.82s)
--- PASS: bootkube.selfetcd.destruct.delete (409.79s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- FAIL: bootkube.selfetcd.scale (942.62s)
        scale.go:44: scaling down: Waited 150 seconds for etcd to scale: expected 1 etcd pods got 3: [Running Running Running]
harness: test suite failed
FAIL

yifan-gu · 2017-04-13T20:16:59Z

pkg/bootkube/bootstrap.go

+		fi, err := os.Stat(dst)
+		if fi != nil {
+			return fmt.Errorf("file already exists: %v", dst)
+		} else if !os.IsNotExist(err) {


nit: else is not required.

yifan-gu · 2017-04-13T20:34:22Z

pkg/asset/asset.go

+	AssetPathBootstrapControllerManager  = "bootstrap-manifests/bootstrap-controller-manager.yaml"
+	AssetPathBootstrapScheduler          = "bootstrap-manifests/bootstrap-scheduler.yaml"
+	AssetPathBootstrapEtcd               = "bootstrap-manifests/bootstrap-etcd.yaml"
+	BootstrapSecretsDir                  = "/tmp/bootkube/secrets"


Why this is in /tmp ?

Static manifests can only use filesystem mounts, not shared secrets. I need to put them somewhere temporarily--is there a more general place than /tmp to do so?

@diegs My dumb question is why not just bootstrap-secrets/?

nvm, I think it doesn't matter, we will remove the secrets anyway.

yifan-gu · 2017-04-13T20:58:45Z

LGTM from my side.
Might need to wait for #429 and #422

aaronlevy

LGTM. Feel free to merge when ready

This change: - Starts the bootstrap control plane using static manifests instead of launching them in-process. - Auto-detects whether self-hosted etcd is being used in the run phase, rather than requiring synchronized flag usage. - Simplifies a lot of code based on the above refactorings. After this change clients can completely customize their bootstrap environment, including changing the version of kubernetes that is launched.

These changes are needed to make the tests pass after this bootkube change: kubernetes-retired/bootkube#425

diegs requested a review from aaronlevy April 11, 2017 18:02

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 11, 2017

aaronlevy requested review from derekparker, dghubble, yifan-gu and peebs April 11, 2017 19:41

aaronlevy mentioned this pull request Apr 11, 2017

templates.go: Update manifest for Kubernetes 1.6 #422

Merged

diegs commented Apr 11, 2017

View reviewed changes

diegs pushed a commit to diegs/mantle that referenced this pull request Apr 12, 2017

Quick and dirty flag fixes to match bootkube changes.

d0d224d

These changes are needed to make the tests pass after this bootkube change: kubernetes-retired/bootkube#425

diegs mentioned this pull request Apr 12, 2017

Quick and dirty flag fixes to match bootkube changes. coreos/mantle#549

Merged

peebs reviewed Apr 12, 2017

View reviewed changes

aaronlevy reviewed Apr 12, 2017

View reviewed changes

aaronlevy reviewed Apr 13, 2017

View reviewed changes

diegs force-pushed the static-manifests branch 2 times, most recently from 10c641e to ce97962 Compare April 13, 2017 17:27

yifan-gu reviewed Apr 13, 2017

View reviewed changes

diegs force-pushed the static-manifests branch from ce97962 to 5fd3dfc Compare April 14, 2017 00:38

aaronlevy approved these changes Apr 14, 2017

View reviewed changes

diegs force-pushed the static-manifests branch from 5fd3dfc to 61d3a72 Compare April 14, 2017 17:25

diegs pushed a commit to diegs/mantle that referenced this pull request Apr 14, 2017

Quick and dirty flag fixes to match bootkube changes.

7e421cb

These changes are needed to make the tests pass after this bootkube change: kubernetes-retired/bootkube#425

diegs merged commit 3523114 into kubernetes-retired:master Apr 14, 2017

diegs deleted the static-manifests branch April 14, 2017 17:42

aaronlevy mentioned this pull request Apr 14, 2017

Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

Closed

Use static assets for temporary control plane. #425

Use static assets for temporary control plane. #425

Conversation

diegs commented Apr 11, 2017

diegs commented Apr 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegs commented Apr 11, 2017

diegs commented Apr 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peebs commented Apr 12, 2017

diegs commented Apr 12, 2017

peebs commented Apr 12, 2017 • edited

aaronlevy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy Apr 12, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented Apr 12, 2017

aaronlevy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented Apr 13, 2017

diegs commented Apr 13, 2017

diegs commented Apr 13, 2017

diegs commented Apr 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yifan-gu Apr 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yifan-gu commented Apr 13, 2017

aaronlevy left a comment

Choose a reason for hiding this comment

peebs commented Apr 12, 2017 •

edited

aaronlevy Apr 12, 2017 •

edited

diegs commented Apr 13, 2017 •

edited

yifan-gu Apr 13, 2017 •

edited