Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Use static assets for temporary control plane. #425

Merged
merged 1 commit into from
Apr 14, 2017

Conversation

diegs
Copy link
Contributor

@diegs diegs commented Apr 11, 2017

This change:

  • Starts the bootstrap control plane using static manifests instead of
    launching them in-process.
  • Auto-detects whether self-hosted etcd is being used in the run phase,
    rather than requiring synchronized flag usage.
  • Simplifies a lot of code based on the above refactorings.

After this change clients can completely customize their bootstrap
environment, including changing the version of kubernetes that is
launched.

We might be able to un-vendor Kubernetes and pull in specific
dependencies after this change. I started to investigate but it
looks like it'd be a fairly big change to include here.

Fixes #168.

@diegs diegs requested a review from aaronlevy April 11, 2017 18:02
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 11, 2017
@diegs
Copy link
Contributor Author

diegs commented Apr 11, 2017

rktbot run tests


# Start etcd.
if [ "$SELF_HOST_ETCD" = true ] ; then
echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd" #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note my TODO here: I've currently hardcoded the bootstrap etcd server to be at http://127.0.0.1:12379. If we need more flexibility we will need a flag to adjust for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hard-code is fine for now. Only should be an issue if deployer wanted to use 12379 for the final etcd cluster - which would be odd.

}

// Copy the static manifests to the kubelet's pod manifest path.
manifestsDir := filepath.Join(assetDir, asset.AssetPathBootstrapManifests)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a possible race here when bootstrapping etcd: if the apiserver tries to come up and etcd isn't up yet it will die. We could mitigate this by setting the restartPolicy on the bootstrap api server (and perhaps other pods too) or by waiting for etcd to be ready before bringing up the other pods (this was effectively the previous behavior).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might start with leaving the restart policy as default (always). If we don't need to provide coordination points (e.g. x starts before y) it keeps bootkube somewhat more simple -- and the api-server should just keep retrying until it can access etcd. This is a common pattern across kubernetes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

@diegs
Copy link
Contributor Author

diegs commented Apr 11, 2017

rktbot run tests

diegs pushed a commit to diegs/mantle that referenced this pull request Apr 12, 2017
These changes are needed to make the tests pass after this bootkube
change:

kubernetes-retired/bootkube#425
@diegs
Copy link
Contributor Author

diegs commented Apr 12, 2017

Note: tests won't pass as-is due to the flag and quickstart changes.

I have a separate patch for pluton (coreos/mantle#549) which allows the tests to pass.

}
return err

if err = CleanupBootstrapControlPlane(b.assetDir, b.podManifestPath); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to the top of this function but using defer This way assets get cleaned up even if bootkube has an error before reaching this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point, done.

@peebs
Copy link

peebs commented Apr 12, 2017

Overall LGTM, just one nit about how the cleanup routine is called. Thanks for digging into the test change required here. Since tests pass with coreos/mantle#549 locally, this PR should more or less be ready to go. Working on making that PR unnecessary as well.

@diegs
Copy link
Contributor Author

diegs commented Apr 12, 2017

Local test results:

diegs@x260:bootkube [static-manifests %]% ~/go/bin/pluton run --parallel 2 --basename diegs-bootkube-dev --platform=gce --gce-image=projects/coreos-cloud/global/images/coreos-stable-1298-7-0-v20170401 --bootkubeRepo=quay.io/coreos/bootkube-dev --bootkubeTag=57489f653e7fc7b865259648e723f1dc99024c79 bootkube.*
=== RUN bootkube.destruct.reboot
=== RUN bootkube.selfetcd.destruct.delete
=== RUN bootkube.smoke
=== RUN bootkube.destruct.delete
=== RUN bootkube.selfetcd.smoke
=== RUN bootkube.selfetcd.scale
=== RUN bootkube.selfetcd.destruct.reboot
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.reboot (612.91s)
--- PASS: bootkube.selfetcd.smoke (649.16s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.delete (455.37s)
--- PASS: bootkube.smoke (533.51s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.selfetcd.destruct.reboot (687.68s)
--- PASS: bootkube.selfetcd.destruct.delete (705.94s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- FAIL: bootkube.selfetcd.scale (730.83s)
run.go:107: creating cluster: final node check: cannot detect all nodes in kubectl output
map[10.240.0.27:{}]
[0xc42041e200 0xc420334e00]
harness: test suite failed
FAIL

The selfetcd.scale test has been flaky but I'll look into the logs to make sure it isn't anything troubling.

@peebs
Copy link

peebs commented Apr 12, 2017

@diegs: looks like another flake due to hyperkube downloads freezing: I suspect if you look at the journal logs for that test you'll find the worker machine taking forever to download hyperkube. That test hasn't made it past running bootkube and making sure the worker node is registered which means all 6 other tests ran the same code and got further.

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need some minor changes - but overall this is great!


# Start etcd.
if [ "$SELF_HOST_ETCD" = true ] ; then
echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd" #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hard-code is fine for now. Only should be an issue if deployer wanted to use 12379 for the final etcd cluster - which would be odd.

hostNetwork: true
containers:
- name: kube-apiserver
image: quay.io/coreos/hyperkube:v1.6.0_coreos.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump image to v1.6.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I also added the flock here based on other discussions to try to reduce the port contention churn. However, it often takes >30s for the self-hosted pods to come up, and definitely >30s to migrate etcd, and we don't tear down the temporary apiserver until all that is done. So, unless we increase the flock timeouts more we'll still see pod restarts for the self-hosted apiserver.

hostNetwork: true
containers:
- name: kube-controller-manager
image: quay.io/coreos/hyperkube:v1.6.0_coreos.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump image to v1.6.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

hostNetwork: true
containers:
- name: kube-scheduler
image: quay.io/coreos/hyperkube:v1.6.0_coreos.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump image to v1.6.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Start etcd.
if [ "$SELF_HOST_ETCD" = true ] ; then
echo "WARNING: THIS IS NOT YET FULLY WORKING - merely here to make ongoing testing easier"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd"
etcd_start_flags="--etcd-server=http://${COREOS_PRIVATE_IPV4}:12379 --experimental-self-hosted-etcd"
etcd_render_flags="--etcd-servers=http://10.3.0.15:2379 --experimental-self-hosted-etcd" #TODO(diegs): maybe add --bootstrap-etcd-server=http://${COREOS_PRIVATE_IPV4}:12379
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to also drop the --etcd-servers flag altogether. 10.3.0.15 is the etcd service IP - which is already being inferred elsewhere from the etcd service asset (detectEtcdIP())

Copy link
Contributor

@aaronlevy aaronlevy Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is for the render side (not start) - so nothing to detect yet. But I do think we should probably just pick the service IP for the user if they're using self-hosted etcd. It has to be within the service-cidr, and we already assume certain IPs for other services (e.g. DNS gets the .10).

That change can be follow-up/optimization though (maybe just add a TODO if not addressed here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The render code actually already does this: https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/bootkube/render.go#L163

I'll just clean up the unnecessary uses in the scripts.

return err
}
}
if err := os.RemoveAll(asset.BootstrapSecretsDir); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @pbx0 about trying to clean up regardless of success -- but also specifically secrets - we should always try to remove these (even if steps above fail)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Switched the order to remove the secrets first. Overall the chaining of all this is a little brittle. I'll give some thought to a smarter way for handling the shared secrets.

return err
}
for _, secret := range secrets {
if err := copyFile(filepath.Join(secretsDir, secret.Name()), filepath.Join(asset.BootstrapSecretsDir, secret.Name()), true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the secrets as world readable feels a bit wrong - but I'm not immediately sure what expectations we should set of how bootkube is executed and/or if the temp control plane should have a specific user (or privileged). Now that bootkube doesn't have to bind on protected port -- it doesn't really need special permissions, so I think we could scope its access down more.

Maybe we just add a TODO for now to re-asses the temporary secret handling later? I don't think this really should block us right now - as it is only first node on first boot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that bootkube and the kubelet are always both running as root (seems to be the case in container linux)? If so I think we can lock this down to 0700.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dghubble do you know what our expectations are for how we're executing bootkube?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like we do run it as root. If we keep that assumption we could change the permissions (0700 for dir 0600 for files)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I already verified that was true on the hack/ scripts, made the change, and ran it through the tests.

@aaronlevy
Copy link
Contributor

rktbot run tests

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last minor comment - but this looks great. Can you squash commits - then if tests are still flaky we can walk through some manual tests to at least get this merged

if err != nil {
return err
}
return ioutil.WriteFile(dst, data, os.FileMode(0644))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just for secrets, can we switch to 0600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also for manifests, but I think 0600 is good there too.

@aaronlevy
Copy link
Contributor

I just ran some manual smoke tests (also with self-hosted etcd). All looks good.

@diegs
Copy link
Contributor Author

diegs commented Apr 13, 2017

Cool, I also ran it through manual tests (hack/multi-node) as well as invoking the modified pluton tests again.

@diegs diegs force-pushed the static-manifests branch 2 times, most recently from 10c641e to ce97962 Compare April 13, 2017 17:27
@diegs
Copy link
Contributor Author

diegs commented Apr 13, 2017

Ok, squashed and rebased. Running one more pass of tests now.

@diegs
Copy link
Contributor Author

diegs commented Apr 13, 2017

All tests passed except for the etcd scale test. I think that's unrelated.

=== RUN   bootkube.selfetcd.destruct.reboot
=== RUN   bootkube.smoke
=== RUN   bootkube.destruct.reboot
=== RUN   bootkube.destruct.delete
=== RUN   bootkube.selfetcd.smoke
=== RUN   bootkube.selfetcd.scale
=== RUN   bootkube.selfetcd.destruct.delete
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.selfetcd.destruct.reboot (561.92s)
--- PASS: bootkube.selfetcd.smoke (579.02s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.destruct.delete (523.29s)
--- PASS: bootkube.destruct.reboot (629.67s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- PASS: bootkube.smoke (463.82s)
--- PASS: bootkube.selfetcd.destruct.delete (409.79s)
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
--- FAIL: bootkube.selfetcd.scale (942.62s)
        scale.go:44: scaling down: Waited 150 seconds for etcd to scale: expected 1 etcd pods got 3: [Running Running Running]
harness: test suite failed
FAIL

fi, err := os.Stat(dst)
if fi != nil {
return fmt.Errorf("file already exists: %v", dst)
} else if !os.IsNotExist(err) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else is not required.

AssetPathBootstrapControllerManager = "bootstrap-manifests/bootstrap-controller-manager.yaml"
AssetPathBootstrapScheduler = "bootstrap-manifests/bootstrap-scheduler.yaml"
AssetPathBootstrapEtcd = "bootstrap-manifests/bootstrap-etcd.yaml"
BootstrapSecretsDir = "/tmp/bootkube/secrets"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is in /tmp ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static manifests can only use filesystem mounts, not shared secrets. I need to put them somewhere temporarily--is there a more general place than /tmp to do so?

Copy link

@yifan-gu yifan-gu Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegs My dumb question is why not just bootstrap-secrets/?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I think it doesn't matter, we will remove the secrets anyway.

@yifan-gu
Copy link

LGTM from my side.
Might need to wait for #429 and #422

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to merge when ready

This change:
- Starts the bootstrap control plane using static manifests instead of
  launching them in-process.
- Auto-detects whether self-hosted etcd is being used in the run phase,
  rather than requiring synchronized flag usage.
- Simplifies a lot of code based on the above refactorings.

After this change clients can completely customize their bootstrap
environment, including changing the version of kubernetes that is
launched.
diegs pushed a commit to diegs/mantle that referenced this pull request Apr 14, 2017
These changes are needed to make the tests pass after this bootkube
change:

kubernetes-retired/bootkube#425
@diegs diegs merged commit 3523114 into kubernetes-retired:master Apr 14, 2017
@diegs diegs deleted the static-manifests branch April 14, 2017 17:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants