Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coreos kube-up now with less cloud init #33965

Conversation

euank
Copy link
Contributor

@euank euank commented Oct 3, 2016

This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the gci cluster scripts.

The reason to do this is:

  1. Avoid duplicating the saltbase manifests by reusing gci's parsing logic (easier maintenance)
  2. Take an incremental step towards sharing more code between gci/trusty/coreos, again for better maintenance
  3. Pave the way for making future changes (e.g. improved rkt support, kubelet support) easier to share

The primary differences from the gci scripts are the following:

  1. Use of the /opt/kubernetes directory over /home/kubernetes
  2. Support for rkt as a runtime
  3. No use of logrotate
  4. No use of /etc/default/
  5. No logic related to noexec mounts or gci-specific firewall-stuff

It will make sense to move 2 over to gci, as well as perhaps a few other small improvements. That will be a separate PR for ease of review.

Ref #29720, this is a part of that because it removes a copy of them.

Fixes #24165

cc @yifan-gu

Since this logic largely duplicates logic from the gci folder, it would be nice if someone closely familiar with that gave an OK or made sure I didn't fall into any gotchas related to that, so cc @andyzheng0831


This change is Reviewable

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Oct 3, 2016
@yifan-gu yifan-gu assigned yifan-gu and dchen1107 and unassigned jbeda Oct 3, 2016
@yifan-gu yifan-gu added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Oct 3, 2016
@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from e3e2f25 to 3653a37 Compare October 3, 2016 20:36
@@ -0,0 +1,8 @@
# CoreOS image

The [CoreOS operating oystem](https://coreos.com/why/) is a Linux distribution optimized for running containers securely at scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/oystem/system

@yifan-gu
Copy link
Contributor

yifan-gu commented Oct 4, 2016

@euank Looks like we don't need cluster/gce/coreos/configure-kubelet.sh as well.

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 082275dc4fdfc9ef19f83335ef29007df10ff304. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

function setup-kubelet-dir {
echo "Making /var/lib/kubelet executable for kubelet"
mount --bind /var/lib/kubelet /var/lib/kubelet/
mount -B -o remount,exec,suid,dev /var/lib/kubelet
Copy link
Contributor

@yifan-gu yifan-gu Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove some tailing spaces?

@@ -0,0 +1,173 @@
#!/bin/bash

# Copyright 2015 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2016

-o "${tmp_kube_env}" \
http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env
# Convert the yaml format file into a shell-style file.
sed 's/: /=/' < "${tmp_kube_env}" > "${KUBE_DIR}/kube-env"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@euank Have you verified that this will always work for today's kube env yamls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work as best as I can tell based on sourcing the file and printenv and based on skimming it.

Any specific concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@euank Thank you.

@yifan-gu
Copy link
Contributor

yifan-gu commented Oct 4, 2016

LGTM other than the comments above. Thanks for the work!!! @euank

@euank
Copy link
Contributor Author

euank commented Oct 4, 2016

As a hint for reviewers, doing a (vim)diff between the gci and the coreos versions of configure.sh and configure-helper.sh is incredibly useful for review.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2016
@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from 082275d to aca0aa2 Compare October 10, 2016 23:34
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit aca0aa29541b024c0eb436c6cdab6c82957c22c5. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit aca0aa29541b024c0eb436c6cdab6c82957c22c5. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit aca0aa29541b024c0eb436c6cdab6c82957c22c5. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from aca0aa2 to 6bb8c49 Compare October 11, 2016 03:07
@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit 6bb8c4928668118446f048f5a6516ff0dbe08e10. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from 05ec52f to 3536ecc Compare December 15, 2016 19:46
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2016
@roberthbailey
Copy link
Contributor

@euank

Sorry for the long delay on the code review. I only two concerns:

  1. It seems you've duplicated some code between shell scripts. They are likely to diverge in the future and it would be nice to consolidate them into one location.

  2. Can you verify that coreos/gci are mostly copies of each other now rather than intimately sharing the node startup scripts (as appears to be the case from https://github.com/euank/kubernetes/tree/3536ecc6ff7f043d5a104b92b803881e5f7e33af/cluster/gce/coreos)? I'm concerned that we'll make a change to gci that will break the coreos startup scripts and won't notice as we don't have continuous testing of them.

@bgrant0607
Copy link
Member

@grodrigues3 Is there a PR that adds an OWNERS file for cluster/gce/coreos? Could we break it out and get it in earlier than the rest?

@roberthbailey
Copy link
Contributor

@bgrant0607 -- juju and vagrant are the only directories under cluster that have an OWNERS file.

@euank
Copy link
Contributor Author

euank commented Dec 16, 2016

@roberthbailey Per the original comment, number 2. This PR is "an incremental step towards sharing more code between gci/trusty/coreos, again for better maintenance"

I want to do that in a followup PR because that change will also be intrusive to GCI and would best be reviewed separately.

I don't believe there is yet any shared code between the gci/coreos folders yet (though I want them to in the future as above).
I don't think any change in the gci folder will impact the coreos folder at this time.

The new thing that could break this is the shared saltbase manifests (which all of gci/coreos/trusty share with this) since there's not much testing there, but there wasn't testing before and on at least one occasion the non-shared coreos-dedicated manifests were broken regardless.

@roberthbailey
Copy link
Contributor

@euank - thanks. That answered by second question. My only remaining question is about the duplication of functions in shell.

# such as GCI and Ubuntu Trusty. We directly copy manifests from
# cluster/addons and cluster/saltbase/salt. The script of cluster initialization
# will remove the salt configuration and evaluate the variables in the manifests.
function kube::release::package_kube_manifests_tarball() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant with the function of the same name in build/lib/release.sh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The was a rebase error. After fixing it, it's a 3-line diff in the original function

I'll update the PR after verifying things still work..

Thanks for catching it!

set -o errexit
set -o nounset
set -o pipefail

function set-broken-motd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI We've found that setting the motd is really useful for debugging node startup failure.

@bgrant0607
Copy link
Member

@roberthbailey I was suggesting that we create a new OWNERS file

@roberthbailey
Copy link
Contributor

Yup. I asked @euank to send a quick PR to add one.

@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from 3536ecc to 65e8d77 Compare December 16, 2016 22:02
@euank
Copy link
Contributor Author

euank commented Dec 16, 2016

@roberthbailey I addressed your comment and rebased it back on master again to pick up any missing gci changes.

I validated that it works for me with my kube-up environment of:

export KUBE_SSH_USER=core
export KUBE_OS_DISTRIBUTION=coreos

export KUBE_GCE_MASTER_PROJECT=coreos-cloud
export KUBE_GCE_MASTER_IMAGE=$(gcloud compute images list --project coreos-cloud --filter  coreos-alpha --format=json | jq '.[0].name' -r)
export KUBE_GCE_NODE_PROJECT=coreos-cloud
export KUBE_GCE_NODE_IMAGE=${KUBE_GCE_MASTER_IMAGE}

I'll send an owners pr pronto as well including myself, @yifan-gu, and @ethernetdan as owners.

k8s-github-robot pushed a commit that referenced this pull request Dec 17, 2016
Automatic merge from submit-queue

cluster/gce/coreos: add OWNERS

See #33965 for context.

The code in `cluster/gce/coreos` has mostly been written/maintained by @yifan-gu and myself thusfar, so I added our names to the owner list.

@ethernetdan has also volunteered as well (thanks!).

**Release note**:
```release-note
NONE
```

cc @roberthbailey
@roberthbailey
Copy link
Contributor

@euank looks like it needs another rebase.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2016
This is for reviewing ease as the following commits introduce changes
to make the coreos kube-up deployment share significant code with the
gci code.
This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the `gci` cluster scripts.

The primary differences between the two are the following:
1. Use of the `/opt/kubernetes` directory over `/home/kubernetes`
2. Support for rkt as a runtime
3. No use of logrotate
4. No use of `/etc/default/`
5. No logic related to noexec mounts or gci-specific firewall-stuff
We don't use this bit of gci currently.
@euank euank force-pushed the coreos-kube-up-now-with-less-cloud-init branch from 65e8d77 to 5a2d080 Compare December 18, 2016 05:36
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 18, 2016
@euank
Copy link
Contributor Author

euank commented Dec 19, 2016

Rebased ..

@roberthbailey
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 20, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3994845 into kubernetes:master Dec 20, 2016
@euank euank deleted the coreos-kube-up-now-with-less-cloud-init branch December 20, 2016 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework the kube-manifests for kube-up for coreos on gce
10 participants