Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: move the "kubelet-start" phase after "kubeconfig" for "init" #90892

Merged

Conversation

xphoniex
Copy link
Contributor

@xphoniex xphoniex commented May 8, 2020

What type of PR is this?

/kind feature

What this PR does / why we need it:

running kubeadm init gets stuck on alpine linux because it uses a different init system lacking re-try mechanism and for some reason kubeadm tries to start kubelet service before required conf files are copied, thus leaving the service in crashed state.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1986

Special notes for your reviewer:

We can only run kubelet-restart phase in case we detect non-systemd or openrc but this would barely make any difference in performance.

Does this PR introduce a user-facing change?:

action required: kubeadm: Move the "kubeadm init" phase "kubelet-start" later in the init workflow, after the "kubeconfig" phase. This makes kubeadm start the kubelet only after the KubeletConfiguration component config file (/var/lib/kubelet/config.yaml) is generated and solves a problem where init systems like OpenRC cannot crashloop the kubelet service.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

n/a

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 8, 2020
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 8, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @xphoniex!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @xphoniex. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 8, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 8, 2020
@@ -177,6 +177,7 @@ func NewCmdInit(out io.Writer, initOptions *initOptions) *cobra.Command {
initRunner.AppendPhase(phases.NewKubeConfigPhase())
initRunner.AppendPhase(phases.NewControlPlanePhase())
initRunner.AppendPhase(phases.NewEtcdPhase())
initRunner.AppendPhase(phases.NewKubeletRestartPhase())
Copy link
Member

@neolit123 neolit123 May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not patch the NewKubeletStartPhase phase to have a managed restart loop (in kubeadm, for something like 2 minutes) in the case of a detected openrc system? would that work?

adding a new phase does not seem needed for the systemd case, so we should avoid it.

also did you try that the change recommended here fixes the problem?
kubernetes/kubeadm#1986
(moving the kubelet start phase)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not patch the NewKubeletStartPhase phase to have a managed restart loop (in kubeadm, for something like 2 minutes) in the case of a detected openrc system? would that work?

could work but kubeadm should not become another init system as this would cause sync issues and might break something in edge cases.

unless we expect kubelet to fail randomly in other ways and restarting to heal those failures, a one-time restart would suffice here.

adding a new phase does not seem needed for the systemd case, so we should avoid it.

as systemd already does the restart itself, I expect a one-time restart to not cause any harm and actually improve start time by ~a second, also we can easily turn this phase into a no-op for systemd.

also did you try that the change recommended here fixes the problem?
kubernetes/kubeadm#1986
(moving the kubelet start phase)

I didn't try it, but it'd have been my proposed solution had I not seen this comment of yours:

this is something that we discussed at some point, but the problem is that 
at this point it could be a breaking change to all the users that separated 
the init process into phases.

I wanted my solution to change as little as possible and as a result I didn't modify existing phases like NewKubeletStartPhase and since moving phases could be a breaking change, I opted for creating a small, separate phase.

Copy link
Member

@neolit123 neolit123 May 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have the NewKubeletFinalizePhase phase and this could be potentially a sub-phase in there called "restart".
the "restart" phase can detect the init system and A) do nothing in the case of systemd B) perform a restart if openrc is detected as the init system.

adding it as a parent phase is far from ideal.

this is something that we discussed at some point, but the problem is that
at this point it could be a breaking change to all the users that separated
the init process into phases.

in any case i think i should bring this as a discussion topic in the next week meeting and see what the wider group thinks. we might as well just move the phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeadm gets stuck on NewWaitControlPlanePhase which is at line 181: https://github.com/kubernetes/kubernetes/blob/a713c8f6fb21e3f171510a104b700aef5fd88555/cmd/kubeadm/app/cmd/init.go#L180-L186

NewKubeletFinalizePhase is way further down at line 186.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this in the office hours. In the long term we should try to reconcile systemd, OpenRC, and Windows' Service Manager:

  • A service should be installed disabled if it's not installed with a default configuration (ready to go). The same should apply to the kubelet.
  • If a service is enabled, it should restart after it crashes (what it now does constantly on systemd).
  • kubeadm (and any other deployment tool) should enable the service and start it after it's configured.

This required some syncing with SIG Release and SIG Arch and a huge "action required" note somewhere.

Copy link
Member

@neolit123 neolit123 May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xphoniex

so we had a discussion about this problem in the kubeadm meeting today.
the introduction of a new phase or reordering phases is not exactly desired to solve the openrc case for the time being.

so we discussed this:

why not patch the NewKubeletStartPhase phase to have a managed restart loop (in kubeadm, for something like 2 minutes) in the case of a detected openrc system? would that work?

we sort of a agreed during the meeting that this is an OK option, but now i started thinking more about the actual implementation it ended up hacky and abrupt for kubeadm to manage the restart. we may have to os.Exit(1) kubeadm from go routine and so on.

[1] so my latest proposal for a short term fix is to just move the kubelet-start phase before the wait-control-plane phase, and here is why it will not be that breaking:

  • if users are using systemd, due to systemd supporting restarts, the location of the kublet-start phase is not important as long as they are calling it before wait-control-plane.
  • systemd users can optionally adjust their phase order as long as we have an action-required release note.
  • this has been bugging systemd and windows people too. as the crashlooping of kubeadm's managed kubelet is not really needed.

this solves @rosti's comment above:

kubeadm (and any other deployment tool) should enable the service and start it after it's configured.

also solves this openrc issue kubernetes/kubeadm#1986.

long term

  • openrc should support automatic restarts? i don't know what is the state of this problem there...but an init system not-supporting programmable restarts is far from ideal.
  • apply the changes @rosti suggests above to k8s packages.

alternative options (if we end up rejecting [1])

  • as proposed by @rosti, Alpine users can try wrapping the kubelet in a shim that manages its restarts, given openrc cannot do that.
  • Alpine users can break the kubeadm init process in phases and reorder the kubelet-start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we sort of a agreed during the meeting that this is an OK option, but now i started thinking more about the actual implementation it ended up hacky and abrupt for kubeadm to manage the restart. we may have to os.Exit(1) kubeadm from go routine and so on.

also there are places which kubeadm stops kubelet service, e.g. to write config files. it wouldn't break anything atm but if there's a need in future to stop the kubelet service, someone's going to be very confused why their code isn't working since goroutine keeps restarting it.

[1] so my latest proposal for a short term fix is to just move the kubelet-start phase before the wait-control-plane phase

sounds good to me!

I thought the issue was some users running their kubeadm phase by phase hence why we wouldn't want to reorder it. otherwise yeah, this should not break anything but if a user was using kubeadm phase by phase then they need to modify their scripts.

openrc should support automatic restarts? i don't know what is the state of this problem there...but an init system not-supporting programmable restarts is far from ideal.

openrc has something called supervise-daemon which is supposed to handle restarts for us but it's experimental and basically misbehaving on my VM and some older alpine releases.

we need a simple fix for now to get this working and in the long run superviced or whatever else the gentoo/alpine community comes up with would be the answer.

alternative options (if we end up rejecting [1])

see above ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad that we are having an action plan here for the time being. Let's do what @neolit123 proposed and put an "Action Required" release note on this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, please check @neolit123 .

@neolit123
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 8, 2020
Copy link
Member

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this @xphoniex !

@@ -177,6 +177,7 @@ func NewCmdInit(out io.Writer, initOptions *initOptions) *cobra.Command {
initRunner.AppendPhase(phases.NewKubeConfigPhase())
initRunner.AppendPhase(phases.NewControlPlanePhase())
initRunner.AppendPhase(phases.NewEtcdPhase())
initRunner.AppendPhase(phases.NewKubeletRestartPhase())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this in the office hours. In the long term we should try to reconcile systemd, OpenRC, and Windows' Service Manager:

  • A service should be installed disabled if it's not installed with a default configuration (ready to go). The same should apply to the kubelet.
  • If a service is enabled, it should restart after it crashes (what it now does constantly on systemd).
  • kubeadm (and any other deployment tool) should enable the service and start it after it's configured.

This required some syncing with SIG Release and SIG Arch and a huge "action required" note somewhere.

@xphoniex xphoniex force-pushed the fix-kubeadm-getting-stuck-alpine branch from a713c8f to 64cca18 Compare May 15, 2020 11:07
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 15, 2020
@neolit123
Copy link
Member

@xphoniex

please change the box under Does this PR introduce a user-facing change? to:

action required: kubeadm: Move the "kubeadm init" phase "kubelet-start" later in the init workflow, after the "kubeconfig" phase. This makes kubeadm start the kubelet only after the KubeletConfiguration component config file (/var/lib/kubelet/config.yaml) is generated and solves a problem where init systems like OpenRC cannot crashloop the kubelet service.

@neolit123
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 15, 2020
@neolit123
Copy link
Member

/remove-priority backlog
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 15, 2020
@neolit123
Copy link
Member

@kubernetes/sig-cluster-lifecycle-pr-reviews
/retest

@xphoniex
Copy link
Contributor Author

@neolit123 is this even related to kubeadm?

W0515 15:12:38.071]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2
e.py", line 111, in check_env
W0515 15:12:38.071]     subprocess.check_call(cmd, env=env)
W0515 15:12:38.072]   File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
W0515 15:12:38.072]     raise CalledProcessError(retcode, cmd)
W0515 15:12:38.072] subprocess.CalledProcessError: Command '('kubetest', '--dump=/works
pace/_artifacts', '--gcp-service-account=/etc/service-account/service-account.json', '-
-build=bazel', '--stage=gs://kubernetes-release-pull/ci/pull-kubernetes-e2e-gce', '--up
', '--down', '--test', '--provider=gce', '--cluster=e2e-7d24a0a517-674b9', '--gcp-netwo
rk=e2e-7d24a0a517-674b9', '--extract=local', '--gcp-node-image=gci', '--gcp-zone=us-wes
t1-b', '--ginkgo-parallel=30', '--runtime-config=batch/v2alpha1=true', '--test_args=--g
inkgo.skip=\\[Slow\\]|\\[Serial\\]|\\[Disruptive\\]|\\[Flaky\\]|\\[Feature:.+\\] --minS
tartupPods=8', '--timeout=80m')' returned non-zero exit status 1
E0515 15:12:38.073] Command failed

@neolit123
Copy link
Member

/retitle kubeadm: move the "kubelet-start" phase after "kubeconfig" for "init"

@k8s-ci-robot k8s-ci-robot changed the title Fix kubeadm getting stuck alpine kubeadm: move the "kubelet-start" phase after "kubeconfig" for "init" May 15, 2020
@neolit123
Copy link
Member

is this even related to kubeadm?

some test jobs are flaky. /retest retests them.

@xphoniex
Copy link
Contributor Author

/retest

@xphoniex
Copy link
Contributor Author

xphoniex commented May 15, 2020

This is really not fun, pull-kubernetes-verify ran for 2h0m22s only to fail to a seemingly unrelated issue:

+++ Running case: verify.openapi-spec 
+++ working dir: /home/prow/go/src/k8s.io/kubernetes
+++ command: bash "hack/make-rules/../../hack/verify-openapi-spec.sh"
Downloading https://github.com/coreos/etcd/releases/download/v3.4.7/etcd-v3.4.7-linux-a
md64.tar.gz succeed
{"component":"entrypoint","file":"prow/entrypoint/run.go:245","func":"k8s.io/test-infra
/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit befor
e 15s grace period","time":"2020-05-15T18:08:52Z"} 

Can someone help verify if it's related to my commit or not? Because if not, the test jobs are not just flaky, they are broken.

EDIT: I just checked, all PRs in the first page have pull-kubernetes-e2e-gce as failed.

@neolit123
Copy link
Member

/approve
/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, xphoniex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2020
@xphoniex xphoniex requested a review from neolit123 May 19, 2020 05:44
@neolit123
Copy link
Member

neolit123 commented May 19, 2020

will LGTM myself if nobody has comments until EOD today.

Copy link
Member

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xphoniex !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit bb4a211 into kubernetes:master May 19, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openrc: kubeadm is starting kubelet before complete config is writte to disk
5 participants