Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: add a upgrade health check that deploys a Job #81319

Merged

Conversation

@neolit123
Copy link
Member

neolit123 commented Aug 12, 2019

What this PR does / why we need it:

  • Add a new preflight check for upgrade that runs the pause container
    with -v in a Job.
  • Wait for the Job to complete and return an error after N seconds.
  • Manually clean the Job because we don't have the TTL controller
    enabled in kubeadm yet (it's still alpha).

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1717

Special notes for your reviewer:
"fixes" a security audit item, although not super critical as the upgrade process will catch further problems in failing components.

Does this PR introduce a user-facing change?:

kubeadm: add a upgrade health check that deploys a Job

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@kubernetes/sig-cluster-lifecycle-pr-reviews
/kind feature
/kind cleanup
/area security
/priority backlog

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Aug 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 12, 2019

/hold

Copy link
Member

rosti left a comment

Thanks @neolit123 !
I am not sure if this is going to pull us through. What if the pods are in a crash loop?
They may be reported as all running, but restart in a few seconds. Hence, the cluster won't be operational.
In fact, what is an operational cluster? If all of the control plane pods are running, but there is no CNI plugin installed, the cluster is still as good as useless.
The closest we can get to answering the original question is to use the liveness probe where available (but that may pose its own set of security risks if not done properly).
Having a check, that spans over more than the API server and etcd, means that a misconfigured scheduler, controller manager, proxy, etc. will be more difficult to fix, than just executing kubeadm upgrade apply --config=fixed-config.yaml

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 13, 2019

I am not sure if this is going to pull us through. What if the pods are in a crash loop?
They may be reported as all running, but restart in a few seconds. Hence, the cluster won't be operational.

we can hold for N seconds and see if the Pods are still running after this period, but then the crash loop period is unknown, so this is yet another arbitrary constant.

In fact, what is an operational cluster? If all of the control plane pods are running, but there is no CNI plugin installed, the cluster is still as good as useless.

IMO, CNI demands for separate check that ensures that the Pods have networking.
it's doable, but not sure we should add it, or in general, check if the cluster is perfectly healthy.

The closest we can get to answering the original question is to use the liveness probe where available (but that may pose its own set of security risks if not done properly).

then again if the liveness probes are working as expected the Pods should be in a Running state.

Having a check, that spans over more than the API server and etcd, means that a misconfigured scheduler, controller manager, proxy, etc. will be more difficult to fix, than just executing kubeadm upgrade apply --config=fixed-config.yaml

yet, --config it's not a recommended pattern as we've agreed.
it should not be used to modify or fix the cluster state.
IMO we should deprecate --config and move reconf to the planned reconf feature.

i must admit my biggest concern is not that this change is checking a Running state, but rather that it's checking the Running state on all CP Pods, which can break potential parallel CP node upgrades.

@rosti do you have a proposal on how to better approach the security audit ticket or are your comments suggesting that you do not agree with it and we should take no action?

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 13, 2019

/retest

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 13, 2019

if this PR merges we can do a health check on the local components /health* endpoints:
#81385

@rosti

This comment has been minimized.

Copy link
Member

rosti commented Aug 14, 2019

I am not sure if this is going to pull us through. What if the pods are in a crash loop?
They may be reported as all running, but restart in a few seconds. Hence, the cluster won't be operational.

we can hold for N seconds and see if the Pods are still running after this period, but then the crash loop period is unknown, so this is yet another arbitrary constant.

That sounds a bit like reimplementing the liveness probes in a different way.

In fact, what is an operational cluster? If all of the control plane pods are running, but there is no CNI plugin installed, the cluster is still as good as useless.

IMO, CNI demands for separate check that ensures that the Pods have networking.
it's doable, but not sure we should add it, or in general, check if the cluster is perfectly healthy.

That's my point. We should see what we need to check and what not. Obviously, we cannot ensure a fully operational cluster and it's not our job to verify that.

The closest we can get to answering the original question is to use the liveness probe where available (but that may pose its own set of security risks if not done properly).

then again if the liveness probes are working as expected the Pods should be in a Running state.

If a liveness probe fails, it will restart the container in question according to the restart policy. The pod will be in running state if there is at least one container, that's starting, running or restarting.

Having a check, that spans over more than the API server and etcd, means that a misconfigured scheduler, controller manager, proxy, etc. will be more difficult to fix, than just executing kubeadm upgrade apply --config=fixed-config.yaml

yet, --config it's not a recommended pattern as we've agreed.
it should not be used to modify or fix the cluster state.
IMO we should deprecate --config and move reconf to the planned reconf feature.

+1 to that.

i must admit my biggest concern is not that this change is checking a Running state, but rather that it's checking the Running state on all CP Pods, which can break potential parallel CP node upgrades.

Indeed, only a single instance of every component is required at a minimum, to claim that we are operational.

@rosti do you have a proposal on how to better approach the security audit ticket or are your comments suggesting that you do not agree with it and we should take no action?

Actually I deliberated quite a lot on this one and I couldn't find a solution, that seemed reliable enough to me.

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 21, 2019

this PR should be closed as the approach is not ideal, but instead i'm going to refactor it to use another approach as discussed during the office hours.

@neolit123 neolit123 changed the title kubeadm: add upgrade preflight check for "Running" control-plane Pods kubeadm: add a upgrade health check that deploys a DaemonSet Aug 27, 2019
@neolit123 neolit123 force-pushed the neolit123:1.16-kubeadm-upgrade-health-check branch from 6064dca to c1160de Aug 27, 2019
@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 27, 2019

@rosti @kubernetes/sig-cluster-lifecycle-pr-reviews
updated the PR to use a DS with pause image as @fabriziopandini recommended.

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 27, 2019

/retest

Copy link
Member

rosti left a comment

Thanks @neolit123 !

I don't like the idea of using a daemon set, a job would be much better in my opinion. We just want to verify that we can start something on the cluster.

A simple Job, like this one, should do the trick:

apiVersion: batch/v1
kind: Job
metadata:
  name: health-check
  namespace: kube-system
spec:
  backoffLimit: 0
  activeDeadlineSeconds: 30
  template:
    spec:
      restartPolicy: Never
      terminationGracePeriodSeconds: 0
      securityContext:
        runAsUser: 999
        runAsGroup: 999
        runAsNonRoot: true
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: health-check
        image: k8s.gcr.io/pause:3.1
        args: ["-v"]

We may also want to have the health checks with a random suffix, to ensure name uniqueness.
/hold

cmd/kubeadm/app/phases/upgrade/health.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/phases/upgrade/health.go Outdated Show resolved Hide resolved
@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 28, 2019

I don't like the idea of using a daemon set, a job would be much better in my opinion. We just want to verify that we can start something on the cluster.

wish you expressed your comments during our discussion with @fabriziopandini during the office hours, last week (was it).

@rosti

This comment has been minimized.

Copy link
Member

rosti commented Aug 28, 2019

@neolit123 we can go with a daemon set for 1.16, but let's copy over the spec and s/prepull/health-check over it.
Later on, we can turn it into a job.

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Aug 28, 2019

we can go with a daemon set for 1.16, but let's copy over the spec and s/prepull/health-check over it.

i will try to experiment with Job later this week.

/remove-kind feature

@timothysc timothysc removed their request for review Sep 6, 2019
@neolit123 neolit123 force-pushed the neolit123:1.16-kubeadm-upgrade-health-check branch from c1160de to d777c91 Nov 22, 2019
@neolit123 neolit123 changed the title kubeadm: add a upgrade health check that deploys a DaemonSet kubeadm: add a upgrade health check that deploys a Job Nov 22, 2019
@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Nov 22, 2019

updated to use a Job for the preflight check.

@neolit123 neolit123 force-pushed the neolit123:1.16-kubeadm-upgrade-health-check branch 2 times, most recently from 2adc722 to 29bad3c Nov 22, 2019
@rosti
rosti approved these changes Nov 22, 2019
Copy link
Member

rosti left a comment

Thanks @neolit123 !
Looks good and I have just non-blocking nits.
I would have used the TTL controller (even though it's still in alpha), but it's fine like this too.
/lgtm

cmd/kubeadm/app/phases/upgrade/health.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/phases/upgrade/health.go Show resolved Hide resolved
cmd/kubeadm/app/phases/upgrade/health.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 22, 2019
- Add a new preflight check for upgrade that runs the pause container
with -v in a Job.
- Wait for the Job to complete and return an error after N seconds.
- Manually clean the Job because we don't have the TTL controller
enabled in kubeadm yet (it's still alpha).
@neolit123 neolit123 force-pushed the neolit123:1.16-kubeadm-upgrade-health-check branch from 29bad3c to 906d315 Nov 22, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Nov 22, 2019

// If client.Discovery().RESTClient() is nil, the fake client is used, and that means we are dry-running. Just proceed
// If client.Discovery().RESTClient() is nil, the fake client is used.
// Return early because the kubeadm dryrun dynamic client only handles the core/v1 GroupVersion.

This comment has been minimized.

Copy link
@neolit123

neolit123 Nov 22, 2019

Author Member

had to add this.

the dynamic client fails when trying to get batch/v1 Job here:

unstructuredObj, err := clg.dynamicClient.Resource(action.GetResource()).Namespace(action.GetNamespace()).Get(action.GetName(), metav1.GetOptions{})

i think we should stop using the dynamic client for dryrun, but this is a much larger refactor.
until then we have to skip the logic in this preflight check on dryrun.

This comment has been minimized.

Copy link
@rosti

rosti Nov 26, 2019

Member

+1 we can have a backlog issue for this so an eager contributor can try it.

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Nov 22, 2019

@rosti updated. see latest comment above.

@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Nov 22, 2019

/hold cancel

@rosti
rosti approved these changes Nov 26, 2019
Copy link
Member

rosti left a comment

Thanks @neolit123 !
/lgtm


// If client.Discovery().RESTClient() is nil, the fake client is used, and that means we are dry-running. Just proceed
// If client.Discovery().RESTClient() is nil, the fake client is used.
// Return early because the kubeadm dryrun dynamic client only handles the core/v1 GroupVersion.

This comment has been minimized.

Copy link
@rosti

rosti Nov 26, 2019

Member

+1 we can have a backlog issue for this so an eager contributor can try it.

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 26, 2019
@neolit123

This comment has been minimized.

Copy link
Member Author

neolit123 commented Nov 26, 2019

@k8s-ci-robot k8s-ci-robot merged commit 2bc3804 into kubernetes:master Nov 26, 2019
14 of 15 checks passed
14 of 15 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation neolit123 authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.