Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller/discoverer: add readiness check #135

Merged

Conversation

@JulienBalestra
Copy link
Contributor

commented Aug 10, 2019

This PR allows to establish a healthcheck over a local volume provisioner pod.

This allows other controllers in the cluster to actually validate if a node has all it's system (daemonset) components up and running.

An immediate example of consumer would be the cluster-autoscaler to improve its capability with local volumes.

As a first step I added a readiness check for the discoverer.DiscoverLocalVolumes() call.
These checks could be easily be extended in other packages.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Welcome @JulienBalestra!

It looks like this is your first PR to kubernetes-sigs/sig-storage-local-static-provisioner 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/sig-storage-local-static-provisioner has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot requested review from cofyc and wongma7 Aug 10, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Hi @JulienBalestra. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from e73e171 to b40dd62 Aug 13, 2019
@msau42

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2019

/ok-to-test
/assign

@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from d8d591f to 044b7b6 Aug 13, 2019
@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Aug 13, 2019
@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch 2 times, most recently from bbbf190 to 9afa79a Aug 13, 2019
@JulienBalestra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

/retest

Gopkg.lock Outdated
[[projects]]
branch = "master"
digest = "1:b669bf7ea147e4d75e097e89f45c1e7de52e79dc913bbbae94b79177f7b075b6"
name = "github.com/heptiolabs/healthcheck"

This comment has been minimized.

Copy link
@cofyc

cofyc Aug 14, 2019

Member

hi, thanks for contributing this feature! because this is a k8s project, for consistent behavior, what about using this package instead? https://godoc.org/k8s.io/apiserver/pkg/server/healthz

@cofyc

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

/retest

@cofyc

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

I've opened a PR for build issue: #137

@msau42

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

/retest

Copy link
Contributor

left a comment

It would be good to add a page to the docs/ section (linked from README) with how to use this feature.

@cofyc I thought we already had a doc that talked about all the metrics that we export? It may be good combine these 2 docs into a "monitoring" doc.

}
d.readySync.Lock()
d.ready = discoErrors == nil

This comment has been minimized.

Copy link
@msau42

msau42 Aug 16, 2019

Contributor

Do we need to store an array of errors? Can it just be a bool?

This comment has been minimized.

Copy link
@JulienBalestra

JulienBalestra Aug 16, 2019

Author Contributor

Yes in this case it could be 👍

klog.V(7).Infof("Discovering volumes at hostpath %q, mount path %q for storage class %q", config.HostDir, config.MountDir, class)

reclaimPolicy, err := d.getReclaimPolicyFromStorageClass(class)
if err != nil {
klog.Errorf("Failed to get ReclaimPolicy from storage class %q: %v", class, err)
return
return err

This comment has been minimized.

Copy link
@msau42

msau42 Aug 16, 2019

Contributor

I prefer the error handling style where the callers log the errors (to avoid double logging at every layer).

This comment has been minimized.

Copy link
@JulienBalestra

JulienBalestra Aug 16, 2019

Author Contributor

Would you like me to move all the different logging lines in this function to the caller:

(d *Discoverer) DiscoverLocalVolumes()

Don't you think we're going to loose some details about the error returned if we just log in the caller ?
It looks like all logs messages add more context to the returned error. Should we wrap them with this additional context ?

This comment has been minimized.

Copy link
@msau42

msau42 Aug 16, 2019

Contributor

How about return fmt.Error("Failed to get ReclaimPolicy" ....)

This comment has been minimized.

Copy link
@JulienBalestra

JulienBalestra Aug 16, 2019

Author Contributor

I've done the changes, let me know if it matches what you proposed 👍

@cofyc

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

yes, agree, the metrics section is here

@cofyc

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

/retest

@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from 9afa79a to 8771d8d Aug 16, 2019
@k8s-ci-robot k8s-ci-robot added size/L and removed size/XL labels Aug 16, 2019
@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from 8771d8d to e7b2cdc Aug 16, 2019
@JulienBalestra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2019

/retest

@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from e7b2cdc to 47ab39e Aug 20, 2019
Copy link
Contributor

left a comment

Sorry for the delay, I just have a minor comment on the readme. Thank you so much for working on this!

/approve

docs/provisioner.md Show resolved Hide resolved
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JulienBalestra, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from 47ab39e to 7277627 Sep 6, 2019

The readiness state is exposed on the path `/ready`.

The state become ready when discovered local volumes are successfully created.

This comment has been minimized.

Copy link
@msau42

msau42 Sep 6, 2019

Contributor

one more nit clarification: on the initial pass

This comment has been minimized.

Copy link
@msau42

msau42 Sep 6, 2019

Contributor

Would it be worth calling out that if there were no disks found on the initial pass, then it will also be ready?

This comment has been minimized.

Copy link
@JulienBalestra

JulienBalestra Sep 9, 2019

Author Contributor

@msau42 it's not on the initial pass only. For example if the RBAC is removed after being ready, the pv creation will fail so the pod will become notReady.

This is what I had in mind, would you like me to change this to become ready on the initial pass only ?

This comment has been minimized.

Copy link
@msau42

msau42 Sep 9, 2019

Contributor

Hm ok I think the rbac removal afterwards is a valid scenario of it becoming not ready. The main scenario I was thinking of was:

  1. Start provisioner before any disks were added/mounted
  2. Provisioner sets ready = true because there were no disks to discover
  3. After disks were added/mounted, now there are issues, and readiness gets set to false.

Is it ok for the provisioner to have been ready in step 2? I think it's fine (and I don't see an easy way to really detect this scenario), but do you think it's worth clarifying that it might be ready even if there were no disks and/or no work to do (ie all the disks were already discovered)?

This comment has been minimized.

Copy link
@JulienBalestra

JulienBalestra Sep 10, 2019

Author Contributor

@msau42 Agreed, I'm going to add some information about this in the docs 👍

Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
@JulienBalestra JulienBalestra force-pushed the DataDog:JulienBalestra/readiness branch from 7277627 to 2e2bd9b Sep 10, 2019
@msau42

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 10, 2019
@k8s-ci-robot k8s-ci-robot merged commit 93915e3 into kubernetes-sigs:master Sep 10, 2019
6 of 7 checks passed
6 of 7 checks passed
tide Not mergeable. Needs lgtm label.
Details
cla/linuxfoundation JulienBalestra authorized
Details
pull-sig-storage-local-static-provisioner Job succeeded.
Details
pull-sig-storage-local-static-provisioner-e2e Job succeeded.
Details
pull-sig-storage-local-static-provisioner-e2e-gke Job succeeded.
Details
pull-sig-storage-local-static-provisioner-test Job succeeded.
Details
pull-sig-storage-local-static-provisioner-verify Job succeeded.
Details
@JulienBalestra JulienBalestra deleted the DataDog:JulienBalestra/readiness branch Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.