Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a kubelet metric to track certificate expiration. #51031

Conversation

jcbsmpsn
Copy link
Contributor

@jcbsmpsn jcbsmpsn commented Aug 21, 2017

Fix #51964

Add a metric to the kubelet to monitor remaining lifetime of the certificate that
authenticates the kubelet to the API server.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 21, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 21, 2017
@jcbsmpsn jcbsmpsn force-pushed the metric-certificate-expiration-on-kubelet branch from 8b1d7ab to b0d77d0 Compare August 21, 2017 18:53
glog.V(2).Infof("Waiting %v for next certificate rotation", m.rotationDeadline.Sub(time.Now()))
for m.rotationDeadline.After(time.Now()) {
metrics.ClientCertificateExpiration.Set(m.rotationDeadline.Sub(time.Now()).Seconds())
time.Sleep(1 * time.Minute)
Copy link
Member

@liggitt liggitt Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug code? or did you mean to make this spin every minute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did mean this to spin every minute. So it can update the metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do this. Just set the deadline as a gauge and the gauge value to the expected time in seconds. And only update the value when the cert manager starts or updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spinning a goroutine for this is unusual.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt we should also expose this for the masters for serving certs.

Copy link
Member

@liggitt liggitt Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add serving cert lifetime in the kubelet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Do you a metric on the apiserver that shows the analogous value for the apiserver's certificate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each of the components that has a serving cert and each of the clients that request client certs want this.

@jcbsmpsn jcbsmpsn force-pushed the metric-certificate-expiration-on-kubelet branch 2 times, most recently from 6ab3358 to 7447bb7 Compare August 24, 2017 23:17
@jcbsmpsn
Copy link
Contributor Author

/assign @crassirostris

@crassirostris
Copy link

Overall LGTM

But be aware that MustRegisted will panic if e.g. the name is incorrect (since you form it dynamically), maybe use Register method instead?

@jcbsmpsn
Copy link
Contributor Author

@crassirostris Your point about MustRegister vs Register is good, but the name is formed from static values, not from user data. I'd rather it crashed and get debugged rather than have the metric disappear and that not be noticed.

@jcbsmpsn
Copy link
Contributor Author

/assign @Random-Liu

@jcbsmpsn jcbsmpsn force-pushed the metric-certificate-expiration-on-kubelet branch from 7447bb7 to bb54bfd Compare August 30, 2017 16:52
@jcbsmpsn jcbsmpsn force-pushed the metric-certificate-expiration-on-kubelet branch from bb54bfd to f1fef11 Compare August 30, 2017 16:55
@crassirostris
Copy link

@jcbsmpsn

@crassirostris Your point about MustRegister vs Register is good, but the name is formed from static values, not from user data. I'd rather it crashed and get debugged rather than have the metric disappear and that not be noticed.

But won't it crash anyway, just upper the stack, rather than in this place? I understand that now Name is passed from the constant, but that might change in the future. Anyway, it's your call, I don't insist :)

@smarterclayton
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jcbsmpsn, smarterclayton

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@crassirostris
Copy link

@jcbsmpsn You need either /approve no-issue label or create an issue and add it to the description in form of Fix <issue-link>

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

5 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 23, 2017

@jcbsmpsn: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws f1fef11 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51031, 51705, 51888, 51727, 51684). If you want to cherry-pick this change to another branch, please follow the instructions here..

@k8s-github-robot k8s-github-robot merged commit d4ac62c into kubernetes:master Sep 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants