Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certificate_manager: Check that template differs from current cert before rotation #69991

Merged
merged 1 commit into from
Oct 25, 2018

Conversation

agunnerson-ibm
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

With the current behavior, when kubelet starts, a templateChanged
event is always fired off because it only checks if getLastRequest
matches getTemplate. The last request only exists in memory and thus
is initially nil and can't ever match the current template during
startup.

This causes kubelet to request the signing of a new CSR every time it's
restarted. This commit changes the behavior so that templateChanged is
only fired off if the currently template doesn't match both the current
certificate and the last template.

Which issue(s) this PR fixes:

Fixes #69471

Release note:

When `--rotate-server-certificates` is enabled, kubelet will no longer request a new certificate on startup if the current certificate on disk is satisfactory.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. labels Oct 18, 2018
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 18, 2018
@fedebongio
Copy link
Contributor

/cc @caesarxuchao

@liggitt
Copy link
Member

liggitt commented Oct 19, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 19, 2018
@liggitt
Copy link
Member

liggitt commented Oct 23, 2018

there's a gofmt error that needs fixing to make the bot happy:

-func (m* manager) certSatisfiesTemplate() bool {
+func (m *manager) certSatisfiesTemplate() bool {
 	m.certAccessLock.RLock()
 	defer m.certAccessLock.RUnlock()
 	if m.cert == nil {

Run ./hack/update-gofmt.sh

glog.V(2).Infof("Current certificate CN (%s) does not match requested CN (%s), rotating now", m.cert.Leaf.Subject.CommonName, template.Subject.CommonName)
return time.Now()
glog.V(2).Infof("Current certificate CN (%s) does not match requested CN (%s)", m.cert.Leaf.Subject.CommonName, template.Subject.CommonName)
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check sets.NewString(m.cert.Leaf.Subject.Organizations...).HasAll(template.Subject.Organizations...)

@liggitt
Copy link
Member

liggitt commented Oct 23, 2018

/test pull-kubernetes-e2e-gke

@liggitt
Copy link
Member

liggitt commented Oct 23, 2018

thanks for the PR, just a couple comments. exercising the logic in a unit test would be good as well

@liggitt liggitt added this to the v1.13 milestone Oct 23, 2018
@liggitt liggitt added sig/auth Categorizes an issue or PR as relevant to SIG Auth. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 23, 2018
return time.Now()
}

m.certAccessLock.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certSatisfiesTemplate already locks the mutex, this will deadlock.
Recommend splitting certSatisfiesTemplate into two functions: unlocked one and a locking wrapper. Call the unlocked one here and the locked one in manager.Start

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2018
@agunnerson-ibm
Copy link
Contributor Author

Thanks everyone for the comments! I've updated this PR to address them.

Changes:

  • Fixed all gofmt errors
  • Added check to ensure that the organizations in the x509 subject of the cert match the template
  • Split certSatisfiesTemplate into certSatisfiesTemplate and certSatisfiesTemplateLocked and use the latter if certAccessLock is already locked
  • Added unit tests for certSatisfiesTemplate

@agunnerson-ibm
Copy link
Contributor Author

Fixed the lint issue reported by pull-kubernetes-verify.

Copy link
Contributor

@awly awly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small nits, LGTM otherwise

}

if m.certSatisfiesTemplate() != tc.shouldSatisfy {
if tc.shouldSatisfy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't wrap errors, store certSatisfiesTemplate result in a var and do a simpler error print:

t.Errorf("cert: %+v, template: %+v, certSatisfiesTemplate returned %v, want %v", m.cert, tc.template, got, tc.shouldSatisfy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix that.

m := manager{
cert: tlsCert,
getTemplate: func() *x509.CertificateRequest { return tc.template },
usages: []certificates.KeyUsage{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'll remove it.

…fore rotation

With the current behavior, when kubelet starts, a `templateChanged`
event is always fired off because it only checks if `getLastRequest`
matches `getTemplate`. The last request only exists in memory and thus
is initially `nil` and can't ever match the current template during
startup.

This causes kubelet to request the signing of a new CSR every time it's
restarted. This commit changes the behavior so that `templateChanged` is
only fired off if the currently template doesn't match both the current
certificate and the last template.

Fixes kubernetes#69471

Signed-off-by: Andrew Gunnerson <andrew.gunnerson@us.ibm.com>
@agunnerson-ibm
Copy link
Contributor Author

The latest commit should address @awly's comments above.

@awly
Copy link
Contributor

awly commented Oct 24, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2018
@liggitt
Copy link
Member

liggitt commented Oct 24, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: agunnerson-ibm, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2018
@liggitt
Copy link
Member

liggitt commented Oct 24, 2018

thanks for putting this together

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit d96f235 into kubernetes:master Oct 25, 2018
@agunnerson-ibm agunnerson-ibm deleted the issue-69471 branch October 25, 2018 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelet generates a new CSR on start even if it has a valid certificate on disk
6 participants