-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make bootstrap client cert loading part of rotation #69890
Make bootstrap client cert loading part of rotation #69890
Conversation
/assign @awly @mikedanese @liggitt as discussed, @aaronlevy will allow bootkube and other self-hosting mechanisms to rely on bootstrapped master nodes and use static pods. |
2e6a32b
to
33ad8ce
Compare
I reverted #66056 in this - perhaps we could instead do a more aggressive dial timeout for bootstrap credentials (now that we always do both in the background)? |
Testing this here and in openshift/origin#21274 so I have a better baseline across different environments |
47a2f3d
to
3c493bc
Compare
There are four core scenarios to test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re #66056: it should be fine to remove if we add aggressive timeout and fast retries for initial bootstrap.
For context: it was needed on GCE because network programming can lag behind and bootstrapping would get trapped in a very long timeout, even after master IP becomes reachable from Node. cc @mikedanese
cmd/kubelet/app/server.go
Outdated
// XXX: When an external bootstrap source is available, it should be possible to always use that source | ||
// to retrieve new credentials. | ||
config := certConfig | ||
if current != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, if certificate manager has an expired cert and fails to rotate it, Current()
will return that expired cert.
Can we check that current
is still valid here and fall back to certConfig if it's not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current() returns nil if the cert is expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least it should, that was the intended behavior a while back. The transport rotation loop checks for the transition from valid -> invalid and uses that as the "die if this persists"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached cert is only updated on successful rotation: http://go/gh/kubernetes/kubernetes/blob/3c493bc95a825f0f6eb38ef78a19811f204053ec/staging/src/k8s.io/client-go/util/certificate/certificate_manager.go#L396
And Current()
just returns the cached cert without checking expiry http://go/gh/kubernetes/kubernetes/blob/3c493bc95a825f0f6eb38ef78a19811f204053ec/staging/src/k8s.io/client-go/util/certificate/certificate_manager.go#L204-L208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I added the filtering to getCurrentCertificateOrBootstrap. There's very little point in the cert manager handing me back an expired cert - can you think of a case where that makes sense? If not, we may want to change the signature and verify callers get the right thing (callers should have to deal with nil in general). Alternatively, we make every caller filter for a valid cert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think manager.Current()
should always check expiration and return nil when cert is expired.
Specifically, scenario I'm concerned about is:
- kubelet has valid bootstrap creds
- kubelet bootstraps new client cert with short expiration
manager.Current()
starts returning this cert- later, rotation using bootstrapped client cert fails
- even later, client cert expires
- rotation flow still gets it from
manager.Current()
, but fails trying to rotate because it's expired
I'd expect rotation to switch back to using bootstrap cert again when active client cert expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I agree. I'll make that change and add tests
return clientConfig, clientConfig, nil | ||
} | ||
|
||
store, err := certificate.NewFileStore("kubelet-client", certDir, certDir, "", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used to generate pemPath
?
moveit down to where it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's high in the function to fail earlier on local config
// kubeconfigPath on disk is populated based on bootstrapPath but pointing to the location of the client cert | ||
// in certDir. This preserves the historical behavior of bootstrapping where on subsequent restarts the | ||
// most recent client cert is used to request new client certs instead of the initial token. | ||
func LoadClientConfig(kubeconfigPath string, bootstrapPath string, certDir string) (certConfig, userConfig *restclient.Config, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootstrapPath, certDir string
return &transportConfig, closeAllConns, nil | ||
} | ||
|
||
if len(s.BootstrapKubeconfig) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this get called before starting the cert managed above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in the unmanaged path. In the managed path bootstrap is managed by the manager.
/assign @caesarxuchao |
/assign @awly |
3c493bc
to
4e141c8
Compare
still looking at making the waitForServer path able to have a more aggressive timeout and dial. |
Since the timeouts seemed prone to follow up, split all that out. This is just pulling bootstrap into rotation, and having m.Current() return nil when the cert is expired. |
f6e246b
to
f008184
Compare
/retest |
@awly this is back to effectively the original PR you reviewed. Was your previous review sufficient or were you looking for more? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on getting this in to CI and seeing how it goes.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mikedanese, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
Ensure that bootstrap+clientcert-rotation in the Kubelet can: 1. happen in the background so that static pods aren't blocked by bootstrap 2. collapse down to a single call path for requesting a CSR 3. reorganize the code to allow future flexibility in retrieving bootstrap creds Fetching the first certificate and later certificates when the kubelet is using client rotation and bootstrapping should share the same code path. We also want to start the Kubelet static pod loop before bootstrapping completes. Finally, we want to take an incremental step towards improving how the bootstrap credentials are loaded from disk (potentially allowing for a CLI call to get credentials, or a remote plugin that better integrates with cloud providers or KSMs). Reorganize how the kubelet client config is determined. If rotation is off, simplify the code path. If rotation is on, load the config from disk, and then pass that into the cert manager. The cert manager creates a client each time it tries to request a new cert. Preserve existing behavior where: 1. bootstrap kubeconfig is used if the current kubeconfig is invalid/expired 2. we create the kubeconfig file based on the bootstrap kubeconfig, pointing to the location that new client certs will be placed 3. the newest client cert is used once it has been loaded
Expose both a Stop() method (for cleanup) and a method to force cert rotation, but only expose Stop() on the interface. Verify that we choose the correct client.
f008184
to
de293b2
Compare
New changes are detected. LGTM label has been removed. |
Comments addressed |
/hold cancel |
/retest |
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Ensure that bootstrap+client-cert rotation in the Kubelet can:
Reorganize how the kubelet client config is determined. If rotation is
off, simplify the code path. If rotation is on, load the config
from disk, and then pass that into the cert manager. The cert manager
creates a client each time it tries to request a new cert.
Preserves existing behavior where:
the location that new client certs will be placed
reverted in #71173, original release note was:
Fixes #68686