New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for APIServer 'ok' forever during CSINode initialization during Kubelet init #89589
Conversation
To speed up unit tests and add more observability when things go wrong.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/priority important-soon |
224dd86
to
60e901a
Compare
cc @liggitt to check CSINode probe. Is here a better way how to check TLS bootstrap is finished and kubelet is ready to start for real? |
pkg/volume/csi/csi_plugin.go
Outdated
@@ -914,3 +919,30 @@ func highestSupportedVersion(versions []string) (*utilversion.Version, error) { | |||
} | |||
return highestSupportedVersion, nil | |||
} | |||
|
|||
// waitForAPIServerForever waits forever to get the APIServer Version as a proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify this comment to be consistent with the node retrieval on line 934
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ | ||
// Using a dummy node name, IsNotFound error is expected. IsForbidden or timeout are bad. | ||
_, lastErr = client.StorageV1().CSINodes().Get(context.TODO(), nonExistingNodeName, meta.GetOptions{}) | ||
if lastErr == nil || apierrors.IsNotFound(lastErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastErr shouldn't be nil if the node is non-existent :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you never know :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could crash break whole cluster by creating a node with nonExistingNodeName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the node may exist, we can give it name such as possiblyExistingNodeName
60e901a
to
eacb857
Compare
Trick with getting CSINode does not work:
|
With the following change to csi_plugin (nonExistingNodeName is no longer used), test suites pass:
|
Updated to get |
eacb857
to
8bdbd4d
Compare
@jsafrane: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
Edited the release note a bit to emphasize kubelet panic. |
…89-upstream-release-1.17 Automated cherry pick of #89589: Wait for APIServer 'ok' forever during CSINode
…89-upstream-release-1.18 Automated cherry pick of #89589: Wait for APIServer 'ok' forever during CSINode
This is fixed version of #88000 when @davidz627 is out.
I kept David's commit there, so he gets credit for the fix.
If APIServer takes a long time to initialize (faster than the timeout for CSI Node intialization) the Kubelet may kill itself before the APIServer has time to come up. This PR makes it so that the goroutine that initializes CSINode will poll and wait for APIServer initialization before starting the exponential backoff to create CSINode objects.
Fixes: #87964
/cc @misterikkit @msau42 @rphillips @marun @tedyu
/kind bug