New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add secret back to the workqueue with delay time, avoid expired bootstrap tokens not being deleted #77713
Conversation
/sig auth |
@@ -52,6 +54,7 @@ type TokenCleanerOptions struct { | |||
func DefaultTokenCleanerOptions() TokenCleanerOptions { | |||
return TokenCleanerOptions{ | |||
TokenSecretNamespace: api.NamespaceSystem, | |||
SecretResync: defaultSecretResyncInterval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an extremely short resync period. I probably wouldn't set this below an hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the kubeadm command and did not provide a valid default value. The token controller is only used for gc. It is reasonable to set a longer time. I have modified it to an hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the offset from the expected GC time?
are you seeing a resync that never happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the token controller will only enqueue the object when adding or modifying the secret. Add a resync in order to re-enqueue all the secrets.
what is the offset from the expected GC time?
Currently set to one hour, maybe longer?
9a39c6c
to
f3de10a
Compare
The bug sites 5 minutes as the desired wait. This doesn't look like it fixes the issue with a one hour resync. Why don't you just add the secret back to the workqueue with a delay == expiration - now? |
Given the small volume of expected secrets of this type, that seems like a reasonable approach |
@mikedanese @liggitt |
f3de10a
to
d8c6ff6
Compare
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
d8c6ff6
to
73e2187
Compare
73e2187
to
73a7133
Compare
/priority backlog |
@@ -38,6 +38,8 @@ import ( | |||
"k8s.io/kubernetes/pkg/util/metrics" | |||
) | |||
|
|||
const defaultSecretResyncInterval = 1 * time.Hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can drop the resync now
2b5fd41
to
0c4bb40
Compare
/test pull-kubernetes-e2e-gce |
} | ||
|
||
// token expires after 3 seconds | ||
time.AfterFunc(3*time.Second, verifyFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't test that the secret got auto-added back into the queue. we need to verify the length of the queue is 0, then after the secret expiration, the secret is auto-added back into the queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I modified the unit test and found the problem with the previous code. GetExpiration I should be trying to return how long it will expire, the previous code returned a negative number. I should write a good test : )
} | ||
|
||
secret := newTokenSecret("tokenID", "tokenSecret") | ||
addSecretExpiration(secret, timeString(3*time.Second)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making this identical to the sleep below seems likely to introduce flakes. suggest an expiration of 2 seconds here, and a wait.PollImmediate check below checking every 100ms (with a timeout of wait.ForeverTestTimeout
) until the queue length is 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, updated. thanks :) :)
return expired | ||
} | ||
|
||
// GetExpiration get expiration time from now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to doc the following things:
- the meaning of the boolean (isExpired)
- the meaning of a zero-value duration when isExpired=false (no expiration)
expiration, secret.Namespace, secret.Name, err) | ||
return 0, true | ||
} | ||
expAt := currentTime.Sub(expTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be easier to read as
timeRemaining := expTime.Sub(currentTime)
if timeRemaining <= 0 {
...
}
return timeRemaining, false
@liggitt updated,PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few last clarifications, then squash to a single commit. thanks!
@@ -188,7 +188,8 @@ func (tc *TokenCleaner) syncFunc(key string) error { | |||
|
|||
func (tc *TokenCleaner) evalSecret(o interface{}) { | |||
secret := o.(*v1.Secret) | |||
if bootstrapsecretutil.HasExpired(secret, time.Now()) { | |||
expiration, ok := bootstrapsecretutil.GetExpiration(secret, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to ttl, alreadyExpired
} | ||
|
||
// GetExpiration checks if the secret expires | ||
// isExpired indicates whether it expires, and timeRemaining indicates how long it will expire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isExpired indicates if the secret is already expired.
timeRemaining indicates how long until it does expire.
if the secret has no expiration timestamp, returns0, false
.
if there is an error parsing the secret's expiration timestamp, returns0, true
.
@@ -201,4 +202,7 @@ func (tc *TokenCleaner) evalSecret(o interface{}) { | |||
klog.V(3).Infof("Error deleting Secret: %v", err) | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if ttl > 0 {
…trap tokens not being deleted
720647b
to
5c1815a
Compare
/test pull-kubernetes-e2e-gce-100-performance |
@liggitt already squash commits, very thank you for code review, ptal |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, zjj2wry The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@zjj2wry thanks for your work on fixing this! I've hit the same problem with tokens being issued by a Node Authorizer, causing a build-up of 4000+ old bootstrap token secrets that should have been cleared down. Eventually resulted in a timeout to the API fetching secrets and no new nodes getting bootstrap tokens issued. Do you think this would be eligible for a cherry-pick to the 1.15, 1.16 branches? Doesn't look like it has made it into any release yet. |
This is already in all v1.16.x and v1.17.x releases |
Ah apologies my mistake, hadn't spotted it in the changelog but I can see it's made it through 👍 |
What type of PR is this?
What this PR does / why we need it:
Fixes #77505
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: