Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add secret back to the workqueue with delay time, avoid expired bootstrap tokens not being deleted #77713

Merged
merged 1 commit into from Aug 1, 2019

Conversation

zjj2wry
Copy link
Contributor

@zjj2wry zjj2wry commented May 10, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Fixes #77505
Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 10, 2019
@zjj2wry
Copy link
Contributor Author

zjj2wry commented May 10, 2019

/sig auth
/cc @liggitt

@k8s-ci-robot k8s-ci-robot requested a review from liggitt May 10, 2019 03:40
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 10, 2019
@@ -52,6 +54,7 @@ type TokenCleanerOptions struct {
func DefaultTokenCleanerOptions() TokenCleanerOptions {
return TokenCleanerOptions{
TokenSecretNamespace: api.NamespaceSystem,
SecretResync: defaultSecretResyncInterval,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an extremely short resync period. I probably wouldn't set this below an hour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the kubeadm command and did not provide a valid default value. The token controller is only used for gc. It is reasonable to set a longer time. I have modified it to an hour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the offset from the expected GC time?
are you seeing a resync that never happens?

Copy link
Contributor Author

@zjj2wry zjj2wry May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the token controller will only enqueue the object when adding or modifying the secret. Add a resync in order to re-enqueue all the secrets.

what is the offset from the expected GC time?

Currently set to one hour, maybe longer?

@mikedanese
Copy link
Member

mikedanese commented Jun 14, 2019

The bug sites 5 minutes as the desired wait. This doesn't look like it fixes the issue with a one hour resync. Why don't you just add the secret back to the workqueue with a delay == expiration - now?

@liggitt
Copy link
Member

liggitt commented Jul 8, 2019

The bug sites 5 minutes as the desired wait. This doesn't look like it fixes the issue with a one hour resync. Why don't you just add the secret back to the workqueue with a delay == expiration - now?

Given the small volume of expected secrets of this type, that seems like a reasonable approach

@zjj2wry
Copy link
Contributor Author

zjj2wry commented Jul 10, 2019

@mikedanese @liggitt
thank you for reviewing, so token will be deleted in real time 👍 , I will modify this pr

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 11, 2019
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jul 11, 2019
@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jul 11, 2019
@zjj2wry
Copy link
Contributor Author

zjj2wry commented Jul 11, 2019

/priority backlog
@mikedanese @liggitt updated, ptal

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 11, 2019
@@ -38,6 +38,8 @@ import (
"k8s.io/kubernetes/pkg/util/metrics"
)

const defaultSecretResyncInterval = 1 * time.Hour
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop the resync now

@zjj2wry zjj2wry force-pushed the secret_resync_duration branch 3 times, most recently from 2b5fd41 to 0c4bb40 Compare July 11, 2019 18:15
@zjj2wry
Copy link
Contributor Author

zjj2wry commented Jul 12, 2019

/test pull-kubernetes-e2e-gce

}

// token expires after 3 seconds
time.AfterFunc(3*time.Second, verifyFunc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't test that the secret got auto-added back into the queue. we need to verify the length of the queue is 0, then after the secret expiration, the secret is auto-added back into the queue

Copy link
Contributor Author

@zjj2wry zjj2wry Jul 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I modified the unit test and found the problem with the previous code. GetExpiration I should be trying to return how long it will expire, the previous code returned a negative number. I should write a good test : )

@zjj2wry zjj2wry changed the title Add a resync to avoid expired secrets not being deleted add secret back to the workqueue with delay time, avoid expired bootstrap tokens not being deleted Jul 13, 2019
}

secret := newTokenSecret("tokenID", "tokenSecret")
addSecretExpiration(secret, timeString(3*time.Second))
Copy link
Member

@liggitt liggitt Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this identical to the sleep below seems likely to introduce flakes. suggest an expiration of 2 seconds here, and a wait.PollImmediate check below checking every 100ms (with a timeout of wait.ForeverTestTimeout) until the queue length is 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, updated. thanks :) :)

return expired
}

// GetExpiration get expiration time from now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to doc the following things:

  • the meaning of the boolean (isExpired)
  • the meaning of a zero-value duration when isExpired=false (no expiration)

expiration, secret.Namespace, secret.Name, err)
return 0, true
}
expAt := currentTime.Sub(expTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be easier to read as

timeRemaining := expTime.Sub(currentTime)
if timeRemaining <= 0 {
   ...
}
return timeRemaining, false

@zjj2wry
Copy link
Contributor Author

zjj2wry commented Jul 30, 2019

@liggitt updated,PTAL

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few last clarifications, then squash to a single commit. thanks!

@@ -188,7 +188,8 @@ func (tc *TokenCleaner) syncFunc(key string) error {

func (tc *TokenCleaner) evalSecret(o interface{}) {
secret := o.(*v1.Secret)
if bootstrapsecretutil.HasExpired(secret, time.Now()) {
expiration, ok := bootstrapsecretutil.GetExpiration(secret, time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to ttl, alreadyExpired

}

// GetExpiration checks if the secret expires
// isExpired indicates whether it expires, and timeRemaining indicates how long it will expire
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isExpired indicates if the secret is already expired.
timeRemaining indicates how long until it does expire.
if the secret has no expiration timestamp, returns 0, false.
if there is an error parsing the secret's expiration timestamp, returns 0, true.

@@ -201,4 +202,7 @@ func (tc *TokenCleaner) evalSecret(o interface{}) {
klog.V(3).Infof("Error deleting Secret: %v", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else if ttl > 0 {

@zjj2wry
Copy link
Contributor Author

zjj2wry commented Aug 1, 2019

/test pull-kubernetes-e2e-gce-100-performance

@zjj2wry
Copy link
Contributor Author

zjj2wry commented Aug 1, 2019

@liggitt already squash commits, very thank you for code review, ptal

@liggitt
Copy link
Member

liggitt commented Aug 1, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, zjj2wry

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit b658028 into kubernetes:master Aug 1, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 1, 2019
@KashifSaadat
Copy link
Contributor

@zjj2wry thanks for your work on fixing this! I've hit the same problem with tokens being issued by a Node Authorizer, causing a build-up of 4000+ old bootstrap token secrets that should have been cleared down. Eventually resulted in a timeout to the API fetching secrets and no new nodes getting bootstrap tokens issued.

Do you think this would be eligible for a cherry-pick to the 1.15, 1.16 branches? Doesn't look like it has made it into any release yet.

@liggitt
Copy link
Member

liggitt commented Jan 13, 2020

Do you think this would be eligible for a cherry-pick to the 1.15, 1.16 branches? Doesn't look like it has made it into any release yet.

This is already in all v1.16.x and v1.17.x releases

@KashifSaadat
Copy link
Contributor

Ah apologies my mistake, hadn't spotted it in the changelog but I can see it's made it through 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tokencleaner not removing expired bootstrap tokens
6 participants