Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not N^2 loading webhook configurations #114794

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

lavalamp
Copy link
Member

@lavalamp lavalamp commented Jan 4, 2023

This adds a processing delay to deduplicate notifications to reload webhook configurations. Solves N^2 behavior on startup and prevents useless work when webhooks change in rapid succession.

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:

Fixes N^2 behavior I noticed while working on #113985.

Does this PR introduce a user-facing change?

kube-apiserver: removed N^2 behavior loading webhook configurations.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 4, 2023
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 4, 2023
@lavalamp
Copy link
Member Author

lavalamp commented Jan 4, 2023

I'll add corresponding changes to the mutating version of this file once someone looks at this.

@lavalamp
Copy link
Member Author

lavalamp commented Jan 4, 2023

/sig api-machinery

@lavalamp
Copy link
Member Author

lavalamp commented Jan 4, 2023

The test fail is real, the goroutine doesn't get cleaned up.

@lavalamp lavalamp force-pushed the improved-has-synced branch 3 times, most recently from 7b76004 to 23ee5d9 Compare January 5, 2023 05:34
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 5, 2023
@lavalamp lavalamp force-pushed the improved-has-synced branch 3 times, most recently from 0ca4937 to 5d16e01 Compare January 5, 2023 21:36
)

// validatingWebhookConfigurationManager collects the validating webhook objects so that they can be called.
type validatingWebhookConfigurationManager struct {
configuration *atomic.Value
lister admissionregistrationlisters.ValidatingWebhookConfigurationLister
hasSynced func() bool
delayer *synctrack.Delayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this a pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the configuration member above, which also doesn't need to be a pointer...

Comment on lines 77 to 82
go func() {
defer wg.Done()
for i := 0; i < adds; i++ {
d.Add()
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes a difference, but you're documenting that Add may be called in parallel., so maybe you could have called them in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@deads2k
Copy link
Contributor

deads2k commented Jan 6, 2023

having updateConfiguration handle the complete list of validating or mutating admission webhooks ensures

  1. order of updates to admission webhooks will always be honored. It is impossible to update-2 for webhook/B to take effect before update-1 for webhook/A to take effect.

But the only known failure mode is actually the lister.List call itself: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/admission/configuration/validating_webhook_manager.go#L77-L84

With the new ability to ensure processing of the initial list has happened (your previous hasSync'd fix) and the synchronous processing done in the add/update/delete func, is there still value in having the List versus having special handling for delete and performing a running adding/remove, perhaps with a periodic sanity check for a diff with a lock and panic on failure?

@lavalamp
Copy link
Member Author

lavalamp commented Jan 6, 2023

With the new ability to ensure processing of the initial list has happened (your previous hasSync'd fix) and the synchronous processing done in the add/update/delete func, is there still value in having the List versus having special handling for delete and performing a running adding/remove, perhaps with a periodic sanity check for a diff with a lock and panic on failure?

I think it was probably just done this way for simplicity? But yes, it could be changed to not list. I think that would be more invasive? This change is pretty small. And waiting and then listing once is probably more efficient on startup than doing N inserts (not that it matters much for this code -- it's about the precedent for me).

@deads2k
Copy link
Contributor

deads2k commented Jan 6, 2023

And waiting and then listing once is probably more efficient on startup than doing N inserts (not that it matters much for this code -- it's about the precedent for me)

Which precedent do we want? Prior to your PR that tracks "have all event handlers handled", this was the state of the art. After your PR, it is possible to build reliable tracking without the list and I think that combined with a periodic consistency checker, is likely how I would suggest people handle it. Would you suggest using a list plus a delay instead?

@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Jan 9, 2023
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 10, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 12, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2023
@logicalhan
Copy link
Member

/remove-sig instrumentation

@k8s-ci-robot k8s-ci-robot removed the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Jan 12, 2023
@lavalamp lavalamp force-pushed the improved-has-synced branch 2 times, most recently from ac3d387 to 3a0d86f Compare January 12, 2023 19:11
limitations under the License.
*/

package synctrack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the test external to make it clear that you're not using anything internal for the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

staging/src/k8s.io/client-go/tools/cache/synctrack/lazy.go Outdated Show resolved Hide resolved
func (z *Lazy[T]) Get() (T, error) {
e := z.cache.Load()
if e == nil {
// Since we don't force a constructor, nil is a possible value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catche! (typo intended 😂 )

@apelisse
Copy link
Member

/lgtm
/approve
If you want to make the test external, I don't really care:
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c71b82421fff3463a7870a49e2f3c4cbb662f5bd

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: apelisse, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add a "lazy" type to track when an update is needed. It uses a nested
locking technique to avoid extra evaluation calls.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2023
@apelisse
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2d1ecb82975186122e94b77b4c2d206724b92e7c

@lavalamp
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit cc9cc4d into kubernetes:master Jan 13, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants