Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: dynamic event handlers registration #101394

Merged
merged 1 commit into from May 24, 2021

Conversation

Huang-Wei
Copy link
Member

@Huang-Wei Huang-Wei commented Apr 23, 2021

What type of PR is this?

/kind feature
/sig scheduling

What this PR does / why we need it:

Kubernetes scheduler now registers event handlers dynamically, rather than hard-coding the events registration logic. Please see more details in Design Doc.

This PR consists of 3 parts:

  • move clusterEventMap to Configurator, so the clusterEventMap can be used during event handlers registration
  • dynamic event handlers registration for core API resources (existing integration test TestServiceAffinityEnqueue can ensure the test coverage)
  • dynamic event handlers registration for custom resources (new integration test TestCustomResourceEnqueue added to ensure the test coverage)

Which issue(s) this PR fixes:

Part of #100347.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Scheduler now registers event handlers dynamically.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Design Doc]: https://docs.google.com/document/d/1RRlWKiFD77NpJPBOfEpUDsqNxi43wbwExlLO-gh3kfY/edit#
(join sig-scheduling google group to view the doc)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 23, 2021
@k8s-ci-robot
Copy link
Contributor

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 23, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2021
@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2021
Comment on lines +357 to +394
case framework.CSINode:
informerFactory.Storage().V1().CSINodes().Informer().AddEventHandler(
buildEvtResHandler(at, framework.CSINode, "CSINode"),
)
case framework.PersistentVolume:
// MaxPDVolumeCountPredicate: since it relies on the counts of PV.
AddFunc: sched.onPvAdd,
UpdateFunc: sched.onPvUpdate,
},
)

// This is for MaxPDVolumeCountPredicate: add/update PVC will affect counts of PV when it is bound.
informerFactory.Core().V1().PersistentVolumeClaims().Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: sched.onPvcAdd,
UpdateFunc: sched.onPvcUpdate,
},
)

// This is for ServiceAffinity: affected by the selector of the service is updated.
// Also, if new service is added, equivalence cache will also become invalid since
// existing pods may be "captured" by this service and change this predicate result.
informerFactory.Core().V1().Services().Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: sched.onServiceAdd,
UpdateFunc: sched.onServiceUpdate,
DeleteFunc: sched.onServiceDelete,
},
)

informerFactory.Storage().V1().StorageClasses().Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: sched.onStorageClassAdd,
},
)
//
// PvAdd: Pods created when there are no PVs available will be stuck in
// unschedulable queue. But unbound PVs created for static provisioning and
// delay binding storage class are skipped in PV controller dynamic
// provisioning and binding process, will not trigger events to schedule pod
// again. So we need to move pods to active queue on PV add for this
// scenario.
//
// PvUpdate: Scheduler.bindVolumesWorker may fail to update assumed pod volume
// bindings due to conflicts if PVs are updated by PV controller or other
// parties, then scheduler will add pod back to unschedulable queue. We
// need to move pods to active queue on PV update for this scenario.
informerFactory.Core().V1().PersistentVolumes().Informer().AddEventHandler(
buildEvtResHandler(at, framework.PersistentVolume, "Pv"),
)
case framework.PersistentVolumeClaim:
// MaxPDVolumeCountPredicate: add/update PVC will affect counts of PV when it is bound.
informerFactory.Core().V1().PersistentVolumeClaims().Informer().AddEventHandler(
buildEvtResHandler(at, framework.PersistentVolumeClaim, "Pvc"),
)
case framework.StorageClass:
if at&framework.Add != 0 {
informerFactory.Storage().V1().StorageClasses().Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: sched.onStorageClassAdd,
},
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really think that a user will configure a scheduler without registering the volume-based plugins? I feel that those should be treated the same as pod and node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really think that a user will configure a scheduler without registering the volume-based plugins?

Nope, I think volume-based plugins are always needed. The design intention is that they're not interacting with internal cache (but pod and node do), so they fit the switch case naturally, to leverage buildEvtResHandler generaizing their event handler instantiation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have buildEvtResHandler function moved outside, so we can still reuse it. The way I am interpreting this switch statement is whether or not those events are necessary to register. If we are saying the storage based ones should always be registered, then I think it is better to treat them like pod/node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(missed this comment earlier)

Another benefit is that we can generalize the registration for add/update/delete events dynamically - for example, currently storage-related registrations are hard-coded, some register add/update, some register add, etc. But if we leverage the unioned cluster events, we can dynamically register the event handlers - which will be resilient to adapt to further code changes such as add/delete events. WDYT?

// options.kubeConfig can be nil in tests.
if options.kubeConfig != nil {
dynClient := dynamic.NewForConfigOrDie(options.kubeConfig)
dynInformerFactory = dynamicinformer.NewFilteredDynamicSharedInformerFactory(dynClient, 0, v1.NamespaceAll, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any overhead (specifically memory and traffic between the api-server and scheduler) to instantiating this informer? if so, we should have an explicit enable flag in component config that is false by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of default scheduler, there is no plugin registering CR events, so IMO instantiating a dynInformerFactory won't do any concrete work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the dynamic informer cache is populated only if an event is registered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just like coreInformerFactory, by default the embedded informers are empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so someone needs to initiate an informer then, so that makes it even more necessary to pass this factory via framework handler to the plugins that will use it, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's your point at #101394 (comment). For now, plugin developers instantiate typed informer and register event handlers accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related, but might be different. If you remember, an informer is initiated only if it is explicitly accessed before starting the the informer. Is this the case here (I don't have experience with dynamic informer, so asking just to make sure)?

Copy link
Member Author

@Huang-Wei Huang-Wei Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remember, an informer is initiated only if it is explicitly accessed before starting the the informer.

Yes, similar here. For dynamic informer factory, an informer for CR xyz is only initiated when we fetch the gvr of xyz, and call its informer() function. In this PR, it's in the default case - i.e., only triggered when a custom plugin implemented EventsToRegister() to show its interests in CR xyz.

However, the framework doesn't really care about the cache for CRs - the out-of-tree plugins are responsible to build their own cache, if needed. The hook setup here is fairly simple: just to guarantee that upon some CR events, partial unschedulable Pods can be moved properly.

(BTW: I rethink the necessity of offering the dynInformerFactory in framework handle. See my other comment #101394 (comment))

@@ -281,10 +286,33 @@ func New(client clientset.Interface,
sched.StopEverything = stopEverything
sched.client = client

addAllEventHandlers(sched, informerFactory)
// Build dynamic client and dynamic informer factory
var dynInformerFactory dynamicinformer.DynamicSharedInformerFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if plugins want to access CRD resources, it is better if they do through the same informer factory used for registering the events, so we may want to plumb this to the framework handler just like we did for shared informer factory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I need to think about it. I'm a bit concerned that would add extra burden for plugin developers, such as fetching the GVR before add event handlers, etc. Probably need to compare it with how controller-runtime obstacles similar problems, and come up with a better abstraction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rethink it a bit. It's better not to enforce out-of-tree plugins to use dynamic informers as typed ones are more applicable and easy to manipulate, in both core logic and tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not forcing them to use it, but I think it is useful to make it available to them since the cache is already populated for the types they registered events for. If they create there own informer for the same type, aren't we wasting memory because we will have two informerFactor caches?

Copy link
Member Author

@Huang-Wei Huang-Wei May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think it is useful to make it available to them since the cache is already populated for the types they registered events for

Not quite sure the plugin developers would really use it, as the typed informer and other generated stuff are way easier to manipulate and performant.

If they create their own informer for the same type, aren't we wasting memory because we will have two informerFactory caches?

Yes, this is true. But AFAIK, using dynamic client/informer all the way down can impose overhead as it stores objects as interfaces under the hood, and hence that cause a lot of reflection operations.

pkg/scheduler/eventhandlers.go Outdated Show resolved Hide resolved
pkg/scheduler/eventhandlers.go Outdated Show resolved Hide resolved
pkg/scheduler/eventhandlers.go Outdated Show resolved Hide resolved
test/integration/scheduler/queue_test.go Show resolved Hide resolved
pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
@Huang-Wei
Copy link
Member Author

@ahg-g any further comments on this PR?

@ahg-g
Copy link
Member

ahg-g commented May 21, 2021

/lgtm
/hold

for squash

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2021
- move clusterEventMap to Configurator
- dynamic event handlers registration for core API resources
- dynamic event handlers registration for custom resources
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2021
@Huang-Wei
Copy link
Member Author

@ahg-g Commits squashed.

@Huang-Wei
Copy link
Member Author

@dims Could you take a look at the vendors change, or cascade to the correct reviewer? Thanks.

@dims
Copy link
Member

dims commented May 21, 2021

/approve

the vendor/ changes are good.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2021
@Huang-Wei
Copy link
Member Author

/retest

@ahg-g
Copy link
Member

ahg-g commented May 21, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2021
@Huang-Wei
Copy link
Member Author

/hold cancel
/retest

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2021
@Huang-Wei
Copy link
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants