Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified Multi-point plugin configuration for scheduler #2891

Closed
4 tasks done
damemi opened this issue Aug 23, 2021 · 25 comments
Closed
4 tasks done

Simplified Multi-point plugin configuration for scheduler #2891

damemi opened this issue Aug 23, 2021 · 25 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Milestone

Comments

@damemi
Copy link
Contributor

damemi commented Aug 23, 2021

Enhancement Description

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 23, 2021
@damemi
Copy link
Contributor Author

damemi commented Aug 23, 2021

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 23, 2021
@damemi
Copy link
Contributor Author

damemi commented Aug 23, 2021

/assign
To continue discussion from kubernetes/kubernetes#102303:

It sounds like there is general consensus on @ahg-g's original proposal of a simplified single extension point, with a weight field for any score plugins.

To implement this, I think the best approach is:

  • Define a new extension point MultiPoint in Plugins (prefer a name besides all, because this may not register all points, more on that)
type Plugins struct {
	MultiPoint PluginSet `json:"multiPoint,omitEmpty"`
...
}
type MultiPointPlugin interface {
  Plugin
  // Register allows a plugin to automatically register itself in 
  // multiple extension points, as defined by the plugin
  Register(fw *framework.Framework)
}
// frameworkImpl is the component responsible for initializing and running scheduler
// plugins.
type frameworkImpl struct {
...
	pluginsMap           map[string]config.Plugin // <--new
	multiPointPlugins    []framework.MultiPointPlugin // <--new
	scorePluginWeight    map[string]int
	queueSortPlugins     []framework.QueueSortPlugin
	preFilterPlugins     []framework.PreFilterPlugin
	filterPlugins        []framework.FilterPlugin
	postFilterPlugins    []framework.PostFilterPlugin
	preScorePlugins      []framework.PreScorePlugin
	scorePlugins         []framework.ScorePlugin
	reservePlugins       []framework.ReservePlugin
	preBindPlugins       []framework.PreBindPlugin
	bindPlugins          []framework.BindPlugin
	postBindPlugins      []framework.PostBindPlugin
	permitPlugins        []framework.PermitPlugin
...
}
  • Modify getExtensionPoints() to return a the MultiPoint extensions as well, so that the factory functions for these get registered
func getExtensionPoints(plugins *config.Plugins) map[string]extensionPoint {
	return []extensionPoint{
		{&plugins.MultiPoint, &f.multiPointPlugins},
		{&plugins.PreFilter, &f.preFilterPlugins},
		{&plugins.Filter, &f.filterPlugins},
		{&plugins.PostFilter, &f.postFilterPlugins},
		{&plugins.Reserve, &f.reservePlugins},
		{&plugins.PreScore, &f.preScorePlugins},
		{&plugins.Score, &f.scorePlugins},
		{&plugins.PreBind, &f.preBindPlugins},
		{&plugins.Bind, &f.bindPlugins},
		{&plugins.PostBind, &f.postBindPlugins},
		{&plugins.Permit, &f.permitPlugins},
		{&plugins.QueueSort, &f.queueSortPlugins},
	}
}
for _, e := range f.getExtensionPoints(profile.Plugins) {
	if err := updatePluginList(e.slicePtr, *e.plugins, pluginsMap); err != nil {
		return nil, err
	}
}

for _, mp := range f.multiPointPlugins {
	mp.Register(f)
}

Plugins can then implement Register() like so:

func (pl *MyPlugin) Register(fw *framework.Framework) {
	myEPs := &config.Plugins{
		PreFilter: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name()}}
		},
		Filter: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name()}}
		},
		PreScore: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name()}}
		},
		Score: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name(), Weight: 1}}
		},
	}
	fw.RegisterMultiPoint(myEPs)
}

using a new public function RegisterMultiPoint() in framework.go, which duplicates the behavior from the original function registration in NewFramework():

func (f *frameworkImpl) RegisterMultiPoint(eps *config.Plugins) error {
  for name, e := f.getExtensionPoints(eps) {
    // by passing `eps` as set by the plugin in its Register() func,
    // the map returned from `f.getExtensionPoints(eps)` will only have the
    // function's defined config profiles in each *e.plugins
    // So only those will be appended to the existing slicePtrs
    if err := updatePluginList(e.slicePtr, *e.plugins, f.pluginsMap); err != nil {
      return err
    }
  }
}

This duplication allows the existing manual plugin config behavior to be preserved, while also taking advantage of the error checking already available with these functions.

The Register() extension point puts flexibility in the developer's hands to define several Multi-Point registration approaches. For example, a plugin which provides Filter and Score extensions could add an option in its instantiation to only enable one or the other:

func (pl *MyPlugin) Register(fw *framework.Framework) {
	myEPs := &config.Plugins{
		PreFilter: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name()}}
		},
		Filter: &config.PluginSet{
			Enabled: []Plugin{{Name: pl.Name()}}
		}}

	if pl.NoScoring == false {
		myEPs.PreScore = &config.PluginSet{Enabled: []Plugin{{Name: pl.Name()}}},
		myEPs.Score = &config.PluginSet{Enabled: []Plugin{{Name: pl.Name(), Weight: 1}}},
	}
	fw.RegisterMultiPoint(myEPs)
}

For MultiPoint disabling of default plugins, it looks like we need to address that in mergePlugins() in pkg/scheduler/apis/config/v1beta2/default_plugins.go. We can pass the list of Disabled multipoint plugins to each call to mergePluginSet() as the starting set of disabled plugins. Should any default plugins match one that has been globally disabled, they will be removed then.

// mergePlugins merges the custom set into the given default one, handling disabled sets.
func mergePlugins(defaultPlugins, customPlugins *v1beta2.Plugins) *v1beta2.Plugins {
	if customPlugins == nil {
		return defaultPlugins
	}
	
	multiPointDisabled := make([]string, len(customPlugins.MultiPoint.Disabled))
	for p := range customPlugins.MultiPoint.Disabled {
		multiPointDisabled = append(multiPointDisabled, p.Name)
	}
	defaultPlugins.QueueSort = mergePluginSet(defaultPlugins.QueueSort, customPlugins.QueueSort, multiPointDisabled)
	defaultPlugins.PreFilter = mergePluginSet(defaultPlugins.PreFilter, customPlugins.PreFilter, multiPointDisabled)
	defaultPlugins.Filter = mergePluginSet(defaultPlugins.Filter, customPlugins.Filter, multiPointDisabled)
	defaultPlugins.PostFilter = mergePluginSet(defaultPlugins.PostFilter, customPlugins.PostFilter, multiPointDisabled)
	defaultPlugins.PreScore = mergePluginSet(defaultPlugins.PreScore, customPlugins.PreScore, multiPointDisabled)
	defaultPlugins.Score = mergePluginSet(defaultPlugins.Score, customPlugins.Score, multiPointDisabled)
	defaultPlugins.Reserve = mergePluginSet(defaultPlugins.Reserve, customPlugins.Reserve, multiPointDisabled)
	defaultPlugins.Permit = mergePluginSet(defaultPlugins.Permit, customPlugins.Permit, multiPointDisabled)
	defaultPlugins.PreBind = mergePluginSet(defaultPlugins.PreBind, customPlugins.PreBind, multiPointDisabled)
	defaultPlugins.Bind = mergePluginSet(defaultPlugins.Bind, customPlugins.Bind, multiPointDisabled)
	defaultPlugins.PostBind = mergePluginSet(defaultPlugins.PostBind, customPlugins.PostBind, multiPointDisabled)
	return defaultPlugins
}

func mergePluginSet(defaultPluginSet, customPluginSet v1beta2.PluginSet, multiDisabled []string) v1beta2.PluginSet {
	disabledPlugins := sets.NewString()
	disabledPlugins.Insert(multiDisabled...) // this is the only change here
...
}

@ahg-g @alculquicondor @Huang-Wei please let me know what you think of this proposal. I'm pretty fond of the shared benefits between administrators (ease-of-use) and developers (control over implementation/options for multiple configs/etc), and I think this accomplishes all of that with a set of minimally invasive changes.

@yuzhiquan
Copy link
Member

yuzhiquan commented Aug 24, 2021

/cc

@alculquicondor
Copy link
Member

alculquicondor commented Aug 24, 2021

What you are proposing is that we support single and multi extension point configurations, right? I don't think the extra complexity is worth the maintenance cost. There's also the problem of extra hierarchy in the YAML:

profiles:
  - plugins:
    multiPoint:
          enabled:
          - name: PodTopologySpread
          - scoringWeight: 1

But the reality is that we would have to support both methods for a while (because we have to support v1beta1 and v1beta2). So what would happen is that the internal types (in pkg/scheduler/config) would look somewhat like what you propose. But v1beta3 profiles would only have a single plugins PluginSet field. Then we do manual conversion into the internal type.
The good thing is that we can be sure that when instantiating the framework we only need to process multiPoint or individual extension points, but not both.

In general, there should be no need for a Register method, as we can simply iterate over all the interfaces and register the ones that the plugin implements. Maybe we can offer this as a default implementation (like we have the default NormalizeScore). Then the plugins that want to offer the option to disable specific extension points can implement their own registration.

@damemi
Copy link
Contributor Author

damemi commented Aug 24, 2021

What you are proposing is that we support single and multi extension point configurations, right? I don't think the extra complexity is worth the maintenance cost

I think both methods should always be supported. imo, this change is not very complex.

Maintaining individual config options is important for development and debugging. Also given that the internal representations will remain the same no matter what we do, I don't see the benefit to removing the current config.

There's also the problem of extra hierarchy in the YAML

Isn't the hierarchy at the same level as all the other extension points? This doesn't change the yaml in that sense

The good thing is that we can be sure that when instantiating the framework we only need to process multiPoint or individual extension points, but not both.

I think we should still process both, say a user wants to disable all scoring plugins. With the current config, they can disable: "*". But if we only have multi-point, there are new requirements:

  1. Each plugin needs to implement disabled-scoring options
  2. Each of those now needs to be manually configured to disable their scoring

number 1 we can take the time to implement for in-tree, but there's no guarantee for 3rd party plugins. But for number 2, we have now actually added complexity to the user's config.

In general, there should be no need for a Register method, as we can simply iterate over all the interfaces and register the ones that the plugin implements

I considered this, but the main problem with this approach is you can't differentiate between when to report an error that the plugin does not implement the matching interface, and when to just ignore it. We should maintain this error reporting.

Also, giving a specific extension point makes this feature more "opt-in" for developers, and provides a concise location to control different configurations (like you mentioned in kubernetes/kubernetes#102303 (comment)). Also consider the possibility that some plugins may wish to define a set of disjoint configurations.

@damemi
Copy link
Contributor Author

damemi commented Aug 25, 2021

I took my proposal from above and put it into a WIP PR for this enhancement: #2899

I also tried to include some of the alternatives @alculquicondor mentioned, and I'm happy to discuss this further at tomorrow's sig meeting

@ahg-g
Copy link
Member

ahg-g commented Sep 2, 2021

This is going directly into beta under v1beta3 scheduler component config api.

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Sep 3, 2021

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 3, 2021
@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Sep 3, 2021

Hello @damemi, 1.23 Enhancements shadow here. Just checking in as we approach enhancements freeze on Thursday 09/09. Here's where this enhancement currently stands:

  • KEP file using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable
  • KEP has a test plan section filled out.
  • KEP has up to date gradution criteria.
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this enhancement, it looks like we need the following to be updated in the PR #2899:

  • KEP status marked as implementable.
  • A completed PRR for the beta release merged into k/enhancements (since I see in the latest comments that KEP is directly going into beta)
  • And finally the KEP merged into k/enhancements by the enhancements freeze

Also, can you update this issue's description when you get a chance to make it easier for us to track. (i.e. correct beta release target)

Thanks!

@Priyankasaggu11929 Priyankasaggu11929 added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Sep 3, 2021
@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Sep 7, 2021

Hello @damemi, 1.23 Enhancements shadow here. Just checking in once again as we approach enhancements freeze on Thursday 09/09/2021.

For this enhancement, we still need the following to be updated:

  • Update issue description with the appropriate latest Enhancements target
  • Update KEP status from provisional to implementable in the PR KEP-2891: Simplified scheduler plugin config #2899
  • A completed PRR for the beta release merged into k/enhancements (since I see in the latest comments that KEP is directly going into beta)
  • KEP merged into k/enhancements by the enhancements freeze

Also, could we add some more information or pointers here in the issue or the KEP doc itself, explaining why this Enhancement is targeted directly at Beta & not alpha, and whether it is already implemented as part of some other feature?

That would be helpful to understand this section better. Thank you.

#### Alpha

- N/A (this feature is targeted directly at beta in the `v1beta3` scheduler API)

Thanks

@damemi
Copy link
Contributor Author

damemi commented Sep 7, 2021

@Priyankasaggu11929 targeting directly to beta was mentioned by @ahg-g above, but for more detail, this change is going into an API which already exists in beta level. So, there is no alpha for this feature. Is that okay?

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Sep 7, 2021

@damemi

but for more detail, this change is going into an API which already exists in beta level. So, there is no alpha for this feature. Is that okay?

Yes. That adds some more context to it. Thank you.

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Sep 9, 2021

@damemi, thanks so much for the changes. With the KEP PR merged now, this enhancement is ready for v1.23 enhancements freeze.

Just one bit, if I understand correctly, the enhancement is graduating to beta this release. So, this line in kep.yaml file might need updating from 1.24 to 1.23. Thank you!

@nate-double-u
Copy link

nate-double-u commented Sep 17, 2021

Hi @damemi 👋 1.23 Docs shadow here.

This enhancement is marked as Needs Docs for the 1.23 release.

Please follow the steps detailed in the documentation to open a PR against the dev-1.23 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thu November 18, 11:59 PM PDT.

Also, if needed take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thanks!

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Nov 8, 2021

Hello @damemi 👋

Checking in once more as we approach 1.23 code freeze at 6:00 pm PST on Tuesday, November 16.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.
  • Have a documentation placeholder PR open by Thursday, November 18.

As always, we are here to help should questions come up.

Thank you so much! 🙂

@damemi
Copy link
Contributor Author

damemi commented Nov 8, 2021

@Priyankasaggu11929 thanks for the reminder, will open one soon

@damemi
Copy link
Contributor Author

damemi commented Nov 11, 2021

Docs PR opened at kubernetes/website#30442

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Nov 12, 2021

Thanks so much @damemi. Could you please link the k/k PR(s), and K/website docs PR links in the issue description. It would help track all the PR(s) properly.

Just want to be very sure that I don't miss any PRs. Thank you once again! 🙂

@damemi
Copy link
Contributor Author

damemi commented Nov 30, 2021

All PRs tracked in this issue for Alpha have merged 🎉

@alculquicondor
Copy link
Member

alculquicondor commented Nov 30, 2021

Shouldn't it say beta? We released a new beta API and there is no feature gate

@damemi
Copy link
Contributor Author

damemi commented Nov 30, 2021

@alculquicondor
Copy link
Member

alculquicondor commented Dec 1, 2021

Thanks for updating

@gracenng gracenng added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Jan 9, 2022
@k8s-triage-robot
Copy link

k8s-triage-robot commented Apr 9, 2022

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2022
@ahg-g
Copy link
Member

ahg-g commented Apr 9, 2022

I think we can declare this done and will graduate to GA with component config.

/close

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Apr 9, 2022

@ahg-g: Closing this issue.

In response to this:

I think we can declare this done and will graduate to GA with component config.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Projects
None yet
Development

No branches or pull requests

9 participants