Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement "queue-sort" extension point for scheduling framework #77529

Conversation

draveness
Copy link
Contributor

@draveness draveness commented May 7, 2019

What type of PR is this?

/kind feature
/priority important-soon
/sig scheduling

What this PR does / why we need it:

Implement "queue-sort" extension point for scheduling framework

Which issue(s) this PR fixes:

Fixes #77524

KEP: kubernetes/enhancements#624

Does this PR introduce a user-facing change?:

Support "queue-sort" extension point for scheduling framework

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 7, 2019
@draveness
Copy link
Contributor Author

draveness commented May 7, 2019

/assign @bsalamat @misterikkit

This is a preliminary implementation for scheduling framework. I'll add more unit tests after the design is good to go :)

@draveness draveness force-pushed the feature/add-queuesort-extension-point branch 3 times, most recently from 2df3136 to 57f14f5 Compare May 7, 2019 04:54
type QueueSortPlugin interface {
Plugin
// Less are used to sort pods in the scheduling queue.
Less(*internalqueue.PodInfo, *internalqueue.PodInfo) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to name it Sort?

Copy link
Contributor Author

@draveness draveness May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the KEP definition:

These plugins are used to sort pods in the scheduling queue. A queue sort plugin essentially will provide a "less(pod1, pod2)" function. Only one queue sort plugin may be enabled at a time.

And the sort package in golang also prefers Less in this scenario, e.g. func Slice(slice interface{}, less func(i, j int) bool)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, understand.

type QueueSortPlugin interface {
Plugin
// Less are used to sort pods in the scheduling queue.
Less(*internalqueue.PodInfo, *internalqueue.PodInfo) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins will not be able to use internalqueue.PodInfo if they import scheduler's code. We should probably move PodInfo outside of the internal.

cc/ @misterikkit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move the PodInfo into this file if it is appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd to move PodInfo here, but since we want plugins to use it, I guess we don't have any other option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of the scope of this pr, but @bsalamat does the same logic apply for NodeInfoSnapshot used here, which is also under internal?

Copy link
Contributor Author

@draveness draveness May 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd to move PodInfo here, but since we want plugins to use it, I guess we don't have any other option.

Moved LessFunc and PodInfo to the framework interface. PTAL

@draveness draveness changed the title [WIP] feat: implement "queue-sort" extension point for scheduling framework feat: implement "queue-sort" extension point for scheduling framework May 11, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2019
@draveness
Copy link
Contributor Author

/retest

@draveness draveness force-pushed the feature/add-queuesort-extension-point branch from de393e4 to 39b37cb Compare May 12, 2019 02:43
@draveness
Copy link
Contributor Author

/retest

pkg/scheduler/internal/queue/scheduling_queue.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
pkg/scheduler/internal/queue/scheduling_queue_test.go Outdated Show resolved Hide resolved
}

// only active one at a time.
return f.queueSortPlugins[0].Less
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this approach is to support multiple plugins for QueueSort. We could rename Less function to Cmp and let it return -1, 0, or 1. A returned 0 means that the values are equal. In that case, we can call the next plugin Cmp until one returns non-zero. For now, I think we can stay with the current implementation.

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
@draveness draveness force-pushed the feature/add-queuesort-extension-point branch 2 times, most recently from d9ae14d to 58ceeda Compare May 15, 2019 01:21
@draveness draveness force-pushed the feature/add-queuesort-extension-point branch from 58ceeda to d60bccc Compare May 15, 2019 01:40
@draveness
Copy link
Contributor Author

/test pull-kubernetes-dependencies

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks, @draveness!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, draveness

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2019
@draveness
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 796ecb9 into kubernetes:master May 16, 2019
@draveness draveness deleted the feature/add-queuesort-extension-point branch May 16, 2019 08:26
@k8s-ci-robot
Copy link
Contributor

@draveness: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce d60bccc link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -69,6 +70,10 @@ func NewFramework(r Registry, _ *runtime.Unknown) (Framework, error) {
// TODO: For now, we assume any plugins that implements an extension
// point wants to be called at that extension point. We should change this
// later and add these plugins based on the configuration.
if qsp, ok := p.(QueueSortPlugin); ok {
f.queueSortPlugins = append(f.queueSortPlugins, qsp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current implementation, when len(f.queueSortPlugins) reaches 1, we don't need to append more plugin.

@apelisse
Copy link
Member

Not sure if this is this PR, but I've noticed this flake:

INFO: From Testing //pkg/scheduler/internal/queue:go_default_test:
==================== Test output for //pkg/scheduler/internal/queue:go_default_test:
==================
WARNING: DATA RACE
Write at 0x00c000334750 by goroutine 29:
  runtime.mapdelete_faststr()
      GOROOT/src/runtime/map_faststr.go:297 +0x0
  k8s.io/kubernetes/pkg/scheduler/util.(*heapData).Pop()
      pkg/scheduler/util/heap.go:113 +0x203
  container/heap.Pop()
      GOROOT/src/container/heap/heap.go:64 +0xb0
  k8s.io/kubernetes/pkg/scheduler/util.(*Heap).Pop()
      pkg/scheduler/util/heap.go:200 +0x5b
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).flushBackoffQCompleted()
      pkg/scheduler/internal/queue/scheduling_queue.go:356 +0x490
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).flushBackoffQCompleted-fm()
      pkg/scheduler/internal/queue/scheduling_queue.go:334 +0x41
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x61
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0x108
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x5a
 
Previous read at 0x00c000334750 by goroutine 28:
  runtime.mapaccess2_faststr()
      GOROOT/src/runtime/map_faststr.go:107 +0x0
  k8s.io/kubernetes/pkg/scheduler/util.(*Heap).Get()
      pkg/scheduler/util/heap.go:221 +0x11d
  k8s.io/kubernetes/pkg/scheduler/internal/queue.TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff()
      pkg/scheduler/internal/queue/scheduling_queue_test.go:327 +0xa57
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
 
Goroutine 29 (running) created at:
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).run()
      pkg/scheduler/internal/queue/scheduling_queue.go:200 +0xd0
  k8s.io/kubernetes/pkg/scheduler/internal/queue.NewPriorityQueueWithClock()
      pkg/scheduler/internal/queue/scheduling_queue.go:193 +0x9af
  k8s.io/kubernetes/pkg/scheduler/internal/queue.TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff()
      pkg/scheduler/internal/queue/scheduling_queue.go:164 +0x70
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
 
Goroutine 28 (finished) created at:
  testing.(*T).Run()
      GOROOT/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      GOROOT/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
  testing.runTests()
      GOROOT/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      GOROOT/src/testing/testing.go:1072 +0x2eb
  main.main()
      bazel-out/k8-fastbuild/bin/pkg/scheduler/internal/queue/linux_amd64_race_stripped/go_default_test%/testmain.go:124 +0x2e1
==================
--- FAIL: TestRecentlyTriedPodsGoBack (1.00s)
    testing.go:809: race detected during execution of test
FAIL
================================================================================
==================== Test output for //pkg/scheduler/internal/queue:go_default_test:
==================
WARNING: DATA RACE
Write at 0x00c00019ddd0 by goroutine 30:
  runtime.mapdelete_faststr()
      GOROOT/src/runtime/map_faststr.go:297 +0x0
  k8s.io/kubernetes/pkg/scheduler/util.(*heapData).Pop()
      pkg/scheduler/util/heap.go:113 +0x203
  container/heap.Pop()
      GOROOT/src/container/heap/heap.go:64 +0xb0
  k8s.io/kubernetes/pkg/scheduler/util.(*Heap).Pop()
      pkg/scheduler/util/heap.go:200 +0x5b
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).flushBackoffQCompleted()
      pkg/scheduler/internal/queue/scheduling_queue.go:356 +0x490
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).flushBackoffQCompleted-fm()
      pkg/scheduler/internal/queue/scheduling_queue.go:334 +0x41
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x61
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0x108
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until()
      staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x5a
 
Previous read at 0x00c00019ddd0 by goroutine 29:
  runtime.mapaccess2_faststr()
      GOROOT/src/runtime/map_faststr.go:107 +0x0
  k8s.io/kubernetes/pkg/scheduler/util.(*Heap).Get()
      pkg/scheduler/util/heap.go:221 +0x11d
  k8s.io/kubernetes/pkg/scheduler/internal/queue.TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff()
      pkg/scheduler/internal/queue/scheduling_queue_test.go:327 +0xa57
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
 
Goroutine 30 (running) created at:
  k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).run()
      pkg/scheduler/internal/queue/scheduling_queue.go:200 +0xd0
  k8s.io/kubernetes/pkg/scheduler/internal/queue.NewPriorityQueueWithClock()
      pkg/scheduler/internal/queue/scheduling_queue.go:193 +0x9af
  k8s.io/kubernetes/pkg/scheduler/internal/queue.TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff()
      pkg/scheduler/internal/queue/scheduling_queue.go:164 +0x70
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
 
Goroutine 29 (finished) created at:
  testing.(*T).Run()
      GOROOT/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      GOROOT/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      GOROOT/src/testing/testing.go:865 +0x163
  testing.runTests()
      GOROOT/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      GOROOT/src/testing/testing.go:1072 +0x2eb
  main.main()
      bazel-out/k8-fastbuild/bin/pkg/scheduler/internal/queue/linux_amd64_race_stripped/go_default_test%/testmain.go:124 +0x2e1
==================
--- FAIL: TestRecentlyTriedPodsGoBack (1.00s)
    testing.go:809: race detected during execution of test
FAIL

@apelisse
Copy link
Member

Interestingly, I haven't been able to reproduce this flake (still I think it should be looked at it looks pretty bad). I managed to trigger another one though:

==================== Test output for //pkg/scheduler/internal/queue:go_default_test (run 857 of 1000):
--- FAIL: TestPendingPodsMetric (0.00s)
    --- FAIL: TestPendingPodsMetric/add_pods_to_unschedulableQ_and_then_move_all_to_activeQ (0.00s)
        scheduling_queue_test.go:1309: ActivePods: Expected 50, got 59
        scheduling_queue_test.go:1317: BackoffPods: Expected 0, got -8
FAIL

@draveness
Copy link
Contributor Author

Interestingly, I haven't been able to reproduce this flake (still I think it should be looked at it looks pretty bad). I managed to trigger another one though:

==================== Test output for //pkg/scheduler/internal/queue:go_default_test (run 857 of 1000):
--- FAIL: TestPendingPodsMetric (0.00s)
    --- FAIL: TestPendingPodsMetric/add_pods_to_unschedulableQ_and_then_move_all_to_activeQ (0.00s)
        scheduling_queue_test.go:1309: ActivePods: Expected 50, got 59
        scheduling_queue_test.go:1317: BackoffPods: Expected 0, got -8
FAIL

It seems like there is no difference if we pass a nil framework into the initialiser which both cases indeed passed nil. So we can not get into the if branch.

	comp := activeQComp
	if fwk != nil {
		if queueSortFunc := fwk.QueueSortFunc(); queueSortFunc != nil {
			comp = func(podInfo1, podInfo2 interface{}) bool {
				pInfo1 := podInfo1.(*framework.PodInfo)
				pInfo2 := podInfo2.(*framework.PodInfo)

				return queueSortFunc(pInfo1, pInfo2)
			}
		}
	}

Could you give me some inputs on how to trigger the failure of TestPendingPodsMetric?

@apelisse
Copy link
Member

bazel test --runs_per_test=1000 //pkg/scheduler/internal/queue:go_default_test

That should trigger the failure pretty quickly

@draveness
Copy link
Contributor Author

draveness commented May 18, 2019

bazel test --runs_per_test=1000 //pkg/scheduler/internal/queue:go_default_test

That should trigger the failure pretty quickly

There's some problem with bazel on my laptop. I ran the tests 100 times with go test and did not trigger the failure.

$ go test ./pkg/scheduler/internal/queue/... -count=100
ok  	k8s.io/kubernetes/pkg/scheduler/internal/queue	191.594s

@tedyu
Copy link
Contributor

tedyu commented May 18, 2019

Using bazel, it is easy to reproduce:

https://pastebin.com/0Q2M8dHk

@draveness
Copy link
Contributor Author

draveness commented May 18, 2019

Using bazel, it is easy to reproduce:

https://pastebin.com/0Q2M8dHk

I revert the commit, and the test fails the same. I'll open a flaky test issue. #78064

@apelisse
Copy link
Member

Thanks for tracking this! Have you opened an issue for the race condition or managed to investigate it?

@draveness
Copy link
Contributor Author

Thanks for tracking this! Have you opened an issue for the race condition or managed to investigate it?

No, I didn't open an issue for the race condition problem. Do you have links to failing CI jobs?

@apelisse
Copy link
Member

The one described in #77529 (comment). A race condition could also be responsible for the other error.

@tedyu
Copy link
Contributor

tedyu commented May 22, 2019

Tried the following:

diff --git a/pkg/scheduler/internal/queue/scheduling_queue_test.go b/pkg/scheduler/internal/queue/scheduling_queue_test.go
index 0252014e83..e96c06d7aa 100644
--- a/pkg/scheduler/internal/queue/scheduling_queue_test.go
+++ b/pkg/scheduler/internal/queue/scheduling_queue_test.go
@@ -323,11 +323,13 @@ func TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff(t *testing.T) {

        // Since there was a move request at the same cycle as "oldCycle", these pods
        // should be in the backoff queue.
+       q.lock.RLock()
        for i := 1; i < totalNum; i++ {
                if _, exists, _ := q.podBackoffQ.Get(newPodInfoNoTimestamp(&expectedPods[i])); !exists {
                        t.Errorf("Expected %v to be added to podBackoffQ.", expectedPods[i].Name)
                }
        }
+       q.lock.RUnlock()
 }

 func TestPriorityQueue_Pop(t *testing.T) {

The test still fails with bazel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add QueueSort extension point for the scheduling framework
8 participants