Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable adaptive scoring in ImageLocalityPriority #65745

Merged
merged 3 commits into from Jul 13, 2018

Conversation

@silveryfu
Copy link
Contributor

silveryfu commented Jul 3, 2018

What this PR does / why we need it:

This PR replaces the original, pure image-size based scoring to an adaptive scoring scheme. The new scoring scheme considers not only the image size but also its "spread" - the definition of "spread" is described in what follows:

Given an imagei, spread_i = num_node_has_i / total_num_nodes

And the image receives the score: score_i = size_i * spread_i, as proposed by @resouer. The final node score is the summation of image scores for all images found existing on the node that are mentioned in the pod spec.

The goal of this heuristic is to better balance image locality with other scheduling policies. In particular, it aims to mitigate and prevent the undesirable "node heating problem", i.e., pods get assigned to the same or a few nodes due to preferred image locality. Given an image, the larger spread it has the more image locality we can consider for it - since we can expect more nodes having this image.

The new image state information in scheduler cache, enabled in this PR, allows other potential heuristics to be explored.

Special notes for your reviewer:

@resouer

Additional unit tests are WIP.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot requested review from bsalamat and timothysc Jul 3, 2018

@silveryfu silveryfu changed the title Enable adaptive scoring for ImageLocalityPriority Enable adaptive scoring in ImageLocalityPriority Jul 3, 2018

@silveryfu

This comment has been minimized.

Copy link
Contributor Author

silveryfu commented Jul 3, 2018

Looking forward to more discussions over this issue - balancing data locality and scheduling quality has always been a central (and fun) topic 🔢

@idealhack

This comment has been minimized.

Copy link
Member

idealhack commented Jul 3, 2018

/ok-to-test
/assign @k82cn @resouer

@silveryfu silveryfu force-pushed the silveryfu:image-locality-scoring branch 4 times, most recently from b4679d5 to cad55e9 Jul 3, 2018

@bsalamat
Copy link
Contributor

bsalamat left a comment

Thanks, @silveryfu. I have some early comments.

// If existing images' total size is larger than max, just make it highest priority.
return schedulerapi.MaxPriority
}

return int((int64(schedulerapi.MaxPriority) * (sumSize - minImgSize) / (maxImgSize - minImgSize)) + 1)
return int((int64(schedulerapi.MaxPriority) * (sumScores - minThreshold) / (maxThreshold - minThreshold)) + 1)

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

Why do we add 1 at the end? It doesn't seem to be necessary. If we don't add 1, we can simplify the body of the function. We can remove the switch and change the above conditions to:

if sumScores < minThreshold {
   sumScores = minThreshold
}
if sumScores > maxThreshold {
   sumScores = maxThreshold
}

This comment has been minimized.

@silveryfu

silveryfu Jul 3, 2018

Author Contributor

Thank you Bobby for the comments - I agree with you on this change.

// Usage counter
Usage int
// Used to track the total number of nodes
TotalNumNodes int

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

Why do you need to keep this information for every image? It is available from the cache and is the same for all images in a cluster.

This comment has been minimized.

@silveryfu

silveryfu Jul 3, 2018

Author Contributor

For clarity, Usage or NumNodes (the suggested name in the next comment) is varied from image to image. The TotalNumNodes is not, but:

"It is available from the cache and is the same for all images in a cluster."

The priority function needs a way to get the TotalNumNodes information. And the nodeInfo struct passed to the needs to carry/piggyback this information - it would be great to figure out a better alternative to do so. (Transient information seems to serve other purposes).

Just as a side note, I find the key to this updated scoring function is that it considers not only node local features (nodeInfo) but also cluster-level features (i.e., NumNodes and TotalNumNodes).

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

I think the best option is to add the total number of nodes to meta information of the priority functions. Then it will be available.

This comment has been minimized.

@silveryfu

silveryfu Jul 4, 2018

Author Contributor

Got it! I will move TotalNumNodes to meta info - Thanks :)

// Size of the image
Size int64
// Usage counter
Usage int

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

Usage is not a good name. IIUC, this is the number of nodes with the image. In that case, something similar to NumNodes or NumInstalledNodes would be better.

This comment has been minimized.

@silveryfu

silveryfu Jul 3, 2018

Author Contributor

Got it - I agree, let's rename it to NumNodes and TotalNumNodes.

nodeNameToInfo[name] = info.Clone()
clone := info.Clone()
// Add image state information to the clone for all images in the node
for image := range info.imageStates {

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

Please note that the outer loop iterates over all the nodes in the cluster. Why do you need to do this for each node?
And, this happens at the beginning of every schedule cycle. Given that the information is needed only to run a scoring function, it should be fine if it is not precise. So, using live data from the cache should be fine and we don't need to clone the information at the beginning of every scheduling cycle.

This comment has been minimized.

@silveryfu

silveryfu Jul 3, 2018

Author Contributor

I agree any potential overhead added to the main scheduling loop should be reasoned about carefully - I had it in mind :)

  1. The part I am concerned (may be unnecessary) is concurrent access. If we do not do a "clone" here for the image, then reading the imageStates needs to acquire the mutex of the scheduler cache. Currently the mutex is acquired here only:

https://github.com/silveryfu/kubernetes/blob/cad55e9f27771a64895a7e95b77ea7651c546925/pkg/scheduler/core/generic_scheduler.go#L124

  1. Here we are not cloning but instead making an ImageStateSummary for each image state (for each image, this generation happens only once with imageStateSummaries := make(map[string]*ImageStateSummary)). The reason is that the original imageState struct can have information we do not need and we need aggregated information - aggregation is relatively cheap here (just len()s).

  2. I agree approximated results may be good enough (for clarification, this would allow the imageStates data aggregation happen outside the scheduling loop).

The key difference seems to be "lazy aggregation" vs. "active aggregation". Lazy aggregation is to postpone the aggregation until the scheduling loop (which is the most accurate); whereas active aggregation is less accurate (some nodeInfo will not have the most updated NumNodes and NumTotalNodes when they are cloned). Active aggregation should have little overhead on the scheduling loop as all the imageStateSummaries will be computed beforehand.

This comment has been minimized.

@resouer

resouer Jul 4, 2018

Member

In one of my recent PRs, I changed the lock in scheduler cache to RWMutex which should improve multi-reader case if it concerns you. While the point of priority is only "best effort" really make sense here :D

This comment has been minimized.

@silveryfu

silveryfu Jul 6, 2018

Author Contributor

:) That's really nice @resouer ! Though multi-reader does not concern me - it is the writers to the cache who should never block the readers and yet it is not possible w/o cloning.

@resouer @bsalamat Based on the discussion above, I will revise it to doing "active aggregation" - that is, creating imageStateSummaries each time we SetNode() and passing its reference via NodeInfo (and TotalNumNodes through meta).

This will avoid additional cloning in the scheduling cycle (at the cost of some accuracy - depending on how long the nodes refresh themselves - btw, how long that is by default?)

I will do the revision this weekend. Thank you for the comments!

This comment has been minimized.

@resouer

resouer Jul 7, 2018

Member

Node heartbeat interval is 10s, and will be refactored to on demand in the future. Should be OK.

@@ -406,7 +451,11 @@ func (cache *schedulerCache) UpdateNode(oldNode, newNode *v1.Node) error {
if !ok {
n = NewNodeInfo()
cache.nodes[newNode.Name] = n
} else {
cache.removeNodeImageUsage(n.node)

This comment has been minimized.

@bsalamat

bsalamat Jul 3, 2018

Contributor

Don't you need to remove and re-add images when an existing node is updated?

This comment has been minimized.

@silveryfu

silveryfu Jul 3, 2018

Author Contributor

I agree - this is what line#454-#455 does: "remove and re-add images when an existing node is updated."

The original UpdateNode function seems to have a gratuitous argument: "oldNode" - shall we remove it?

@silveryfu

This comment has been minimized.

Copy link
Contributor Author

silveryfu commented Jul 3, 2018

The main design choices:

  1. The new scoring function needs the following information: (1) node-level information about the image (i.e., whether the image is in the node); (2) cluster-level information about the image (e.g., how many nodes are using this image and how many nodes in total); (3) image size. Among these, (2) is the tricky one because we need to track the image states in schedulercache and piggyback the information in the nodeInfo and pass it to the priority function. To this end, this PR adds cache.imageStates and replaces nodeinfo.imageSizes with nodeinfo.imageStates, in schedulercache. The cache.imageStates keeps track, for each image, of which nodes are using it, while the nodeinfo.imageStates keeps only the "aggregated" data about an image if and only if this image exists on the node.

  2. Cache.imageStates is updated accordingly upon node updates.

  3. Nodeinfo.imageStates can now be used to check whether and fetch image state information. The state information is not "materialized" until the scheduling loop for accuracy (as Bobby points out, we can have approximated results here instead, e.g., for the TotalNumNodes).

  4. In the scheduling loop, for each imageState entry in cache.imageStates, an ImageStateSummary will be created (at UpdateNodeNameToInfoMap()) by aggregating the information in imageState (e.g., sets{nodes} => len(sets{nodes} for tracking how many nodes have this image). Each nodeInfo will complete the entries in their nodeInfo.imageStates for each image they have. Note that it is not feasible to pass references under concurrent access.

  5. The ImageLocalityPriority can now look at the nodeInfo.imageStates when scoring.

func calculateFinalScore(sumScores int64) int {
// Bound the score range by the minimum and maximum image sizes.
minThreshold := minImgSize
maxThreshold := maxImgSize

This comment has been minimized.

@resouer

resouer Jul 7, 2018

Member

you can rename the these two globally

This comment has been minimized.

@silveryfu

silveryfu Jul 9, 2018

Author Contributor

Done.

nodeNameToInfo[name] = info.Clone()
clone := info.Clone()
// Add image state information to the clone for all images in the node
for image := range info.imageStates {

This comment has been minimized.

@resouer

resouer Jul 7, 2018

Member

Node heartbeat interval is 10s, and will be refactored to on demand in the future. Should be OK.

// in imageStates cache. After the removal, if any image becomes free, i.e., the image
// is no longer available on any node, the image entry will be removed from imageStates.
func (cache *schedulerCache) removeNodeImageUsage(node *v1.Node) {
if node == nil {

This comment has been minimized.

@resouer

resouer Jul 9, 2018

Member

please remember to update names of these two functions as well

This comment has been minimized.

@silveryfu

silveryfu Jul 9, 2018

Author Contributor

Done.

@silveryfu silveryfu force-pushed the silveryfu:image-locality-scoring branch 5 times, most recently from 8c327e9 to 78a4ca6 Jul 9, 2018

}
return nodeNameToInfo
}

func createNodeImageStatesMap(nodes []*v1.Node) map[string]map[string]*ImageStateSummary {

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Member

Mind to add a brief comment to these two helpers as well?

This comment has been minimized.

@silveryfu

silveryfu Jul 10, 2018

Author Contributor

Done and I changed the other function's name to createImageExistenceMap.

Thanks for the reviews!

@silveryfu silveryfu force-pushed the silveryfu:image-locality-scoring branch from 78a4ca6 to f74d04f Jul 10, 2018


// Node2
// Image: gcr.io/250:latest 250MB
// Score: (250M-23M)/97.7M + 1 = 3
// Score: (250M/2-23M)/97.7M = 1

This comment has been minimized.

@bsalamat

bsalamat Jul 10, 2018

Contributor

I don't understand the numbers in this comment. Shouldn't this be (10 * (250/2 - 23)/(1000 - 23))?

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

Yes, I will replace all equations in comments with the full form (here simplified by canceling out 10 with 977). Thanks for pointing this out!

// Image: gcr.io/40:latest 40MB, gcr.io/140:latest 140MB
// Score: (40M+140M-23M)/97.7M + 1 = 2
// Image: gcr.io/40:latest 40MB, gcr.io/300:latest 300MB
// Score: ((40M+300M)/2-23M)/97.7M = 1

This comment has been minimized.

@bsalamat

bsalamat Jul 10, 2018

Contributor

The same question about this comment.

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

Fixed.

nodeNameToInfo[node.Name].SetNode(node)
nodeInfo := nodeNameToInfo[node.Name]
nodeInfo.imageStates = imageStatesMap[node.Name]
nodeInfo.SetNode(node)

This comment has been minimized.

@bsalamat

bsalamat Jul 10, 2018

Contributor

Please move line 43 before line 42 to ensure that node is set first after creation of NodeInfo.

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

Got it, will fix.

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

Just in case I miss any important point - I understand the rationale here being "it makes more sense to set node and then add imageStates" but correctness-wise shouldn't they be equivalent?

This comment has been minimized.

@bsalamat

bsalamat Jul 11, 2018

Contributor

Yes, they are equivalent in the current implementation.

for _, image := range node.Status.Images {
for _, name := range image.Names {
if _, ok := imageExistenceMap[name]; !ok {
imageExistenceMap[name] = sets.NewString(name)

This comment has been minimized.

@bsalamat

bsalamat Jul 10, 2018

Contributor

Shouldn't this be sets.NewString(node.name)?

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

Yes - thank you for catching the bug - will be extra careful with functions outside direct unit test coverage.

@@ -29,11 +32,55 @@ func CreateNodeNameToInfoMap(pods []*v1.Pod, nodes []*v1.Node) map[string]*NodeI
}
nodeNameToInfo[nodeName].AddPod(pod)
}
imageStatesMap := createNodeImageStatesMap(nodes)

This comment has been minimized.

@bsalamat

bsalamat Jul 10, 2018

Contributor

We are allocating a lot of memory to store image information for all the nodes upfront and then use that information inside the following loop and then discard all the image info. I guess, you could replace createNodeImageStatesMap with a function that receives a node and imageExistenceMap and returns map[string]*ImageStateSummary for the given node. Then, you could replace this code with the following:

imageExistenceMap := createImageExistenceMap(nodes)
for _, node := range nodes {
...
    nodeInfo := nodeNameToInfo[node.Name]
    nodeInfo.SetNode(node)
    nodeInfo.imageStates = createNodeImageStatesMap(node, imageExistenceMap)
}

This comment has been minimized.

@silveryfu

silveryfu Jul 11, 2018

Author Contributor

I agree! Thanks Bobby, I learned better taste again :)

(Also I named the new function as getXXX instead of createXXX because here the caller does not care create or get..)

@silveryfu silveryfu force-pushed the silveryfu:image-locality-scoring branch from f74d04f to 5f724cf Jul 11, 2018

@silveryfu silveryfu force-pushed the silveryfu:image-locality-scoring branch from 73b4442 to 2003a0d Jul 12, 2018

@bsalamat
Copy link
Contributor

bsalamat left a comment

/lgtm

Thanks a lot, @silveryfu!

}
return nodeNameToInfo
}

// getNodeImageStates returns the given node's image states based on the given imageExistence map.
func getNodeImageStates(node *v1.Node, imageExistenceMap map[string]sets.String) map[string]*ImageStateSummary {

This comment has been minimized.

@bsalamat

bsalamat Jul 12, 2018

Contributor

We should do our best to cover all of our code by unit and integration tests. I think we didn't have enough coverage in our existing tests for these functions. As you know, there was a bug in the next function.

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jul 13, 2018

/lgtm

Thanks for taking this up!

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 13, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, resouer, silveryfu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sumSize := totalImageSize(nodeInfo, pod.Spec.Containers)
var score int
if priorityMeta, ok := meta.(*priorityMetadata); ok {
score = calculatePriority(sumImageScores(nodeInfo, pod.Spec.Containers, priorityMeta.totalNumNodes))

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

I know that this predates your changes but it'd be better to include initContainers.

This comment has been minimized.

@bsalamat

bsalamat Jul 13, 2018

Contributor

@silveryfu and @resouer, could you guys work on a follow-up PR to address this?

This comment has been minimized.

@resouer

resouer Jul 13, 2018

Member

Thanks! Ref this comment: #65745 (comment) There will be following PR for initContainers specifically.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

Ohh sorry, did not notice it. Thanks @resouer

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 13, 2018

Automatic merge from submit-queue (batch tested with PRs 66011, 66111, 66106, 66039, 65745). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit b883f4c into kubernetes:master Jul 13, 2018

17 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation silveryfu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@@ -113,6 +140,7 @@ func (cache *schedulerCache) Snapshot() *Snapshot {
func (cache *schedulerCache) UpdateNodeNameToInfoMap(nodeNameToInfo map[string]*NodeInfo) error {
cache.mu.Lock()
defer cache.mu.Unlock()

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

Please remove this blank line

return nil
}

// addNodeImageStates adds states of the images on given node to the given nodeInfo and update the imageStates in
// scheduler cache. This function assumes the lock to scheduler cache has been acquired.
func (cache *schedulerCache) addNodeImageStates(node *v1.Node, nodeInfo *NodeInfo) {

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

I think you could just pass node.Status.Images here.

}

cache.addNodeImageStates(node, n)

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

I think SetNode could return an error rendering this cache addition invalid.

@@ -579,7 +569,6 @@ func (n *NodeInfo) SetNode(node *v1.Node) error {
}
}
n.TransientInfo = newTransientSchedulerInfo()
n.updateImageSizes()

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 13, 2018

Contributor

Why can't we set the imageStates here?

This comment has been minimized.

@silveryfu

silveryfu Jul 27, 2018

Author Contributor

Good point - so the main reason is that when we add a new node, we need to update or create both the cache.imageStates and the nodeinfo's image state summaries; and if we set the imageStates in node_info separately, then we need to traverse the new node for its images twice - one for the cache and the other for nodeinfo.

Thank you for the comments, I will address the above in follow-up PRs.

@ravisantoshgudimetla
Copy link
Contributor

ravisantoshgudimetla left a comment

Apologies for my late comments. The approach and logic LGTM. I just have few comments that could be addressed in a follow-on PR.

k8s-github-robot pushed a commit that referenced this pull request Jul 16, 2018

Kubernetes Submit Queue
Merge pull request #66224 from nikhita/fix-scheduler-panic
Automatic merge from submit-queue (batch tested with PRs 66203, 66224). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

scheduler: fix panic while removing node from imageStates cache

Currently, when I run `hack/local-up-cluster.sh`, the scheduler encounters a panic. From `/tmp/kube-scheduler.log `:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x15e9988]

goroutine 55 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x4242880, 0x877d870)
	/usr/local/go/src/runtime/panic.go:502 +0x229
k8s.io/kubernetes/pkg/scheduler/cache.(*schedulerCache).removeNodeImageStates(0xc4203dfe50, 0xc420ae3b80)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/cache/cache.go:510 +0xe8
k8s.io/kubernetes/pkg/scheduler/cache.(*schedulerCache).UpdateNode(0xc4203dfe50, 0xc420ae3b80, 0xc420415340, 0x0, 0x0)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/cache/cache.go:442 +0xcd
k8s.io/kubernetes/pkg/scheduler/factory.(*configFactory).updateNodeInCache(0xc420d2ca00, 0x4b680c0, 0xc420ae3b80, 0x4b680c0, 0xc420415340)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/factory/factory.go:794 +0x9a
k8s.io/kubernetes/pkg/scheduler/factory.(*configFactory).(k8s.io/kubernetes/pkg/scheduler/factory.updateNodeInCache)-fm(0x4b680c0, 0xc420ae3b80, 0x4b680c0, 0xc420415340)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/factory/factory.go:248 +0x52
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(0xc4209fc8f0, 0xc4209fc900, 0xc4209fc910, 0x4b680c0, 0xc420ae3b80, 0x4b680c0, 0xc420415340)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:202 +0x5d
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0x42cf8f, 0xc4215035a0, 0x0)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:552 +0x18a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc4214addf0, 0x42cad9, 0xc421598f30)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0x9c
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x81
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc421503768)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc4214adf68, 0xdf8475800, 0x0, 0x40fdd01, 0xc4215d2360)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc421503768, 0xdf8475800, 0xc4215d2360)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc420187100)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x78
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).(k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.run)-fm()
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc4209b4840, 0xc42025e370)
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/home/nraghunath/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
```

#65745 was merged recently which introduced
https://github.com/kubernetes/kubernetes/blob/c861ceb41a239a0c3c0fe767f1f02a721e54ae6b/pkg/scheduler/cache/cache.go#L506-L510

If `!ok` i.e. `state` is nil, `state.nodes` ends up in a panic.

**Release note**:

```release-note
NONE
```
@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor

ravisantoshgudimetla commented Jul 16, 2018

Since we are touching scheduler cache, it is better

  • To feature gate this plugin till all the tests are in place to find issues like #66224. (Even if this priority is not enabled by default.)
  • Write more integration, unit or e2e tests exercising this before removing the feature gate.
@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jul 16, 2018

Feature gates have the down side of bugs not getting discovered because features are never exercised, but I totally support the idea of adding more tests. #66224 shows an area that more tests are needed. We definitely need another test case to cover the scenario.

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor

ravisantoshgudimetla commented Jul 16, 2018

Feature gates have the down side of bugs not getting discovered because features are never exercised

That is true but I think, in integration tests we can enable this feature gate explicitly and make it a blocking or presubmit testcase. For example, I think TestTaintNodeByCondition integration test is an example of such a test.

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jul 16, 2018

That is true but I think, in integration tests we can enable this feature gate explicitly and make it a blocking or presubmit testcase.

True, but if the integration tests coverage for a feature was enough, they would already catch bugs in the feature. Enabling the feature gate in the integration tests helps discover bugs introduced in a feature by future developments, but is not great in finding existing bugs. When a feature is enabled in the head, it goes through a lot more testing, in various test environments and by various people who try the HEAD.
Given that the changes to this priority function was not a major change, I am not so sure about feature gating it.

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jul 17, 2018

I agree that feature gate may not be perfect for this specific case, as the goal is to expose a change in HEAD to all test cases versus "I want to test this WIP/alpha change only here and do not want to bother all the others".

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor

ravisantoshgudimetla commented Jul 17, 2018

enabling the feature gate in the integration tests helps discover bugs introduced in a feature by future developments, but is not great in finding existing bugs.

Agreed. My concern is more on the lines of having master branch more stable. Since we are breaking PRs into mutiple PR's one for modifying code and corresponding tests(e2e's or integrations) in another PR, there is a chance that we miss some edge case in first PR and by the time the test PR merges causing instability in master. That was reason I was asking for a feature gate here.

I agree that feature gate may not be perfect for this specific case, as the goal is to expose a change in HEAD to all test cases versus "I want to test this WIP/alpha change only here and do not want to bother all the others".

Agreed just that the line becomes important most of the times when we consider stability of master branch vs soak time for a feature :(.

k8s-github-robot pushed a commit that referenced this pull request Sep 2, 2018

Kubernetes Submit Queue
Merge pull request #68081 from silveryfu/image-locality-tests-new
Automatic merge from submit-queue (batch tested with PRs 63437, 68081). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Enable ImageLocalityPriority by default with integration tests

**What this PR does / why we need it**:

This PR is a follow-up to [#63842](#63842). It moves the ImageLocalityPriority function to default priority functions of the default algorithm provider and adds integration tests for the updated scheduling policy.

- Compared to [#64662](#64662), this PR does note provide e2e test due to concerns about a large image may add too much overhead to the testing infrastructure and pipeline. We should add e2e tests in the future with the use of large enough image(s) in following PRs. 

- Compared to [#64662](#64662), this PR simplifies the code changes and keeps code changes under test/integration/scheduler/.

- The PR contains a bug fix for [#65745](#65745) - caught by the integration test - where the image states are not properly cloned to the scheduler's cachedNodeInfoMap. We might split this fix into a separate PR.

The integration test covers what follows: a pod requiring a large image (~= 3GB) is submitted to the cluster and there is a single node in the cluster has the same large image; the pod should get scheduled to that node. We might also consider whether more scenarios are desired.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

Kindly ping @resouer and @bsalamat 

**Release note**:

```release-note
None
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.