Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: retrieve reservationInfo from cache first in preFilter hook #906

Conversation

KunWuLuan
Copy link
Member

@KunWuLuan KunWuLuan commented Dec 27, 2022

fix: prefilter hook get stale reservation from indexer

currently prefilter hook of resource reservation will list reservations from informer cache and construct reservationInfo to judge whether the pod match the reservation(https://github.com/koordinator-sh/koordinator/blob/main/pkg/scheduler/plugins/reservation/hook.go#L147):

		rOnNode, err := indexer.ByIndex(NodeNameIndex, node.Name)
		if err != nil {
			klog.V(3).InfoS("PreFilterHook failed to list reservations",
				"node", node.Name, "index", NodeNameIndex, "err", err)
			return
		}
		klog.V(6).InfoS("PreFilterHook indexer list reservation on node",
			"node", node.Name, "count", len(rOnNode))
		count := 0
		for _, obj := range rOnNode {
			r, ok := obj.(*schedulingv1alpha1.Reservation)
			if !ok {
				klog.V(5).Infof("unable to convert to *schedulingv1alpha1.Reservation, obj %T", obj)
				continue
			}
			// only count available reservations, ignore succeeded ones
			if !util.IsReservationAvailable(r) {
				continue
			}
			if matchReservation(pod, newReservationInfo(r)) {
				matchedCache.Add(r)
				count++
			} else {
				klog.V(6).InfoS("got reservation on node does not match the pod",
					"reservation", klog.KObj(r), "pod", klog.KObj(pod), "reason",
					dumpMatchReservationReason(pod, newReservationInfo(r)))
			}
		}

However, reservations in informer cache can be stale because in pervious cycles pods can have reserve the resouce of the reservation, which leads to a failed scheduling of the pod:

$ k get event | grep busybox-deployment-basic-58868685fc-vk92n
8m54s       Warning   FailedScheduling    pod/busybox-deployment-basic-58868685fc-vk92n    running Reserve plugin "Reservation": reservation is stale and does not match any more

How to reproduce:
A reservation with label selector ownership and a deployment:

apiVersion: scheduling.koordinator.sh/v1alpha1
kind: Reservation
metadata:
  name: reservation-demo1
spec:
  # allocateOnce: true
  template: # set resource requirements
    namespace: default
    spec:
      containers:
        - resources:
            requests:
              cpu: 100m
              memory: 100Mi
      schedulerName: koord-scheduler # use koord-scheduler
  owners: # set the owner specifications
    - labelSelector:
        matchLabels:
          app: test
  ttl: 1h # set the TTL, the reservation will get expired 1 hour later
---
apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  name: busybox-deployment-basic
  labels:
    app: busybox
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      schedulerName: koord-scheduler
    #  nodeSelector:
    #    env: test-team
      containers:
      - name: busybox
        image: busybox
        command: ["sleep", "1d"]
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: "100m"
            memory: "100Mi"

@KunWuLuan KunWuLuan force-pushed the fix-rr-get-stale-version-from-indexer branch from af00a62 to 0699156 Compare December 27, 2022 13:01
@codecov
Copy link

codecov bot commented Dec 27, 2022

Codecov Report

Base: 66.63% // Head: 66.65% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (afbf4b8) compared to base (0a02d28).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #906      +/-   ##
==========================================
+ Coverage   66.63%   66.65%   +0.01%     
==========================================
  Files         238      238              
  Lines       27675    27682       +7     
==========================================
+ Hits        18442    18451       +9     
- Misses       7965     7967       +2     
+ Partials     1268     1264       -4     
Flag Coverage Δ
unittests 66.65% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/scheduler/plugins/reservation/hook.go 79.16% <100.00%> (+0.63%) ⬆️
pkg/scheduler/plugins/reservation/plugin.go 80.83% <100.00%> (ø)
pkg/scheduler/plugins/reservation/rcache.go 89.16% <100.00%> (+0.09%) ⬆️
pkg/util/httputil/reverseproxy.go 84.84% <0.00%> (+0.53%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@eahydra eahydra changed the title fix: prefilter hook get stale reservation from indexer koord-scheduler: reservation prefilter hook get stale reservation from reservationCache first Dec 27, 2022
@eahydra
Copy link
Member

eahydra commented Dec 27, 2022

@saintube please review the PR

@eahydra
Copy link
Member

eahydra commented Dec 27, 2022

/milestone v1.2

@koordinator-bot koordinator-bot bot added this to the v1.2 milestone Dec 27, 2022
@eahydra eahydra changed the title koord-scheduler: reservation prefilter hook get stale reservation from reservationCache first koord-scheduler: retrieve reservationInfo from cache first in preFilter hook Dec 27, 2022
@KunWuLuan KunWuLuan force-pushed the fix-rr-get-stale-version-from-indexer branch 2 times, most recently from 1118d2a to 68a19cc Compare December 28, 2022 01:43
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
@KunWuLuan KunWuLuan force-pushed the fix-rr-get-stale-version-from-indexer branch from 68a19cc to afbf4b8 Compare December 28, 2022 08:40
@saintube
Copy link
Member

/lgtm

Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit b0c95a0 into koordinator-sh:main Dec 29, 2022
FillZpp pushed a commit that referenced this pull request Jan 16, 2023
…er hook (#906)

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
lucming pushed a commit to lucming/koordinator that referenced this pull request Feb 8, 2023
…er hook (koordinator-sh#906)

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants