Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s EKS Scheduler possibly not respecting node score for nodeAffinity & nodeAnti-Affinity #88112

Open
martinwoods opened this issue Feb 13, 2020 · 5 comments

Comments

@martinwoods
Copy link

@martinwoods martinwoods commented Feb 13, 2020

What happened:
node has the following label and taint
Labels: lifecycle=on-demand
Taints: on-demand=true:PreferNoSchedule

number of stateful apps that we'd PREFER that they run on on-demand, example stateful app could be elasticsearch where the Affinity is as:

client:
  tolerations:
    - key: "on-demand"
      operator: "Equal"
      value: "true"
      effect: "PreferNoSchedule"
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      preference:
        matchExpressions:
        - key: lifecycle
          operator: In
          values:
          - on-demand
        - key: lifecycle
          operator: NotIn
          values:
          - ec2spot 

Running 6 on-demand and 6 spot in my EKS cluster

If we reduce the number of on-demand from 6 to 3, we're left with 3 on-demand and 6 spot, which is what we want

But my problem is that some of my stateful apps end up on spot, which is NOT what we want.

What you expected to happen:
For the scheduler to see that the node score is higher for the on-demand nodes and bind the stateful pods to those nodes based on the affinity below as per the pods yaml

kubectl get pod ???-???-es-elasticsearch-client-??????-????? -o yaml | grep -iA12 'affinity'
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - on-demand
          - key: lifecycle
            operator: NotIn
            values:
            - ec2spot
        weight: 100
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              app: elasticsearch
              component: client
              release: ???-???-es
          topologyKey: kubernetes.io/hostname
        weight: 1

How to reproduce it (as minimally and precisely as possible):
Set taints in terraform config for infra deployment
Config yaml in helm deployment with affinities & tolerations
Use the AWS console to add/remove spot and ondemand nodes via ASG's

Anything else we need to know?:

Environment: AWS EKS

  • Kubernetes version (use kubectl version): 1.14 (This is the latest supported EKS)
  • Cloud provider or hardware configuration: AWS
@athenabot

This comment has been minimized.

Copy link

@athenabot athenabot commented Feb 13, 2020

/sig node
/sig cloud-provider

These SIGs are my best guesses for this issue. Please comment /remove-sig <name> if I am incorrect about one.

🤖 I am a bot run by vllry. 👩‍🔬

@martinwoods

This comment has been minimized.

Copy link
Author

@martinwoods martinwoods commented Feb 13, 2020

/sig scheduling

@martinwoods martinwoods changed the title K8s EKS Scheduler possibly not respecting node score for nodeAffinity & nodeAffinity K8s EKS Scheduler possibly not respecting node score for nodeAffinity & nodeAnti-Affinity Feb 13, 2020
@martinwoods

This comment has been minimized.

Copy link
Author

@martinwoods martinwoods commented Feb 14, 2020

I attempted to create a priorityClass to see if it would help the scheduler treat the stateful pods differently

cat priorityclass.yaml

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-stateful
value: 1000001
globalDefault: false
description: "Use this class for stateful service pods only mongodb|elasticsearch|redis."

apply it:

k apply -f priorityclass.yaml
priorityclass.scheduling.k8s.io/high-priority-stateful created

then picked elasticsearch.data to test this priorityClass by setting my yaml as per the helm chart data.priorityClassName:

data:  priorityClassName: high-priority-stateful 

When I run:

kubectl get pods -n spt-dev -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priorityClassName | grep 'high-priority-stateful'

NAME                                                            PRIORITY
???-???-es-elasticsearch-data-0                                 high-priority-stateful

But once again when I test my scenario I still end up with pods binding to spot:

for n in $(k get no -l lifecycle=ec2spot --no-headers | cut -d " " -f1) ; do kubectl get pods --all-namespaces --no-headers --field-selector spec.nodeName=${n}; done | egrep 'mongo|elasticsearch|redis'

???-???          ???-???-mongodb-backup-mgob-0                       1/1   Running   0     60m
???-???          ???-???-es-elasticsearch-master-1                   1/1   Running   0     59m
???-???          ???-???-redis-slave-1                               2/2   Running   0     55m
@ahg-g

This comment has been minimized.

Copy link
Member

@ahg-g ahg-g commented Feb 14, 2020

Copying the response I posted on slack:

The default scheduler is configured to prefer nodes with lower utilization, which I think is conflicting with the preferred affinity you are setting. You can fix this by configuring the scheduler to give a higher weight to pod/node affinity priorities.

In general, I think that all priorities that act on pod-specific configuration (affinities and the new spread) should by default be configured with a higher weight than the rest.

@hprateek43

This comment has been minimized.

Copy link
Member

@hprateek43 hprateek43 commented Feb 17, 2020

@ahg-g Do you recommend modifying the default scheduler with a configmap or adding a new scheduler with the desired weights and add them to the deployments. Does changing the default weights affect the system pods in a negative way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.