Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

myaser · 2024-06-20T12:17:44Z

Description

Observed Behavior:
I craeted a daemonset that has the following nodeAffinity

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia

observing karpenter logs and experimenting with scheduling pods of different sizes, I could find that karpenter's calculations for daemonset resources excludes this pod

I could confirm this by checking the code.

here it is reading only the first of the affinities relying on an outer loop to remove the first affinity and continue with the next one

but, this is not happening for daemonset calculation as shown here

to validate my findings, I flipped the affinities order and the calculations were corrected

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists

Expected Behavior:
all affinities should be considered for calculating daemonset resources

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: datalab-workloads
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  template:
    metadata:
    spec:
      kubelet:
        clusterDNS:
        - 10.0.1.100
        cpuCFSQuota: false
        kubeReserved:
          cpu: 100m
          memory: 282Mi
        maxPods: 32
        systemReserved:
          cpu: 100m
          memory: 164Mi
      nodeClassRef:
        name: datalab-workloads
      requirements:
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - g4dn.xlarge
        - g4dn.4xlarge
        - g4dn.12xlarge
        - g4dn.16xlarge
        - g4dn.metal
        - g5.xlarge
        - g5.4xlarge
        - g5.16xlarge
        - g5.24xlarge
        - g5.48xlarge
        - g3s.xlarge
        - g4dn.2xlarge
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
        - on-demand
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
        - amd64
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - eu-central-1a
        - eu-central-1b
        - eu-central-1c
      startupTaints:
      - effect: NoSchedule
        key: zalando.org/node-not-ready
      taints:
      - effect: NoSchedule
        key: dedicated
        value: datalab-workloads

Versions:

karpenter Version: 0.36.2
Kubernetes Version (kubectl version): v1.30.2

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jigisha620 · 2024-06-24T18:46:00Z

Hi @myaser, I was able to track this down. It definitely is a bug since we are not considering all affinities for calculating daemonset resources. Do you have a workaround for your use-case that can get you unblocked on this?

myaser · 2024-06-26T07:08:28Z

Hi @myaser, I was able to track this down. It definitely is a bug since we are not considering all affinities for calculating daemonset resources. Do you have a workaround for your use-case that can get you unblocked on this?

for my specific case, I just flipped the order of the affinities. since the other affinity is meant for cluster autoscaler.

since we are in a middle of migration we run both CA (legacy) and Karpenter together.

jonathan-innis · 2024-07-01T20:25:14Z

/triage accepted

myaser added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 20, 2024

myaser mentioned this issue Jun 20, 2024

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty aws/karpenter-provider-aws#6391

Closed

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 1, 2024

jonathan-innis linked a pull request Jul 2, 2024 that will close this issue

fix: Fix handling for multiple Daemonset nodeAffinity selectorTerms #1381

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

myaser commented Jun 20, 2024 •

edited

Loading

jigisha620 commented Jun 24, 2024

myaser commented Jun 26, 2024

jonathan-innis commented Jul 1, 2024

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

Comments

myaser commented Jun 20, 2024 • edited Loading

Description

jigisha620 commented Jun 24, 2024

myaser commented Jun 26, 2024

jonathan-innis commented Jul 1, 2024

myaser commented Jun 20, 2024 •

edited

Loading