Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping labels can lead to HPA matching incorrect pods #124307

Open
adrianmoisey opened this issue Apr 14, 2024 · 9 comments
Open

Overlapping labels can lead to HPA matching incorrect pods #124307

adrianmoisey opened this issue Apr 14, 2024 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.

Comments

@adrianmoisey
Copy link

adrianmoisey commented Apr 14, 2024

What happened?

When an HPA targets a Deployment which has a label selector matching Pods that don't belong to it (overlapping labels, for example), those "other" Pods are considered by the HPA to be part of the targeted HPA.

What did you expect to happen?

I have always been lead to believe that this behaviour is correct and the user should have labels that are specific enough to not overlap other Deployments.
This is stated here:

Do not overlap labels or selectors with other controllers (including other Deployments and StatefulSets). Kubernetes doesn't stop you from overlapping, and if multiple controllers have overlapping selectors those controllers might conflict and behave unexpectedly.

However, recently the VPA fixed the same behaviour in kubernetes/autoscaler#6460

Which makes me wonder if I should expect the HPA to test the owner reference too, and only target that Deployment's pods.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create 2 deployments, both with the same labels/selectors
  2. In 1 Deployment set resources
  3. Deploy an HPA (and metrics-server)
  4. Watch the events of the HPA throw error events related to Pods not under control of the HPA's deployment
    1. ie: Warning FailedGetResourceMetric 87s (x3 over 117s) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu in container ubuntu of Pod other-64886557cb-ldtnt
YAML files
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
  app: workload
name: workload
spec:
replicas: 1
selector:
  matchLabels:
    app: other
template:
  metadata:
    labels:
      app: other
  spec:
    containers:
    - command:
      - /bin/bash
      - -c
      - sha256sum /dev/zero
      image: ubuntu
      imagePullPolicy: Always
      name: nginx
      resources:
        requests:
          cpu: 10m
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
  app: other
name: other
spec:
replicas: 1
selector:
  matchLabels:
    app: other
template:
  metadata:
    labels:
      app: other
  spec:
    containers:
    - command:
      - /bin/bash
      - -c
      - sleep infinity
      image: ubuntu
      imagePullPolicy: Always
      name: ubuntu
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: workload
spec:
scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: workload
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 50

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Cloud provider

  • kind
  • GKE
  • kOps (AWS)

OS version

No response

Install tools

kind

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@adrianmoisey adrianmoisey added the kind/bug Categorizes issue or PR as related to a bug. label Apr 14, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 14, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 14, 2024
@adrianmoisey
Copy link
Author

/sig autoscaling

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 14, 2024
@dbenque
Copy link

dbenque commented Apr 15, 2024

Since the spec is using a scaleTargetRef and not a labelSelector, I think it makes sense to scope the HPA to the deployment and not to the set of pods matching the labelSelector of that deployment. This is what is done in the VPA and also in the Deployment Controller.

This being said the documentation clearly says:

The controller manager finds the target resource defined by the scaleTargetRef, then selects the pods based on the target resource's .spec.selector labels,...

So fixing this would be a change in behaviour as discussed in the SIG. Does it really makes sense to have deployment having overlapping selectors? It is technically possible yes, but is it a good practice... probably not! This leads to so many problems, like overlapping PDBs, that result in 500 when you try to evict pods. If there is a real need for doing labelSelector to define the scope of the HPA, maybe next HPA spec should explicitly introduce that capability by offering in the spec both scaleTargetRef and labelSelector (only one can be used at a time). I am clearly not in favour of such a change, but if the need really exists that would be a clear way to address it.

I would be in favour of giving a mean to fix this issue to align the behaviour of all the controllers (VPA, Deployment). If there is a consensus for fixing, to protect backward compatibility, maybe we can propose to pilot the activation of the fix at controller (or HPA object) level using a new parameter (or annotation on the HPA). Thanks to that new parameter the user can decide on the behaviour: "strict scaleTargetRef using ownerRef" versus "labelSelector"

@adrianmoisey
Copy link
Author

I think it makes sense to scope the HPA to the deployment and not to the set of pods matching the labelSelector of that deployment. This is what is done in the VPA and also in the Deployment Controller.

As an end user, I think this is what one would "expect" when starting out on Kubernetes.
I also really like the consistency that this would add.

@sftim
Copy link
Contributor

sftim commented Apr 22, 2024

I think we might need a KEP to change the way that Kubernetes interprets the spec of a HorizontalPodAutoscaler.

VerticalPodAutoscaler is not part of the formal Kubernetes API (yet), whereas HorizontalPodAutoscaler already is.

@OmarHawk
Copy link

OmarHawk commented Apr 26, 2024

Just ran into the same issue I believe.

We have following situation:

  • one namespace app
  • two deployments in the same namespace app-frontend and app-backend
  • both deployments, replicasets and their pods have overlapping labels, like this as everything belongs sort of together:
    -- app: app
  • in the same time, they also have a different label per deployment/replica set/pod like this (just fyi):
    -- Deployment 1: app-role: app-frontend
    -- Deployment 2: app-role: app-backend
  • both have the following thing in their Deployment spec:
  selector:
    matchLabels:
      app: app
  • they have the following default replica count:
    -- app-frontend: 2
    -- app-backend: 1
  • and the following resource requests / limits (cpu) per Pod:
    -- app-frontend: 200m / 1000m
    -- app-backend: 600m / 1200m

The backend one is performing some heavy load operations in the background, causing nearly 100% utilization of that single pod in the deployment all the time (that is fine for us), while the frontend one is having near 0 load, as it is produces only load on a per request basis.

We have configured an HPA that has scaleTargetRef like this:

scaleTargetRef:
    kind: Deployment
    name: app-frontend
    apiVersion: apps/v1

So my assumption would be that metrics defined like this:

  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

would only apply to that single app-frontend Deployment as this is also the target.
From what I observe, the metrics seem to also consider the heavy-load app-backend Pods, therefore leading to undesired upscaling operations on the app-frontend Deployment. This is in my opinion nowhere documented. I have tried to find explanations for the observations I have made , but haven't found anything. Additionally, I think, this is also undesired behaviour...

btw. scaling itself happens only on the app-frontend, but not on app-backend, which is fine.

@adrianmoisey
Copy link
Author

What are the label selectors for each deployment?

@OmarHawk
Copy link

OmarHawk commented Apr 26, 2024

I've updated my description.

@OmarHawk
Copy link

After adjusting the matchLabels selectors there accordingly, it seems to work properly actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Projects
None yet
Development

No branches or pull requests

5 participants