Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA not able to find CPU, looking at unrelated pod #63615

Closed
rdzimmer opened this issue May 9, 2018 · 13 comments

Comments

Projects
None yet
7 participants
@rdzimmer
Copy link

commented May 9, 2018

What happened:
I created HPAs for several Deployments using the helm yaml below:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata: 
  name: {{template "fullname" . }}
  namespace: default
spec: 
  scaleTargetRef: 
    apiVersion: apps/v1
    kind: Deployment
    name: {{template "fullname" . }}
  maxReplicas: 5
  minReplicas: 1
  targetCPUUtilizationPercentage: 100

It works fine except for on my pod "metric" that has 3 containers in it. All containers have kubernetes resource requests and limits for CPU and memory. The problem is the HPA reports it cannot find the CPU usage.

kubectl get -o json hpa release-metric
{
    "apiVersion": "autoscaling/v1",
    "kind": "HorizontalPodAutoscaler",
    "metadata": {
        "annotations": {
            "autoscaling.alpha.kubernetes.io/conditions": "[{\"type\":\"AbleToScale\",\"status\":\"True\",\"lastTransitionTime\":\"2018-05-09T15:17:51Z\",\"reason\":\"SucceededGetScale\",\"message\":\"the HPA controller was able to get the target's current scale\"},{\"type\":\"ScalingActive\",\"status\":\"False\",\"lastTransitionTime\":\"2018-05-09T15:17:51Z\",\"reason\":\"FailedGetResourceMetric\",\"message\":\"the HPA was unable to compute the replica count: missing request for cpu on container release-event in pod default/release-event-5486b6976c-q2btv\"}]"
        },
        "creationTimestamp": "2018-05-09T15:17:21Z",
        "name": "release-metric",
        "namespace": "default",
        "resourceVersion": "16249",
        "selfLink": "/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/release-metric",
        "uid": "11bee95f-539c-11e8-a15c-005056b43c41"
    },
    "spec": {
        "maxReplicas": 5,
        "minReplicas": 1,
        "scaleTargetRef": {
            "apiVersion": "extensions/v1beta1",
            "kind": "Deployment",
            "name": "release-metric"
        },
        "targetCPUUtilizationPercentage": 100
    },
    "status": {
        "currentReplicas": 1,
        "desiredReplicas": 0
    }
}

The odd thing is the "event" pod is completely unrelated to the "metric" pod. In another environment, it's looking at another different pod "applications" that again has nothing to do with the Deployment "metric". It seems to be getting confused on what pod to look for when there are multiple containers in the Deployment.
I created another HPA against another pod with 2 containers, and again the HPA is looking for CPU for unrelated pods.

What you expected to happen:
CPU is correctly monitored for the pod and horizontally scaled.

How to reproduce it (as minimally and precisely as possible):
Create a "deployment" HPA for a pod with 2 or more containers.

Anything else we need to know?:
I tried creating an HPA against the ReplicaSet for my "metric" service and this works okay. The problem is, I'm not sure how to get the "ReplicaSet" name into the helm yaml to replace name: {{template "fullname" . }} since in my helm charts it's turning into -metric-.

Environment:

  • Kubernetes version (use kubectl version): v1.9.1+icp
  • Cloud provider or hardware configuration: vmware
  • OS (e.g. from /etc/os-release): rhel 7.4
  • Kernel (e.g. uname -a): 3.10.0-693.21.1.el7.x86_64

/kind bug
/sig autoscaling
/sig scalability

@IvanAlegre

This comment has been minimized.

Copy link

commented May 14, 2018

+1 on v1.9.7-gke.0

@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 15, 2018

I've experimented with different apiVersions, but it hasn't made a difference. For some reason the apiVersion got changed from the yaml to the actual definition. I created it with apps/v1 but it became extensions/v1beta1.

@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 15, 2018

I've hit this now with a Deployment with just 1 container per pod, so we can throw out my theory of it only being the 2+ container deployments. Again, it's reporting it wasn't able to get the CPU for a pod that isn't related to the HPA.

@ImamHyderAli

This comment has been minimized.

Copy link

commented May 21, 2018

You will have to add a line --horizontal-pod-autoscaler-use-rest-clients=false in the manifest file: kube-controller-manager.yaml which is located at: /etc/kubernetes/manifests.

Preview of Edited kube-controller-manager.yaml :

kube-controller-manager yaml

  • delete the old hpa
    $ kubectl delete hpa (hpa name)
  • and the autoscale the Deployment.
    $ kubectl autoscale deployment Deployment_Name --cpu-percent=100 --min=1 --max=5
@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 21, 2018

@ImamHyderAli Thank you for the suggestion. Unfortunately, have already benn running with the --horizontal-pod-autoscaler-use-rest-clients=false flag on my controller-manager. I'm in the process of getting kubernetes 1.10 today and will let you know if that resolves the issue.

ps -ef | grep controller-manager
root      44213  44197  2 May17 ?        02:27:36 /hyperkube controller-manager --master=https://127.0.0.1:8001 --service-account-private-key-file=/etc/cfc/conf/server.key --feature-gates=TaintBasedEvictions=true,PersistentLocalVolumes=true,VolumeScheduling=true --root-ca-file=/etc/cfc/conf/ca.crt --min-resync-period=3m --cluster-cidr=10.1.0.0/16 --cluster-signing-cert-file=/etc/cfc/conf/ca.crt --cluster-signing-key-file=/etc/cfc/conf/ca.key --use-service-account-credentials=true --kubeconfig=/etc/cfc/conf/kube-controller-manager-config.yaml --pv-recycler-pod-template-filepath-hostpath=/etc/cfc/conf/recycler.yaml --pv-recycler-pod-template-filepath-nfs=/etc/cfc/conf/recycler.yaml --v=2 --leader-elect=true --horizontal-pod-autoscaler-use-rest-clients=false
@huxiaoliang

This comment has been minimized.

Copy link

commented May 24, 2018

It is better to porting hpa from autoscaling/v1 to autoscaling/v2beta1, refer to [here ], (https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/)for details, I saw somewhere before that targetCPUUtilizationPercentage will be deleted from k8s in feature (or next release).

Notice that the targetCPUUtilizationPercentage field has been replaced with an array called metrics. The CPU utilization metric is a resource metric, since it is represented as a percentage of a resource specified on pod containers. Notice that you can specify other resource metrics besides CPU. By default, the only other supported resource metric is memory.

@yadavkkumar

This comment has been minimized.

Copy link

commented May 24, 2018

@rdzimmer You need metric server to fetch CPU usage and remove that --horizontal-pod-autoscaler-use-rest-clients=false flag from the controller-manager.

You need to specify
resources:
requests:
cpu: 200m
in your deployment yaml file
screenshot from 2018-05-24 15-09-14

After this, HPA will fetch the CPU usage from Metric Server.

@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 24, 2018

Based on the last comment I am not using horizontal-pod-autoscaler-use-rest-clients=false anymore.
Some HPAs work, but most do not. I do have non-zero CPU and memory requests and limits for all of the containers.
Here is the HPA json:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: metric-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: metric
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 50

Results in this:

kubectl get -o json hpa metric-hpa
{
    "apiVersion": "autoscaling/v1",
    "kind": "HorizontalPodAutoscaler",
    "metadata": {
        "annotations": {
            "autoscaling.alpha.kubernetes.io/conditions": "[{\"type\":\"AbleToScale\",\"status\":\"True\",\"lastTransitionTime\":\"2018-05-24T12:23:16Z\",\"reason\":\"SucceededGetScale\",\"message\":\"the HPA controller was able to get the target's current scale\"},{\"type\":\"ScalingActive\",\"status\":\"False\",\"lastTransitionTime\":\"2018-05-24T12:23:16Z\",\"reason\":\"FailedGetResourceMetric\",\"message\":\"the HPA was unable to compute the replica count: missing request for cpu on container event-observer in pod default/event-observer-66f6879df-nqs5z\"}]"
        },
        "creationTimestamp": "2018-05-24T12:22:46Z",
        "name": "metric-hpa",
        "namespace": "default",
        "resourceVersion": "186187",
        "selfLink": "/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/metric-hpa",
        "uid": "2a398998-5f4d-11e8-b961-005056b460d7"
    },
    "spec": {
        "maxReplicas": 10,
        "minReplicas": 1,
        "scaleTargetRef": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "name": "metric"
        },
        "targetCPUUtilizationPercentage": 50
    },
    "status": {
        "currentReplicas": 1,
        "desiredReplicas": 0
    }
}

The odd thing was that I have apiVersion: apps/v1 in the metric yaml file, but it turns into "apiVersion": "extensions/v1beta1", in the kubectl get -o json deployment metric. I'm looking into why and if that's a common thing between the ones that do not work.

@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 24, 2018

@ImamHyderAli

This comment has been minimized.

Copy link

commented May 25, 2018

@rdzimmer can you please provide the pods (all namespaces) that have been running in your cluster, and specify the current version of kubernetes.

@rdzimmer

This comment has been minimized.

Copy link
Author

commented May 29, 2018

@ImamHyderAli We found the issue. All of the Deployments that had HPAs had kubernetes resource requests and limits set. However, there were 2 Deployments out of the roughly 30 that did not. I didn't realize it would matter if Deployments without HPAs didn't have cpu requests. We were planning on adding requests to those Deployments, but hadn't gotten to them yet.

I'm not sure if this is a bug or working as designed? I could see others having the same trouble if you happen to miss adding resources to just one container in a big environment.

@yadavkkumar

This comment has been minimized.

Copy link

commented May 30, 2018

@rdzimmer You Need to specify that resource requests and limits in your deployment. In Kubernetes 1.10, the metric server has memory resource metric by default and all other resource type like CPU we need to specify them manually.
I think they have designed in this way only and gave the functionality to the user to specify the amount of CPU resource they require.
You can also go through the HPA Documentation

@rdzimmer rdzimmer closed this Jun 1, 2018

@pawel-furmaniak

This comment has been minimized.

Copy link

commented May 10, 2019

@rdzimmer I think your deployment selector was matching the labels of containers from other deployments in your namespace. In such case you also need to add cpu limits to those containers for the autoscaler to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.