Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vertiacal-pod-autoscaler: how to configure for integration with prometheus? #1551

Closed
piontec opened this issue Jan 3, 2019 · 14 comments
Closed

Comments

@piontec
Copy link
Contributor

piontec commented Jan 3, 2019

I'm running VPA 0.3.0 on k8s 1.10.11. I'm trying to find some docs that explain how VPA fetches metrics' history from prometheus. Which metrics and labels it expects to be present in prometheus? Currently, no matter if I give it a correct or incorrect prometheus URL in --prometheus-address, nothing is logged about prometheus queries or connections, even with --v=8.

@bskiba
Copy link
Member

bskiba commented Jan 3, 2019

I think you also need --storage=prometheus
Otherwise this defaults to reading from checkpoints on startup.
https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/recommender/main.go#L37

@piontec
Copy link
Contributor Author

piontec commented Jan 3, 2019

Thanks for the tip! Does it mean that VPA uses either CRD checkpoints or it stores checkpoints in prometheus (and reads them back from there), but not both at the same time? Does it read any custom metrics from prom or uses it only as a storage for data it got previously from metrics-server?

@bskiba
Copy link
Member

bskiba commented Jan 3, 2019

VPA uses checkpoints or history from Prometheus to initialise on startup. With checkpoints, it writes them periodically to etcd. With Prometheus it doesn't store any metrics, but assumes Proemetheus is already set up to gather pod metrics from the cluster. It reads container_cpu_usage_seconds_total and container_memory_usage_bytes.
You can take a look at this file to see how it is done in more detail.

@bskiba
Copy link
Member

bskiba commented Jan 8, 2019

Any luck? If you managed to set this up successfully, it would be extremely helpful if you wrote a short note on how you set it up that we could put in the docs.

@piontec
Copy link
Contributor Author

piontec commented Jan 8, 2019

Sorry, for now I left it running without prom. But maybe I can test this in the next few days.

@wyb1
Copy link
Contributor

wyb1 commented Jan 22, 2019

I setup the VPA to use prometheus. I set --storage=prometheus and --prometheus-address=http://prometheus.default.svc.cluster.local:9090 Looking at the logs it looks like the vpa-recommender is using prometheus.
@bskiba is there any reason you are using container_memory_usage_bytes over container_memory_working_set_bytes? I found that container_memory_working_set_bytes delivers more accurate results regarding memory usage.

@bskiba
Copy link
Member

bskiba commented Jan 24, 2019

@wyb1 This is good to know, would you maybe have time to add a short note on how to set this up in the FAQ? It would be very helpful to other users!

wrt the metric used, no specific reason I know of, I know the CPU metrics is also not used in a correct way (#1501). Talking to someone knowledgable about prometheus @ k8s to sort this out is on my TODO list, but couldn't get to it yet :(

@wyb1
Copy link
Contributor

wyb1 commented Jan 25, 2019

@bskiba no problem, I can add it to the FAQ.
Just for clarification, if I configure the VPA to use prometheus it only uses prometheus for the history. So it creates the the checkpoints using metrics that prometheus gets from cadvisor. Is this correct?

@bskiba
Copy link
Member

bskiba commented Jan 29, 2019

@wyb1

Just for clarification, if I configure the VPA to use prometheus it only uses prometheus for the history.

This part is correct. It only uses prometheus for history

So it creates the the checkpoints using metrics that prometheus gets from cadvisor. Is this correct?

After fetching history, VPA will also start gathering the metrics real time from the Kubernetes metrics API. If you have a working setup where the metrics api is backed by Prometheus fetching metrics from cadvisor, then this will come from there. Checkpoints are created based on history + observed metrics.

@panza24
Copy link

panza24 commented Sep 3, 2021

I setup the VPA to use prometheus. I set --storage=prometheus and --prometheus-address=http://prometheus.default.svc.cluster.local:9090 Looking at the logs it looks like the vpa-recommender is using prometheus.
@bskiba is there any reason you are using container_memory_usage_bytes over container_memory_working_set_bytes? I found that container_memory_working_set_bytes delivers more accurate results regarding memory usage.

How did you set it?

My yaml looks like this:

apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
name: hamster-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hamster
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 50Mi
maxAllowed:
cpu: 1
memory: 500Mi
controlledResources: ["cpu", "memory"]

@mohang6770
Copy link

where about do you set these values..could anyone give an example yaml file

I setup the VPA to use prometheus. I set --storage=prometheus and --prometheus-address=http://prometheus.default.svc.cluster.local:9090 Looking at the logs it looks like the vpa-recommender is using prometheus.
@bskiba is there any reason you are using container_memory_usage_bytes over container_memory_working_set_bytes? I found that container_memory_working_set_bytes delivers more accurate results regarding memory usage.

@nc-gcz
Copy link

nc-gcz commented Nov 25, 2022

@mohang6770, I think it's here (taken from Fairwinds VPA Helm chart): https://artifacthub.io/packages/helm/fairwinds-stable/vpa#utilize-prometheus-for-history

recommender:
  extraArgs:
    prometheus-address: |
      http://prometheus-operator-prometheus.prometheus-operator.svc.cluster.local:9090
    storage: prometheus

Adjust the addess to match your Prometheus instance.

@mohang6770
Copy link

mohang6770 commented Nov 25, 2022 via email

@mohang6770
Copy link

mohang6770 commented Nov 25, 2022 via email

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
* Basic E2E test for pod groups

Change-Id: If8066a7e34bc3fae251342b3f9f2a2ab8e2d6bfe

* Skip pod groups E2E tests in 1.26

Change-Id: I05bd8757d50eb5b6ae6b12540f9c8a0356db9f3e

* Review

Change-Id: I1747ae8bdd5f2283930b9360b5764239d1310ae7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants