-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I autoscale Prometheus shards using HPA? #4946
Comments
Thanks for the detailed report!
Hmm I would need to look more into the details of how HPA works.
This is a very good question. Even if #4735 implements the scale subresouce, it's probably not doing what we want since as you noted, it's going to add more replicas instead of more shards. To be honest, we need to review your use case in more depth. And I think it will become even more pressing with the agent CRD... |
We've been discussing it offline with @slashpai and came to the conclusion that #4735 had to be reverted for now (see #4952). The plan is rather to re-implement the scale subresource with the following changes:
@SHEELE41 does it make sense from your point of view? |
Thanks you for answering @simonpasquier.
That's exactly what I wanted! My opinion As you said at #4952, I also thought This may solve the problem that HPA can't get metrics but, as you know, this will be not enough yet to make HPA scale the number of shards rather than the number of replicas. However, I don't think even scaling the number of replicas will work properly because Prometheus CR's
HPA will try to make Additionally, I think If not, user should create HPA per StatefulSet for scaling the number of replicas, but this HPA will scale only pods in target StatefulSet. (when only My use case
|
My idea is that scaling should only work for shards. There's little incentive to scale the number of replicas IMHO: increasing this number is never going to spread the load. I think it could work if we add a
|
Oh, I see. Then I think we can deal with this problem in the way you suggested. By the way, have you any plan of implementation that makes Prometheus pods(not CR) have the same labels specified in Prometheus CR?
I think both cases need to write some code in makeStatefulSetSpec func for PodTemplateSpec. In general, user will be able to scale by just adding [/pkg/prometheus/statefulset.go:683] // In cases where an existing selector label is modified, or a new one is added, new sts cannot match existing pods.
// We should try to avoid removing such immutable fields whenever possible since doing
// so forces us to enter the 'recreate cycle' and can potentially lead to downtime.
// The requirement to make a change here should be carefully evaluated.
podSelectorLabels := map[string]string{
"app.kubernetes.io/name": "prometheus",
"app.kubernetes.io/managed-by": "prometheus-operator",
"app.kubernetes.io/instance": p.Name,
"prometheus": p.Name,
shardLabelName: fmt.Sprintf("%d", shard),
prometheusNameLabelName: p.Name,
} However, most users will not know what value Prometheus CR's Therefore, I wonder if you have a plan to provide this feature. I really appreciate your help. :) |
We don't want to expose a
What do you think? |
Oh, okay! Is there something I can help you with? :) |
I have been working on this, will try to submit patch soon as possible :) |
Well noted :) |
Glad you all are looking into this - just adding in our chime of support. We were just looking into using Keda (https://keda.sh/docs/2.8/concepts/scaling-deployments/#scaling-of-custom-resources) to scale the PrometheusOperator resource shards as well... |
Any news on that? |
@tlorreyte Unfortunately I didn't get much time to look further into this. Would you want to contribute for this change? |
@slashpai did you manage to work on this further? I don't mind helping, is there a branch you are working on? |
@Migueljfs Please feel free to work on the issue. I didn't get enough time to work on this. |
It would be amazing to be able to make shard running on us-east-1a scrap targets only in this own AZ; this would reduce a lot of the data transfers between AZs |
Indeed :) One of the ideas we have in #5495, would be awesome to start discussions at some point! |
What did you do?
I already read #3130, #2590 and saw that Prometheus shards can be autoscaled via the HPA.
I want to increase only 'shards' value, not 'replicas' value, when overall cpu usage of Prometheus shards(pods) is increasing.
(Because I want each Prometheus object to scrape metric from targets mutual exclusively.)
So, I used kubernetes-sigs/metrics-server for cpu usage based scaling, and wrote a HPA manifest file with 'Prometheus' CRD as the target.
However, this HPA could not get cpu usage of target, so the target CPU usage remained unknown.
I realized it stands to reason because 'Promethues' CRD not implements labelselector /scale subresource.
In Prometheus-operator, StatefulSet is created per shard and I checked that each id of the shard is pre-written in each StatefulSet manifest by running kubectl edit.
(In my case... shards=3 & replicas=1)
So even if I write the HPA manifest file to target each StatefulSet, it won't work as I expected.
In conclusion,
FYI, I'm using Prometheus to record external server's metrics, not my kubernetes cluster's metrics.
Environment
Prometheus Operator version:
v0.58.0
Kubernetes version information:
v1.24
The text was updated successfully, but these errors were encountered: