-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PodMonitor vs ServiceMonitor what is the difference? #3119
Comments
I think ServiceMonitor should be the default choice unless you have some reason to use PodMonitor. For example, you may want to scrape a set of pods which all have a certain label which is not consistent between different services. There is some helpful discussion in the original podmonitor issue #38, but I agree that the docs could be improved here. |
The way I see it, |
@pgier @brancz,
Thanks in advance for your time. |
It will scrape all pods behind the service, because the |
Hello, I should say that I'm a little bit confused too: when talking about ServiceMonitor, I would expect my service to be monitored, not the associated Endpoints (otherwise I would have been looking for an EndpointsMonitor). This would be particularly useful when one has a highly available exporter and wants only one of the exporter's pods to be scraped each time to avoid having duplicate metrics and to avoid uselessly loading the service which the exporter requests. Without using PrometheusOperator, one would use a kubernetes_sd_config with a role of type service in this case. Prometheus would then scrape the service IP, which would randomly scrape one of the exporter's pods. That's precisely what I thought the ServiceMonitor did in the first place due to its naming (ServiceMonitor let one think that it will be translated to kubernetes_sd_config with service role). How can we achieve this with the actual ServiceMonitor and PodMonitor? Is it too late to rename the actual ServiceMonitor to EndpointsMonitor and create a “real” ServiceMonitor? |
@yann-soubeyrand scraping metric endpoints via a load balancer is an antipattern in prometheus. Best practice is to scrape directly and this is why ServiceMonitor is examining all endpoints in a Service and scraping them this way. |
@paulfantom so how do you achieve high availability for your exporters? Do you scrape all the replicas of your exporters and then get duplicated metrics and useless load on your applications (elasticsearch or postgresql exporters can be quite demanding on the server they get the metrics from)? |
Hi @paulfantom, that's a real question, I'd really like to know what's the right approach for the scraping of highly available exporters if scraping them through a load balancer is an antipattern? |
Highly available exporter is an anti-pattern in itself and that has a consequence of #3119 (comment). The best practice is to have an exporter near to an instance that is being monitored. However, if you need to do this then |
Thanks for your answer @paulfantom. In case one cannot have the exporter near the instance being monitored (like a PostgreSQL instance on AWS RDS), what's the best practice for the deployment of the exporter? Should one deploy a single instance of it and use a PodMonitor (and accept that there can be holes in the metrics when the exporter gets unavailable, which can happen for various reasons like a k8s node being drained in response to cluster scale-in)? |
In such case, you need to deploy at least 2 instances and deduplicate data on prometheus level. |
Hi @paulfantom I'm in the same situation. I have a redis exporter that cannot be close to redis itself (Azure managed redis). What is the best practice for deduping data at the prometheus level without thanos or cortex? (sorry, I've searched but only came up with thanos/cortex as a solution. I'm hopeful for a short term solution while I plan out a cortex deployment) |
No matther whether you are using Cortex or Thanos, you will always want two Prometheus instances for high availability reasons. If you don't have control over the targets themselves, then I recommend still running 1 exporter per process of a system. For query time deduplication, you can just use the Thanos querier and none of the other components, which would simplify your setup a lot if that's all you are looking for. |
yep, it helps me a lot. |
While this is an older issue I would make the argument that potentially exporters that Prometheus scrapes (like the SNMP exporter, or BlackBox exporter) are almost anti-patterns in themselves (in that the scraping occurs via a proxy). However, these are in widespread use because of their important functionality (actually the only functionality that makes Prometheus viable for us). In these cases, it makes a lot of sense for these exporters to be highly available and load-balanced as they are effectively proxies. While I agree that this load balancing is an anti-pattern I would not consider exporters a "service" (in the sense of monitoring other stuff via them, it makes sense to monitor the exporter itself but that is already covered by the service monitor). I think it would make sense to support these common use cases simply without having to setup multiple Prometheus instances and deuplication (this also enabled independent scaling of exporters and prometheus, as duplicating Prometheus instances is quite demanding in a lot of environments) with something like a ServiceMonitor that actually scrapes the service IP. Happy to hear feedback if that makes sense, but just my 2 cents :) |
I don't think that's related to this original question, or that you misunderstand how ServiceMonitors work. They use the service's endpoints directly, not the kube-proxy balancer itself. |
I am talking about a ServiceMonitor "like" resource using the service IP instead of the endpoint IPs for use case as outlined by yann-soubeyrand and discussed with brancz and paulfantom stance on this being an anti pattern and therefore wouldn't make sense to support and following on from this why I thought it would make sense as something that may be considered. I can see however that this might make more sense as a feature request rather than a comment here, but I thought it was relevant given the discussion above. |
I think that one way to monitor a service without scraping data from all its pods, is to use the uri of the service |
Although one can make a very reasonable argument that Prometheus should aim to scrape endpoints and not services, would you not say that it is at least confusing to have a CRD called "ServiceMonitor" that monitors endpoints when Prometheus configuration does support monitoring both endpoints and services independently? At least for educational purposes I'd recommend renaming it or perhaps at least provide further clarification of this fact in the ServiceMonitor documentation. |
Hi @v-pap , I am confused why prometheus need to monitor a service without scraping data. And why do we need to monitor service without data scraping? :( |
is it possible to abuse the |
What happens when pods are registered in multiple |
This can potentially happen yes. The recommendation is to narrow down with labels when this happens. |
This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions. |
PodMonitor - no service |
Going to close this out as I think this issue has outlived its usefulness. |
podmonitor don't need services ^--^ |
I have a case that hasn't been covered in the discussion and hence I wonder what's the best practice is for my case. There is a service which is scaled to hundreds of instances; there is a business metric that is exposed as a gauge and (under the hood) is stored in a database that all of the instances of the service can access, and the metric is related to a service as a whole, not to any specific instance. So, I want to let Prometheus understand that I want the scraper to make a single request to my service within a specified interval (there are different metrics that I plan to scrape hourly and others that I'd like to scrape daily). Although I read that scraping off a load balancer is a bad practice (and I kind of accept that it is when we're talking about monitoring specific instances), are there any better options for my case? |
Did you ever find a good solution for this ? I ended up using AdditionalScapeConfig but I would strongly prefer another solution |
@torbenaa nope, I did the same and also despise it. |
Hi, I'm new to prometheus-operator and I do not see difference between ServiceMonitor and PodMonitor.
Documentation describes them as:
But this does not explain when to use particular one.
I played with them a little bit. Both of them use selector rules to find pods.
So you can either define PodMonitor that search for pods with eg. label app: myAPP, or if your app is behind service that is labeled eg. app:myAPP, you create ServiceMonitor that will scrape metrics from all pods in service with this label. Both of them will produce the same result in prometheus target tab in prometheus UI.
So I would really appreciate if someone will explain to me when one should use PodMonitor and when ServiceMonitor?
The text was updated successfully, but these errors were encountered: