Don’t ignore prometheus-operator community #27940

Morriz · 2020-10-14T09:38:14Z

You have chosen to only target prometheus and stated that annotations have become "the defacto method" for configuration, but I think most deployments in the wild use prometheus-operator, which brought a more robust "standard": ServiceMonitor CRD. I strongly advise to add a flag to be able to choose deployment of these ServiceMonitor CRs by the operator.

Benefits: no metrics merging is needed (istio proxies/components will get their own endpoints) so the user can separately view metrics for istio only. This would allow for more community efforts into the realm of istio performance observability.

howardjohn · 2020-10-14T14:53:01Z

Hi @Morriz . We have not intended to ignore, nor do I think we have, the prometheus operator community. The metrics merging is primarily intended to work around issues around configuring Prometheus for users not using the operator, as ServiceMonitor CRDs make it not required in most circumstances. We provide some ServiceMonitors that will monitor the control plane as well as all istio proxies: https://github.com/istio/istio/blob/master/samples/addons/extras/prometheus-operator.yaml. Let me know if there is anything else we can do to integrate better.

Morriz · 2020-10-14T16:31:07Z

Awesome. Did not see this before. I think it is a bit hard to navigate all the istio docs, which seem to be spread over multiple places.

But would those 2 service monitors in the example docs then cover all istio targets? And what about sane default alerting rules?

It would be nice to see support for the two prometheus scraping flavors that exist, and let the operator manage the prometheus configuration and resources. I don't think it makes sense to let the community keep up with the istio scraping endpoints.

I am not deep into the matter, but as a platform engineer, this is my concern: that I can rely on the operator to deliver prometheus metrics with sane alerting rules out of the box.

So something like this could suffice imo:

prometheus:
  enabled: false
  rulesEnabled: # any istio rules? all enabled by default
    someRule: false # override, like in prom-op
  prometheusOperator:
    enabled: true # this will result in deployment of service monitors and deliver rules as CRs
  prometheus: '' # prometheus instance to deliver metrics to

Or am I on the wrong track here? I always feel a bit dumb as an outside user, but maybe that can help simplify things as well ;)

Morriz · 2020-10-14T16:32:10Z

Of course I mean the istio-operator when I talk about "the operator" :)

howardjohn · 2020-10-14T16:38:37Z

And what about sane default alerting rules?

This is a topic orthogonal to operator vs non-operator I think(?). I agree this would be useful, I think there is some ongoing work here #24310

Awesome. Did not see this before. I think it is a bit hard to navigate all the istio docs, which seem to be spread over multiple places.

Were you looking in https://preliminary.istio.io/latest/docs/ops/integrations/? I think we need to mention the operator here - I thought we did already

cc @douglas-reid

Morriz · 2020-10-14T17:06:01Z

Yes, there is a mention of the sample servicemonitors in the docs (and one has to then go to the samples folder and inspect the manifests, but that is not addressing the concerns I just gave. Automatic deployment of these resources by istio would.

howardjohn · 2020-10-14T17:08:04Z

Per https://istio.io/latest/blog/2020/addon-rework/ we are unlikely to deploy these resources.

…

On Wed, Oct 14, 2020 at 10:06 AM Maurice Faber ***@***.***> wrote: Yes, there is a mention of the sample servicemonitors in the docs (and one has to then go to the samples folder and inspect the manifests, but that is not addressing the concerns I just gave. Automatic deployment of these resources by istio would. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27940 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXKTXZ3DNGZHEJW525DSKXLAXANCNFSM4SQW5WEQ> .

Morriz · 2020-10-14T17:23:06Z

I have not come across any reasoning against my suggestions in that blog post.

I understand not being opinionated about deploying entire solutions (like jaeger over zipkin), but seeing that istio exports prometheus metrics, it would be very beneficial to deliver related metrics scraping configuration/resources out of the box. I don't think any other solutions out there will consume these metrics. (All major container platforms have adopted it I think.) Is there anything else out there that can? Don't you agree it would serve most of us when istio would take ownership of this and automate it for us? (And not us in user space reinvent the wheel?).

I don't want to come across as pushy or anything, it's mostly my autist brain trying to see the bigger picture ;) Maybe the time is not ripe and we will see future iterations revisiting this request.

douglas-reid · 2020-10-14T17:59:39Z

@Morriz I think fundamentally this is an issue of scope (and scope creep). Istio, as a project, does not want to be responsible for Prometheus deployment and configuration, for a number of reasons. Here, you've asked for customizable control over alerting rules for the operator -- but imagine having to replicate that for non-operator installs. And someone else will (and has) asked for customization on the Prometheus resource itself (and also on the standalone installs). And others will (and have) asked for mTLS options, etc. None of these are core to Istio's installation, and would present a very real maintenance and testing burden for the Istio operator itself.

We do, however, want to make integration easy -- and providing samples of the operator configuration (and the annotations for non-operator use cases) was meant to achieve that goal. We need to provide clearer documentation and make sure that the samples are up-to-date and discoverable. Any advice/suggestions on that aspect is very welcome.

As for sane alerting, there is a long-standing issue open for developing a core set of suggested alerting rules. It'd be great to have a community-driven development of such an artifact that we could add to the samples.

Morriz · 2020-10-15T10:49:30Z

I understand now. It is quite an endeavor. Let's hope the community will step in. Good guidance helps. I can allocate time in our team to focus on this. Don't know where to deliver to though. An "istio-community" org on GitHub (like "prometheus-community" now has for charts and such) would be awesome. I don't want us to be driving that now, but maybe down the road it will become clear.

Shohou · 2020-10-21T13:48:36Z

https://preliminary.istio.io/latest/docs/ops/integrations/prometheus/
Search for "operator" gives 0 results
So finding the fact that prometheus-operator based installs doesn't work out of the box and finding samples related to it is quite a task.

arturre · 2021-01-08T18:27:04Z

before 1.8.1( I refer to 1.7.x for example) it was possible to use

  addonComponents:
    prometheusOperator:
      enabled: true

to have all the prometheus Servicemonitor CRs installed using istioctl without actual Prometheus

I agree with @Morriz that we need some machinery to install these CRs for istioctl as Prometheus community moving towards ServiceMonitor/PodMonitor CRD for scraping data

douglas-reid · 2021-01-08T22:05:15Z

I agree with @Morriz that we need some machinery to install these CRs for istioctl as Prometheus community moving towards ServiceMonitor/PodMonitor CRD for scraping data

The suggested mechanism right now is: kubectl apply -f samples/addons/extras/prometheus-operator.yaml. Are you looking for something beyond that?

stevehipwell · 2021-01-13T08:57:56Z

Sorry if this is a hijack but I think it's relevent here. Currently there is no documentation for using metric merging with the Prometheus Operator pattern (direct or federated). As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

Also I'd be interested in the pros/cons of federating Prometheus when using the operator pattern? By the looks of it if you want to run Kiali from the production ready Prometheus instance all the metrics would need federating anyway?

douglas-reid · 2021-01-13T17:42:03Z

Sorry if this is a hijack but I think it's relevent here. Currently there is no documentation for using metric merging with the Prometheus Operator pattern (direct or federated). As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

Also I'd be interested in the pros/cons of federating Prometheus when using the operator pattern? By the looks of it if you want to run Kiali from the production ready Prometheus instance all the metrics would need federating anyway?

@stevehipwell do you want to open a separate issue for the documentation you mention? that might enable better tracking/resolution. there is another open issue about mTLS configuration with the latest releases of Istio that might also be worth tracking.

stevehipwell · 2021-01-13T18:28:43Z

@douglas-reid which other issue were you referring to?

psibi · 2021-04-13T15:11:14Z

As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

I just tested this out and that doesn't work with promethus-operator. @douglas-reid Do we have any doc on how to make it work with something like service monitor pattern ? Or is the merge metrics pattern doesn't work with prometheus operator ? Eitherwise, having that information in the official pages will be very helpful.

stevehipwell · 2021-04-13T16:20:52Z

@psibi what doesn't work? Currently metric merging works fine although it'd be really useful if there was an original_container or similar label added so we could get metrics for specific containers.

psibi · 2021-04-13T16:30:34Z

@stevehipwell This is what my testing setup looks like:

I have my own prometheus instance running (via promethus-operator) and this is not part of istio service mesh. I have also applied the same service monitor and pod monitor present here: https://raw.githubusercontent.com/istio/istio/release-1.9/samples/addons/extras/prometheus-operator.yaml
I have created a new namespace with istio injection enabled. ( My istio setup has metrics merging enabled )
The namespace also has strict mTLS enabled.
Now I deployed an application on it which exposes some application specific prometheus metrics.

Now when I go to the prometheus dashboard, I can't see any application metrics. Note that I was able to get it working if I do these changes:

Use permissive mTLS
Add a service or pod monitor for the application

So my question is: Is there a way to make it work with strict mTLS using prometheus when it's not part of service mesh ?

stevehipwell · 2021-04-13T16:47:18Z

@psibi you want to look at metric merging and the sample code (although that might need modifying). Basically once enabled the proxy pod monitor will collect the merged metrics and serve them up without mTLS, you don't set service monitors for your pods.

psibi · 2021-04-13T16:50:48Z

@stevehipwell Thanks, I have enabled metrics merging in my istio setup. Although I don't see any sample code for metric merging (the sample code seems to be present for option 2).

Basically once enabled the proxy pod monitor will collect the merged metrics and serve them up without mTLS, you don't set service monitors for your pods.

Yes, I understand that. The only problem is that I don't get my application metrics. But I do seem to get istio metrics fine in strict mTLS. So I'm wondering if I'm missing something.

stevehipwell · 2021-04-14T07:53:12Z

@psibi what are the settings that you're using for metrics merging? Specifically podmonitor Istio config and application annotations.

psibi · 2021-04-14T10:19:13Z

@stevehipwell I figured out the issue and it occurred because of missing annotations in my deployment. Thanks for the help!

stevehipwell · 2021-04-14T10:41:03Z

Glad to hear you've got it sorted now @psibi.

istio-policy-bot · 2021-07-28T05:01:51Z

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-01-13. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

howardjohn transferred this issue from istio/community Oct 14, 2020

howardjohn added area/extensions and telemetry kind/enhancement labels Oct 14, 2020

douglas-reid added this to P2 in Prioritization Oct 20, 2020

douglas-reid added this to the Backlog milestone Oct 20, 2020

stevehipwell mentioned this issue Jan 13, 2021

[kube-prometheus-stack] tlsConfig support for servicemonitors and default alertingEndpoints to work with istio strict mtls prometheus-community/helm-charts#145

Closed

stevehipwell mentioned this issue Jan 14, 2021

Improved Prometheus Documentation #30063

Closed

stevehipwell mentioned this issue May 27, 2021

Add source_container label to merged metrics #33141

Closed

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 13, 2021

istio-policy-bot closed this as completed Jul 28, 2021

Prioritization automation moved this from P2 to Done Jul 28, 2021

istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jul 28, 2021

upodroid mentioned this issue Sep 1, 2021

Monitoring instructions for using ServiceMonitors istio/istio.io#10288

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don’t ignore prometheus-operator community #27940

Don’t ignore prometheus-operator community #27940

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020

Morriz commented Oct 14, 2020

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020 via email

Morriz commented Oct 14, 2020

douglas-reid commented Oct 14, 2020

Morriz commented Oct 15, 2020

Shohou commented Oct 21, 2020

arturre commented Jan 8, 2021

douglas-reid commented Jan 8, 2021

stevehipwell commented Jan 13, 2021

douglas-reid commented Jan 13, 2021

stevehipwell commented Jan 13, 2021

psibi commented Apr 13, 2021

stevehipwell commented Apr 13, 2021

psibi commented Apr 13, 2021 •

edited

stevehipwell commented Apr 13, 2021

psibi commented Apr 13, 2021

stevehipwell commented Apr 14, 2021

psibi commented Apr 14, 2021

stevehipwell commented Apr 14, 2021

istio-policy-bot commented Jul 28, 2021

Don’t ignore prometheus-operator community #27940

Don’t ignore prometheus-operator community #27940

Comments

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020

Morriz commented Oct 14, 2020

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020

Morriz commented Oct 14, 2020

howardjohn commented Oct 14, 2020 via email

Morriz commented Oct 14, 2020

douglas-reid commented Oct 14, 2020

Morriz commented Oct 15, 2020

Shohou commented Oct 21, 2020

arturre commented Jan 8, 2021

douglas-reid commented Jan 8, 2021

stevehipwell commented Jan 13, 2021

douglas-reid commented Jan 13, 2021

stevehipwell commented Jan 13, 2021

psibi commented Apr 13, 2021

stevehipwell commented Apr 13, 2021

psibi commented Apr 13, 2021 • edited

stevehipwell commented Apr 13, 2021

psibi commented Apr 13, 2021

stevehipwell commented Apr 14, 2021

psibi commented Apr 14, 2021

stevehipwell commented Apr 14, 2021

istio-policy-bot commented Jul 28, 2021

psibi commented Apr 13, 2021 •

edited