Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don’t ignore prometheus-operator community #27940

Closed
Morriz opened this issue Oct 14, 2020 · 24 comments · May be fixed by istio/istio.io#10288
Closed

Don’t ignore prometheus-operator community #27940

Morriz opened this issue Oct 14, 2020 · 24 comments · May be fixed by istio/istio.io#10288
Labels
area/extensions and telemetry kind/enhancement lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Milestone

Comments

@Morriz
Copy link

Morriz commented Oct 14, 2020

You have chosen to only target prometheus and stated that annotations have become "the defacto method" for configuration, but I think most deployments in the wild use prometheus-operator, which brought a more robust "standard": ServiceMonitor CRD. I strongly advise to add a flag to be able to choose deployment of these ServiceMonitor CRs by the operator.

Benefits: no metrics merging is needed (istio proxies/components will get their own endpoints) so the user can separately view metrics for istio only. This would allow for more community efforts into the realm of istio performance observability.

@howardjohn howardjohn transferred this issue from istio/community Oct 14, 2020
@howardjohn
Copy link
Member

Hi @Morriz . We have not intended to ignore, nor do I think we have, the prometheus operator community. The metrics merging is primarily intended to work around issues around configuring Prometheus for users not using the operator, as ServiceMonitor CRDs make it not required in most circumstances. We provide some ServiceMonitors that will monitor the control plane as well as all istio proxies: https://github.com/istio/istio/blob/master/samples/addons/extras/prometheus-operator.yaml. Let me know if there is anything else we can do to integrate better.

@Morriz
Copy link
Author

Morriz commented Oct 14, 2020

Awesome. Did not see this before. I think it is a bit hard to navigate all the istio docs, which seem to be spread over multiple places.

But would those 2 service monitors in the example docs then cover all istio targets? And what about sane default alerting rules?

It would be nice to see support for the two prometheus scraping flavors that exist, and let the operator manage the prometheus configuration and resources. I don't think it makes sense to let the community keep up with the istio scraping endpoints.

I am not deep into the matter, but as a platform engineer, this is my concern: that I can rely on the operator to deliver prometheus metrics with sane alerting rules out of the box.

So something like this could suffice imo:

prometheus:
  enabled: false
  rulesEnabled: # any istio rules? all enabled by default
    someRule: false # override, like in prom-op
  prometheusOperator:
    enabled: true # this will result in deployment of service monitors and deliver rules as CRs
  prometheus: '' # prometheus instance to deliver metrics to

Or am I on the wrong track here? I always feel a bit dumb as an outside user, but maybe that can help simplify things as well ;)

@Morriz
Copy link
Author

Morriz commented Oct 14, 2020

Of course I mean the istio-operator when I talk about "the operator" :)

@howardjohn
Copy link
Member

And what about sane default alerting rules?

This is a topic orthogonal to operator vs non-operator I think(?). I agree this would be useful, I think there is some ongoing work here #24310

Awesome. Did not see this before. I think it is a bit hard to navigate all the istio docs, which seem to be spread over multiple places.

Were you looking in https://preliminary.istio.io/latest/docs/ops/integrations/? I think we need to mention the operator here - I thought we did already

cc @douglas-reid

@Morriz
Copy link
Author

Morriz commented Oct 14, 2020

Yes, there is a mention of the sample servicemonitors in the docs (and one has to then go to the samples folder and inspect the manifests, but that is not addressing the concerns I just gave. Automatic deployment of these resources by istio would.

@howardjohn
Copy link
Member

howardjohn commented Oct 14, 2020 via email

@Morriz
Copy link
Author

Morriz commented Oct 14, 2020

I have not come across any reasoning against my suggestions in that blog post.

I understand not being opinionated about deploying entire solutions (like jaeger over zipkin), but seeing that istio exports prometheus metrics, it would be very beneficial to deliver related metrics scraping configuration/resources out of the box. I don't think any other solutions out there will consume these metrics. (All major container platforms have adopted it I think.) Is there anything else out there that can? Don't you agree it would serve most of us when istio would take ownership of this and automate it for us? (And not us in user space reinvent the wheel?).

I don't want to come across as pushy or anything, it's mostly my autist brain trying to see the bigger picture ;) Maybe the time is not ripe and we will see future iterations revisiting this request.

@douglas-reid
Copy link
Contributor

@Morriz I think fundamentally this is an issue of scope (and scope creep). Istio, as a project, does not want to be responsible for Prometheus deployment and configuration, for a number of reasons. Here, you've asked for customizable control over alerting rules for the operator -- but imagine having to replicate that for non-operator installs. And someone else will (and has) asked for customization on the Prometheus resource itself (and also on the standalone installs). And others will (and have) asked for mTLS options, etc. None of these are core to Istio's installation, and would present a very real maintenance and testing burden for the Istio operator itself.

We do, however, want to make integration easy -- and providing samples of the operator configuration (and the annotations for non-operator use cases) was meant to achieve that goal. We need to provide clearer documentation and make sure that the samples are up-to-date and discoverable. Any advice/suggestions on that aspect is very welcome.

As for sane alerting, there is a long-standing issue open for developing a core set of suggested alerting rules. It'd be great to have a community-driven development of such an artifact that we could add to the samples.

@Morriz
Copy link
Author

Morriz commented Oct 15, 2020

I understand now. It is quite an endeavor. Let's hope the community will step in. Good guidance helps. I can allocate time in our team to focus on this. Don't know where to deliver to though. An "istio-community" org on GitHub (like "prometheus-community" now has for charts and such) would be awesome. I don't want us to be driving that now, but maybe down the road it will become clear.

@douglas-reid douglas-reid added this to the Backlog milestone Oct 20, 2020
@Shohou
Copy link

Shohou commented Oct 21, 2020

https://preliminary.istio.io/latest/docs/ops/integrations/prometheus/
Search for "operator" gives 0 results
So finding the fact that prometheus-operator based installs doesn't work out of the box and finding samples related to it is quite a task.

@arturre
Copy link

arturre commented Jan 8, 2021

before 1.8.1( I refer to 1.7.x for example) it was possible to use

  addonComponents:
    prometheusOperator:
      enabled: true

to have all the prometheus Servicemonitor CRs installed using istioctl without actual Prometheus

I agree with @Morriz that we need some machinery to install these CRs for istioctl as Prometheus community moving towards ServiceMonitor/PodMonitor CRD for scraping data

@douglas-reid
Copy link
Contributor

I agree with @Morriz that we need some machinery to install these CRs for istioctl as Prometheus community moving towards ServiceMonitor/PodMonitor CRD for scraping data

The suggested mechanism right now is: kubectl apply -f samples/addons/extras/prometheus-operator.yaml. Are you looking for something beyond that?

@stevehipwell
Copy link
Contributor

Sorry if this is a hijack but I think it's relevent here. Currently there is no documentation for using metric merging with the Prometheus Operator pattern (direct or federated). As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

Also I'd be interested in the pros/cons of federating Prometheus when using the operator pattern? By the looks of it if you want to run Kiali from the production ready Prometheus instance all the metrics would need federating anyway?

@douglas-reid
Copy link
Contributor

Sorry if this is a hijack but I think it's relevent here. Currently there is no documentation for using metric merging with the Prometheus Operator pattern (direct or federated). As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

Also I'd be interested in the pros/cons of federating Prometheus when using the operator pattern? By the looks of it if you want to run Kiali from the production ready Prometheus instance all the metrics would need federating anyway?

@stevehipwell do you want to open a separate issue for the documentation you mention? that might enable better tracking/resolution. there is another open issue about mTLS configuration with the latest releases of Istio that might also be worth tracking.

@stevehipwell
Copy link
Contributor

@douglas-reid which other issue were you referring to?

@psibi
Copy link

psibi commented Apr 13, 2021

As a Prometheus Operator user I would add a ServiceMonitor for each of my service exposing custom metrics; but if mTLS is strict this will error. What we need are instructions on how to convert to the service monitor pattern to the merge metrics pattern; which I suspect is as simple as annotating our pods with Prometheus scrape labels and using the sample PodMonitor?

I just tested this out and that doesn't work with promethus-operator. @douglas-reid Do we have any doc on how to make it work with something like service monitor pattern ? Or is the merge metrics pattern doesn't work with prometheus operator ? Eitherwise, having that information in the official pages will be very helpful.

@stevehipwell
Copy link
Contributor

@psibi what doesn't work? Currently metric merging works fine although it'd be really useful if there was an original_container or similar label added so we could get metrics for specific containers.

@psibi
Copy link

psibi commented Apr 13, 2021

@stevehipwell This is what my testing setup looks like:

  • I have my own prometheus instance running (via promethus-operator) and this is not part of istio service mesh. I have also applied the same service monitor and pod monitor present here: https://raw.githubusercontent.com/istio/istio/release-1.9/samples/addons/extras/prometheus-operator.yaml
  • I have created a new namespace with istio injection enabled. ( My istio setup has metrics merging enabled )
  • The namespace also has strict mTLS enabled.
  • Now I deployed an application on it which exposes some application specific prometheus metrics.

Now when I go to the prometheus dashboard, I can't see any application metrics. Note that I was able to get it working if I do these changes:

  • Use permissive mTLS
  • Add a service or pod monitor for the application

So my question is: Is there a way to make it work with strict mTLS using prometheus when it's not part of service mesh ?

@stevehipwell
Copy link
Contributor

@psibi you want to look at metric merging and the sample code (although that might need modifying). Basically once enabled the proxy pod monitor will collect the merged metrics and serve them up without mTLS, you don't set service monitors for your pods.

@psibi
Copy link

psibi commented Apr 13, 2021

@stevehipwell Thanks, I have enabled metrics merging in my istio setup. Although I don't see any sample code for metric merging (the sample code seems to be present for option 2).

Basically once enabled the proxy pod monitor will collect the merged metrics and serve them up without mTLS, you don't set service monitors for your pods.

Yes, I understand that. The only problem is that I don't get my application metrics. But I do seem to get istio metrics fine in strict mTLS. So I'm wondering if I'm missing something.

@stevehipwell
Copy link
Contributor

@psibi what are the settings that you're using for metrics merging? Specifically podmonitor Istio config and application annotations.

@psibi
Copy link

psibi commented Apr 14, 2021

@stevehipwell I figured out the issue and it occurred because of missing annotations in my deployment. Thanks for the help!

@stevehipwell
Copy link
Contributor

Glad to hear you've got it sorted now @psibi.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 13, 2021
Prioritization automation moved this from P2 to Done Jul 28, 2021
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-01-13. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry kind/enhancement lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
Development

Successfully merging a pull request may close this issue.

8 participants