-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update prometheus config and add mixer dashboard #2853
Conversation
cc: @jeffmendoza for prom config comments (and grafana setup). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No LGTM from me since I don't know about any of this stuff, but it sure looks cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm for the pilot port
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nmittler Assign the PR to them by writing The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick pass on the dashboard content json. Looks nice. Minor NIT comments
"thresholds": [], | ||
"timeFrom": null, | ||
"timeShift": null, | ||
"title": "Server Errors (4xxs)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server error (5xxs) ? and the other one "non non-success (4xxs) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Updated.
"thresholds": [], | ||
"timeFrom": null, | ||
"timeShift": null, | ||
"title": "Config Resolution Count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider clarifying this. It is unclear to me if this means number of resolved rules, or some internal async resolution success count. I think this is number of eventual rules whose match clause evaluated to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the number of total times Resolve() is called in Mixer. It is distinct from the number of resolved rules.
"height": "40", | ||
"panels": [ | ||
{ | ||
"content": "<center><h2>Adapters and Config</h2></center>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: should we split config and adapter section ? Is there specific reason you think grouping these is useful ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a collection point for high-level, Mixer-specific metrics. When the rest of the runtime2 metrics come online, I think we should split these out into separate sections.
LGTM |
@guptasu can i have a /lgtm ? |
@@ -12,6 +12,7 @@ ENV GF_AUTH_ANONYMOUS_ORG_ROLE=Admin | |||
|
|||
COPY ./start.sh /start.sh | |||
COPY ./grafana-dashboard.json /grafana-dashboard.json | |||
COPY ./mixer-dashboard.json /mixer-dashboard.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware that if you add or remove a file from a Dockerfile then there's likely a corresponding update to make in the top-level Makefile (in this case adding a file to the GRAFANA_FILES list). This doesn't yet cause a build breakage but eventually will.
This PR adds a new alpha-quality Mixer component dashboard to the set of grafana
dashboards that Istio provides out-of-the-box. This dashboard uses a combination
of envoy-generated and Mixer-generated metrics to allow monitoring of the performance
of Istio Mixer.
As part of configuring the dashboard, kubernetes service discovery configuration
was added to the default prometheus scraping config (provides data like cpu, etc.,
as well as supporting HA Mixer configurations that run multiple Mixer replicas.).
This PR also updates the pilot template to expose the monitoring port that was added
in previous PRs to start the process of exposing those metrics.
At the moment, the Mixer dashboard provides monitoring of resource usage (cpu, memory, disk), health information (cluster membership, etc.), adapter and config overview, and breakdowns of metrics per adapter. Improvements / additions to the dashboard will follow in subsequent PRs.