Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prometheus config and add mixer dashboard #2853

Merged
merged 2 commits into from
Jan 26, 2018

Conversation

douglas-reid
Copy link
Contributor

@douglas-reid douglas-reid commented Jan 25, 2018

This PR adds a new alpha-quality Mixer component dashboard to the set of grafana
dashboards that Istio provides out-of-the-box. This dashboard uses a combination
of envoy-generated and Mixer-generated metrics to allow monitoring of the performance
of Istio Mixer.

As part of configuring the dashboard, kubernetes service discovery configuration
was added to the default prometheus scraping config (provides data like cpu, etc.,
as well as supporting HA Mixer configurations that run multiple Mixer replicas.).

This PR also updates the pilot template to expose the monitoring port that was added
in previous PRs to start the process of exposing those metrics.

At the moment, the Mixer dashboard provides monitoring of resource usage (cpu, memory, disk), health information (cluster membership, etc.), adapter and config overview, and breakdowns of metrics per adapter. Improvements / additions to the dashboard will follow in subsequent PRs.

image

@douglas-reid
Copy link
Contributor Author

cc: @jeffmendoza for prom config comments (and grafana setup).

Copy link
Contributor

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No LGTM from me since I don't know about any of this stuff, but it sure looks cool!

Copy link
Contributor

@jeffmendoza jeffmendoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nmittler nmittler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm for the pilot port

@istio-merge-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nmittler
We suggest the following additional approver: frankbu

Assign the PR to them by writing /assign @frankbu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

Copy link
Contributor

@guptasu guptasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick pass on the dashboard content json. Looks nice. Minor NIT comments

"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Server Errors (4xxs)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server error (5xxs) ? and the other one "non non-success (4xxs) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated.

"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Config Resolution Count",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider clarifying this. It is unclear to me if this means number of resolved rules, or some internal async resolution success count. I think this is number of eventual rules whose match clause evaluated to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the number of total times Resolve() is called in Mixer. It is distinct from the number of resolved rules.

"height": "40",
"panels": [
{
"content": "<center><h2>Adapters and Config</h2></center>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: should we split config and adapter section ? Is there specific reason you think grouping these is useful ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a collection point for high-level, Mixer-specific metrics. When the rest of the runtime2 metrics come online, I think we should split these out into separate sections.

@guptasu
Copy link
Contributor

guptasu commented Jan 26, 2018

LGTM

@douglas-reid
Copy link
Contributor Author

@guptasu can i have a /lgtm ?

@douglas-reid douglas-reid merged commit fcf441d into istio:master Jan 26, 2018
rshriram added a commit that referenced this pull request Jan 26, 2018
rshriram added a commit that referenced this pull request Jan 26, 2018
* Revert "Update prometheus config and add mixer dashboard (#2853)"

This reverts commit fcf441d.

* Revert "RBAC adapter implementation (#2743)"

This reverts commit c4628a5.

* Revert "renabling problematic .circleci path in codeowners (#2891)"

This reverts commit d3cbb43.

* Update CODEOWNERS
@@ -12,6 +12,7 @@ ENV GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

COPY ./start.sh /start.sh
COPY ./grafana-dashboard.json /grafana-dashboard.json
COPY ./mixer-dashboard.json /mixer-dashboard.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware that if you add or remove a file from a Dockerfile then there's likely a corresponding update to make in the top-level Makefile (in this case adding a file to the GRAFANA_FILES list). This doesn't yet cause a build breakage but eventually will.

@douglas-reid douglas-reid deleted the component-monitoring branch June 18, 2018 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants