Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: meta-monitoring #2068

Merged
merged 4 commits into from Jun 24, 2022
Merged

helm: meta-monitoring #2068

merged 4 commits into from Jun 24, 2022

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Jun 9, 2022

What this PR does

Aims to make monitoring of Mimir/GEM when deployed via the helm chart easier. The idea is to make the process as streamlined as possible so that an operator has to spend minimal time on configuring scraping and relabelling configs.

This PR tries to achieve this by vendoring the Grafana Agent Operator helm chart. The mimir-distributed chart creates custom resources that create two grafana agents: one that scrapes metrics and one that collects logs. Optionally the mimir chart can also create resources that scrape relevant metrics from cadvisor, kubelet, and kube-state-metrics because these are used in alerts and dashboards.

Which issue(s) this PR fixes or relates to

Fixes #2014

Checklist

  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@Logiraptor Logiraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I was able to send metrics from a local Mimir to my Grafana Cloud account, but I found a few rough edges along the way. Can you take a look?

metaMonitoring:
grafanaAgent:
# -- Controls whether to create PodLogs, MetricsInstance, LogsInstance, and GrafanaAgent CRs to scrape the
# ServiceMonitors of the chart and ship metrics and logs to the remote endpoints below.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't work with the defaults. And yes, the ServiceMonitors need to be enabled. I will clarify this in the docs

@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/helm-chart-metamonitoring branch from ca9e934 to 801d0d7 Compare June 17, 2022 18:33
@trevorwhitney
Copy link
Contributor

@dimitarvdimitrov this PR was very inspiring, just put up something similar for Loki: grafana/helm-charts#1514

@dimitarvdimitrov
Copy link
Contributor Author

I realized there is no service monitor for the overrides exporter I will add one

@dimitarvdimitrov
Copy link
Contributor Author

I realized there is no service monitor for the overrides exporter I will add one

#2125 adds a new helper for serviceomonitors. I will wait for that to be merged before adding one here. Looks be trivial.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My remaining comments are more nitpick that we can deal with separately if you want.

Copy link
Contributor

@Logiraptor Logiraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time I was able to send metrics and logs to Grafana Cloud, and was able to send metrics locally to Mimir, so it appears everything is working.

I found one last UX bug.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/helm-chart-metamonitoring branch from dbb7977 to 2a67ba3 Compare June 23, 2022 14:00
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov mentioned this pull request Jun 23, 2022
3 tasks
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@krajorama krajorama merged commit 00bc6b8 into main Jun 24, 2022
@krajorama krajorama deleted the dimitar/helm-chart-metamonitoring branch June 24, 2022 14:35
rlex added a commit to rlex/mimir that referenced this pull request Jun 28, 2022
* main: (63 commits)
  Add new section on website for links to blog posts, podcasts and talks. (grafana#2216)
  Rename codified errors to errors catalog (grafana#2256)
  Helm: add a step to contributing doc (grafana#2257)
  Signal that 2.2 release is now in progress. (grafana#2254)
  Removed migration of alertmanager local state files from old hierarchy (Cortex 1.8 and earlier) (grafana#2253)
  operations/mimir: Change multi_zone_ingester_max_unavailable to 25 (grafana#2251)
  Helm: weekly release (grafana#2252)
  Jsonnet: Configure ingester max global metadata per user and per metric (grafana#2250)
  Helm: metamonitor naming (grafana#2236)
  Mimir documentation about out-of-order (grafana#2183)
  Vendor latest mimir-prometheus/main (grafana#2243)
  Set CODEOWNERS to primary technical writer (grafana#2242)
  Use BasicLifecycler for distributors and auto-forget (grafana#2154)
  Docs: Basic documentation for deploying the ruler using jsonnet. (grafana#2127)
  Fix post merge reviews on 2187 (grafana#2230)
  Add tests for user metadata in the ingester (grafana#2184)
  Change the error message template for per-tenant limits (grafana#2234)
  helm: meta-monitoring (grafana#2068)
  Article about migrating from Consul to memberlist. Added documentation for /memberlist endpoint. (grafana#2166)
  Update runbooks to mention possibility to investigate memberlist KV store in various alerts (grafana#2158)
  ...
masonmei pushed a commit to udmire/mimir that referenced this pull request Jul 11, 2022
* Add meta-monitoring

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include a working metamonitoring setup in helm chart
5 participants