Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Minor corrections to Master/Seed MLA stack docs #1653

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,27 @@ date = 2018-08-17T12:07:15+02:00
weight = 10
+++

The Master / Seed Cluster MLA (Monitoring Logging & Alerting) stack monitors KKP components running in the KKP master and seed clusters, including control plane components of the user clusters. Unlike the [User Cluster MLA stack]({{< ref "../user-cluster/">}}) it does not monitor applications running in the user clusters. Only KKP administrators can access this monitoring data.
The Master / Seed Cluster MLA (Monitoring Logging & Alerting) stack monitors KKP components running in the KKP master and seed clusters, including control plane components of the user clusters. Unlike the [User Cluster MLA stack]({{< ref "../user-cluster/">}}), it does not monitor applications running in the user clusters. Only KKP administrators can access this monitoring data.

It uses [Prometheus](https://prometheus.io) and its [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) for monitoring and alerting. The logging stack consists of Promtail and [Grafana Loki](https://grafana.com/oss/loki/). Dashboarding is done with [Grafana](https://grafana.com).
It uses [Prometheus](https://prometheus.io) and its [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) for monitoring and alerting. The logging stack consists of [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) and [Grafana Loki](https://grafana.com/oss/loki/). Visualization is done with [Grafana](https://grafana.com) dashboards.

## Overview

There is a single Prometheus service in each seed cluster's `monitoring` namespace, which is responsible for monitoring the cluster's components (like the KKP controller manager) and serves as the main datasource for the accompanying Grafana service. Besides that there is a Prometheus inside each user cluster namespace, which in turn monitors the Kubernetes control plane (apiserver, controller manager, etcd cluster etc.) of that customer cluster. The seed-level Prometheus scrapes all customer-cluster Prometheus instances and combines their metrics for creating the dashboards in Grafana.
There is a single Prometheus service in each seed cluster's `monitoring` namespace, which is responsible for monitoring the cluster's components (like the KKP controller manager) and serves as the main datasource for the accompanying Grafana service. Besides that there is a Prometheus inside each user cluster namespace, which in turn monitors the Kubernetes control plane (apiserver, controller manager, etcd cluster etc.) of that user cluster. The seed-level Prometheus scrapes all the user cluster Prometheus instances and combines their metrics for creating the dashboards in Grafana.

Along the seed-level Prometheus, there is a single alertmanager running in the seed, which *all* Prometheus instances are using to relay their alerts (i.e. the Prometheus inside the customer clusters send their alerts to the seed cluster's alertmanager).
Along the seed-level Prometheus, there is a single alertmanager running in the seed, which *all* Prometheus instances are using to relay their alerts (i.e. the Prometheus inside the user clusters send their alerts to the seed cluster's alertmanager).

![Monitoring architecture diagram](/img/kubermatic/main/monitoring/architecture/architecture.png)

## Federation

The seed-level Prometheus uses Prometheus' native federation mechanism to scrape the customer Prometheus instances. To prevent excessive amount of data in the seed, it will however only scrape a few selected metrics, namely those labelled with `kubermatic=federate`.
The seed-level Prometheus uses Prometheus' native federation mechanism to scrape the user cluster Prometheus instances. To prevent excessive amount of data in the seed, it will however only scrape a few selected metrics, namely those labelled with `kubermatic=federate`.

The last of these options is used for pre-aggregated metrics, which combine highly detailed time series (like from etcd) into smaller, easier to handle metrics that can be readily used inside Grafana.

## Grafana

In a default KKP installation we ship Grafana as *readonly* metrics dashboard.
In a default KKP installation, we ship Grafana as *readonly* metrics dashboard.
When working with Grafana please keep in mind, that **ALL CHANGES** done using the Grafana UI (like adding datasources, etc.) **WILL NOT BE PERSISTED**. Dashboards, graphs, datasources, etc. will be defined using the Helm chart.

## Storage Requirements
Expand All @@ -35,7 +35,7 @@ Depending on how user clusters are used, disk usage for Prometheus can vary grea
* 100 MiB used by the seed-level Prometheus for each user cluster
* 50-300 MiB used by the user-level Prometheus, depending on its WAL size.

These values can also vary if you tweak the retention periods.
These values can also vary, if you tweak the retention periods.

## Installation
Please follow the [Installation of the Master / Seed MLA Stack Guide]({{< relref "../../../tutorials-howtos/monitoring-logging-alerting/master-seed/installation/" >}}).
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,28 @@ This guide assumes the following tools are available:

## Monitoring & Alerting Components

This chapter describes how to setup the Kubermatic Kubernetes Platform (KKP) master / seed monitoring & alerting components. It's highly recommended to install this
stack on the master and all seed clusters.
This chapter describes how to setup the Kubermatic Kubernetes Platform (KKP) master / seed monitoring & alerting components. It’s highly recommended to install this stack on the master and all seed clusters.

It uses [Prometheus](https://prometheus.io) and its [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) for monitoring and alerting. Dashboarding is done with [Grafana](https://grafana.com). More information can be found in the [Architecture]({{< relref "../../../../architecture/monitoring-logging-alerting/master-seed/" >}}) document.
- [Prometheus](https://prometheus.io) for monitoring.
- [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) for alerting.
- [Grafana](https://grafana.com) for visualization using dashboards.

More information can be found in the [Architecture]({{< relref "../../../../architecture/monitoring-logging-alerting/master-seed/" >}}) document.

### Installation

As with KKP itself, it's recommended to use a single `values.yaml` to configure all Helm charts. There
are a few important options you might want to override for your setup:
As with KKP itself, it's recommended to use a single `values.yaml` to configure all Helm charts. There are a few important options, you might want to override for your setup:

* `prometheus.host` is used for the external URL in Prometheus, e.g. `prometheus.kubermatic.example.com`.
* `alertmanager.host` is used for the external URL in Alertmanager, e.g. `alertmanager.kubermatic.example.com`.
* `prometheus.storageSize` (default: `100Gi`) controls the volume size for each Prometheus replica; this should
be large enough to hold all data as per your retention time (see next option). Long-term storage for Prometheus
* `prometheus.host` is used for the external URL in Prometheus, e.g. `prometheus.kkp.example.com`.
* `alertmanager.host` is used for the external URL in Alertmanager, e.g. `alertmanager.kkp.example.com`.
* `prometheus.storageSize` (default: `100Gi`) controls the volume size for each Prometheus replica; this should be large enough to hold all data as per your retention time (see next option). Long-term storage for Prometheus
blocks is provided by Thanos, an optional extension to the Prometheus chart.
* `prometheus.tsdb.retentionTime` (default: `15d`) controls how long metrics are stored in Prometheus before they
are deleted. Larger retention times require more disk space. Long-term storage is accomplished by Thanos, so the
retention time for Prometheus itself should not be set to extremely large values (like multiple months).
* `prometheus.ruleFiles` is a list of Prometheus alerting rule files to load. Depending on whether or not the
target cluster is a master or seed, the `/etc/prometheus/rules/kubermatic-master-*.yaml` entry should be removed
* `prometheus.tsdb.retentionTime` (default: `15d`) controls how long metrics are stored in Prometheus before they are deleted. Larger retention times require more disk space. Long-term storage is accomplished by Thanos, so the retention time for Prometheus itself should not be set to extremely large values (like multiple months).
* `prometheus.ruleFiles` is a list of Prometheus alerting rule files to load. Depending on whether or not the target cluster is a master or seed, the `/etc/prometheus/rules/kubermatic-master-*.yaml` entry should be removed
in order to not trigger bogus alerts.
* `prometheus.blackboxExporter.enabled` is used to enable integration between Prometheus and Blackbox Exporter, used for monitoring of API endpoints of user clusters created on the seed. `prometheus.blackboxExporter.url` should be adjusted accordingly (default value would be `blackbox-exporter:9115`)
* `grafana.user` and `grafana.password` should be set with custom values if no identity-aware proxy is configured.
In this case, `grafana.provisioning.configuration.disable_login_form` should be set to `false` so that a manual
login is possible.
* `grafana.user` and `grafana.password` should be set with custom values, if no identity-aware proxy is configured.
In this case, `grafana.provisioning.configuration.disable_login_form` should be set to `false`, so that a manual login is possible.

An example `values.yaml` could look like this if all options mentioned above are customized:

Expand Down Expand Up @@ -110,15 +107,16 @@ complete list of options.

## Logging Components

This chapter describes how to setup the Kubermatic Kubernetes Platform (KKP) master / seed logging components. It's highly recommended to install this
stack on the master and all seed clusters.
This chapter describes how to setup the Kubermatic Kubernetes Platform (KKP) master / seed logging components. It's highly recommended to install this stack on the master and all seed clusters.

The logging stack consists of Promtail and [Grafana Loki](https://grafana.com/oss/loki/). More information can be found in the [Architecture]({{< relref "../../../../architecture/monitoring-logging-alerting/master-seed/" >}}) document.
- [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/)
- [Grafana Loki](https://grafana.com/oss/loki/).

More information can be found in the [Architecture]({{< relref "../../../../architecture/monitoring-logging-alerting/master-seed/" >}}) document.

### Requirements

The exact requirements for the stack depend highly on the expected cluster load; the following are the minimum
viable resources:
The exact requirements for the stack depend highly on the expected cluster load; the following are the minimum viable resources:

* 2 GB RAM
* 2 CPU cores
Expand Down