Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="about-accessing-monitoring-web-service-apis_{context}"]
= About accessing monitoring web service APIs

[role="_abstract"]
You can directly access web service API endpoints from the command line for the following monitoring stack components:

* Prometheus
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="about-creating-alerting-rules-for-user-defined-projects_{context}"]
= Creating alerting rules for user-defined projects

[role="_abstract"]
In {ocp}, you can create alerting rules for user-defined projects. Those alerting rules will trigger alerts based on the values of the chosen metrics.

If you create alerting rules for a user-defined project, consider the following key behaviors and important limitations when you define the new rules:
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-about-managing-alerts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="about-managing-alerts_{context}"]
= Managing alerts

[role="_abstract"]
In the {ocp}, the Alerting UI enables you to manage alerts, silences, and alerting rules.

* *Alerting rules*. Alerting rules contain a set of conditions that outline a particular state within a cluster. Alerts are triggered when those conditions are true. An alerting rule can be assigned a severity that defines how the alerts are routed.
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-about-monitoring-dashboards.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="mon-dashboards-adm-perspective_{context}"]
= Monitoring dashboards in the Administrator perspective

[role="_abstract"]
Use the *Administrator* perspective to access dashboards for the core {ocp} components, including the following items:

* API performance
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="about-specifying-limits-and-requests-for-monitoring-components_{context}"]
= About specifying limits and requests for monitoring components

[role="_abstract"]
You can configure resource limits and requests for the following core platform monitoring components:

* Alertmanager
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="accessing-alerting-rules-for-your-project_{context}"]
= Accessing alerting rules for user-defined projects

[role="_abstract"]
To list alerting rules for a user-defined project, you must have been assigned the `monitoring-rules-view` cluster role for the project.

.Prerequisites
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-accessing-metrics-outside-cluster.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="accessing-metrics-from-outside-cluster_{context}"]
= Accessing metrics from outside the cluster for custom applications

[role="_abstract"]
You can query Prometheus metrics from outside the cluster when monitoring your own services with user-defined projects. Access this data from outside the cluster by using the `thanos-querier` route.

This access only supports using a bearer token for authentication.
Expand Down
5 changes: 1 addition & 4 deletions modules/monitoring-accessing-the-alerting-ui.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,10 @@
// * logging/logging_alerts/log-storage-alerts.adoc

:_mod-docs-content-type: PROCEDURE

// The ultimate solution DOES NOT NEED separate IDs and titles, it is just needed for now so that the tests will not break

// tag::ADM[]
[id="monitoring-accessing-the-alerting-ui-adm_{context}"]
= Accessing the Alerting UI from the Administrator perspective
// end::ADM[]

// tag::DEV[]
[id="monitoring-accessing-the-alerting-ui-dev_{context}"]
= Accessing the Alerting UI from the Developer perspective
Expand All @@ -27,6 +23,7 @@
:perspective: Developer
// end::DEV[]

[role="_abstract"]
The Alerting UI is accessible through the *{perspective}* perspective of the {ocp} web console.

// tag::ADM[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="accessing-a-monitoring-web-service-api_{context}"]
= Accessing a monitoring web service API

[role="_abstract"]
The following example shows how to query the service API receivers for the Alertmanager service used in core platform monitoring.
You can use a similar method to access the `prometheus-k8s` service for core platform Prometheus and the `thanos-ruler` service for Thanos Ruler.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
:component: alertmanager
// end::UWM[]

[role="_abstract"]
You can add secrets to the Alertmanager configuration by editing the `{configmap-name}` config map in the `{namespace-name}` project.

After you add a secret to the config map, the secret is mounted as a volume at `/etc/alertmanager/secrets/<secret_name>` within the `alertmanager` container for the Alertmanager pods.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="adding-cluster-id-labels-to-metrics_{context}"]
= Adding cluster ID labels to metrics

[role="_abstract"]
If you manage multiple {ocp} clusters and use the remote write feature to send metrics data from these clusters to an external storage location, you can add cluster ID labels to identify the metrics data coming from different clusters. You can then query these labels to identify the source cluster for a metric and distinguish that data from similar metrics data sent by other clusters.

This way, if you manage many clusters for multiple customers and send metrics data to a single centralized storage system, you can use cluster ID labels to query metrics for a particular cluster or customer.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
:component: thanosRuler
// end::UWM[]

[role="_abstract"]
// tag::CPM[]
You can assign tolerations to any of the monitoring stack components to enable moving them to tainted nodes.
// end::CPM[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
:component: prometheus
// end::UWM[]

[role="_abstract"]
You can attach custom labels to all time series and alerts leaving Prometheus by using the external labels feature of Prometheus.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
:FeatureName: Metrics collection profile
include::snippets/technology-preview.adoc[]

[role="_abstract"]
To choose a metrics collection profile for core {ocp} monitoring components, edit the `cluster-monitoring-config` `ConfigMap` object.

.Prerequisites
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-common-terms.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="monitoring-common-terms_{context}"]
= Glossary of common terms for {ocp} monitoring

[role="_abstract"]
This glossary defines common terms that are used in {ocp} architecture.

Alertmanager::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="components-for-monitoring-user-defined-projects_{context}"]
= Components for monitoring user-defined projects

[role="_abstract"]
{ocp}
ifndef::openshift-dedicated,openshift-rosa[]
{product-version}
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-configurable-monitoring-components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
:thanos: thanosRuler
// end::UWM[]

[role="_abstract"]
This table shows the monitoring components you can configure and the keys used to specify the components in the `{configmap-name}` config map.

// tag::UWM[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
:component: thanosRuler
// end::UWM[]

[role="_abstract"]
To use a persistent volume (PV) for monitoring components, you must configure a persistent volume claim (PVC).

.Prerequisites
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-configuring-alert-routing-console.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
[id="configuring-alert-routing-console_{context}"]
= Configuring alert routing with the {ocp} web console

[role="_abstract"]
You can configure alert routing through the {ocp} web console to ensure that you learn about important issues with your cluster.

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-alert-routing-default-platform-alerts_{context}"]
= Configuring alert routing for default platform alerts

[role="_abstract"]
You can configure Alertmanager to send notifications to receive important alerts coming from your cluster. Customize where and how Alertmanager sends notifications about default platform alerts by editing the default configuration in the `alertmanager-main` secret in the `openshift-monitoring` namespace.

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-alert-routing-for-user-defined-projects_{context}"]
= Configuring alert routing for user-defined projects

[role="_abstract"]
If you are a non-administrator user who has been given the `alert-routing-edit` cluster role, you can create or edit alert routing for user-defined projects.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-alert-routing-user-defined-alerts-secret_{context}"]
= Configuring alert routing for user-defined projects with the Alertmanager secret

[role="_abstract"]
If you have enabled a separate instance of Alertmanager that is dedicated to user-defined alert routing, you can customize where and how the instance sends notifications by editing the `alertmanager-user-workload` secret in the `openshift-user-workload-monitoring` namespace.

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-audit-logs-for-metrics-server_{context}"]
= Configuring audit logs for Metrics Server

[role="_abstract"]
You can configure audit logs for Metrics Server to help you troubleshoot issues with the server.
Audit logs record the sequence of actions in a cluster. It can record user, application, or control plane activities.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-different-alert-receivers-for-default-platform-alerts-and-user-defined-alerts_{context}"]
= Configuring different alert receivers for default platform alerts and user-defined alerts

[role="_abstract"]
You can configure different alert receivers for default platform alerts and user-defined alerts to ensure the following results:

* All default platform alerts are sent to a receiver owned by the team in charge of these alerts.
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-configuring-external-alertmanagers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
:component-name: Thanos Ruler
// end::UWM[]

[role="_abstract"]
The {ocp} monitoring stack includes a local Alertmanager instance that routes alerts from Prometheus.

// tag::CPM[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
:FeatureName: Metrics collection profile
include::snippets/technology-preview.adoc[]

[role="_abstract"]
By default, Prometheus collects metrics exposed by all default metrics targets in {ocp} components.
However, you might want Prometheus to collect fewer metrics from a cluster in certain scenarios:

Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-configuring-persistent-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="configuring-persistent-storage_{context}"]
= Configuring persistent storage

[role="_abstract"]
Run cluster monitoring with persistent storage to gain the following benefits:

* Protect your metrics and alerting data from data loss by storing them in a persistent volume (PV). As a result, they can survive pods being restarted or recreated.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
:label: thanos-ruler
// end::UWM[]

[role="_abstract"]
You can configure pod topology spread constraints for
// tag::CPM[]
all the pods deployed by the {cmo-full}
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-configuring-remote-write-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
:component: prometheus
// end::UWM[]

[role="_abstract"]
You can configure remote write storage to enable Prometheus to send ingested metrics to remote systems for long-term storage. Doing so has no impact on how or for how long Prometheus stores metrics.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="monitoring-configuring-secrets-for-alertmanager_{context}"]
= Configuring secrets for Alertmanager

[role="_abstract"]
The {ocp} monitoring stack includes Alertmanager, which routes alerts from Prometheus to endpoint receivers.
If you need to authenticate with a receiver so that Alertmanager can send alerts to it, you can configure Alertmanager to use a secret that contains authentication credentials for the receiver.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="controlling-the-impact-of-unbound-attributes-in-user-defined-projects_{context}"]
= Controlling the impact of unbound metrics attributes in user-defined projects

[role="_abstract"]
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute. An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a `customer_id` attribute is unbound because it has an infinite number of possible values.

Every assigned key-value pair has a unique time series. The use of many unbound attributes in labels can result in an exponential increase in the number of time series created. This can impact Prometheus performance and can consume a lot of disk space.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="creating-alerting-rules-for-user-defined-projects_{context}"]
= Creating alerting rules for user-defined projects

[role="_abstract"]
You can create alerting rules for user-defined projects. Those alerting rules will trigger alerts based on the values of the chosen metrics.

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
:component: prometheus
// end::UWM[]

[role="_abstract"]
You can create cluster ID labels for metrics by adding the `write_relabel` settings for remote write storage in the `{configmap-name}` config map in the `{namespace-name}` namespace. By adding a cluster ID label, you can uniquely identify metrics and track them consistently across clusters and workloads.

ifndef::openshift-dedicated,openshift-rosa[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="creating-cluster-monitoring-configmap_{context}"]
= Creating a cluster monitoring config map

[role="_abstract"]
You can configure the core {ocp} monitoring components by creating and updating the `cluster-monitoring-config` config map in the `openshift-monitoring` project. The {cmo-first} then configures the core components of the monitoring stack.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="creating-cross-project-alerting-rules-for-user-defined-projects_{context}"]
= Creating cross-project alerting rules for user-defined projects

[role="_abstract"]
You can create alerting rules that are not bound to their project of origin by configuring a project in the `user-workload-monitoring-config` config map. The `PrometheusRule` objects created in these projects are then applicable to all projects.

Therefore, you can have generic alerting rules that apply to multiple user-defined projects instead of having individual `PrometheusRule` objects in each user project. You can filter which projects are included or excluded from the alerting rule by using PromQL queries in the `PrometheusRule` object.
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-creating-new-alerting-rules.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="creating-new-alerting-rules_{context}"]
= Creating new alerting rules

[role="_abstract"]
As a cluster administrator, you can create new alerting rules based on platform metrics.
These alerting rules trigger alerts based on the values of chosen metrics.

Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-creating-scrape-sample-alerts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="creating-scrape-sample-alerts_{context}"]
= Creating scrape sample alerts

[role="_abstract"]
You can create alerts that notify you when:

* The target cannot be scraped or is not available for the specified `for` duration
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-default-monitoring-components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="default-monitoring-components_{context}"]
= Default monitoring components

[role="_abstract"]
By default, the {ocp} {product-version} monitoring stack includes the following components:

.Default monitoring stack components
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-default-monitoring-targets.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="default-monitoring-targets_{context}"]
= Default monitoring targets

[role="_abstract"]
ifndef::openshift-dedicated,openshift-rosa[]
In addition to the components of the stack itself, the default monitoring stack monitors additional platform components.

Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-deploying-a-sample-service.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="deploying-a-sample-service_{context}"]
= Deploying a sample service

[role="_abstract"]
To test monitoring of a service in a user-defined project, you can deploy a sample service.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
[id="determining-why-prometheus-is-consuming-disk-space_{context}"]
= Determining why Prometheus is consuming a lot of disk space

[role="_abstract"]
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute. An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a `customer_id` attribute is unbound because it has an infinite number of possible values.

Every assigned key-value pair has a unique time series. The use of many unbound attributes in labels can result in an exponential increase in the number of time series created. This can impact Prometheus performance and can consume a lot of disk space.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="disabling-cross-project-alerting-rules-for-user-defined-projects_{context}"]
= Disabling cross-project alerting rules for user-defined projects

[role="_abstract"]
Creating cross-project alerting rules for user-defined projects is enabled by default. Cluster administrators can disable the capability in the `cluster-monitoring-config` config map for the following reasons:

* To prevent user-defined monitoring from overloading the cluster monitoring stack.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="disabling-monitoring-for-user-defined-projects_{context}"]
= Disabling monitoring for user-defined projects

[role="_abstract"]
After enabling monitoring for user-defined projects, you can disable it again by setting `enableUserWorkload: false` in the cluster monitoring `ConfigMap` object.

[NOTE]
Expand Down
1 change: 1 addition & 0 deletions modules/monitoring-disabling-the-local-alertmanager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="monitoring-disabling-the-local-alertmanager_{context}"]
= Disabling the local Alertmanager

[role="_abstract"]
A local Alertmanager that routes alerts from Prometheus instances is enabled by default in the `openshift-monitoring` project of the {ocp} monitoring stack.

If you do not need the local Alertmanager, you can disable it by configuring the `cluster-monitoring-config` config map in the `openshift-monitoring` project.
Expand Down
5 changes: 1 addition & 4 deletions modules/monitoring-editing-silences.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,16 @@
// * observability/monitoring/managing-alerts.adoc

:_mod-docs-content-type: PROCEDURE

// The ultimate solution DOES NOT NEED separate IDs and titles, it is just needed for now so that the tests will not break

// tag::ADM[]
[id="editing-silences-adm_{context}"]
= Editing silences from the Administrator perspective
// end::ADM[]

// tag::DEV[]
[id="editing-silences-dev_{context}"]
= Editing silences from the Developer perspective
// end::DEV[]

[role="_abstract"]
You can edit a silence, which expires the existing silence and creates a new one with the changed configuration.

.Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_{context}"]
= Enabling a separate Alertmanager instance for user-defined alert routing

[role="_abstract"]
ifndef::openshift-rosa,openshift-dedicated[]
In some clusters, you might want to deploy a dedicated Alertmanager instance for user-defined projects, which can help reduce the load on the default platform Alertmanager instance and can better separate user-defined alerts from default platform alerts.
endif::[]
Expand Down
Loading