Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 6 additions & 54 deletions install_config/monitoring/configuring-etcd-monitoring.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

If the `etcd` service does not run correctly, successful operation of the whole {product-title} cluster is in danger. Therefore, it is reasonable to configure monitoring of `etcd`.

Follow these steps to configure `etcd` monitoring:
To enable the `etcd` monitoring:

.Procedure

Expand All @@ -27,68 +27,20 @@ node-exporter-b2mrp 2/2 Running 0
node-exporter-fd52p 2/2 Running 0 33m
node-exporter-hfqgv 2/2 Running 0 33m
prometheus-k8s-0 4/4 Running 1 35m
prometheus-k8s-1 0/4 ContainerCreating 0 21s
prometheus-k8s-1 4/4 Runinng 0 35m
prometheus-operator-6c9fddd47f-9jfgk 1/1 Running 0 36m
----

. Open the configuration file for the cluster monitoring stack:
. Set this variable to `true` in the Ansible inventory file:
+
----
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
----

. Under `config.yaml: |+`, add the `etcd` section.

.. If you run `etcd` in static pods on your master nodes, you can specify the `etcd` nodes using the selector:
+
----
...
data:
config.yaml: |+
...
etcd:
targets:
selector:
openshift.io/component: etcd
openshift.io/control-plane: "true"
----
`openshift_cluster_monitoring_operator_etcd_enabled`
Copy link

@juzhao juzhao Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only set openshift_cluster_monitoring_operator_etcd_enabled=true without setting the kube-etcd-client-certs secret first, we would get error, the prometheus pod will not become running
$ oc -n openshift-monitoring get pod | grep prometheus-k8s prometheus-k8s-0 0/4 ContainerCreating 0 8m
`$ oc -n openshift-monitoring describe pod prometheus-k8s-0
Events:
Type Reason Age From Message


Normal Scheduled 1m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to juzhao-311-node-1
Warning FailedMount 2s (x8 over 1m) kubelet, juzhao-311-node-1 MountVolume.SetUp failed for volume "secret-kube-etcd-client-certs" : secrets "kube-etcd-client-certs" not found
`
since cluster_monitoring_operator is installed by default, and ectd monitoring is not enabled by default, we don't need to change the enable etcd monitoring part. All we need to do is just explain how to keep the ectd monitoring configuation alongside with the OCP upgrade. steps are

  1. enable ectd monitoring first in your cluster
  2. add openshift_cluster_monitoring_operator_etcd_enabled, openshift_cluster_monitoring_operator_etcd_hosts(If you run etcd on separate host) to inventory file
  3. upgrade OCP to new version
    and the etcd monitoring configuration is kept after upgrade

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juzhao not sure to follow your comment. The procedure described here is to setup the etcd monitoring and setting up the kube-etcd-client-certs secret is explained below.
As you wrote, it is assumed that etcd monitoring stays enabled after 3.11.x upgrades but IMO, it doesn't need to be documented.

Copy link

@juzhao juzhao Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier this is the doc,
https://github.com/openshift/openshift-docs/blob/9bbfadf1d284416b0a21a524e28461ef183e1c75/install_config/monitoring/configuring-etcd-monitoring.adoc
step 2 and step 3, what should we do if we added the two ansible parameters?
followed the steps, we can not get to step 7
Click Status, then Targets. If you see an etcd entry, etcd is being monitored
since monitoring is installed by default and ectd monitoring is disabled by default, I don't think we need to change ectd monitoring configuation part, only mention how to use the two added ansible parameters is fine, see doc in https://bugzilla.redhat.com/show_bug.cgi?id=1808386


.. If you run `etcd` on separate hosts, you need to specify the nodes using IP addresses:
. If you run `etcd` on separate hosts, set this variable in the Ansible inventory file to specify the nodes using IP addresses:
+
----
...
data:
config.yaml: |+
...
etcd:
targets:
ips:
- "127.0.0.1"
- "127.0.0.2"
- "127.0.0.3"
----
`openshift_cluster_monitoring_operator_etcd_hosts`
+
If the IP addresses for `etcd` nodes change, you must update this list.

. Verify that the `etcd` service monitor is now running:
+
----
$ oc -n openshift-monitoring get servicemonitor
NAME AGE
alertmanager 35m
etcd 1m <1>
kube-apiserver 36m
kube-controllers 36m
kube-state-metrics 34m
kubelet 36m
node-exporter 34m
prometheus 36m
prometheus-operator 37m
----
<1> The `etcd` service monitor.
+
It might take up to a minute for the `etcd` service monitor to start.

. Now you can navigate to the web interface to see more information about the status of `etcd` monitoring.

.. To get the URL, run:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ The {product-title} Ansible `openshift_cluster_monitoring_operator` role configu
|`openshift_cluster_monitoring_operator_alertmanager_storage_class_name`
| If you enabled the `openshift_cluster_monitoring_operator_alertmanager_storage_enabled` option, set a specific StorageClass to ensure that pods are configured to use the `PVC` with that `storageclass`. Defaults to `none`, which applies the default storage class name.

|`openshift_cluster_monitoring_operator_etcd_enabled`
| Enable `etcd` monitoring. This variable is set to `false` by default.

|`openshift_cluster_monitoring_operator_etcd_hosts`
| The list of IP addresses of the `etcd` hosts when `etcd` runs on separate nodes.

|===

[[monitoring-prerequisites]]
Expand Down