Update etcd monitoring procedure #19286

simonpasquier · 2020-01-24T14:52:03Z

xref
openshift/openshift-ansible#12084
https://bugzilla.redhat.com/show_bug.cgi?id=1703032

simonpasquier · 2020-01-30T12:49:02Z

@rh-max friendly ping

rh-max

@simonpasquier Looks fine, although I suggest to change the order of the steps a bit.

…pgrade Reorder steps in the etcd monitoring configuration procedure

simonpasquier · 2020-02-28T15:08:45Z

@rh-max please have a look again. The bug associated to this update has been verified by QE.

rh-max · 2020-02-28T16:07:09Z

@simonpasquier Looks fine. Should I ask for merging this?

simonpasquier · 2020-03-09T13:28:07Z

@rh-max yes please

rh-max · 2020-03-09T13:33:50Z

@ahardin-rh Could you please review and merge? Thank you.

simonpasquier · 2020-04-03T09:24:55Z

@rh-max @ahardin-rh friendly ping :)

install_config/monitoring/configuring-etcd-monitoring.adoc

juzhao · 2020-04-15T08:25:43Z

install_config/monitoring/configuring-etcd-monitoring.adoc

-          openshift.io/component: etcd
-          openshift.io/control-plane: "true"
----
+`openshift_cluster_monitoring_operator_etcd_enabled`


if we only set openshift_cluster_monitoring_operator_etcd_enabled=true without setting the kube-etcd-client-certs secret first, we would get error, the prometheus pod will not become running
$ oc -n openshift-monitoring get pod | grep prometheus-k8s prometheus-k8s-0 0/4 ContainerCreating 0 8m
`$ oc -n openshift-monitoring describe pod prometheus-k8s-0
Events:
Type Reason Age From Message

Normal Scheduled 1m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to juzhao-311-node-1
Warning FailedMount 2s (x8 over 1m) kubelet, juzhao-311-node-1 MountVolume.SetUp failed for volume "secret-kube-etcd-client-certs" : secrets "kube-etcd-client-certs" not found
`
since cluster_monitoring_operator is installed by default, and ectd monitoring is not enabled by default, we don't need to change the enable etcd monitoring part. All we need to do is just explain how to keep the ectd monitoring configuation alongside with the OCP upgrade. steps are

enable ectd monitoring first in your cluster

add openshift_cluster_monitoring_operator_etcd_enabled, openshift_cluster_monitoring_operator_etcd_hosts(If you run etcd on separate host) to inventory file

upgrade OCP to new version
and the etcd monitoring configuration is kept after upgrade

@juzhao not sure to follow your comment. The procedure described here is to setup the etcd monitoring and setting up the kube-etcd-client-certs secret is explained below.
As you wrote, it is assumed that etcd monitoring stays enabled after 3.11.x upgrades but IMO, it doesn't need to be documented.

@simonpasquier this is the doc,
https://github.com/openshift/openshift-docs/blob/9bbfadf1d284416b0a21a524e28461ef183e1c75/install_config/monitoring/configuring-etcd-monitoring.adoc
step 2 and step 3, what should we do if we added the two ansible parameters?
followed the steps, we can not get to step 7
Click Status, then Targets. If you see an etcd entry, etcd is being monitored
since monitoring is installed by default and ectd monitoring is disabled by default, I don't think we need to change ectd monitoring configuation part, only mention how to use the two added ansible parameters is fine, see doc in https://bugzilla.redhat.com/show_bug.cgi?id=1808386

vikram-redhat · 2020-06-21T04:19:32Z

@simonpasquier if you have addressed @juzhao's comments, can you please squash your commits and we can then merge it.

simonpasquier · 2020-06-24T12:59:00Z

Closing since it's been superseded by #21136

Update etcd monitoring procedure

ec9dbcb

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 24, 2020

simonpasquier requested a review from rh-max January 24, 2020 14:53

Restructure the etcd monitoring configuration procedure

bbf5b7b

rh-max suggested changes Jan 30, 2020

View reviewed changes

Merge pull request #1 from rh-max/simonpasquier/fix-etcd-monitoring-u…

fa68424

…pgrade Reorder steps in the etcd monitoring configuration procedure

juzhao reviewed Apr 15, 2020

View reviewed changes

install_config/monitoring/configuring-etcd-monitoring.adoc Show resolved Hide resolved

juzhao reviewed Apr 15, 2020

View reviewed changes

Fix 'oc get pods' output

9bbfadf

simonpasquier closed this Jun 24, 2020

Update etcd monitoring procedure #19286

Update etcd monitoring procedure #19286

Uh oh!

Conversation

simonpasquier commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonpasquier commented Jan 30, 2020

Uh oh!

rh-max left a comment

Choose a reason for hiding this comment

Uh oh!

simonpasquier commented Feb 28, 2020

Uh oh!

rh-max commented Feb 28, 2020

Uh oh!

simonpasquier commented Mar 9, 2020

Uh oh!

rh-max commented Mar 9, 2020

Uh oh!

simonpasquier commented Apr 3, 2020

Uh oh!

Uh oh!

juzhao Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonpasquier Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

juzhao Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vikram-redhat commented Jun 21, 2020

Uh oh!

simonpasquier commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

simonpasquier commented Jan 24, 2020 •

edited

Loading

juzhao Apr 15, 2020 •

edited

Loading

juzhao Apr 23, 2020 •

edited

Loading