-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Update etcd monitoring procedure #19286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update etcd monitoring procedure #19286
Conversation
|
@rh-max friendly ping |
rh-max
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonpasquier Looks fine, although I suggest to change the order of the steps a bit.
…pgrade Reorder steps in the etcd monitoring configuration procedure
|
@rh-max please have a look again. The bug associated to this update has been verified by QE. |
|
@simonpasquier Looks fine. Should I ask for merging this? |
|
@rh-max yes please |
|
@ahardin-rh Could you please review and merge? Thank you. |
|
@rh-max @ahardin-rh friendly ping :) |
| openshift.io/component: etcd | ||
| openshift.io/control-plane: "true" | ||
| ---- | ||
| `openshift_cluster_monitoring_operator_etcd_enabled` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only set openshift_cluster_monitoring_operator_etcd_enabled=true without setting the kube-etcd-client-certs secret first, we would get error, the prometheus pod will not become running
$ oc -n openshift-monitoring get pod | grep prometheus-k8s prometheus-k8s-0 0/4 ContainerCreating 0 8m
`$ oc -n openshift-monitoring describe pod prometheus-k8s-0
Events:
Type Reason Age From Message
Normal Scheduled 1m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to juzhao-311-node-1
Warning FailedMount 2s (x8 over 1m) kubelet, juzhao-311-node-1 MountVolume.SetUp failed for volume "secret-kube-etcd-client-certs" : secrets "kube-etcd-client-certs" not found
`
since cluster_monitoring_operator is installed by default, and ectd monitoring is not enabled by default, we don't need to change the enable etcd monitoring part. All we need to do is just explain how to keep the ectd monitoring configuation alongside with the OCP upgrade. steps are
- enable ectd monitoring first in your cluster
- add openshift_cluster_monitoring_operator_etcd_enabled, openshift_cluster_monitoring_operator_etcd_hosts(If you run
etcdon separate host) to inventory file - upgrade OCP to new version
and the etcd monitoring configuration is kept after upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juzhao not sure to follow your comment. The procedure described here is to setup the etcd monitoring and setting up the kube-etcd-client-certs secret is explained below.
As you wrote, it is assumed that etcd monitoring stays enabled after 3.11.x upgrades but IMO, it doesn't need to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonpasquier this is the doc,
https://github.com/openshift/openshift-docs/blob/9bbfadf1d284416b0a21a524e28461ef183e1c75/install_config/monitoring/configuring-etcd-monitoring.adoc
step 2 and step 3, what should we do if we added the two ansible parameters?
followed the steps, we can not get to step 7
Click Status, then Targets. If you see an etcd entry, etcd is being monitored
since monitoring is installed by default and ectd monitoring is disabled by default, I don't think we need to change ectd monitoring configuation part, only mention how to use the two added ansible parameters is fine, see doc in https://bugzilla.redhat.com/show_bug.cgi?id=1808386
|
@simonpasquier if you have addressed @juzhao's comments, can you please squash your commits and we can then merge it. |
|
Closing since it's been superseded by #21136 |
xref
openshift/openshift-ansible#12084
https://bugzilla.redhat.com/show_bug.cgi?id=1703032