Skip to content

Conversation

@simonpasquier
Copy link

@simonpasquier simonpasquier commented Jan 24, 2020

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 24, 2020
@simonpasquier simonpasquier requested a review from rh-max January 24, 2020 14:53
@simonpasquier
Copy link
Author

@rh-max friendly ping

Copy link
Contributor

@rh-max rh-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier Looks fine, although I suggest to change the order of the steps a bit.

…pgrade

Reorder steps in the etcd monitoring configuration procedure
@simonpasquier
Copy link
Author

@rh-max please have a look again. The bug associated to this update has been verified by QE.

@rh-max
Copy link
Contributor

rh-max commented Feb 28, 2020

@simonpasquier Looks fine. Should I ask for merging this?

@simonpasquier
Copy link
Author

@rh-max yes please

@rh-max
Copy link
Contributor

rh-max commented Mar 9, 2020

@ahardin-rh Could you please review and merge? Thank you.

@simonpasquier
Copy link
Author

@rh-max @ahardin-rh friendly ping :)

openshift.io/component: etcd
openshift.io/control-plane: "true"
----
`openshift_cluster_monitoring_operator_etcd_enabled`
Copy link

@juzhao juzhao Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only set openshift_cluster_monitoring_operator_etcd_enabled=true without setting the kube-etcd-client-certs secret first, we would get error, the prometheus pod will not become running
$ oc -n openshift-monitoring get pod | grep prometheus-k8s prometheus-k8s-0 0/4 ContainerCreating 0 8m
`$ oc -n openshift-monitoring describe pod prometheus-k8s-0
Events:
Type Reason Age From Message


Normal Scheduled 1m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to juzhao-311-node-1
Warning FailedMount 2s (x8 over 1m) kubelet, juzhao-311-node-1 MountVolume.SetUp failed for volume "secret-kube-etcd-client-certs" : secrets "kube-etcd-client-certs" not found
`
since cluster_monitoring_operator is installed by default, and ectd monitoring is not enabled by default, we don't need to change the enable etcd monitoring part. All we need to do is just explain how to keep the ectd monitoring configuation alongside with the OCP upgrade. steps are

  1. enable ectd monitoring first in your cluster
  2. add openshift_cluster_monitoring_operator_etcd_enabled, openshift_cluster_monitoring_operator_etcd_hosts(If you run etcd on separate host) to inventory file
  3. upgrade OCP to new version
    and the etcd monitoring configuration is kept after upgrade

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juzhao not sure to follow your comment. The procedure described here is to setup the etcd monitoring and setting up the kube-etcd-client-certs secret is explained below.
As you wrote, it is assumed that etcd monitoring stays enabled after 3.11.x upgrades but IMO, it doesn't need to be documented.

Copy link

@juzhao juzhao Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier this is the doc,
https://github.com/openshift/openshift-docs/blob/9bbfadf1d284416b0a21a524e28461ef183e1c75/install_config/monitoring/configuring-etcd-monitoring.adoc
step 2 and step 3, what should we do if we added the two ansible parameters?
followed the steps, we can not get to step 7
Click Status, then Targets. If you see an etcd entry, etcd is being monitored
since monitoring is installed by default and ectd monitoring is disabled by default, I don't think we need to change ectd monitoring configuation part, only mention how to use the two added ansible parameters is fine, see doc in https://bugzilla.redhat.com/show_bug.cgi?id=1808386

@vikram-redhat
Copy link
Contributor

@simonpasquier if you have addressed @juzhao's comments, can you please squash your commits and we can then merge it.

@simonpasquier
Copy link
Author

Closing since it's been superseded by #21136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants