OSDOCS-796: Add description of metrics exported by Machine Config Daemon#18787
Conversation
dffa7bf to
5a70595
Compare
|
The preview will be available shortly at: |
b5d8b2b to
5a70595
Compare
|
@juzhao @mike-nguyen @miabbott I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK. |
Thanks, please help to review the docs |
Are there any updates on this? Is this doc ok from QE point of view? |
|
@kikisdeliveryservice is the right person to review the doc, please reply to @vikram-redhat |
|
|
||
| `$ oc logs -f -n openshift-machine-config-operator machine-config-daemon-<hash> -c machine-config-daemon` | ||
|
|
||
| `$ journalctl -u pivot.service` |
There was a problem hiding this comment.
It should be noted that this command should be run on the nodes themselves after using something like oc debug node <node>
| |Logs kubelet health failures. * | ||
| |This is expected to be empty, with failure count of 0. If failure count exceeds 2, the error indicating threshold is exceeded. This indicates a possible issue with the health of the kubelet. For further investigation, see the logs by running: | ||
|
|
||
| `$ journalctl -u kubelet` |
There was a problem hiding this comment.
Same comment as above; use oc debug node <node> to access the node first
|
Agreed with Micah's comments. Otherwise LGTM. |
|
Thanks @miabbott , implemented the feedback. |
sheriff-rh
left a comment
There was a problem hiding this comment.
Looks great! Just one suggestion.
|
|
||
| Beginning {product-title} 4.3, the Machine Config Daemon provides a set of metrics. These metrics can be accessed using the Prometheus Cluster Monitoring stack. | ||
|
|
||
| The following table describes this set of metrics. Note that: |
There was a problem hiding this comment.
I recommend a [NOTE] here, instead of a bulleted list.
There was a problem hiding this comment.
Thanks @sheriff-rh . I'll rather keep it like this, since those details closely pertain to the table, and are not a sort of side note.
There was a problem hiding this comment.
Changed my mind and going with your suggestion. Substituted two bullet points with two [NOTE]s.
| |Logs errors encountered during pivot. * | ||
| |Pivot errors might prevent OS upgrades from proceeding. For further investigation, run these commands to access the node and see its logs: | ||
|
|
||
| `$ oc debug node <node>` |
There was a problem hiding this comment.
This is out of order. If we do oc debug node and then try to do oc logs I don't think that will work.
The explicit commands would look like:
$ debug node/ip-10-0-139-75.us-west-2.compute.internal
$ chroot /host
$ journalctl -u pivot.service
This connects to the node, chroot to the node's filesystem, then run's the journalctl command on the node.
There was a problem hiding this comment.
Changed accordingly. So how to describe the difference between logs from oc logs and from journalctl? I want to be more explicit than just "You can also run:".
There was a problem hiding this comment.
The oc logs command is only going to fetch the logs from the particular container in the pod (machine-config-daemon). It's something that you want to run from your workstation, not from a node on the cluster.
The oc debug node route gets you on the host itself and using journalctl is going to give you the full contents of the journal on the node. You could find the machine-config-daemon logs in that journal, but they will be mixed in with the rest of the node logs and therefore harder to sift through.
There was a problem hiding this comment.
There was a problem hiding this comment.
so it would look like:
$ oc debug node/ip-10-0-139-75.us-west-2.compute.internal
% chroot /host
% journalctl -u pivot.service
You can also run
$ oc logs -f ....
There was a problem hiding this comment.
@kikisdeliveryservice has the right idea....using different prompts to indicate on your workstation vs on the node itself. thanks Kirsten!
There was a problem hiding this comment.
Thanks @miabbott and @kikisdeliveryservice ! I've applied your feedback and improved the structure a bit. Does the result look good to you?
(We can't just opt into using the % for shell prompt in the docs, so I've separated the "before login" and "after login" steps with words.)
There was a problem hiding this comment.
The user can also run oc debug node/<node> -- chroot /host journalctl -u pivot.service. With this one liner we won't need to distinguish between the different prompts.
There was a problem hiding this comment.
Thanks @mike-nguyen , I changed the commands to your simplified command. Since the feedback is implemented and I hear no NOACKs, I'm asking @sheriff-rh to merge this.
sheriff-rh
left a comment
There was a problem hiding this comment.
LGTM, thanks Max! Shoot me a message if you need a merge.
|
@rh-max is this ready for merge? |
|
Hey Andrew @sheriff-rh , could you please merge this one and cherry-pick it into |
|
/cherrypick enterprise-4.3 |
|
@sheriff-rh: new pull request created: #19162 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@rh-max multiple commits. |
|
@vikram-redhat Sorry, forgot to squash. |
…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).
…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).
…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).
…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).
…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).
OSDOCS: https://issues.redhat.com/browse/OSDOCS-796