OSDOCS-796: Add description of metrics exported by Machine Config Daemon by rh-max · Pull Request #18787 · openshift/openshift-docs

rh-max · 2019-12-20T04:13:44Z

OSDOCS: https://issues.redhat.com/browse/OSDOCS-796

openshift-docs-preview-bot · 2019-12-20T15:21:50Z

The preview will be available shortly at:

https://enterprise-4.3-mco-daemon-metrics--ocpdocs.netlify.com/

juzhao · 2019-12-23T00:18:18Z

@mnguyen @miabbott
Is machine-config daemon owned by you? If so, please help to review the doc

kikisdeliveryservice · 2019-12-23T17:31:41Z

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

juzhao · 2019-12-24T00:24:58Z

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

Thanks, please help to review the docs

vikram-redhat · 2020-01-05T23:42:14Z

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

Are there any updates on this? Is this doc ok from QE point of view?

juzhao · 2020-01-06T01:27:23Z

@kikisdeliveryservice is the right person to review the doc, please reply to @vikram-redhat

miabbott · 2020-01-06T15:52:50Z

modules/machine-config-daemon-metrics.adoc

+
+`$ oc logs -f -n openshift-machine-config-operator machine-config-daemon-<hash> -c machine-config-daemon`
+
+`$ journalctl -u pivot.service`


It should be noted that this command should be run on the nodes themselves after using something like oc debug node <node>

miabbott · 2020-01-06T15:53:36Z

modules/machine-config-daemon-metrics.adoc

+|Logs kubelet health failures.  *
+|This is expected to be empty, with failure count of 0. If failure count exceeds 2, the error indicating threshold is exceeded. This indicates a possible issue with the health of the kubelet. For further investigation, see the logs by running:
+
+`$ journalctl -u kubelet`


Same comment as above; use oc debug node <node> to access the node first

mike-nguyen · 2020-01-07T16:15:37Z

Agreed with Micah's comments. Otherwise LGTM.

rh-max · 2020-01-09T15:00:34Z

Thanks @miabbott , implemented the feedback.
To clarify, @kikisdeliveryservice has reviewed this document before the PR was even created. Here, I was just asking for a QE review.
Thanks @kikisdeliveryservice , @miabbott , @juzhao , @mike-nguyen for your work on this! On its way to peer review & merging now.

sheriff-rh

Looks great! Just one suggestion.

sheriff-rh · 2020-01-09T15:16:18Z

modules/machine-config-daemon-metrics.adoc

+
+Beginning {product-title} 4.3, the Machine Config Daemon provides a set of metrics. These metrics can be accessed using the Prometheus Cluster Monitoring stack.
+
+The following table describes this set of metrics. Note that:


I recommend a [NOTE] here, instead of a bulleted list.

Thanks @sheriff-rh . I'll rather keep it like this, since those details closely pertain to the table, and are not a sort of side note.

Changed my mind and going with your suggestion. Substituted two bullet points with two [NOTE]s.

miabbott · 2020-01-09T15:40:38Z

modules/machine-config-daemon-metrics.adoc

+|Logs errors encountered during pivot. *
+|Pivot errors might prevent OS upgrades from proceeding. For further investigation, run these commands to access the node and see its logs:
+
+`$ oc debug node <node>`


This is out of order. If we do oc debug node and then try to do oc logs I don't think that will work.

The explicit commands would look like:

$ debug node/ip-10-0-139-75.us-west-2.compute.internal $ chroot /host $ journalctl -u pivot.service

This connects to the node, chroot to the node's filesystem, then run's the journalctl command on the node.

Changed accordingly. So how to describe the difference between logs from oc logs and from journalctl? I want to be more explicit than just "You can also run:".

The oc logs command is only going to fetch the logs from the particular container in the pod (machine-config-daemon). It's something that you want to run from your workstation, not from a node on the cluster.

The oc debug node route gets you on the host itself and using journalctl is going to give you the full contents of the journal on the node. You could find the machine-config-daemon logs in that journal, but they will be mixed in with the rest of the node logs and therefore harder to sift through.

@rh-max @miabbott one way to make this clear would be to use a different shell prompt for the on-node commands so oc debug node stays the same but instead of using $ for the chroot and journalctl commands you can use something else like % or # to show that it's a different level than $

so it would look like:

$ oc debug node/ip-10-0-139-75.us-west-2.compute.internal % chroot /host % journalctl -u pivot.service

You can also run
$ oc logs -f ....

@kikisdeliveryservice has the right idea....using different prompts to indicate on your workstation vs on the node itself. thanks Kirsten!

@rh-max wdyt? ^^^

Thanks @miabbott and @kikisdeliveryservice ! I've applied your feedback and improved the structure a bit. Does the result look good to you?
(We can't just opt into using the % for shell prompt in the docs, so I've separated the "before login" and "after login" steps with words.)

The user can also run oc debug node/<node> -- chroot /host journalctl -u pivot.service. With this one liner we won't need to distinguish between the different prompts.

Thanks @mike-nguyen , I changed the commands to your simplified command. Since the feedback is implemented and I hear no NOACKs, I'm asking @sheriff-rh to merge this.

modules/machine-config-daemon-metrics.adoc

sheriff-rh

LGTM, thanks Max! Shoot me a message if you need a merge.

vikram-redhat · 2020-01-16T02:49:43Z

@rh-max is this ready for merge?

rh-max · 2020-01-20T11:44:41Z

Hey Andrew @sheriff-rh , could you please merge this one and cherry-pick it into 4.3? Thanks.

sheriff-rh · 2020-01-20T13:55:13Z

/cherrypick enterprise-4.3

openshift-cherrypick-robot · 2020-01-20T13:55:35Z

@sheriff-rh: new pull request created: #19162

Details

In response to this:

/cherrypick enterprise-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vikram-redhat · 2020-01-23T22:28:47Z

@rh-max multiple commits.

rh-max · 2020-01-23T22:39:51Z

@vikram-redhat Sorry, forgot to squash.

…gather link We have internal must-gather docs, so use them instead of pointing out to whatever happens to be on GitHub. The link landed pointing at GitHub in 5a70595 (Add description of metrics exported by Machine Config Daemon, 2019-12-20, openshift#18787).

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 20, 2019

Add description of metrics exported by Machine Config Daemon

5a70595

rh-max force-pushed the enterprise-4.3-mco-daemon-metrics branch from dffa7bf to 5a70595 Compare December 20, 2019 14:27

openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 20, 2019

rh-max changed the base branch from enterprise-4.3 to master December 20, 2019 14:28

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 20, 2019

rh-max force-pushed the enterprise-4.3-mco-daemon-metrics branch from b5d8b2b to 5a70595 Compare December 20, 2019 15:23

miabbott reviewed Jan 6, 2020

View reviewed changes

vikram-redhat changed the title ~~Add description of metrics exported by Machine Config Daemon~~ OSDOCS-796: Add description of metrics exported by Machine Config Daemon Jan 8, 2020

Clarify that actions are done on nodes

95e33de

sheriff-rh approved these changes Jan 9, 2020

View reviewed changes

miabbott reviewed Jan 9, 2020

View reviewed changes

modules/machine-config-daemon-metrics.adoc Outdated Show resolved Hide resolved

rh-max added 2 commits January 10, 2020 09:48

Put information into [NOTE]s

7355026

Fixed the instructions for accessing logs

a657dbc

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 10, 2020

sheriff-rh approved these changes Jan 10, 2020

View reviewed changes

sheriff-rh added the peer-review-done Signifies that the peer review team has reviewed this PR label Jan 10, 2020

Add info on difference between two commands & structure improvements

1f0af5d

Simplify instructions and wording around them

e5d64b0

sheriff-rh merged commit 27db8e4 into openshift:master Jan 20, 2020

openshift-cherrypick-robot mentioned this pull request Jan 20, 2020

[enterprise-4.3] OSDOCS-796: Add description of metrics exported by Machine Config Daemon #19162

Merged

wking mentioned this pull request Sep 11, 2020

nodes/nodes/nodes-nodes-machine-config-daemon-metrics: Internal must-gather link #25421

Merged


		`$ oc logs -f -n openshift-machine-config-operator machine-config-daemon-<hash> -c machine-config-daemon`

		`$ journalctl -u pivot.service`


		Beginning {product-title} 4.3, the Machine Config Daemon provides a set of metrics. These metrics can be accessed using the Prometheus Cluster Monitoring stack.

		The following table describes this set of metrics. Note that:

Conversation

rh-max commented Dec 20, 2019 • edited by vikram-redhat Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-docs-preview-bot commented Dec 20, 2019

Uh oh!

juzhao commented Dec 23, 2019

Uh oh!

kikisdeliveryservice commented Dec 23, 2019

Uh oh!

juzhao commented Dec 24, 2019

Uh oh!

vikram-redhat commented Jan 5, 2020

Uh oh!

juzhao commented Jan 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mike-nguyen commented Jan 7, 2020

Uh oh!

rh-max commented Jan 9, 2020

Uh oh!

sheriff-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sheriff-rh left a comment

Choose a reason for hiding this comment

Uh oh!

vikram-redhat commented Jan 16, 2020

Uh oh!

rh-max commented Jan 20, 2020

Uh oh!

sheriff-rh commented Jan 20, 2020

Uh oh!

openshift-cherrypick-robot commented Jan 20, 2020

Uh oh!

vikram-redhat commented Jan 23, 2020

Uh oh!

rh-max commented Jan 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

rh-max commented Dec 20, 2019 •

edited by vikram-redhat

Loading