Skip to content

OSDOCS-796: Add description of metrics exported by Machine Config Daemon#18787

Merged
sheriff-rh merged 6 commits intoopenshift:masterfrom
rh-max:enterprise-4.3-mco-daemon-metrics
Jan 20, 2020
Merged

OSDOCS-796: Add description of metrics exported by Machine Config Daemon#18787
sheriff-rh merged 6 commits intoopenshift:masterfrom
rh-max:enterprise-4.3-mco-daemon-metrics

Conversation

@rh-max
Copy link
Contributor

@rh-max rh-max commented Dec 20, 2019

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 20, 2019
@rh-max rh-max force-pushed the enterprise-4.3-mco-daemon-metrics branch from dffa7bf to 5a70595 Compare December 20, 2019 14:27
@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 20, 2019
@rh-max rh-max changed the base branch from enterprise-4.3 to master December 20, 2019 14:28
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 20, 2019
@openshift-docs-preview-bot

The preview will be available shortly at:

@rh-max rh-max force-pushed the enterprise-4.3-mco-daemon-metrics branch from b5d8b2b to 5a70595 Compare December 20, 2019 15:23
@juzhao
Copy link

juzhao commented Dec 23, 2019

@mnguyen @miabbott
Is machine-config daemon owned by you? If so, please help to review the doc

@kikisdeliveryservice
Copy link

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

@juzhao
Copy link

juzhao commented Dec 24, 2019

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

Thanks, please help to review the docs

@vikram-redhat
Copy link
Contributor

@juzhao @mike-nguyen @miabbott

I owned this epic for MCO and worked with @rh-max on these docs. If you have any questions, LMK.

Are there any updates on this? Is this doc ok from QE point of view?

@juzhao
Copy link

juzhao commented Jan 6, 2020

@kikisdeliveryservice is the right person to review the doc, please reply to @vikram-redhat


`$ oc logs -f -n openshift-machine-config-operator machine-config-daemon-<hash> -c machine-config-daemon`

`$ journalctl -u pivot.service`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that this command should be run on the nodes themselves after using something like oc debug node <node>

|Logs kubelet health failures. *
|This is expected to be empty, with failure count of 0. If failure count exceeds 2, the error indicating threshold is exceeded. This indicates a possible issue with the health of the kubelet. For further investigation, see the logs by running:

`$ journalctl -u kubelet`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above; use oc debug node <node> to access the node first

@mike-nguyen
Copy link
Member

Agreed with Micah's comments. Otherwise LGTM.

@vikram-redhat vikram-redhat changed the title Add description of metrics exported by Machine Config Daemon OSDOCS-796: Add description of metrics exported by Machine Config Daemon Jan 8, 2020
@rh-max
Copy link
Contributor Author

rh-max commented Jan 9, 2020

Thanks @miabbott , implemented the feedback.
To clarify, @kikisdeliveryservice has reviewed this document before the PR was even created. Here, I was just asking for a QE review.
Thanks @kikisdeliveryservice , @miabbott , @juzhao , @mike-nguyen for your work on this! On its way to peer review & merging now.

Copy link
Contributor

@sheriff-rh sheriff-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one suggestion.


Beginning {product-title} 4.3, the Machine Config Daemon provides a set of metrics. These metrics can be accessed using the Prometheus Cluster Monitoring stack.

The following table describes this set of metrics. Note that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a [NOTE] here, instead of a bulleted list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sheriff-rh . I'll rather keep it like this, since those details closely pertain to the table, and are not a sort of side note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my mind and going with your suggestion. Substituted two bullet points with two [NOTE]s.

|Logs errors encountered during pivot. *
|Pivot errors might prevent OS upgrades from proceeding. For further investigation, run these commands to access the node and see its logs:

`$ oc debug node <node>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of order. If we do oc debug node and then try to do oc logs I don't think that will work.

The explicit commands would look like:

$ debug node/ip-10-0-139-75.us-west-2.compute.internal
$ chroot /host
$ journalctl -u pivot.service

This connects to the node, chroot to the node's filesystem, then run's the journalctl command on the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly. So how to describe the difference between logs from oc logs and from journalctl? I want to be more explicit than just "You can also run:".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oc logs command is only going to fetch the logs from the particular container in the pod (machine-config-daemon). It's something that you want to run from your workstation, not from a node on the cluster.

The oc debug node route gets you on the host itself and using journalctl is going to give you the full contents of the journal on the node. You could find the machine-config-daemon logs in that journal, but they will be mixed in with the rest of the node logs and therefore harder to sift through.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rh-max @miabbott one way to make this clear would be to use a different shell prompt for the on-node commands so oc debug node stays the same but instead of using $ for the chroot and journalctl commands you can use something else like % or # to show that it's a different level than $

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it would look like:

$ oc debug node/ip-10-0-139-75.us-west-2.compute.internal
% chroot /host
% journalctl -u pivot.service

You can also run
$ oc logs -f ....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kikisdeliveryservice has the right idea....using different prompts to indicate on your workstation vs on the node itself. thanks Kirsten!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rh-max wdyt? ^^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @miabbott and @kikisdeliveryservice ! I've applied your feedback and improved the structure a bit. Does the result look good to you?
(We can't just opt into using the % for shell prompt in the docs, so I've separated the "before login" and "after login" steps with words.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can also run oc debug node/<node> -- chroot /host journalctl -u pivot.service. With this one liner we won't need to distinguish between the different prompts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mike-nguyen , I changed the commands to your simplified command. Since the feedback is implemented and I hear no NOACKs, I'm asking @sheriff-rh to merge this.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 10, 2020
Copy link
Contributor

@sheriff-rh sheriff-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Max! Shoot me a message if you need a merge.

@sheriff-rh sheriff-rh added the peer-review-done Signifies that the peer review team has reviewed this PR label Jan 10, 2020
@vikram-redhat
Copy link
Contributor

@rh-max is this ready for merge?

@rh-max
Copy link
Contributor Author

rh-max commented Jan 20, 2020

Hey Andrew @sheriff-rh , could you please merge this one and cherry-pick it into 4.3? Thanks.

@sheriff-rh sheriff-rh merged commit 27db8e4 into openshift:master Jan 20, 2020
@sheriff-rh
Copy link
Contributor

/cherrypick enterprise-4.3

@openshift-cherrypick-robot

@sheriff-rh: new pull request created: #19162

Details

In response to this:

/cherrypick enterprise-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vikram-redhat
Copy link
Contributor

@rh-max multiple commits.

@rh-max
Copy link
Contributor Author

rh-max commented Jan 23, 2020

@vikram-redhat Sorry, forgot to squash.

wking added a commit to wking/openshift-docs that referenced this pull request Sep 11, 2020
…gather link

We have internal must-gather docs, so use them instead of pointing out
to whatever happens to be on GitHub.  The link landed pointing at
GitHub in 5a70595 (Add description of metrics exported by Machine
Config Daemon, 2019-12-20, openshift#18787).
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Sep 21, 2020
…gather link

We have internal must-gather docs, so use them instead of pointing out
to whatever happens to be on GitHub.  The link landed pointing at
GitHub in 5a70595 (Add description of metrics exported by Machine
Config Daemon, 2019-12-20, openshift#18787).
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Sep 21, 2020
…gather link

We have internal must-gather docs, so use them instead of pointing out
to whatever happens to be on GitHub.  The link landed pointing at
GitHub in 5a70595 (Add description of metrics exported by Machine
Config Daemon, 2019-12-20, openshift#18787).
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Sep 21, 2020
…gather link

We have internal must-gather docs, so use them instead of pointing out
to whatever happens to be on GitHub.  The link landed pointing at
GitHub in 5a70595 (Add description of metrics exported by Machine
Config Daemon, 2019-12-20, openshift#18787).
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Sep 21, 2020
…gather link

We have internal must-gather docs, so use them instead of pointing out
to whatever happens to be on GitHub.  The link landed pointing at
GitHub in 5a70595 (Add description of metrics exported by Machine
Config Daemon, 2019-12-20, openshift#18787).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

peer-review-done Signifies that the peer review team has reviewed this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants