Add messages to node's status card #5050

rawagner · 2020-04-15T07:54:12Z

andybraren

Looking good!

Could we add unit labels to the top area of the breakdown popover? There could be a mix of m (millicores) and cores there, so writing those units out would be clearer.
The second screenshot shows "total requested" being 5.5 (cores), but the node's "total capacity" is only 4 (cores). Reading the docs, it seems like the pod scheduler wouldn't/shouldn't allow the total requested to go over the total capacity. Is this just dummy data, a UI bug, or maybe I'm misunderstanding?
The second screenshot also has a red state in the Utilization card but a yellow state in the Status card. I think the two should match, and assuming the 5.5 requested is just dummy data or a bug, that would mean the Utilization card would be yellow as well (for the total limit being 3.9). The third screenshot seems correct though.

andybraren · 2020-04-15T16:50:59Z

frontend/packages/console-app/src/components/nodes/node-dashboard/messages.ts

+  'The total CPU requested by all pods on this node is approaching the node’s capacity. New pods may not be schedulable on this node.';
+
+export const MEM_LIMIT_REQ_ERROR =
+  'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node';


Suggested change

'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node';

'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node.';

I forgot a period, oops. 😄

rawagner · 2020-04-16T07:54:54Z

Looking good!

Could we add unit labels to the top area of the breakdown popover? There could be a mix of m (millicores) and cores there, so writing those units out would be clearer.

Would be good to have this implemented console-wide. But we still havent done it. I could take a look but until then I think we should just stick with what console uses now (no units for cores, m for milicores)

The second screenshot shows "total requested" being 5.5 (cores), but the node's "total capacity" is only 4 (cores). Reading the docs, it seems like the pod scheduler wouldn't/shouldn't allow the total requested to go over the total capacity. Is this just dummy data, a UI bug, or maybe I'm misunderstanding?

@kyoto @openshift/openshift-team-monitoring
After reading the docs, I would expect the same. These are real data. I'm using query

sum(kube_pod_container_resource_requests{node='<node-name>', resource='cpu'})

I see all masters having higher requests than capacity. Not happening on workers though.

The second screenshot also has a red state in the Utilization card but a yellow state in the Status card. I think the two should match, and assuming the 5.5 requested is just dummy data or a bug, that would mean the Utilization card would be yellow as well (for the total limit being 3.9). The third screenshot seems correct though.

No dummy data, will fix that - if requests are over 100% we will show red state in status card too. I think we will need to update the messages if the query is correct and requests can go over 100%

jtomasek

/lgtm

rawagner · 2020-04-16T09:57:10Z

/hold
lets wait for #4971 to merge first

rawagner · 2020-04-17T07:26:40Z

/hold cancel

rawagner · 2020-04-17T07:27:05Z

/retest

brancz · 2020-04-17T07:34:14Z

sum(kube_pod_container_resource_requests{node='', resource='cpu'})

Is for sure the wrong query as it includes completed, pending, failed, Pods. This precisely proves my point on the other PR of not reusing queries that have been hardened over the year, but instead re-inventing.

jtomasek · 2020-04-17T08:16:10Z

/lgtm

rawagner · 2020-04-17T09:13:09Z

After reading kubernetes/kube-state-metrics#1051 I've updated the requests/limits queries based on https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/8e370046348970ac68bd0fcfd5a15184a6cbdf51/rules/apps.libsonnet#L64-L75 and hopefully didnt mess up.

@brancz
I agree that we should do a better job at reusing queries. Would it be feasible to request the queries which we need to be included into kubernetes-mixin ? Similar to what we have already for namespace record: 'namespace:kube_pod_container_resource_requests_memory_bytes:sum', we would request another one for node etc ?

jtomasek · 2020-04-17T12:22:43Z

/lgtm

openshift-ci-robot · 2020-04-17T12:23:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jtomasek, rawagner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~frontend/OWNERS~~ [rawagner]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rawagner · 2020-04-17T16:37:04Z

/retest

rawagner · 2020-04-17T17:13:34Z

/retest

rawagner · 2020-04-17T19:43:46Z

/retest

spadgett · 2020-04-17T21:47:28Z

/hold
for #5100

spadgett · 2020-04-17T23:15:12Z

/hold cancel

openshift-ci-robot requested review from andybraren and jeff-phillips-18 April 15, 2020 07:54

openshift-ci-robot added component/dashboard Related to dashboard component/metal3 Related to metal3-plugin component/shared Related to console-shared approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 15, 2020

rawagner force-pushed the node-dashboard-messages branch 3 times, most recently from fba9b25 to 8578428 Compare April 15, 2020 09:35

andybraren reviewed Apr 15, 2020

View reviewed changes

rawagner force-pushed the node-dashboard-messages branch from 8578428 to 43f5afe Compare April 16, 2020 07:59

jtomasek approved these changes Apr 16, 2020

View reviewed changes

openshift-ci-robot assigned jtomasek Apr 16, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2020

rawagner mentioned this pull request Apr 16, 2020

Add Projects to Node Top consumers #5054

Merged

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2020

rawagner force-pushed the node-dashboard-messages branch from 43f5afe to 41c042b Compare April 17, 2020 04:37

openshift-ci-robot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 17, 2020

rawagner force-pushed the node-dashboard-messages branch from 41c042b to ef1aa8b Compare April 17, 2020 04:43

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020

rawagner force-pushed the node-dashboard-messages branch from 1a30fc5 to cd53a3d Compare April 17, 2020 09:08

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020

Add messages to node's status card

f6123f6

rawagner force-pushed the node-dashboard-messages branch from cd53a3d to f6123f6 Compare April 17, 2020 09:23

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020

openshift-merge-robot merged commit bf309fa into openshift:master Apr 18, 2020

spadgett added this to the v4.5 milestone Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add messages to node's status card #5050

Add messages to node's status card #5050

rawagner commented Apr 15, 2020 •

edited

andybraren left a comment

andybraren Apr 15, 2020

rawagner commented Apr 16, 2020 •

edited

jtomasek left a comment

rawagner commented Apr 16, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

brancz commented Apr 17, 2020

jtomasek commented Apr 17, 2020

rawagner commented Apr 17, 2020 •

edited

jtomasek commented Apr 17, 2020

openshift-ci-robot commented Apr 17, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

spadgett commented Apr 17, 2020

spadgett commented Apr 17, 2020

	'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node';
	'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node.';

Add messages to node's status card #5050

Add messages to node's status card #5050

Conversation

rawagner commented Apr 15, 2020 • edited

andybraren left a comment

Choose a reason for hiding this comment

andybraren Apr 15, 2020

Choose a reason for hiding this comment

rawagner commented Apr 16, 2020 • edited

jtomasek left a comment

Choose a reason for hiding this comment

rawagner commented Apr 16, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

brancz commented Apr 17, 2020

jtomasek commented Apr 17, 2020

rawagner commented Apr 17, 2020 • edited

jtomasek commented Apr 17, 2020

openshift-ci-robot commented Apr 17, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

rawagner commented Apr 17, 2020

spadgett commented Apr 17, 2020

spadgett commented Apr 17, 2020

rawagner commented Apr 15, 2020 •

edited

rawagner commented Apr 16, 2020 •

edited

rawagner commented Apr 17, 2020 •

edited