Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add messages to node's status card #5050

Merged

Conversation

rawagner
Copy link
Contributor

@rawagner rawagner commented Apr 15, 2020

Screenshot from 2020-04-15 11-15-03
Screenshot from 2020-04-15 10-50-04
Screenshot from 2020-04-15 10-49-37

@openshift-ci-robot openshift-ci-robot added component/dashboard Related to dashboard component/metal3 Related to metal3-plugin component/shared Related to console-shared approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 15, 2020
@rawagner rawagner force-pushed the node-dashboard-messages branch 3 times, most recently from fba9b25 to 8578428 Compare April 15, 2020 09:35
Copy link
Contributor

@andybraren andybraren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

  1. Could we add unit labels to the top area of the breakdown popover? There could be a mix of m (millicores) and cores there, so writing those units out would be clearer.
  2. The second screenshot shows "total requested" being 5.5 (cores), but the node's "total capacity" is only 4 (cores). Reading the docs, it seems like the pod scheduler wouldn't/shouldn't allow the total requested to go over the total capacity. Is this just dummy data, a UI bug, or maybe I'm misunderstanding?
  3. The second screenshot also has a red state in the Utilization card but a yellow state in the Status card. I think the two should match, and assuming the 5.5 requested is just dummy data or a bug, that would mean the Utilization card would be yellow as well (for the total limit being 3.9). The third screenshot seems correct though.

'The total CPU requested by all pods on this node is approaching the node’s capacity. New pods may not be schedulable on this node.';

export const MEM_LIMIT_REQ_ERROR =
'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node';
'This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. The total memory requested is also approaching the node’s capacity. Pods will be terminated under high load, and new pods may not be schedulable on this node.';

I forgot a period, oops. 😄

@rawagner
Copy link
Contributor Author

rawagner commented Apr 16, 2020

Looking good!

  1. Could we add unit labels to the top area of the breakdown popover? There could be a mix of m (millicores) and cores there, so writing those units out would be clearer.

Would be good to have this implemented console-wide. But we still havent done it. I could take a look but until then I think we should just stick with what console uses now (no units for cores, m for milicores)

  1. The second screenshot shows "total requested" being 5.5 (cores), but the node's "total capacity" is only 4 (cores). Reading the docs, it seems like the pod scheduler wouldn't/shouldn't allow the total requested to go over the total capacity. Is this just dummy data, a UI bug, or maybe I'm misunderstanding?

@kyoto @openshift/openshift-team-monitoring
After reading the docs, I would expect the same. These are real data. I'm using query

sum(kube_pod_container_resource_requests{node='<node-name>', resource='cpu'})

I see all masters having higher requests than capacity. Not happening on workers though.

  1. The second screenshot also has a red state in the Utilization card but a yellow state in the Status card. I think the two should match, and assuming the 5.5 requested is just dummy data or a bug, that would mean the Utilization card would be yellow as well (for the total limit being 3.9). The third screenshot seems correct though.

No dummy data, will fix that - if requests are over 100% we will show red state in status card too. I think we will need to update the messages if the query is correct and requests can go over 100%

Copy link

@jtomasek jtomasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2020
@rawagner
Copy link
Contributor Author

/hold
lets wait for #4971 to merge first

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2020
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2020
@openshift-ci-robot openshift-ci-robot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 17, 2020
@rawagner
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020
@rawagner
Copy link
Contributor Author

/retest

@brancz
Copy link

brancz commented Apr 17, 2020

sum(kube_pod_container_resource_requests{node='', resource='cpu'})

Is for sure the wrong query as it includes completed, pending, failed, Pods. This precisely proves my point on the other PR of not reusing queries that have been hardened over the year, but instead re-inventing.

@jtomasek
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@rawagner
Copy link
Contributor Author

rawagner commented Apr 17, 2020

After reading kubernetes/kube-state-metrics#1051 I've updated the requests/limits queries based on https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/8e370046348970ac68bd0fcfd5a15184a6cbdf51/rules/apps.libsonnet#L64-L75 and hopefully didnt mess up.

@brancz
I agree that we should do a better job at reusing queries. Would it be feasible to request the queries which we need to be included into kubernetes-mixin ? Similar to what we have already for namespace record: 'namespace:kube_pod_container_resource_requests_memory_bytes:sum', we would request another one for node etc ?

@jtomasek
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jtomasek, rawagner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rawagner
Copy link
Contributor Author

/retest

2 similar comments
@rawagner
Copy link
Contributor Author

/retest

@rawagner
Copy link
Contributor Author

/retest

@spadgett
Copy link
Member

/hold
for #5100

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020
@spadgett
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2020
@openshift-merge-robot openshift-merge-robot merged commit bf309fa into openshift:master Apr 18, 2020
@spadgett spadgett added this to the v4.5 milestone Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/core Related to console core functionality component/dashboard Related to dashboard component/metal3 Related to metal3-plugin component/shared Related to console-shared lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants