Inconsistencies with Cluster metrics across the UI #5430

aalves08 · 2022-03-17T16:09:42Z

System
v2.6.3

Describe the bug
Several inconsistencies with Cluster metrics have been found throughout the UI (homepage vs Cluster Dashboard vs Node list view-> for single node clusters).

~~1) In the homepage cluster table, the "usage" for RKE2 clusters is always zero~~ FIXED in 2.6.4 already
~~2) Values for Reserved (MEM and CPU) in Cluster Dashboard for a RK2 cluster are broken (as in it displays 0 always)~~ FIXED in 2.6.4 already
3) Number of PODS used in the homepage cluster table is inconsistent with the number displayed in Cluster Dashboard for all cluster types
4) in the homepage cluster table we use MB for MEM less than 1 GiB, while the units in Cluster Dashboard are only GiB
5) Inconsistency between the number of "used" PODS between Cluster Dashboard and nodes list view (applicable for single machine count/node Clusters)
6) In the homepage cluster table we show the "reserved" values rather than the "usage" values when comparing with the Cluster Dashboard

To Reproduce
~~1) Create a RKE2 cluster -> check Cluster Dashboard for numbers -> compare with homepage (always zero)~~
~~2) Create a RKE2 cluster -> Create a deployment with MEM and CPU reserved -> check Cluster Dashboard~~
3) Create a RKE2 cluster -> check Cluster Dashboard for POD numbers -> compare with homepage
4) Create a RKE2 cluster -> check Cluster Dashboard for MEM units -> compare with homepage
5) Create a RKE2 cluster with a single node -> Create a deployment -> check Cluster Dashboard and compare with nodes list view
6) Create a RKE2 cluster -> check Cluster Dashboard for numbers -> compare with homepage

Expected Result

should display correct metrics
should display correct metrics
POD numbers should match
units should be consistent (GiB)
should result in the same values of PODS for both (applicable for single machine count/node Clusters)
Since it's a "very expensive" operation for the frontend to grab all the correct "usage" for MEM, CPU, PODS (at least 2 extra API requests per cluster) on the homepage, we decided to remove the "reserved" information from the MEM and CPU until we have a proper technical solution to display the usage in the cluster list in the homepage

Additional Information

Main objectives are to show the "usage" everywhere for MEM, CPU, PODS (applicable views), with consistent numbers and units throughout the UI.

The text was updated successfully, but these errors were encountered:

aalves08 · 2022-03-17T16:12:50Z

@nwmac issue created about the errors we found with our digging related with cluster metrics.

@gaktive I believe this issue relates better with SURE-4148 and SURE-4090. What do you think?

xhejtman · 2022-03-27T10:26:57Z

I see problem with these metrics as well (2.6.3-patch2). In cluster explorer, I see metrics numbers like cpu, memory, but they are not current, they seem to be from some point in time and not updated or randomly updated.

aalves08 · 2022-03-31T08:57:56Z

BE response to possibly adding status.usage to git repo obj response:
https://suse.slack.com/archives/C02CX064EBX/p1648551618026609

gaktive · 2022-04-04T22:09:22Z

Another possible scenario: SURE-4301 brings up the scenario that a k8s upgrade to 1.21.8 changes the monitoring values.

aalves08 · 2022-04-05T09:43:59Z

@gaktive I think we should look into SURE-4301 for 2.6.6. The PR is already open for this issue and I think that SURE-4301 will need quite a bit of time to investigate.

Heads up: We are going to remove the "usage" for CPU and MEM in the homepage. Technically the values that we were displaying there were for reserved and not used resources, which is the data the clients really want to look at. To add that data on the homepage, we would need changes to the BE.
Unfortunately this has been a big source for inconsistency claims.

So, clients should focus on looking at the cluster dashboard to get proper metrics for their clusters for now.

FYI @nwmac

gaktive · 2022-04-20T17:23:42Z

Upon discussion, we're OK to remove the usage side if the results are wrong. We just need to put this in the release notes for 2.6.5. I see the docs ticket in place, so that's helpful.

igomez06 · 2022-05-03T20:24:04Z

Setup:
Rancher Version: v2.6.5-rc6
Kubernetes Version: RKE2 v1.22.8+rke2r1
HA Install

Steps:

Created a RKE2 POD numbers are consistent between Cluster Dashboard and homepage
Same RKE2 cluster MEM units are consistent between Cluster Dashboard and homepage
Same RKE2 cluster numbers are consistent between Cluster Dashboard and homepage memory, cores, pods, etc.
Created a single node RKE2 cluster, then created a deployment. Pod numbers are consistent between Cluster Dashboard and nodes list view
"reserved" no longer is present in the homepage as expected.

ellerydb · 2022-05-13T14:42:55Z

Upon discussion, we're OK to remove the usage side if the results are wrong. We just need to put this in the release notes for 2.6.5. I see the docs ticket in place, so that's helpful.

As a Rancher platform owner, it was good to have a holistic view of all the clusters usage across the board.

aalves08 added the area/ui label Mar 17, 2022

aalves08 added this to the v2.6.5 milestone Mar 17, 2022

aalves08 self-assigned this Mar 17, 2022

aalves08 added the kind/bug label Mar 17, 2022

gaktive added [zube]: To Triage internal labels Mar 17, 2022

aalves08 added [zube]: Backlog and removed [zube]: To Triage labels Mar 18, 2022

nwmac added [zube]: Next Up and removed [zube]: Backlog labels Mar 22, 2022

aalves08 added [zube]: Reopened and removed [zube]: Next Up labels Mar 23, 2022

aalves08 mentioned this issue Mar 29, 2022

Inconsistencies with Cluster metrics across the UI #5542

Merged

aalves08 added [zube]: Review and removed [zube]: Working labels Mar 30, 2022

aalves08 added [zube]: To Test and removed [zube]: Review labels Apr 13, 2022

zube bot added the team/area3 Helm Yes! label Apr 18, 2022

gaktive added the release-note label Apr 20, 2022

bmdepesa added the status/dev-validate label May 2, 2022

gaktive mentioned this issue May 2, 2022

Fix inconsistencies with cluster metrics for memory and CPU rancher/rancher#37565

Open

zube bot assigned KevinJoiner and igomez06 May 3, 2022

zube bot added [zube]: QA Working and removed [zube]: To Test labels May 3, 2022

zube bot closed this as completed May 3, 2022

zube bot added [zube]: Done and removed [zube]: QA Working labels May 3, 2022

zube bot removed the [zube]: Done label Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistencies with Cluster metrics across the UI #5430

Inconsistencies with Cluster metrics across the UI #5430

aalves08 commented Mar 17, 2022 •

edited

Loading

aalves08 commented Mar 17, 2022

xhejtman commented Mar 27, 2022

aalves08 commented Mar 31, 2022

gaktive commented Apr 4, 2022

aalves08 commented Apr 5, 2022

gaktive commented Apr 20, 2022

igomez06 commented May 3, 2022

ellerydb commented May 13, 2022

Inconsistencies with Cluster metrics across the UI #5430

Inconsistencies with Cluster metrics across the UI #5430

Comments

aalves08 commented Mar 17, 2022 • edited Loading

aalves08 commented Mar 17, 2022

xhejtman commented Mar 27, 2022

aalves08 commented Mar 31, 2022

gaktive commented Apr 4, 2022

aalves08 commented Apr 5, 2022

gaktive commented Apr 20, 2022

igomez06 commented May 3, 2022

ellerydb commented May 13, 2022

aalves08 commented Mar 17, 2022 •

edited

Loading