Steve should only send back metadata for counts that have changed #36681

nwmac · 2022-03-01T14:26:55Z

Currently, the Steve API sends count metadata over the web socket every time resource count metadata changes. This results in 24K (typical minimum) of data being send over the socket that the UI has to process.

This is always the full payload of count metadata.

It would be more efficient if Steve only sent metadata for the resources whose counts have changed, rather than the entire document.

samjustus · 2022-05-13T15:37:58Z

@nwmac punting to 2.7

gaktive · 2022-10-12T23:11:48Z

Internal reference: SURE-5394

git-ival · 2022-12-13T20:30:58Z

Hi @nwmac I have a few questions about the specifics on this issue.

When you say "24K of data" I'm assuming you meant KB as in KiloBytes. Is that correct?
Was this observed on a Rancher cluster with or without downstream clusters?
Were the resource updates coming from Downstream or Local resources?
How many resources of different types existed in the cluster? (Secrets, Projects, Pods, etc.)
Can you provide some details on the Rancher cluster's specifics so that we can reproduce more easily and test any fixes?

MbolotSuse · 2022-12-14T16:23:29Z

@git-ival This hasn't been fully merged yet. I'll get you a validation template which aims to give you more information to reproduce the issue once it's ready for testing.

MbolotSuse · 2022-12-14T18:58:18Z

Validation Template

Root Cause

Steve maintains a map of the current counts for various resource types in memory. Each time that this map was changed, it would send over the full map of counts, which caused a large object to be delivered through the websocket each time that a resource was added or deleted.

What was fixed, or what change have occurred

Steve only sends the counts that have changed. For example, if you create a service account, you will only get the counts (over the websocket) for the serviceAccounts, and not for pods or deployments
Counts are now "de-bounced" for five seconds, meaning that we only send a counts update every 5 seconds (at most) and this update will contain all of the changed counts from the last five seconds. This was explicitly requested by the UI in Steve should throttle sending of count metadata #36682.

Areas or cases that should be tested

Basic UI functionality. For example:
- Are the counts on the sidebar accurate?
- Do they stay accurate for a resource type (e.x. pods) if I create/update/delete a resource of that same type (i.e. make a new pod)?
- Do they stay accurate for a resource type if I create/update/delete a resource of an unrelated type?

What areas could experience regressions

The resource counts that the UI presents
The presence of sidebars allowing a user to filter for a specific resource type (e.g. Workloads -> Deployments)

Are the repro steps accurate/minimal?

Yes, they are included here for convenience.

Run rancher/rancher:v2.7.0 (docker install or HA)
Complete basic setup steps through setting a new admin password
Open your dev console, and filter by websocket connections. Find the websocket connection which started the request for the counts resource.
Add a new resource to the local cluster (for example, create a serviceAccount/configmap/namespace).
Observe that you received counts for every resource type in the cluster.

Q/A

I don't think that the exact size is important here - the salient point is that it would send a large JSON over the websocket on a frequent basis.
The counts resource is provided by steve, and can be obtained both in local clusters and downstream clusters. I would guess that this issue is substantially worse in the local cluster (where there is more activity/overall resources), but you can likely notice it by navigating to downstream clusters in the UI and looking at the websocket connections to find where the counts for that cluster are coming from.
As stated above, you can get updates from both local and downstream connections, depending on where you have requested the counts for.
You can check this in clusters that you have running with kubectl api-resources -o wide | wc -l. From my testing on a basic rancher install, this number comes out to about 172.
This issue (counts for all resources) is reproducible on every rancher setup (docker install, HA install, many downstreams, few downstreams). If you are looking for repro tips to make this as bad as possible from a scaling perspective, I would go with:

Create many resources very fast. Secrets/configmaps/tokens are all good candidates for this
Create/delete clusters. Cluster creation/deletion seems to create/delete many different resources of varying types (RBAC, core types, etc). Because the operation is relatively "noisy" it can produce many of these counts

If you would like more specific guidance, please let me know.

floatingman · 2022-12-23T18:41:11Z

Ran through the validation template and observed the correct behavior.

MbolotSuse · 2023-03-10T16:56:35Z

Release Note

Rancher maintains a /v1/counts endpoint that the UI uses to display resource counts. The UI subscribes to changes to the counts for all resources through a websocket to receive the new counts for resources.

Previously, each message from this socket would include all counts for every resource type in the cluster, even if the counts only changed for one specific resource type. This would cause the UI to need to re-update resource counts for every resource type at a high frequency, causing significant performance impact. Now, Rancher will only send back a count for a resource type if the count has changed from the previously known number, improving UI performance.

samjustus · 2023-03-13T20:40:23Z

/backport 2023-Q2-v2.6.x

nwmac added the feature/performance label Mar 1, 2022

nwmac added this to the v2.6.5 milestone Mar 1, 2022

nwmac assigned cbron Mar 1, 2022

nwmac mentioned this issue Mar 1, 2022

Backend Performance rancher/dashboard#5243

Open

cbron assigned samjustus and unassigned cbron Mar 3, 2022

cbron added the team/area1 label Mar 4, 2022

samjustus modified the milestones: v2.6.5, v.2.6.6 Mar 9, 2022

cbron added area/scalability 10k or bust and removed feature/performance labels Apr 26, 2022

samjustus modified the milestones: v2.6.6, v2.7.0 May 13, 2022

Jono-SUSE-Rancher added the v2.7.0 label Jul 28, 2022

zube bot added the [zube]: Team Area 1 label Aug 5, 2022

samjustus assigned MbolotSuse Aug 11, 2022

zube bot added [zube]: Next Up and removed [zube]: Team Area 1 labels Aug 11, 2022

samjustus modified the milestones: v2.7.0, v2.6.9 Aug 29, 2022

Jono-SUSE-Rancher added v2.6.9 and removed v2.7.0 labels Aug 30, 2022

zube bot modified the milestones: v2.6.9, v2.7.0 Aug 31, 2022

zube bot added v2.7.0 and removed v2.6.9 labels Aug 31, 2022

snasovich removed the v2.7.0 label Sep 22, 2022

cbron added [zube]: Working and removed [zube]: Next Up labels Sep 27, 2022

MbolotSuse mentioned this issue Sep 30, 2022

Changing count watch to only return changed counts rancher/steve#58

Merged

gaktive added the JIRA To be used in correspondence with the internal ticketing system. label Oct 12, 2022

samjustus removed the JIRA To be used in correspondence with the internal ticketing system. label Oct 13, 2022

MbolotSuse mentioned this issue Nov 28, 2022

Adjust reading of Counts from the Websocket to account for Steve Changes rancher/dashboard#7566

Closed

zube bot added the kind/enhancement Issues that improve or augment existing functionality label Dec 12, 2022

MbolotSuse mentioned this issue Dec 14, 2022

Bump steve version for counts-diff changes #39901

Merged

MbolotSuse added the [zube]: To Test label Dec 14, 2022

zube bot removed the [zube]: Review label Dec 14, 2022

bmdepesa assigned floatingman Dec 16, 2022

zube bot added [zube]: QA Next up and removed [zube]: To Test labels Dec 16, 2022

zube bot added [zube]: QA Review and removed [zube]: QA Working labels Dec 23, 2022

floatingman mentioned this issue Dec 23, 2022

Steve should throttle sending of count metadata #36682

Closed

zube bot closed this as completed Dec 23, 2022

zube bot added [zube]: Done and removed [zube]: QA Review labels Dec 23, 2022

MbolotSuse added the release-note Note this issue in the milestone's release notes label Mar 10, 2023

This was referenced Mar 13, 2023

[Backport v2.6] Steve should only send back metadata for counts that have changed #40862

Closed

[Backport v2.6] Adjust reading of Counts from the Websocket to account for Steve Changes rancher/dashboard#8438

Closed

zube bot removed the [zube]: Done label Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steve should only send back metadata for counts that have changed #36681

Steve should only send back metadata for counts that have changed #36681

nwmac commented Mar 1, 2022

samjustus commented May 13, 2022

gaktive commented Oct 12, 2022

git-ival commented Dec 13, 2022

MbolotSuse commented Dec 14, 2022

MbolotSuse commented Dec 14, 2022

floatingman commented Dec 23, 2022

MbolotSuse commented Mar 10, 2023

samjustus commented Mar 13, 2023

Steve should only send back metadata for counts that have changed #36681

Steve should only send back metadata for counts that have changed #36681

Comments

nwmac commented Mar 1, 2022

samjustus commented May 13, 2022

gaktive commented Oct 12, 2022

git-ival commented Dec 13, 2022

MbolotSuse commented Dec 14, 2022

MbolotSuse commented Dec 14, 2022

Validation Template

Root Cause

What was fixed, or what change have occurred

Areas or cases that should be tested

What areas could experience regressions

Are the repro steps accurate/minimal?

Q/A

floatingman commented Dec 23, 2022

MbolotSuse commented Mar 10, 2023

Release Note

samjustus commented Mar 13, 2023