Steve socket is spammed on fresh visit to node detail page #10668

richard-cox · 2024-03-20T17:25:56Z

Setup

Rancher version: v2.9-head

Describe the bug

Rancher now returns a resource.error message when watching an individual resource over websocket
- See [SURE-7122] Excessive WebSocket activity when watching resources with permission by name rancher#41809
- Previously it failed to return an resource.error of too old which would result in socket spam
That revealed a scenario where the ui itself caused the spamming
- See Steve socket is spammed on fresh visit to a resource detail page #10540
- Handle the way we get revisions when watching
That also revealed another bug, which this issue is about
- The resource.error message is now returned when we try to watch a resource we should not
  - For instance the metrics.k8s.io.nodemetrics resource used in the node detail page
  - The resource.error though isn't one we let stick, so we just try to resource.start / watch again... which results in the same resource.error, repeat ad nauseam
- Impact
  - This happens on navigate to the node detail page, as opposed to only when refreshing on the detail page
  - The spam does not stop by going from the detail page to the list page. Going from the node detail page where we watch an individual node metric to the node list where we watch all node metrics should behave the same as things like pod ... but it doesn't. Pods will stop the individual watch and start a watch for all.
  - This applies to all resources that we can't watch (no schema watch verb, about 8 out of 182 in a DS rke2 cluster)

To Reproduce

Bring up a rancher instance at or newer than 793118 (7th March)
Ensure that the metrics server is running
- keep things simple, test in local cluster
- log in
- go to <rancher url>/v1/schemas/metrics.k8s.io.nodemetrics
- if this returns, great, you're set
  - if not enable / install metrics server, ensure the url above is reachable
  - check with distro on how to do this, otherwise run apply command from https://github.com/kubernetes-sigs/metrics-server
Navigate to the node list
Bring up dev tools
Navigate to a node's detail page

Result

View the messages sent over the cluster socket repeat the resource.start, resource.stop / resource.error, resource.start cycle

Expected Result

We should not attempt to watch resources the user cannot watch

Screenshots

The text was updated successfully, but these errors were encountered:

richard-cox · 2024-03-22T08:46:53Z

/backport v2.7.next2

richard-cox · 2024-03-22T08:47:04Z

/backport v2.8.next2

izaac · 2024-04-03T16:33:10Z

This is a manual test candidate. This require debug level logs to be enabled on Rancher and monitor log messages. @yonasberhe23

IsaSih · 2024-04-25T02:35:42Z

I noticed there's no /k8s/clusters/<cluster id>/v1/subscribe socket in both the local and downstream cluster

I see the following resource.error messages on a node from the downstream cluster (etcd node):

I see the following resource.error messages on a node from the local cluster:

Errors occurs for resourceTypes: node, pod and apps.deployment

richard-cox · 2024-04-25T09:15:08Z

You'll need to check Big request rows to see the full url path that the socket is using (network tab --> cog icon top right --> checkbox on left)

The main thing to confirm is that there's not lots and lots of repeated messages every second (green socket message sent, resource.start, resource.error, resource.start, green socket message sent, etc)

IsaSih · 2024-06-19T17:32:52Z

Tests pass on v2.9-16190ecee0aa3fb4aa570aa0fc932a36d9dcd082-head

richard-cox added the kind/bug label Mar 20, 2024

richard-cox added this to the v2.9.0 milestone Mar 20, 2024

richard-cox self-assigned this Mar 20, 2024

github-actions bot added [zube]: To Triage QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this and removed [zube]: To Triage labels Mar 20, 2024

richard-cox mentioned this issue Mar 20, 2024

Handle resources that cannot be watched #10669

Merged

7 tasks

nwmac added [zube]: Backlog and removed [zube]: Backlog labels Mar 21, 2024

This was referenced Mar 22, 2024

[backport v2.7.next1] Steve socket is spammed on fresh visit to node detail page #10685

Closed

[backport v2.8.next1] Steve socket is spammed on fresh visit to node detail page #10686

Closed

github-actions bot added [zube]: Review and removed [zube]: Working labels Mar 22, 2024

richard-cox closed this as completed in #10669 Mar 27, 2024

zube bot added [zube]: Done and removed [zube]: Review labels Mar 27, 2024

github-actions bot reopened this Mar 27, 2024

zube bot added [zube]: To Triage and removed [zube]: Done labels Mar 27, 2024

github-actions bot added [zube]: QA Review and removed [zube]: To Triage labels Mar 27, 2024

yonasberhe23 added [zube]: To Test and removed [zube]: QA Review labels Apr 4, 2024

bmdepesa added the QA/manual-test Indicates issue requires manually testing label Apr 4, 2024

gaktive removed the [zube]: To Test label Apr 22, 2024

IsaSih self-assigned this Apr 29, 2024

IsaSih closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steve socket is spammed on fresh visit to node detail page #10668

Steve socket is spammed on fresh visit to node detail page #10668

richard-cox commented Mar 20, 2024 •

edited

Loading

richard-cox commented Mar 22, 2024

richard-cox commented Mar 22, 2024

izaac commented Apr 3, 2024

IsaSih commented Apr 25, 2024 •

edited

Loading

richard-cox commented Apr 25, 2024

IsaSih commented Jun 19, 2024

Steve socket is spammed on fresh visit to node detail page #10668

Steve socket is spammed on fresh visit to node detail page #10668

Comments

richard-cox commented Mar 20, 2024 • edited Loading

richard-cox commented Mar 22, 2024

richard-cox commented Mar 22, 2024

izaac commented Apr 3, 2024

IsaSih commented Apr 25, 2024 • edited Loading

richard-cox commented Apr 25, 2024

IsaSih commented Jun 19, 2024

richard-cox commented Mar 20, 2024 •

edited

Loading

IsaSih commented Apr 25, 2024 •

edited

Loading