Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steve socket is spammed on fresh visit to node detail page #10668

Open
richard-cox opened this issue Mar 20, 2024 · 5 comments · Fixed by #10669
Open

Steve socket is spammed on fresh visit to node detail page #10668

richard-cox opened this issue Mar 20, 2024 · 5 comments · Fixed by #10669
Assignees
Labels
kind/bug QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this QA/manual-test Indicates issue requires manually testing
Milestone

Comments

@richard-cox
Copy link
Member

richard-cox commented Mar 20, 2024

Setup

  • Rancher version: v2.9-head

Describe the bug

  • Rancher now returns a resource.error message when watching an individual resource over websocket
  • That revealed a scenario where the ui itself caused the spamming
  • That also revealed another bug, which this issue is about
    • The resource.error message is now returned when we try to watch a resource we should not
      • For instance the metrics.k8s.io.nodemetrics resource used in the node detail page
      • The resource.error though isn't one we let stick, so we just try to resource.start / watch again... which results in the same resource.error, repeat ad nauseam
    • Impact
      • This happens on navigate to the node detail page, as opposed to only when refreshing on the detail page
      • The spam does not stop by going from the detail page to the list page. Going from the node detail page where we watch an individual node metric to the node list where we watch all node metrics should behave the same as things like pod ... but it doesn't. Pods will stop the individual watch and start a watch for all.
      • This applies to all resources that we can't watch (no schema watch verb, about 8 out of 182 in a DS rke2 cluster)

To Reproduce

  • Bring up a rancher instance at or newer than 793118 (7th March)
  • Ensure that the metrics server is running
    • keep things simple, test in local cluster
    • log in
    • go to <rancher url>/v1/schemas/metrics.k8s.io.nodemetrics
    • if this returns, great, you're set
  • Navigate to the node list
  • Bring up dev tools
  • Navigate to a node's detail page

Result

  • View the messages sent over the cluster socket repeat the resource.start, resource.stop / resource.error, resource.start cycle

Expected Result

  • We should not attempt to watch resources the user cannot watch

Screenshots
image

@richard-cox richard-cox added this to the v2.9.0 milestone Mar 20, 2024
@richard-cox richard-cox self-assigned this Mar 20, 2024
@github-actions github-actions bot added [zube]: To Triage QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this and removed [zube]: To Triage labels Mar 20, 2024
@richard-cox
Copy link
Member Author

/backport v2.7.next2

@richard-cox
Copy link
Member Author

/backport v2.8.next2

@izaac
Copy link
Contributor

izaac commented Apr 3, 2024

This is a manual test candidate. This require debug level logs to be enabled on Rancher and monitor log messages. @yonasberhe23

@bmdepesa bmdepesa added the QA/manual-test Indicates issue requires manually testing label Apr 4, 2024
@IsaSih
Copy link

IsaSih commented Apr 25, 2024

I noticed there's no /k8s/clusters/<cluster id>/v1/subscribe socket in both the local and downstream cluster

I see the following resource.error messages on a node from the downstream cluster (etcd node):

Image

I see the following resource.error messages on a node from the local cluster:

Image

Errors occurs for resourceTypes: node, pod and apps.deployment

@richard-cox
Copy link
Member Author

You'll need to check Big request rows to see the full url path that the socket is using (network tab --> cog icon top right --> checkbox on left)

image

The main thing to confirm is that there's not lots and lots of repeated messages every second (green socket message sent, resource.start, resource.error, resource.start, green socket message sent, etc)

@IsaSih IsaSih self-assigned this Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this QA/manual-test Indicates issue requires manually testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants