fix: nodemetric version comparison by yulken · Pull Request #969 · rancher/steve

yulken · 2026-01-07T13:17:56Z

This PR addresses the following issues:

[BUG] Node usage not being updated in the cluster node list rancher#53151

Analyzing the case, I noticed that setting the flag features=ui-sql-cache=false made the metrics collection work as expected.
In summary, the thing is that NodeMetrics is not natively compatible with the watch verb, meaning its cache will never be updated after Steve starts.

root@cp:/home/rancher# kc api-resources -o wide | grep NodeMetrics
nodes                                                              metrics.k8s.io/v1beta1                    false        NodeMetrics                                 get,list

Similarly, the "Restarts" data for the pod will also not be updated once it is cached, unless there is a modification to the pod itself.

tomleb · 2026-01-07T15:06:50Z

Hi @yulken thanks for looking into that. There should be a "synthetic watcher" somewhere in steve that essentially polls the data for non-watchable APIs. I vaguely remember it not working correctly but have forgotten about it. That would fix the NodeMetrics issue.

As for Pod restart column, I remember it being mentioned somewhere, not sure if it's your issue or some other issue 🤔, I'll have to check. To fix this, we're likely going to want to extract column data that we receive from the api server and store it differently. Basically, same thing as we do with the other duration columns.

We shouldn't bypass the cache since the new filters / sorting functionality depends on it.

tomleb · 2026-01-07T15:17:53Z

There we go, found it: rancher/rancher#52217

yulken · 2026-01-07T15:46:24Z

Thank you @tomleb. I'll investigate this synthetic watcher.

yulken · 2026-01-15T14:57:52Z

@tomleb I found the root cause for this issue:
We use resource version as comparison parameter to determine if we need to update previous data¹.

The thing is that NodeMetrics version is always an empty string, so we should use a different attribute to compare this.

I tried comparing timestamps specifically for NodeMetrics and it seems to work in my local environment.
I think we can reach a better solution, but I'm not sure how.
Do you have any advice for me?

https://github.com/rancher/steve/blob/main/pkg/sqlcache/informer/synthetic_watcher.go#L86C1

Ps.: I also added .vscode to .gitignore.

tomleb

The solution to fix it for NodeMetrics makes sense.

Can you add at least a test?

I'd like to check with UI (@richard-cox) on how we should tackle this to support HA (aka when Rancher has multiple replicas). Without a resourceVersion, the logic where we avoid "stale" data via revision=X doesn't really work.

Are we using mode=resources.changes with NodeMetrics?

pkg/sqlcache/informer/synthetic_watcher.go

richard-cox · 2026-01-26T10:52:21Z

@tomleb which specific resource are we talking about?

metrics.k8s.io podmetrics
- not used in the UI
metrics.k8s.io nodemetrics
- we use the schema for this
- The schema does not report WATCH though, so we don't watch. Instead we poll (every 30 seconds).
- The schema reports the resource to LIST and GET as a different resource metrics.k8s.io nodes
metrics.k8s.io nodes
- We use this type for GET nodemetrics requests given above (in node lists and detail pages)
- The GET request does not supply a revision (because it doesn't come from a watch with one)
- Given that i think replica cache syncs shouldn't be longer than 30s in theory worst case we show 59 second old data (ui polls for data, a change happens after a second, ui polls for data from stale cache not containing change after 29 seconds, ui polls for data after 30 seconds - at which point the change should be in all replicas)?

tomleb · 2026-01-27T16:17:59Z

The schema reports the resource to LIST and GET as a different resource metrics.k8s.io nodes

@richard-cox

TIL, didn't know this was a thing. Why do we do that? I see the schema also reports nodes instead of nodemetrics.

You're right that in theory the data wouldn't be stale for too long, might always be 1 "state" behind. Are we okay with this? What about the general case?

richard-cox · 2026-01-27T16:36:28Z

Why do we do that?

Not quite sure. the resources themselves look exactly the same as well

Are we okay with this? What about the general case?

Until it becomes an issue i say we stick with what we have. In terms of the general case how do you mean?

tomleb · 2026-01-27T16:59:27Z

In terms of the general case how do you mean?

There might be other resources coming from Extensions API server that will also be list-able but not watch-able. Maybe we'd want to handle the general case now. Though I'm tempted to go with "Until it becomes an issue i say we stick with what we have." for that as well. So maybe let's just "wait-and-see" in that case.

tomleb

LGTM

yulken requested a review from a team as a code owner January 7, 2026 13:17

yulken force-pushed the fix/list-metrics-cache-issues branch from db5b321 to 1a82b5a Compare January 15, 2026 14:52

yulken marked this pull request as draft January 15, 2026 15:08

tomleb reviewed Jan 26, 2026

View reviewed changes

pkg/sqlcache/informer/synthetic_watcher.go Show resolved Hide resolved

fix: nodemetric version comparison

1ebfeaf

yulken force-pushed the fix/list-metrics-cache-issues branch from 1a82b5a to 1ebfeaf Compare January 28, 2026 14:33

yulken marked this pull request as ready for review January 28, 2026 15:50

yulken requested a review from tomleb January 28, 2026 15:51

yulken changed the title ~~fix: bypass cache queries for nodemetrics/podmetrics~~ fix: nodemetric version comparison Jan 28, 2026

yulken self-assigned this Jan 28, 2026

tomleb approved these changes Feb 12, 2026

View reviewed changes

yulken merged commit 89138d2 into rancher:main Feb 19, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: nodemetric version comparison#969

fix: nodemetric version comparison#969
yulken merged 1 commit intorancher:mainfrom
yulken:fix/list-metrics-cache-issues

yulken commented Jan 7, 2026 •

edited

Loading

Uh oh!

tomleb commented Jan 7, 2026

Uh oh!

tomleb commented Jan 7, 2026

Uh oh!

yulken commented Jan 7, 2026

Uh oh!

yulken commented Jan 15, 2026

Uh oh!

tomleb left a comment

Uh oh!

Uh oh!

richard-cox commented Jan 26, 2026 •

edited

Loading

Uh oh!

tomleb commented Jan 27, 2026

Uh oh!

richard-cox commented Jan 27, 2026

Uh oh!

tomleb commented Jan 27, 2026

Uh oh!

tomleb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yulken commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomleb commented Jan 7, 2026

Uh oh!

tomleb commented Jan 7, 2026

Uh oh!

yulken commented Jan 7, 2026

Uh oh!

yulken commented Jan 15, 2026

Uh oh!

tomleb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richard-cox commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomleb commented Jan 27, 2026

Uh oh!

richard-cox commented Jan 27, 2026

Uh oh!

tomleb commented Jan 27, 2026

Uh oh!

tomleb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yulken commented Jan 7, 2026 •

edited

Loading

richard-cox commented Jan 26, 2026 •

edited

Loading