Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return error from watcher when kinds do not support watch #1101

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

clux
Copy link
Member

@clux clux commented Dec 7, 2022

A quick fix for #1092. This should just cause the watcher to spin infinitely (like in the case where the crd isn't installed and people try to watch there). It's not 100% ideal, as an explicit crash might be better for many users to be seen early.

sidenote: A previously considered alternative we have thought about is to look at having watcher separate between recoverable and irrecoverable errors (e.g. a resource not being installed or support watch). The problem between trying to do that separation is that those classes are not static; irrecoverably may become recoverable due to external factors (a crd might become installed, a crd might be upgraded to support watch in a new release).

This PR avoids crashing user code without their knowledge (something users generally hate), but it will also potentially not work, get successfully deployed (replacing an older version in a rolling upgrade due to no crashloop), but will just spin-error continuously without actually doing anything. Then again, it's not like an earlier version will have worked with watch, because if a resource does not support watch, it did not support watch before we started error handling.

Passing an error here does give the whole system a (slim) chance to fix itself if a simultaneous upgrade to a resource (adding watch support for the resource) gets deployed. So it's in-line with standard "eventual consistency" ideals. The spin-lock problem is also somewhat mitigated with backoff support.

Closes #1092

Signed-off-by: clux <sszynrae@gmail.com>
@clux clux added this to the 0.77.0 milestone Dec 7, 2022
@clux clux added the changelog-fix changelog fix category for prs label Dec 7, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 7, 2022

Codecov Report

Merging #1101 (50fcf74) into main (edd1121) will decrease coverage by 0.00%.
The diff coverage is 80.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1101      +/-   ##
==========================================
- Coverage   72.66%   72.65%   -0.01%     
==========================================
  Files          65       65              
  Lines        4832     4835       +3     
==========================================
+ Hits         3511     3513       +2     
- Misses       1321     1322       +1     
Impacted Files Coverage Δ
kube-runtime/src/watcher.rs 51.25% <80.00%> (+0.60%) ⬆️

kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
Co-authored-by: teozkr <teo.roijezon@stackable.de>
Signed-off-by: Eirik A <sszynrae@gmail.com>
@clux
Copy link
Member Author

clux commented Dec 9, 2022

Considered writing a test for this, didn't feel super helpful. Behaviour can be verified in a cluster with pod/node-metrics via:

GROUP=metrics.k8s.io VERSION=v1beta1 KIND=NodeMetrics cargo run --example dynamic_watcher

which unwraped before and now prints the error and exits. so this should be a straight improvement in the standard watcher case where people use the normal TryStream pattern.

@clux clux merged commit cde4530 into main Dec 9, 2022
@clux clux deleted the watcher-handle-no-rv branch December 9, 2022 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-fix changelog fix category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when using watcher with a resource that does not expose a resourceVersion
3 participants