-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in reflector not recovering from "Too large resource version"… #92537
Fix bug in reflector not recovering from "Too large resource version"… #92537
Conversation
@@ -356,10 +356,7 @@ func TestTooLargeResourceVersionList(t *testing.T) { | |||
|
|||
result := &example.PodList{} | |||
err = cacher.List(context.TODO(), "pods/ns", storage.ListOptions{ResourceVersion: listRV, Predicate: storage.Everything}, result) | |||
if !errors.IsTimeout(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout error is no longer checked ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's checked as part of IsTooLargeResourceVersion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems a bit strange to me... timeout reason and TooLargeResourceVersion cause seem orthogonal to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/retest |
/lgtm |
if !IsTimeout(err) { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little strange... why does a CauseTypeResourceVersionTooLarge always have to be a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is how we create the "TooLargeResourceVersion" error.
Do you suggest dropping it and leaving it rely only on the cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - I played a bit more with the PR and removed this part completely. The cause type being the official type seems to be enough for the purpose of this PR and makes it easier to cherrypick.
if status := APIStatus(nil); errors.As(err, &status) && status.Status().Details != nil { | ||
for _, cause := range status.Status().Details.Causes { | ||
if cause.Type == metav1.CauseTypeResourceVersionTooLarge { | ||
return true | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be simplified to this:
return HasStatusCause(metav1.CauseTypeResourceVersionTooLarge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice - didn't know about this function
// NewTooLargeResourceVersionError returns a timeout error with the given retrySeconds for a request for | ||
// a minimum resource version that is larger than the largest currently available resource version for a requested resource. | ||
func NewTooLargeResourceVersionError(minimumResourceVersion, currentRevision uint64, retrySeconds int) error { | ||
err := errors.NewTimeoutError(fmt.Sprintf("Too large resource version: %d, current: %d", minimumResourceVersion, currentRevision), retrySeconds) | ||
err.ErrStatus.Details.Causes = []metav1.StatusCause{{Message: tooLargeResourceVersionCauseMsg}} | ||
err.ErrStatus.Details.Causes = []metav1.StatusCause{{Type: metav1.CauseTypeResourceVersionTooLarge}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also continue setting the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return err | ||
} | ||
|
||
// IsTooLargeResourceVersion returns true if the error is a TooLargeResourceVersion error. | ||
func IsTooLargeResourceVersion(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to backport this, I would probably leave this method here and delegate to apierrors.IsTooLargeResourceVersion(err)
in old releases, rather than remove an exported method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// CauseTypeResourceVersionTooLarge is used to report that resource version is coming | ||
// from the future and request cannot be served. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"resource version is coming from the future" is confusing... consider rephrasing to something like
CauseTypeResourceVersionTooLarge is used to report that that requested resource version is newer than the data observed by the API server, so the request cannot be served.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - done
@@ -288,7 +288,7 @@ func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error { | |||
} | |||
|
|||
list, paginatedResult, err = pager.List(context.Background(), options) | |||
if isExpiredError(err) { | |||
if isExpiredError(err) || apierrors.IsTooLargeResourceVersion(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the docs below, and the docs for setIsLastSyncResourceVersionExpired
misleading, since we're setting that in cases other than for expired errors. Should we rename setIsLastSyncResourceVersionExpired
to setIsLastSyncResourceVersionUnavailable
or something that can make sense for both expired and toolarge errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense - done
b876884
to
f65f961
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt - thanks for the review; PTAL
// CauseTypeResourceVersionTooLarge is used to report that resource version is coming | ||
// from the future and request cannot be served. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - done
if status := APIStatus(nil); errors.As(err, &status) && status.Status().Details != nil { | ||
for _, cause := range status.Status().Details.Causes { | ||
if cause.Type == metav1.CauseTypeResourceVersionTooLarge { | ||
return true | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice - didn't know about this function
// NewTooLargeResourceVersionError returns a timeout error with the given retrySeconds for a request for | ||
// a minimum resource version that is larger than the largest currently available resource version for a requested resource. | ||
func NewTooLargeResourceVersionError(minimumResourceVersion, currentRevision uint64, retrySeconds int) error { | ||
err := errors.NewTimeoutError(fmt.Sprintf("Too large resource version: %d, current: %d", minimumResourceVersion, currentRevision), retrySeconds) | ||
err.ErrStatus.Details.Causes = []metav1.StatusCause{{Message: tooLargeResourceVersionCauseMsg}} | ||
err.ErrStatus.Details.Causes = []metav1.StatusCause{{Type: metav1.CauseTypeResourceVersionTooLarge}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return err | ||
} | ||
|
||
// IsTooLargeResourceVersion returns true if the error is a TooLargeResourceVersion error. | ||
func IsTooLargeResourceVersion(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -288,7 +288,7 @@ func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error { | |||
} | |||
|
|||
list, paginatedResult, err = pager.List(context.Background(), options) | |||
if isExpiredError(err) { | |||
if isExpiredError(err) || apierrors.IsTooLargeResourceVersion(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense - done
@@ -356,10 +356,7 @@ func TestTooLargeResourceVersionList(t *testing.T) { | |||
|
|||
result := &example.PodList{} | |||
err = cacher.List(context.TODO(), "pods/ns", storage.ListOptions{ResourceVersion: listRV, Predicate: storage.Everything}, result) | |||
if !errors.IsTimeout(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
if !IsTimeout(err) { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is how we create the "TooLargeResourceVersion" error.
Do you suggest dropping it and leaving it rely only on the cause?
f65f961
to
84cb34c
Compare
84cb34c
to
3704174
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt - PTAL
if !IsTimeout(err) { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - I played a bit more with the PR and removed this part completely. The cause type being the official type seems to be enough for the purpose of this PR and makes it easier to cherrypick.
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…37-upstream-release-1.18 Automated cherry pick of #92537 upstream release 1.18
Ref #91073
/kind bug