Skip to content

Fix bug in reflector not recovering from "Too large resource version"…#92537

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
wojtek-t:fix_reflector_not_making_progress
Jul 1, 2020
Merged

Fix bug in reflector not recovering from "Too large resource version"…#92537
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
wojtek-t:fix_reflector_not_making_progress

Conversation

@wojtek-t

Copy link
Copy Markdown
Member

Ref #91073

Fix bug in reflector that couldn't recover from "Too large resource version" errors

/kind bug

@wojtek-t wojtek-t added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 26, 2020
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 26, 2020
@k8s-ci-robot k8s-ci-robot added area/apiserver kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 26, 2020

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout error is no longer checked ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's checked as part of IsTooLargeResourceVersion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems a bit strange to me... timeout reason and TooLargeResourceVersion cause seem orthogonal to me

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@fejta-bot

Copy link
Copy Markdown

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@wojtek-t

Copy link
Copy Markdown
Member Author

/retest

@tedyu

tedyu commented Jun 26, 2020

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2020
Comment on lines 569 to 571

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little strange... why does a CauseTypeResourceVersionTooLarge always have to be a timeout?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is how we create the "TooLargeResourceVersion" error.

Do you suggest dropping it and leaving it rely only on the cause?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - I played a bit more with the PR and removed this part completely. The cause type being the official type seems to be enough for the purpose of this PR and makes it easier to cherrypick.

Comment on lines 572 to 578

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be simplified to this:

return HasStatusCause(metav1.CauseTypeResourceVersionTooLarge)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - didn't know about this function

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also continue setting the message?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to backport this, I would probably leave this method here and delegate to apierrors.IsTooLargeResourceVersion(err) in old releases, rather than remove an exported method

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 876 to 877

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resource version is coming from the future" is confusing... consider rephrasing to something like

CauseTypeResourceVersionTooLarge is used to report that that requested resource version is newer than the data observed by the API server, so the request cannot be served.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the docs below, and the docs for setIsLastSyncResourceVersionExpired misleading, since we're setting that in cases other than for expired errors. Should we rename setIsLastSyncResourceVersionExpired to setIsLastSyncResourceVersionUnavailable or something that can make sense for both expired and toolarge errors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense - done

@wojtek-t wojtek-t force-pushed the fix_reflector_not_making_progress branch from b876884 to f65f961 Compare June 30, 2020 06:16
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2020

@wojtek-t wojtek-t left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt - thanks for the review; PTAL

Comment on lines 876 to 877

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - done

Comment on lines 572 to 578

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - didn't know about this function

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense - done

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Comment on lines 569 to 571

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is how we create the "TooLargeResourceVersion" error.

Do you suggest dropping it and leaving it rely only on the cause?

@wojtek-t wojtek-t force-pushed the fix_reflector_not_making_progress branch from f65f961 to 84cb34c Compare June 30, 2020 07:02
@wojtek-t wojtek-t force-pushed the fix_reflector_not_making_progress branch from 84cb34c to 3704174 Compare June 30, 2020 07:12

@wojtek-t wojtek-t left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt - PTAL

Comment on lines 569 to 571

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - I played a bit more with the PR and removed this part completely. The cause type being the official type seems to be enough for the purpose of this PR and makes it easier to cherrypick.

@wojtek-t

Copy link
Copy Markdown
Member Author

/retest

@liggitt

liggitt commented Jun 30, 2020

Copy link
Copy Markdown
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2020
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants