Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid going back in time in Reflector relist (revived) #83520

Merged
merged 4 commits into from
Nov 7, 2019

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Oct 5, 2019

(This is a based on #67998. It applies the fix @liggitt suggested in #67998 (comment), also addresses component restart and test coverage.)

Fixes the relist part of #59848

Update reflectors to relist with resourceVersion="{newest resource version observed}", falling back to resourceVersion="" if an HTTP 410 (Gone) status code is returned by the relist request. This ensures the reflector never "goes back in time" while running.

The pager was also modified to only fallback to a full list if an "Expired" error is returned on page 2 or later. This is to avoid needless reattempts of "too old" requests. See comment in code explaining rationale.

/sig api-machinery
/kind bug
/priority important-soon
@liggitt @wojtek-t @smarterclayton

When the go-client reflector relists, the ResourceVersion list option is set to the reflector's latest synced resource version to ensure the reflector does not "go back in time" and reprocess events older than it has already processed. If the the server responds with an HTTP 410 (Gone) status code response, the relist falls back to using resourceVersion="".

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 5, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 5, 2019
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 5, 2019

/test pull-kubernetes-integration

@jennybuckley
Copy link

/cc @yliaog

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 7, 2019

This is causing real issues with reflector ListAndWatch in TestCRDDeletionCascading. Investigating.

@jpbetz jpbetz changed the title Avoid going back in time in a single run of Reflector (revived) Avoid going back in time in Reflector (revived) Oct 9, 2019
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 9, 2019

Code updated to handle HTTP 410 (Gone) status code responses correctly and to start with an initial list using resourceVersion="". I need to update the tests still.

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 10, 2019

Test failures should all be fixed now. This is ready for review.

@smarterclayton, this should solve both of the problems that caused stale reads

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. area/dependency Issues or PRs related to dependency changes and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Nov 6, 2019
@jpbetz
Copy link
Contributor Author

jpbetz commented Nov 6, 2019

Test was failing because it expected Gone error responses and we've modified the server to always return ResourceExpired. I've updated the test to expect ResourceExpired instead.

@liggitt
Copy link
Member

liggitt commented Nov 6, 2019

Looks like the _output_tests folder accidentally got committed

edit: actually, looks like a bunch of random changes got added under vendor

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpbetz
Copy link
Contributor Author

jpbetz commented Nov 6, 2019

Looks like the _output_tests folder accidentally got committed

edit: actually, looks like a bunch of random changes got added under vendor

heh, git failed me. That was a from a git add of a single file. The fun part was after that:

$ git stash
[1]    133831 segmentation fault  git stash

had to back out the commit and nuke my .git to restore things.

@wojtek-t
Copy link
Member

wojtek-t commented Nov 6, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 6, 2019
@liggitt
Copy link
Member

liggitt commented Nov 7, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2019
@k8s-ci-robot k8s-ci-robot merged commit 8ed2f47 into kubernetes:master Nov 7, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 7, 2019
jpbetz added a commit to jpbetz/kubernetes that referenced this pull request Jan 2, 2020
jpbetz added a commit to jpbetz/kubernetes that referenced this pull request Jan 3, 2020
jpbetz added a commit to jpbetz/kubernetes that referenced this pull request Jan 3, 2020
k8s-ci-robot added a commit that referenced this pull request Jan 6, 2020
1.17 Fix: Revert reflector changes of PR #83520 to fix #86483 #86791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/dependency Issues or PRs related to dependency changes area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants