Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List requests with limit and resourceVersion return incorrect metadata.resourceVersion in response #83651

Open
jpbetz opened this issue Oct 9, 2019 · 10 comments

Comments

@jpbetz
Copy link
Contributor

commented Oct 9, 2019

List responses return the most recent avilable metadata.resourceVersion for the resource at the time the list response was constructed:

curl -s "localhost:8082/api/v1/pods?resourceVersion=0" | jq .metadata
{
  "selfLink": "/api/v1/pods",
  "resourceVersion": "303"
}
curl -s "localhost:8082/api/v1/pods?resourceVersion=303" | jq .metadata
{
  "selfLink": "/api/v1/pods",
  "resourceVersion": "303"
}
curl -s "localhost:8082/api/v1/pods?resourceVersion=5" | jq .metadata
{
  "selfLink": "/api/v1/pods",
  "resourceVersion": "303"
}

But if limit and resourceVersion are both, the exact resourceVersion requested is instead returned.

curl -s "localhost:8082/api/v1/pods?resourceVersion=5&limit=10" | jq .metadata
{
  "selfLink": "/api/v1/pods",
  "resourceVersion": "5"
}

This is what is causing TestCRDDeletionCascading to fail for #83520. The PR changes reflector relist to leverage "minimum resource version" semantics to prevent stale reads, but since pagination is used, it gets "exact resource version" semantics. This can result in that the reflector getting stuck at a particular resourceVersion with no way of making progress. This did not happen prior to this PR because resourceVersion=0&limit=x is handled with minimum resource version semantics:

curl -s "localhost:8082/api/v1/pods?resourceVersion=0&limit=10" | jq .metadata
{
  "selfLink": "/api/v1/pods",
  "resourceVersion": "303"
}

/sig api-machinery

cc @smarterclayton @liggitt

@liggitt

This comment has been minimized.

Copy link
Member

commented Oct 9, 2019

Isn't that capability what allows continue tokens to work?

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

Yes, continue tokens have to bypass watch cache.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

the watch cache (at least today) can't mimic the behavior of viewing at an exact version with the appropriate limit and thus must drop down to etcd.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

but I don't think when that happens we have to honor exact rv - we can use min. The continue token has an embedded RV which should conflict with user specified RV in any case

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

it's too late at night for me to reason about this

@jpbetz

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2019

Rewrote the description to more clearly describe what is going on.

@jpbetz

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2019

/assign

@liggitt

This comment has been minimized.

Copy link
Member

commented Oct 9, 2019

This can result in that the reflector getting stuck at a particular resourceVersion with no way of making progress.

Can you give more details about why it can't make progress? I expected one of the following paths if the client lists at rv=10,limit=100:

  • rv=10 is not compacted and current: all is well
  • rv=10 is not compacted, behind current, but there are no watch events for that resource after rv=10:
    • client lists, gets content, starts watch at rv=10, gets no watch events, eventually watch drops, client rewatches at rv=10:
      • still not compacted, all is well
      • compacted, gets 410 Gone, tries to relist at rv=10,limit=100, also gets 410 Gone, falls back to relisting with rv=,limit=100
  • rv=10 is not compacted, behind current, and there are watch events for that resource after rv=10:
    • client lists, gets content, starts watch at rv=10, gets a watch event at rv=11, updates lastSync, eventually watch drops, client rewatches at rv=11
      • still not compacted, all is well
      • compacted, gets 410 Gone, tries to relist at rv=11,limit=100, also gets 410 Gone, falls back to relisting with rv=,limit=100
@liggitt

This comment has been minimized.

Copy link
Member

commented Oct 9, 2019

This can result in that the reflector getting stuck at a particular resourceVersion with no way of making progress.

Can you give more details about why it can't make progress?

spoke offline, debug output of the reflector in the failing test was at https://gist.github.com/jpbetz/f7b8660dec1eb073d304eb59fcee8ff6

this showed an unexpected 404 response to list calls:

...
Failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
Listing and watching *v1.PartialObjectMetadata,
list with pager for RV=69
Failed to list *v1.PartialObjectMetadata: the server could not find the requested resource

need to dig into that response more and figure out the cause

@jpbetz also pointed out that when mixing requests that punch through the watch cache (like LIST calls with limit=...&resourceVersion=... params) and ones that are satisfied by the watch cache (like WATCH with a resourceVersion=... param), the window of acceptable resourceVersions might not overlap perfectly.

The current etcd store behavior of returning the exact requested version to a LIST could return a resourceVersion that is not new enough for the watch cache to serve a watch from. For example, with a watch cache size of 100:

  • rv=10 has not been compacted yet, so LIST rv=10,limit=10 punches through to etcd, succeeds, and returns rv=10
  • 101 events have occurred since rv=10, so the watch cache can only serve watches starting at 11, so establishing a watch at rv=10 returns 410 Gone

(this is pre-existing in 1.16)

@jpbetz

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2019

Thanks for summarizing @liggitt

The test does delete the CRD being watched. I think this explains the "the server could not find the requested resource" responses (LIST CR after CRD is deleted returns 404).

I'll test having the reflector handle 410 Gone watch responses by relisting with RV="".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.