Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Watching without resourceVersion fails #13809

Closed
stefwalter opened this issue Sep 10, 2015 · 43 comments
Closed

Regression: Watching without resourceVersion fails #13809

stefwalter opened this issue Sep 10, 2015 · 43 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Milestone

Comments

@stefwalter
Copy link
Contributor

At some point after kubernetes 1.0 watching without a resourceVersion has broken. This is a v1 API regression.

@stefwalter
Copy link
Contributor Author

Current behavior against today's master 6a5049f

$ curl 'http://127.0.0.1:8080/api/v1/watch/nodes'
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "too old resource version: 0 (17)",
  "code": 500
}

Expected behavior:

$ curl 'http://127.0.0.1:8080/api/v1/watch/nodes'
{"type":"MODIFIED","object":{"kind":"Node","apiVersion":"v1","metadata":{"name":"127.0.0.1","selfLink":"/api/v1/nodes/127.0.0.1","uid":"88c574f5-57d2-11e5-a9ef-5254009e0000","resourceVersion":"14","creationTimestamp":"2015-09-10T15:42:25Z","labels":{"kubernetes.io/hostname":"127.0.0.1"}},"spec":{"externalID":"127.0.0.1"},"status":{"capacity":{"cpu":"1","memory":"1017136Ki","pods":"40"},"conditions":[{"type":"Ready","status":"True","lastHeartbeatTime":"2015-09-10T15:42:25Z","lastTransitionTime":"2015-09-10T15:42:25Z","reason":"kubelet is posting ready status"}],"addresses":[{"type":"LegacyHostIP","address":"127.0.0.1"},{"type":"InternalIP","address":"127.0.0.1"}],"nodeInfo":{"machineID":"626b7b7fa48a4472b974b1aea92163c1","systemUUID":"","bootID":"7b7ee651-8880-4d83-8816-8fd27e3d8a3d","kernelVersion":"4.1.6-200.fc22.x86_64","osImage":"Fedora 22 (Twenty Two)","containerRuntimeVersion":"docker://1.7.1.fc22","kubeletVersion":"v1.1.0-alpha.0.1588+e44c8e6661c931","kubeProxyVersion":"v1.1.0-alpha.0.1588+e44c8e6661c931"}}}}
^C

@stefwalter
Copy link
Contributor Author

@stefwalter
Copy link
Contributor Author

@liggitt @derekwaynecarr CC

@derekwaynecarr
Copy link
Member

Looking to see if I can find where the regression occurred...

@mikedanese mikedanese added kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Sep 10, 2015
@derekwaynecarr
Copy link
Member

This issue appears to be related to introducing the watch_cache

https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/watch_cache.go#L253

fyi @wojtek-t

@derekwaynecarr
Copy link
Member

I think the fix is to change

https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/watch_cache.go#L253

and special case for resourceVersion==0 scenario

func (w *watchCache) GetAllEventsSinceThreadUnsafe(resourceVersion uint64) ([]watchCacheEvent, error) {
    size := w.endIndex - w.startIndex
    oldest := w.resourceVersion
    if resourceVersion == uint64(0) {
        resourceVersion = oldest
    }

    if size > 0 {
        oldest = w.cache[w.startIndex%w.capacity].resourceVersion
    }
    if resourceVersion < oldest {
        return nil, fmt.Errorf("too old resource version: %d (%d)", resourceVersion, oldest)
    }

@wojtek-t
Copy link
Member

Yes - I talked with @lavalamp and he said that we don't want to support it. If we want, I'm happy to fix that (it should be few lines of code).

@wojtek-t
Copy link
Member

or even you can take sth like "newest" (instead of oldest) - I think that "oldest" might be too old.

@derekwaynecarr
Copy link
Member

@wojtek-t - the code I outlined above works in my tests (Cockpit UI is now happy)... what do you mean by we do not want to support it? It seems like we should support absence of a resource version in the same manner as etcd, no? In the snippet above, oldest defaults to w.resourceVersion which I thought is last version it has seen in the cache?

@wojtek-t
Copy link
Member

"oldest" is kind of "first" (it's the oldest). So yes - it works, but it might be too old (so I would do the newest one).

@wojtek-t
Copy link
Member

by "too old" I mean that we may unnecessary read to many data, because we're mostly interested in the "~ current" state

@derekwaynecarr
Copy link
Member

I must be dense, but I read this as meaning current state:

https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/watch_cache.go#L74

And oldest defaults to w.resourceVersion where I did the check.

If you can give an alternate fix I would be happy to review.

it looks like cache_test.go complains when we special case resourceVersion==0 so it ends up being about 10 lines of code ;-)

@wojtek-t
Copy link
Member

I'm OOO tomorrow, so I'm happy with your fix for now. I can change it on Monday,

@derekwaynecarr
Copy link
Member

Ok, will open PR.

@wojtek-t
Copy link
Member

[if that's urgent]

@lavalamp
Copy link
Member

@stefwalter How were you using that "feature"? I don't think it's possible to use it safely?

@lavalamp
Copy link
Member

IMO, it was broken in v1 that you could do that at all.

@liggitt
Copy link
Member

liggitt commented Sep 10, 2015

I wondered about that... would you get the full history of the resource when watching with no resource version?

@lavalamp
Copy link
Member

We can't give the full history, and you have no idea at what point in the history we'd start. Certainly no automated system should be watching without a defined starting point.

@wojtek-t
Copy link
Member

In "v1" there is no guarantee what will be returned if pass resourceVersion=0 (it basically starts with "now" which is not very well defined).

@derekwaynecarr
Copy link
Member

@lavalamp - my understanding is that watch with no resource versoin should be equivalent to get me latest state, not full history.

@stefwalter
Copy link
Contributor Author

Yes, Cockpit gets the latest state with /watch and then keeps the connection open to receive updates. It only uses resourceVersion=X when the /watch disconnects, and it needs to pick up the /watch=resourceVersion=X from where it left off.

To summarize, we're not looking for history. We're looking for latest state ... and then wait for updates behavior.

This is in use against kubernetes 1.0 in released products. Hence this is a regression.

@lavalamp
Copy link
Member

@stefwalter You should list first. Otherwise you have no way of knowing when you end up with a complete state.

@derekwaynecarr
Copy link
Member

@lavalamp - it seems like there is a difference in calling

/watch/api/v1/nodes

vs

/watch/api/v1/nodes?resourceVersion=0

I think absence of a resourceVersion should default to latest state.

It seems sensible to me to let a client make one call instead of two.

@stefwalter
Copy link
Contributor Author

You should list first. Otherwise you have no way of knowing when you end up with a complete state.

Cockpit is a UI that tracks the state of Kubernetes live on the fly. We have no need to determine when we have a "complete" state. In fact on a moderatly active kubernetes cluster ... it is meaningless to talk about a "complete" state.

@stefwalter
Copy link
Contributor Author

To clarify: kubernetes 1.0 would list all objects with MODIFIED at the start of a /watch without resourceVersion.

@lavalamp
Copy link
Member

@derekwaynecarr Yeah, IMO /watch/api/v1/nodes should return a bad request status code, and /watch/api/v1/nodes?resourceVersion=0 should ordinarily give you the out of window error, but it will work for the first few seconds of a cluster's lifecycle.

@lavalamp
Copy link
Member

@stefwalter Are you using this for single objects or multiple objects?

@stefwalter
Copy link
Contributor Author

Are you using this for single objects or multiple objects?

Multiple simultaneous watches each for multiple objects like /api/v1/watch/nodes or /api/v1/watch/namespaces/default/pods

@lavalamp
Copy link
Member

If your UI is view-only, it probably doesn't matter. But if you offer operations like "do x to all the pods" or "do X with all the pods" or "send all the pods to the frobber"-- you basically have no way of safely doing those operations, at least not with the objects you got from the watch.

I am very reluctant to have us do things to enable the old behavior to continue, because the above error is very subtle and I don't want to keep it easy for people to misuse.

@bgrant0607 Must we continue to support resourceVersion-less watches? Do you consider that a breaking change to v1? I consider it an undocumented feature that we ran out of time to remove (aka bug) and therefore think that it should be acceptable to remove from v1.

@smarterclayton may also have an opinion.

@liggitt
Copy link
Member

liggitt commented Sep 10, 2015

you basically have no way of safely doing those operations

Any client doing write operations already has to handle cases where the state of the world changed between the time they received a watch event and when they made another API call acting on the watch event's object. If they are handling NotFound and Conflict errors correctly, how would this be unsafe?

@stefwalter
Copy link
Contributor Author

Yes @liggitt, a client's best hope is that their view will eventually converge. Short of kubernetes providing a way to 'lock the cluster' (heh) there never a way to know you're seeing or acting on all pods or other any other sort of item.

In any distributed environment, the aim is to get things to converge and not to imagine that things are consistent at any given point.

The v1 watch behavior allows us to do half the number of HTTP requests, and have a simpler client. For example, instead of 30 requests, we only do 15. Remember that Kubernetes does not provide a way to watch multiple kinds in a single watch, therefore the number of watches a client will have open can be quite high.

Lastly, even if kubernetes chooses to change the v1 watch behavior in a v2 API (which I would be against due to the reasons above) ... it's still part of the v1 behavior. Kubernetes shouldn't regress the v1 API or consumers of that API.

@lavalamp
Copy link
Member

@liggitt

If they are handling NotFound and Conflict errors correctly, how would this be unsafe?

You can't even initiate operations on objects you don't know about. I'm surprised you're making this argument, because you're the one that ran into this problem: #8494

@liggitt
Copy link
Member

liggitt commented Sep 10, 2015

You can't even initiate operations on objects you don't know about

Different use cases, depending on what you are doing with the objects in the watch stream. My use case involved building a store and querying against it, so knowing when the initial list was complete mattered. If you're just processing one object at a time, the watch stream is enough... when you get a watch event about a resource, you know about it then.

@lavalamp
Copy link
Member

@stefwalter Even in eventually-consistent situations, there's still a need to deterministically act on every object in a collection, process every change, or update some downstream state.

You can't know that you've e.g. "acted on every pod", because that's not well defined. But you can know that you've "acted on every pod as of revision 2348".

@smarterclayton
Copy link
Contributor

I think it's a breaking change - it was definitely an intentional part of
the API, and we never announced deprecation or discussed removing it.

It's not the perfect pattern for everyone, and it can be emulated in the
fashions, but for simple clients and simple integrations this behavior is
much more useful.

On Sep 10, 2015, at 3:19 PM, Daniel Smith notifications@github.com wrote:

If your UI is view-only, it probably doesn't matter. But if you offer
operations like "do x to all the pods" or "do X with all the pods" or "send
all the pods to the frobber"-- you basically have no way of safely doing
those operations, at least not with the objects you got from the watch.

I am very reluctant to have us do things to enable the old behavior to
continue, because the above error is very subtle and I don't want to keep
it easy for people to misuse.

@bgrant0607 https://github.com/bgrant0607 Must we continue to support
resourceVersion-less watches? Do you consider that a breaking change to v1?
I consider it an undocumented feature that we ran out of time to remove
(aka bug) and therefore think that it should be acceptable to remove from
v1.

@smarterclayton https://github.com/smarterclayton may also have an
opinion.


Reply to this email directly or view it on GitHub
#13809 (comment)
.

@derekwaynecarr
Copy link
Member

I just think I expect this to map to etcd watch mental model.

If I do

curl -L http://127.0.0.1:4001/v2/keys/?recursive=true&wait=true

I get the equivalent of LIST(currentstate)+(wait for updates)

When we moved to watch_cache.go pattern, we broke this pattern. Seems like a bug.

@lavalamp
Copy link
Member

OK. I still think it's dangerous, but I think I'm outvoted here.

@wojtek-t
Copy link
Member

I agree with @lavalamp that it's dangerous.
However, I see @smarterclayton point that it's breaking change that we didn't announce thus I think we should fix it. I will send a PR.

@wojtek-t
Copy link
Member

#13910 sent out for review

@derekwaynecarr
Copy link
Member

I want to augment my previous stmts:

etcd:

curl -L 'http://127.0.0.1:4001/v2/keys/registry/pods?recursive=true&wait=true'

Response ONLY includes modifications made AFTER request was initiated.

kube-1.0.6

curl -L -k -u vagrant:vagrant 'https://10.245.1.2/api/v1/namespaces/default/pods?watch=true'

Response has MODIFIED records for existing resources made BEFORE request was initiated, and then all subsequent WATCH events after request was initiated

Call-chain ultimately leads this path
https://github.com/kubernetes/kubernetes/blob/v1.0.6/pkg/registry/generic/etcd/etcd.go#L486
https://github.com/kubernetes/kubernetes/blob/v1.0.6/pkg/tools/etcd_helper_watch.go#L76
https://github.com/kubernetes/kubernetes/blob/v1.0.6/pkg/tools/etcd_helper_watch.go#L193

to where we do a LIST and then convert the initial data in the path to a list of MODIFIED events
https://github.com/kubernetes/kubernetes/blob/v1.0.6/pkg/tools/etcd_helper_watch.go#L220

kube-1.1

curl -L -k -u vagrant:vagrant 'https://10.245.1.2/api/v1/namespaces/default/pods?watch=true'

Error

Summary
So it seems that we do more than basic etcd watch pattern, and that kube-api itself was doing the LIST+WATCH pattern for the user when resourceVersion==0

Seems for v1 API if we want to preserve backward compatibility, we need to do the same.

@bgrant0607 bgrant0607 added this to the v1.1 milestone Sep 14, 2015
@bgrant0607 bgrant0607 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 14, 2015
@bgrant0607
Copy link
Member

If it worked before, we can't take it away from the v1 API.

@wojtek-t
Copy link
Member

Agree - the PR is already sent out for review.

@lavalamp lavalamp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Sep 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

8 participants