-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old events from the past yielded due to remembered resource_version
#819
Comments
Do you have any link to confirm that this is really designed behavior? |
@mitar I couldn't find anything on this topic (but I didn't do a deep research). But this is the observed behaviour on the real cluster. The only official description is here: https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes The phrasing is made so that it does not specify how the initial list must be fetched, and what is the sort order. It only says that resourceVersion will be returned with the list, and —separately— that the "last returned| resourceVersion can be used to restore the watch. But the latter sentence means that the "last returned" is from another interrupted watch, not from the original list. What can be useful is this: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency Here, it explains the nature of the resourceVersion and potential future changes. The last paragraph there hints that in the initial GET operation, the resourceVersion of a list must be used, not of the objects in the list:
Perhaps, this is the resource version to be used for watch-stream's initial seeding:
I can implement this logic — if somebody more proficient with Kubernetes (than me) can confirm I understood it right. |
I suggest you look up the |
I think this is actually related to #700 and #693 and is expected. Just ran into this myself... This document describes how one should use watch. One line in that doc is etcd by default is set to remember 5 mins of resourceVersions... I bet this is may not the case anymore on GKE (at least in my case). |
@nolar Thanks for your investigation. I did some tests and I can confirm that it works as you write. 'Watch' should get a list of object to start watching with the proper version. In the same way kubectl's watch works. Let me know if you need help with implementing it. |
… watch from Watch() only returns entries, not the list. Entries in a list can be very old, and no longer valid to watch from. Additionally, the initial list watch() returns is unordered. To correct way of doing this is first list the entries. Here the list itself contains a resource_version that is recent. Using this version to watch will result in a correct watch(). A lot has been written about this. Most clear resource for me was: kubernetes-client/python#819
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Just to be clear, it's not just List action response that's ordered arbitrarily, Indeed, this scenario violates the documented guidance in api-concepts that "If the client watch is disconnected they can restart a new watch from the last returned resourceVersion, or ..." Things I hope are true but can't find clear promise in k8s docs 🙏
API reference only says this about specifying
I recommend opening an issue in kubernetes/community with these questions. This is good find of subtle points the k8s docs should explain better 👏 |
The issue is that k8s gives no meaning to the resource version but the resource version is just the exported modification index of its underlying datastore, etcd. etcd explicitly does not guarantee linearity of events
but it provides the modification index to allow clients to reorder events again. Imo the only fix is to interpret the resource version in client libraries as what it really is and not what kubernetes currently wants it to be, an opaque value probably in order to allow changing underlying datastore in future. I have prepared a branch for that a while ago but have not posted it as I had no reproducing example. I assume the client-go does something very similar as its watch functions are based on a local cache anyway and then can return values in resource index order. |
Hm, but if etcd returns versions per resource, that is per object, no? So what is then version returned by Kubernetes when you list multiple resources? How can you watch then a list of multiple resources and how can you resume that? Is the version of a list of multiple resources simple max(version of every resource in the list)? Is this how Kubernetes does it? etcd versions are monotonically increasing, but I am not sure if this is per resource or per all resources:
Would not then simply be enough to just always take max of currently known latest version and new version information? So instead of just setting currently known latest version to whatever Kubernetes returned, you compare and set it to the max? |
A list operation returns a single object and I assume kubernetes sorts the entries in that list itself so there should be no issue. The problem is only with watch operations if they are unordered simply storing the maximum for a reset would not work. But I now found this in the documentation of etcd:
This means that the results should not be unordered but then I do not know where the problem is. Maybe a buggy etcd version was used or kubernetes does not keep this guarantee in its api abstraction. I was not the only one confused by the old documentation see etcd-io/etcd#5592 |
A little addition on this issue: I have "solved" this by using the list-then-watch strategy. The resource version strings seems to be scoped to the custom resource type, or maybe to the whole cluster and all resources. You can treat it as a timestamp of the resulting dataset. For the list operation, it returns the resource version of the whole list — no necessary of the latest item in the list, but maybe of the current value of the version counter (as if it was "now"). The watch stream then continues from that point in versioning history. However, when listing, the objects are modified in the list: they have no We could in theory get those from the The If the "watching-with-no-starting-point" feature is going to be (re)implemented in this client library, these aspects should be taken into account. |
the list issue is #745 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Current behaviour
The
list...()
operation (actually, the Kubernetes API) returns the resources in random arbitrary order. It can happen so that the very old resources (e.g. custom resources 1-month old) go last in the list.In current implementation, the watcher-streamer remembers the resource_version of the last seen resource — in that case, the old resource versions (1-month old) will be remembered as the last seen one. And when the HTTP call is disconnected for any reason, the watcher-streamer starts a new one, using that remembered old resource_version as the base.
As a result, all the changes of all the resources of that resource kind are yielded: i.e. all happened in the past month, despite they were already yielded before (and presumable handled by the consumer). For the objects that were created since that old timestamp, it yields the ADDED & all MODIFIED events & event the DELETED events.
Example
Example for my custom resource kind:
Please note the random order of resource_versions. Depending on your luck and current state of the cluster, you can get either the new enough, or the oldest resource in the last line.
Let's use the latest resource_version
223394843
with a new watch object:Well, okay, let's try the recommended resource_version, which is at least known to the API:
All this is dumped immediately, nothing happens in the cluster during these operations. All these changes are old, i.e. not expected, as they were processed before doing
list...()
.Please note that even the deleted non-existing resource are yielded ("expr1").
Dilemma
See kubernetes-client/python-base#131 for a suggested implementation of the monotonically increasing resource_version as remembered by the watcher.
However, one of the unit-tests says about the resource version:
Kubernetes does not keep the promise, and returns the events in random order.
Way A: If the client library starts interpreting the resource versions, and to remember the maximum value seen, it can break its compatibility with kubernetes.
Way B: If the client library decides to treat the resource version as opaque and non-interpretable, it should also stop remembering it, as it leads to the re-yielding of the events from the past (long ago), as demonstrated above.
In the latter case, all
resource_version
support should not be in theWatch.stream()
at all, and only the users of the watcher-streamer should decide on whether they are interpreting the resource version or not, and to track it by their own rules (at their own risk).Possibly related
The text was updated successfully, but these errors were encountered: