Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Treat resourceVersion as opaque string #467

Closed
wants to merge 3 commits into from

Conversation

nathancoleman
Copy link
Member

Changes proposed in this PR:

The Kubernetes API explicitly disallows parsing the metadata.resourceVersion field on an object as an integer.

You must not assume resource versions are numeric or collatable. API clients may only compare two resource versions for equality (this means that you must not compare resource versions for greater-than or less-than relationships).

With this change, we rely on the watcher to only trigger for newer versions than we've previously seen. Instead of parsing to an integer, we do simple equality checks.

How I've tested this PR:

馃 tests

How I expect reviewers to test this PR:

Checklist:

  • Tests added
  • CHANGELOG entry added

    Run make changelog-entry for guidance in authoring a changelog entry, and
    commit the resulting file, which should have a name matching your PR number.
    Entries should use imperative present tense (e.g. Add support for...)

@nathancoleman nathancoleman added the pr/no-changelog Skip the CI check that requires a changelog entry label Nov 29, 2022
@nathancoleman nathancoleman linked an issue Nov 29, 2022 that may be closed by this pull request
if current, ok := g.gatewayClasses[class.class.Name]; ok {
if utils.ResourceVersionGreater(current.class.ResourceVersion, class.class.ResourceVersion) {
if cachedClass, ok := g.gatewayClasses[class.class.Name]; ok {
if cachedClass.class.ResourceVersion == class.class.ResourceVersion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So part of the problem with this is that I believe we can actually corrupt our cache if we're simultaneously reconciling resources and only doing equality checks (i.e. an old resource is going to potentially override a newer resource). Since a lot of our internal state storage has changed fairly substantially, I'm not entirely sure if this is still an issue, but it clearly was early on -- which is why we explicitly made the decision early on to ignore the warning against resource version parsing (it has been a monotonically increasing global 64 bit integer based on etcd's revision field since day one IIRC).

I'm fine if we can remove the reliance on that, but just be aware that we may get

  1. (in the best case), additional log spam due to failures in whatever status Update calls we're doing
  2. (in the bad case), potential, hopefully temporary, cache corruption that may or may not affect what we're syncing to Consul
  3. (in the worst case), due to the above, potentially wedging a resource in a reconciliation loop due to affecting upstream dependencies that are stale in cache (i.e. we write in an old gateway somehow, but routes are reconciling that require a gateway status update for such a thing as the number of routes bound to it, the update fails, so the route gets re-reconciled, but never completes until we purge/fix the stale cache entry)

All of this said, while this fix seems fairly straightforward, it can potentially have some pretty large consequences, so I would want to thoroughly vet it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewstucki if I understand this logic correctly, the caller of K8sGatewayClasses.Upsert() wants to mutate if and only if the cached version of the class they have is up-to-date, right? Instead of doing the resource version comparison manually here, why not pass the resource version through the client to the server as a precondition and allow the server to reject the call if it's out-of-date? Store in g.gatewayClasses[class.class.Name] if successful and ignore the conflict error if you want your client not to re-try with an updated version from the cache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting the server do the resourceVersion comparison for you would free you from having to carefully manage when you do and do not add to your client-side cache, meaning your option 2 & 3 would not occur. Although, one more question - are you using the controller-runtime cache-backed clients, and, if so, what is the additional utility to the in-memory cache in question here?

Copy link
Member Author

@nathancoleman nathancoleman Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concern is less about where we do the comparing and more that the meaning of "out of date" with this PR changes from "a lower incrementing resourceVersion integer than the one I currently have" to "a resource version that exactly matches the one I currently have".

The current state deals with a watch trigger series for versions 1->3->2 by rejecting the last version because 2 < 3, ending up holding resourceVersion 3.

The proposed state would accept the last version because 2 != 3, ending up holding resourceVersion 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't think I follow (yet!).

In the current implementation, the reconciler has a cached version of an object, with some resourceVersion=a. The intent with the current Upsert() semantic is that this reconciler's client call should only go through if the cache has not seen a more-recent version resourceVersion=b of the object yet.

If, instead of doing the check in-memory, the reconclier unconditionally made the client call to the sever, and passed that resourceVersion=a in the precondition on the UpdateStatus(), the server would ensure that the there had not been any committed changes to the object past a - so the invariant remains that the UpdateStatus() succeeds if and only if the client's cached version (a) is the latest. In fact, the invariant becomes stronger, since you are guaranteed not only that a is newer than any revision you have happened to observe and store in your in-memory cache, but that it is the newest revision available in the database. Storing the object only on a successful call would allow your cache to stay coherent.

With the current implementation, if you want to make sure your upsert is not clobbering newer data on the server (perhaps from other actors), you still need to pass the precondition to the server to guard against the following case:

  • cache observes object with resourceVersion=a
  • reconclier uses cached state to issue an upsert call
  • in-memory resourceVersion comparison is successful, as no new data exists in the cache
  • server updates the object to resourceVersion=b
  • reconclier issues a call to the server to UPDATE the resource
  • cache observes object with resourceVersion=b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I mentioned the controller-runtime cache-backed client, as well, is that you have a full view of the objects your reconclier is operating over in an in-memory cache as part of the controller-runtime infrastructure. Therefore, a simple pattern to ensure correctness is to pass the resourceVersion on objects you see in a reconcile.Request to the server as a precondition. If the server call fails with a conflict, the cached view that your reconclier used was out-of-date. However, you'll see the up-to-date version at some point in the future, at which point the reconciler's client call will succeed and you've made forward progress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevekuznetsov The cache/out-of-k8s storage is required for shipping SDS and syncing all of our data to Consul -- having a working set of how things need to be and actually enforcing that when users change things externally (simple example of enforcing intentions updates when a user may add or remove an additional source or destination to a service that we're routing to). RE your comment in your issue about the in-memory store, the original design of this external storage layer was to be pluggable with future implementations, including using durable storage backends, in part to pave the way for working with both on k8s and off-k8s execution modes -- some of that is no longer true though.

RE the current use of resourceVersion I'm thinking that part of our use of resourceVersion could be ameliorated by actually doing the external storage sync after delegating to Kubernetes for the version check during our status updates, if the update succeeds, then as you mentioned, we have made forward progress since we were using a fresh entry and we can go ahead and do the rest of the sync. This does make tracking whether or not we were able to sync state to our external sources as part of the user reflected Status more difficult though, as that now happens after the initial remote update and would require a second pass through the reconciler to actually update the remote sync status.

Just to set expectations, given that there's likely a good amount of work involved with both changing this resourceVersion check and ensuring correctness with both Kubernetes usage and our external state syncing code and the current workload/priorities of our team, I'm not sure that we'll be able to change this behavior in short order. I would like to fix it, but it could take some time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely - if you have appetite for it, I am more than happy to take on the work. By no means did I mean to show up and ask you to do more - I am sure you are very busy! If that sounds like a good idea to you, I'd love to take more time to learn the code and perhaps sync with you to understand the problem space a little better. I'm sure there's a solution here that does not require breaking that particular API convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a solution here that does not require breaking that particular API convention.

Totally agree that this is tractable and thanks for your desire to contribute! If you do wind up tackling this, let me know if you have any questions or need some feedback.

@nathancoleman
Copy link
Member Author

@stevekuznetsov if you do have an appetite for pursuing that larger change, I will close this PR in favor of that. Let me know!

@stevekuznetsov
Copy link

Absolutely - I can work on that.

@nathancoleman nathancoleman deleted the opaque-resource-version branch December 1, 2022 18:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr/no-changelog Skip the CI check that requires a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing Resource Version Is Not Allowed
3 participants