New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO-ISSUE: prevent update status conflicts #1621
NO-ISSUE: prevent update status conflicts #1621
Conversation
c896243
to
806b367
Compare
@deads2k: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@deads2k: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This PR is pulling openshift/library-go#1658 and it looks like the fix was to issue a live GET after running into a conflict on an update. Why not issue a |
By passing a function, the operator gets to decide whether the conflict is meaningful enough to skip an update (decide not to update). I don't remember if we used this capability or not. Also, a json patch cannot be predicated based on another field. So the patch for conditions would rely on order, which will fail when a condition is added or removed. Recall semantic merge patch isn't supported for CRDs. Finally, server-side-apply has arrived since we added this style update, but because every field a fieldmanager sets must be set on every SSA call, we have update every controller to set every condition instead of the ones it cares about. While this could be done, it is not a drop-in replacement (this PR is) and will need to be phased in if someone wishes to undertake it, meaning this method will remain. |
okay, so the list would have to be replaced and not merged
+1
Please help me understand the details of this issue and the provided fix. Does it mean that our caches were/are stale? Does it mean that the update frequency is very high? Finally, does it mean that after the fix, we will issue more requests to the server to get the live object? |
Yes, cache was stale.
It's high sometimes. During upgrade and configuration changes many controllers try to issue status updates for individual conditions.
Yes, but recall that we're comparing repeatedly failing UpdateStatus (usually failing more than once), to a single extra get. So if we look at the behavior, we're comparing 5 updates to 2 updates and one get. So it's fewer calls overall. |
ok, let's try this PR out. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, p0lyn0mial The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
758ac1c
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.16.0-202401151732.p0.g758ac1c.assembly.stream for distgit ose-cluster-kube-apiserver-operator. |
This reduces the
messages in the log. I found 50-100 in runs prior to the update here.
After this update had 5.