NO-ISSUE: prevent update status conflicts #1621

deads2k · 2024-01-09T20:42:28Z

This reduces the

Operation cannot be fulfilled on kubeapiservers.operator.openshift.io

messages in the log. I found 50-100 in runs prior to the update here.

After this update had 5.

openshift-ci · 2024-01-10T23:11:05Z

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node	`806b367`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-aws-operator-disruptive-single-node	`806b367`	link	false	`/test e2e-aws-operator-disruptive-single-node`
ci/prow/e2e-gcp-operator-single-node	`806b367`	link	false	`/test e2e-gcp-operator-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2024-01-11T19:19:17Z

@deads2k: This pull request explicitly references no jira issue.

In response to this:

This reduces the

Operation cannot be fulfilled on kubeapiservers.operator.openshift.io

messages in the log. I found 50-100 in runs prior to the update here.

After this update had 5.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

p0lyn0mial · 2024-01-12T08:12:56Z

This PR is pulling openshift/library-go#1658 and it looks like the fix was to issue a live GET after running into a conflict on an update. Why not issue a PATCH request which should help resolving conflicts, right ?

deads2k · 2024-01-12T13:38:42Z

This PR is pulling openshift/library-go#1658 and it looks like the fix was to issue a live GET after running into a conflict on an update. Why not issue a PATCH request which should help resolving conflicts, right ?

By passing a function, the operator gets to decide whether the conflict is meaningful enough to skip an update (decide not to update). I don't remember if we used this capability or not.

Also, a json patch cannot be predicated based on another field. So the patch for conditions would rely on order, which will fail when a condition is added or removed. Recall semantic merge patch isn't supported for CRDs.

Finally, server-side-apply has arrived since we added this style update, but because every field a fieldmanager sets must be set on every SSA call, we have update every controller to set every condition instead of the ones it cares about. While this could be done, it is not a drop-in replacement (this PR is) and will need to be phased in if someone wishes to undertake it, meaning this method will remain.

p0lyn0mial · 2024-01-12T14:11:05Z

Also, a json patch cannot be predicated based on another field. So the patch for conditions would rely on order, which will fail when a condition is added or removed. Recall semantic merge patch isn't supported for CRDs.

okay, so the list would have to be replaced and not merged

Finally, server-side-apply has arrived since we added this style update, but because every field a fieldmanager sets must be set on every SSA call, we have update every controller to set every condition instead of the ones it cares about. While this could be done, it is not a drop-in replacement (this PR is) and will need to be phased in if someone wishes to undertake it, meaning this method will remain.

+1

This reduces the

Operation cannot be fulfilled on kubeapiservers.operator.openshift.io

messages in the log. I found 50-100 in runs prior to the update here.

Please help me understand the details of this issue and the provided fix. Does it mean that our caches were/are stale? Does it mean that the update frequency is very high? Finally, does it mean that after the fix, we will issue more requests to the server to get the live object?

deads2k · 2024-01-12T14:55:46Z

Please help me understand the details of this issue and the provided fix. Does it mean that our caches were/are stale?

Yes, cache was stale.

Does it mean that the update frequency is very high?

It's high sometimes. During upgrade and configuration changes many controllers try to issue status updates for individual conditions.

Finally, does it mean that after the fix, we will issue more requests to the server to get the live object?

Yes, but recall that we're comparing repeatedly failing UpdateStatus (usually failing more than once), to a single extra get. So if we look at the behavior, we're comparing 5 updates to 2 updates and one get. So it's fewer calls overall.

p0lyn0mial · 2024-01-15T09:08:07Z

ok, let's try this PR out.

/lgtm

openshift-ci · 2024-01-15T09:08:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [deads2k,p0lyn0mial]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2024-01-15T20:12:47Z

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.16.0-202401151732.p0.g758ac1c.assembly.stream for distgit ose-cluster-kube-apiserver-operator.
All builds following this will include this PR.

openshift-ci bot requested review from soltysh and tkashem January 9, 2024 20:45

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2024

update library-go to deconflict updates

806b367

deads2k force-pushed the test-de-conflict branch from c896243 to 806b367 Compare January 10, 2024 20:32

deads2k changed the title ~~prevent update status conflicts~~ NO ISSUE: prevent update status conflicts Jan 11, 2024

deads2k changed the title ~~NO ISSUE: prevent update status conflicts~~ NO-ISSUE: prevent update status conflicts Jan 11, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 11, 2024

openshift-ci bot assigned p0lyn0mial Jan 15, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2024

openshift-merge-bot bot merged commit 758ac1c into openshift:master Jan 15, 2024
12 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-ISSUE: prevent update status conflicts #1621

NO-ISSUE: prevent update status conflicts #1621

deads2k commented Jan 9, 2024 •

edited

openshift-ci bot commented Jan 10, 2024

openshift-ci-robot commented Jan 11, 2024

p0lyn0mial commented Jan 12, 2024

deads2k commented Jan 12, 2024

p0lyn0mial commented Jan 12, 2024

deads2k commented Jan 12, 2024

p0lyn0mial commented Jan 15, 2024

openshift-ci bot commented Jan 15, 2024

openshift-bot commented Jan 15, 2024

NO-ISSUE: prevent update status conflicts #1621

NO-ISSUE: prevent update status conflicts #1621

Conversation

deads2k commented Jan 9, 2024 • edited

openshift-ci bot commented Jan 10, 2024

openshift-ci-robot commented Jan 11, 2024

p0lyn0mial commented Jan 12, 2024

deads2k commented Jan 12, 2024

p0lyn0mial commented Jan 12, 2024

deads2k commented Jan 12, 2024

p0lyn0mial commented Jan 15, 2024

openshift-ci bot commented Jan 15, 2024

openshift-bot commented Jan 15, 2024

deads2k commented Jan 9, 2024 •

edited