New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1740824: Fix degraded status on upgrade #241
Bug 1740824: Fix degraded status on upgrade #241
Conversation
|
@awgreene: This pull request references an invalid Bugzilla bug:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@awgreene: This pull request references a valid Bugzilla bug. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Problem: During an upgrade, multiple marketplace operators are running and updating the ClusterOperatorStatus at the same time, making it difficult to identify the actual state of the marketplace operator. Solution: Implement leader election using the functions provided by the Operator-SDK to prevent multiple marketplace operators from updating the ClusterOperatorStatus at the same time.
Update an invalid comment that suggest that the ClusterOperatorStatus Degraded condition is being set to true when the operator exits.
Problem: The marketplace operator is reporting degraded while it is upgrading. The marketplace operator reports degraded when the number of failed syncs surpasses some threshold of total syncs. The marketplace operator currently reports a failed sync whenever an error is encountered while reconciling an operand (OperatorSources or CatalogSourceConfigs). This allows network issues or invalid operands to drive the marketplace operator to a degraded state. Solution: Rather than reporting a failed sync whenever an error is encountered while reconciling an operand, report a failed sync when the marketplace operator is unable to get or update an existing operand.
Problem: It is difficult to identify why a ClusterOperatorStatus condition is in a given state via telemetry. Marketplace currently does not set a reason when setting conditions and telemetry only includes the state and reason. Solution: Update marketplace to set the Reason field when setting a condition.
|
@awgreene: This pull request references a valid Bugzilla bug. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/approve |
1 similar comment
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aravindhp, awgreene, ecordell The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
PTAL @shawn-hurley |
|
/lgtm |
|
/test e2e-aws-operator |
|
/test e2e-aws |
|
/retest |
1 similar comment
|
/retest |
|
/retest |
1 similar comment
|
/retest |
|
@awgreene: All pull requests linked via external trackers have merged. Bugzilla bug 1740824 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bug 1740824: Fix Degraded status on upgrade
Problem: There are three bugs that should be addressed within this bugzilla:
During an upgrade, multiple marketplace operators are running and updating the ClusterOperatorStatus at the same time, making it difficult to identify the actual state of the marketplace operator.
The marketplace operator is reporting degraded while it is upgrading. The marketplace operator reports degraded when the number of failed syncs surpasses some threshold of total syncs. The marketplace operator currently reports a failed sync whenever an error is encountered while reconciling an operand (OperatorSources or CatalogSourceConfigs). This allows network issues or invalid operands to drive the marketplace operator to a degraded state.
It is difficult to identify why a ClusterOperatorStatus condition is in a given state via telemetry. Marketplace currently does not set a reason when setting conditions and telemetry only includes the state and reason.
Solution: Three changes need to be made to address the bugzilla:
Implement leader election using the functions provided by the Operator-SDK to prevent multiple marketplace operators from updating the ClusterOperatorStatus at the same time.
Rather than reporting a failed sync whenever an error is encountered while reconciling an operand, report a failed sync when the marketplace operator is unable to get or update an existing operand.
Update marketplace to include a reason when setting a condition that indicates why the condition is being set.