Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Teleport update reconciliation on status updates #34063

Merged
merged 3 commits into from Nov 2, 2023

Conversation

tigrato
Copy link
Contributor

@tigrato tigrato commented Oct 31, 2023

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the status subresource. This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute. When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the status field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Fixes #34092

Changelog: Skip Teleport Operator reconciliation on status updates

Special thanks to @strideynet for confirming my hypothesis and giving the solution!

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
@github-actions
Copy link

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

Copy link
Contributor

@strideynet strideynet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to move meta.SetStatusCondition into silentUpdateStatus to reduce the likelihood of this occurring again - otherwise lgtm

@tigrato
Copy link
Contributor Author

tigrato commented Nov 2, 2023

Might be nice to move meta.SetStatusCondition into silentUpdateStatus to reduce the likelihood of this occurring again - otherwise lgtm

done in c433dc9

@tigrato tigrato force-pushed the tigrato/skip-reconcialiation-on-status-updates branch from c433dc9 to bd9b2db Compare November 2, 2023 22:09
@tigrato tigrato added this pull request to the merge queue Nov 2, 2023
Merged via the queue into master with commit 9750a45 Nov 2, 2023
33 checks passed
@tigrato tigrato deleted the tigrato/skip-reconcialiation-on-status-updates branch November 2, 2023 22:46
@public-teleport-github-review-bot

@tigrato See the table below for backport results.

Branch Result
branch/v12 Failed
branch/v13 Failed
branch/v14 Failed

tigrato added a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>

* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
tigrato added a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>

* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
tigrato added a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>

* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
tigrato added a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>

* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!



* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!



* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 3, 2023
* Fix Teleport update reconciliation on `status` updates

This pull request addresses the issue where the Teleport operator reconciliation runs every time the operator updates the `status` subresource.
This continuous reconciliation has led to an infinite loop, causing millions of reconciliations per minute.
When an error occurs, such as having invalid role properties, the Operator updates the status and returns an error, which should trigger a rescheduled reconciliation with exponential backoff. The problem arises because the operator failed to enforce a resource generation change, resulting in an immediate trigger of a new reconciliation when the `status` field is updated.

This pull request modifies the operator to avoid updating subresources and only trigger updates when there is a change in resource generation.

Special thanks to @strideynet for confirming my hypothesis and giving
the solution!



* return proper status conditions on failures

* enforce condition update on silentUpdateStatus

---------

Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Teleport operator's reconciliation loop is reactivated when there are updates to the status field
4 participants