[Deployment] Add PollRolloutStatus and poll deployment on updates #613

stuggi · 2025-03-25T07:58:46Z

When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected and the caller be able to evaluate. Currently if there is an issue like the above, the deployment rollout/update results in DeadlineExceed but it is not reflected in the service operators. Since the old config replicas are still up and healthy, the service is still functional with the old specs, but the rollout issue is not reflected.

This adds PollRolloutStatus() and calls it from CreateOrPatch() when the oparation != create.

Using rolloutPollInterval and rolloutPollTimeout, the caller can control the interval and timeout of one poll run which uses PollUntilContextTimeout().

The status and message of each run is reflected as d.rolloutStatus and d.rolloutMessage. Status can be Complete, Progressing and ProgressDeadlineExceeded. The message has a corresponding string to each of the status.

To note, the DeadlineExceeded from PollUntilContextTimeout() in CreateOrPatch() is ignored to allow the caller to reflect the rolleoutStatus/rolloutMessage in its DeploymentCondition and reconcile and trigger a new poll.

Jira: OSPRH-14472

When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>

When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected and the caller be able to evaluate. Currently if there is an issue like the above, the deployment rollout/update results in DeadlineExceed but it is not reflected in the service operators. Since the old config replicas are still up and healthy, the service is still functional with the old specs, but the rollout issue is not reflected. This adds PollRolloutStatus() and calls it from CreateOrPatch() when the oparation != create. Using rolloutPollInterval and rolloutPollTimeout, the caller can control the interval and timeout of one poll run which uses PollUntilContextTimeout(). The status and message of each run is reflected as d.rolloutStatus and d.rolloutMessage. Status can be Complete, Progressing and ProgressDeadlineExceeded. The message has a corresponding string to each of the status. To note, the DeadlineExceeded from PollUntilContextTimeout() in CreateOrPatch() is ignored to allow the caller to reflect the rolleoutStatus/rolloutMessage in its DeploymentCondition and reconcile and trigger a new poll. Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>

When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>

stuggi · 2025-04-03T14:45:20Z

closing in favor of a less invasive solution, at least atm

stuggi added the do-not-merge/hold label Mar 25, 2025

stuggi mentioned this pull request Mar 25, 2025

Poll deployment on update openstack-k8s-operators/keystone-operator#552

Closed

stuggi force-pushed the OSPRH-14472 branch from c89c4d3 to 9060707 Compare March 25, 2025 10:53

stuggi force-pushed the OSPRH-14472 branch from 9060707 to 305e428 Compare April 1, 2025 09:46

stuggi closed this Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Deployment] Add PollRolloutStatus and poll deployment on updates #613

[Deployment] Add PollRolloutStatus and poll deployment on updates #613

Uh oh!

stuggi commented Mar 25, 2025

Uh oh!

stuggi commented Apr 3, 2025

Uh oh!

Uh oh!

[Deployment] Add PollRolloutStatus and poll deployment on updates #613

[Deployment] Add PollRolloutStatus and poll deployment on updates #613

Uh oh!

Conversation

stuggi commented Mar 25, 2025

Uh oh!

stuggi commented Apr 3, 2025

Uh oh!

Uh oh!