-
Notifications
You must be signed in to change notification settings - Fork 46
[Deployment] Add PollRolloutStatus and poll deployment on updates #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stuggi
added a commit
to stuggi/keystone-operator
that referenced
this pull request
Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi
added a commit
to stuggi/keystone-operator
that referenced
this pull request
Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi
added a commit
to stuggi/keystone-operator
that referenced
this pull request
Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi
added a commit
to stuggi/keystone-operator
that referenced
this pull request
Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected and the caller be able to evaluate. Currently if there is an issue like the above, the deployment rollout/update results in DeadlineExceed but it is not reflected in the service operators. Since the old config replicas are still up and healthy, the service is still functional with the old specs, but the rollout issue is not reflected. This adds PollRolloutStatus() and calls it from CreateOrPatch() when the oparation != create. Using rolloutPollInterval and rolloutPollTimeout, the caller can control the interval and timeout of one poll run which uses PollUntilContextTimeout(). The status and message of each run is reflected as d.rolloutStatus and d.rolloutMessage. Status can be Complete, Progressing and ProgressDeadlineExceeded. The message has a corresponding string to each of the status. To note, the DeadlineExceeded from PollUntilContextTimeout() in CreateOrPatch() is ignored to allow the caller to reflect the rolleoutStatus/rolloutMessage in its DeploymentCondition and reconcile and trigger a new poll. Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi
added a commit
to stuggi/keystone-operator
that referenced
this pull request
Apr 1, 2025
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected in the Deployment condition. Currently if there is an issue on the rollout of the deployment and the rollout/update results in a ProgressDeadlineExceed it is not reflected in the service operators conditions. Instead because the old config replicas are still up and healthy, the service is still functional with the old specs, the Deployment conditions shows ready, but the rollout is stuck and failed to deploy what was requested. Example: ~~~ NAME NETWORKATTACHMENTS STATUS MESSAGE keystone True Setup complete keystone True Setup complete <<<< depl complete >> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start keystone Unknown Service config create not started keystone Unknown Setup started keystone False rollout status: 1/4 replicas updated >>> replacement pod is failing keystone False rollout status: 1/4 replicas updated ... keystone False keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing. >>> DeadlineExceeded reached (default 10min) ~~~ As a result the keystoneAPI is not ready, because keystone is not ready for the new config requested/rollout failed. Other services which rely on the keystoneapi will stop at: ~~~ $ oc get neutronapi -n openstack NAME NETWORKATTACHMENTS STATUS MESSAGE neutron {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]} False KeystoneAPI not yet ready ~~~ Depends-On: openstack-k8s-operators/lib-common#613 Jira: OSPRH-14472 Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
closing in favor of a less invasive solution, at least atm |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected and the caller be able to evaluate. Currently if there is an issue like the above, the deployment rollout/update results in DeadlineExceed but it is not reflected in the service operators. Since the old config replicas are still up and healthy, the service is still functional with the old specs, but the rollout issue is not reflected.
This adds PollRolloutStatus() and calls it from CreateOrPatch() when the oparation != create.
Using rolloutPollInterval and rolloutPollTimeout, the caller can control the interval and timeout of one poll run which uses PollUntilContextTimeout().
The status and message of each run is reflected as d.rolloutStatus and d.rolloutMessage. Status can be Complete, Progressing and ProgressDeadlineExceeded. The message has a corresponding string to each of the status.
To note, the DeadlineExceeded from PollUntilContextTimeout() in CreateOrPatch() is ignored to allow the caller to reflect the rolleoutStatus/rolloutMessage in its DeploymentCondition and reconcile and trigger a new poll.
Jira: OSPRH-14472