Skip to content

Conversation

stuggi
Copy link
Contributor

@stuggi stuggi commented Mar 25, 2025

When a deployment gets updated (config, image or changed spec) and the rollout results in the deployment to fail it should be reflected and the caller be able to evaluate. Currently if there is an issue like the above, the deployment rollout/update results in DeadlineExceed but it is not reflected in the service operators. Since the old config replicas are still up and healthy, the service is still functional with the old specs, but the rollout issue is not reflected.

This adds PollRolloutStatus() and calls it from CreateOrPatch() when the oparation != create.

Using rolloutPollInterval and rolloutPollTimeout, the caller can control the interval and timeout of one poll run which uses PollUntilContextTimeout().

The status and message of each run is reflected as d.rolloutStatus and d.rolloutMessage. Status can be Complete, Progressing and ProgressDeadlineExceeded. The message has a corresponding string to each of the status.

To note, the DeadlineExceeded from PollUntilContextTimeout() in CreateOrPatch() is ignored to allow the caller to reflect the rolleoutStatus/rolloutMessage in its DeploymentCondition and reconcile and trigger a new poll.

Jira: OSPRH-14472

stuggi added a commit to stuggi/keystone-operator that referenced this pull request Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
in the Deployment condition. Currently if there is an issue on
the rollout of the deployment and the rollout/update results in
a ProgressDeadlineExceed it is not reflected in the service operators
conditions.
Instead because the old config replicas are still up and healthy,
the service is still functional with the old specs, the Deployment
conditions shows ready, but the rollout is stuck and failed to deploy
what was requested.

Example:
~~~
NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
keystone                        True     Setup complete
keystone                        True     Setup complete <<<< depl complete

>> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start
keystone                        Unknown   Service config create not started
keystone                        Unknown   Setup started
keystone                        False     rollout status: 1/4 replicas updated
>>> replacement pod is failing
keystone                        False     rollout status: 1/4 replicas updated
...
keystone                        False     keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing.
>>> DeadlineExceeded reached (default 10min)
~~~

As a result the keystoneAPI is not ready, because keystone is not ready
for the new config requested/rollout failed. Other services which rely
on the keystoneapi will stop at:

~~~
$ oc get neutronapi -n openstack
NAME      NETWORKATTACHMENTS                                                            STATUS   MESSAGE
neutron   {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]}   False    KeystoneAPI not yet ready
~~~

Depends-On: openstack-k8s-operators/lib-common#613

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi added a commit to stuggi/keystone-operator that referenced this pull request Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
in the Deployment condition. Currently if there is an issue on
the rollout of the deployment and the rollout/update results in
a ProgressDeadlineExceed it is not reflected in the service operators
conditions.
Instead because the old config replicas are still up and healthy,
the service is still functional with the old specs, the Deployment
conditions shows ready, but the rollout is stuck and failed to deploy
what was requested.

Example:
~~~
NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
keystone                        True     Setup complete
keystone                        True     Setup complete <<<< depl complete

>> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start
keystone                        Unknown   Service config create not started
keystone                        Unknown   Setup started
keystone                        False     rollout status: 1/4 replicas updated
>>> replacement pod is failing
keystone                        False     rollout status: 1/4 replicas updated
...
keystone                        False     keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing.
>>> DeadlineExceeded reached (default 10min)
~~~

As a result the keystoneAPI is not ready, because keystone is not ready
for the new config requested/rollout failed. Other services which rely
on the keystoneapi will stop at:

~~~
$ oc get neutronapi -n openstack
NAME      NETWORKATTACHMENTS                                                            STATUS   MESSAGE
neutron   {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]}   False    KeystoneAPI not yet ready
~~~

Depends-On: openstack-k8s-operators/lib-common#613

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi added a commit to stuggi/keystone-operator that referenced this pull request Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
in the Deployment condition. Currently if there is an issue on
the rollout of the deployment and the rollout/update results in
a ProgressDeadlineExceed it is not reflected in the service operators
conditions.
Instead because the old config replicas are still up and healthy,
the service is still functional with the old specs, the Deployment
conditions shows ready, but the rollout is stuck and failed to deploy
what was requested.

Example:
~~~
NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
keystone                        True     Setup complete
keystone                        True     Setup complete <<<< depl complete

>> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start
keystone                        Unknown   Service config create not started
keystone                        Unknown   Setup started
keystone                        False     rollout status: 1/4 replicas updated
>>> replacement pod is failing
keystone                        False     rollout status: 1/4 replicas updated
...
keystone                        False     keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing.
>>> DeadlineExceeded reached (default 10min)
~~~

As a result the keystoneAPI is not ready, because keystone is not ready
for the new config requested/rollout failed. Other services which rely
on the keystoneapi will stop at:

~~~
$ oc get neutronapi -n openstack
NAME      NETWORKATTACHMENTS                                                            STATUS   MESSAGE
neutron   {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]}   False    KeystoneAPI not yet ready
~~~

Depends-On: openstack-k8s-operators/lib-common#613

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi added a commit to stuggi/keystone-operator that referenced this pull request Mar 25, 2025
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
in the Deployment condition. Currently if there is an issue on
the rollout of the deployment and the rollout/update results in
a ProgressDeadlineExceed it is not reflected in the service operators
conditions.
Instead because the old config replicas are still up and healthy,
the service is still functional with the old specs, the Deployment
conditions shows ready, but the rollout is stuck and failed to deploy
what was requested.

Example:
~~~
NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
keystone                        True     Setup complete
keystone                        True     Setup complete <<<< depl complete

>> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start
keystone                        Unknown   Service config create not started
keystone                        Unknown   Setup started
keystone                        False     rollout status: 1/4 replicas updated
>>> replacement pod is failing
keystone                        False     rollout status: 1/4 replicas updated
...
keystone                        False     keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing.
>>> DeadlineExceeded reached (default 10min)
~~~

As a result the keystoneAPI is not ready, because keystone is not ready
for the new config requested/rollout failed. Other services which rely
on the keystoneapi will stop at:

~~~
$ oc get neutronapi -n openstack
NAME      NETWORKATTACHMENTS                                                            STATUS   MESSAGE
neutron   {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]}   False    KeystoneAPI not yet ready
~~~

Depends-On: openstack-k8s-operators/lib-common#613

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
and the caller be able to evaluate. Currently if there is an issue
like the above, the deployment rollout/update results in DeadlineExceed
but it is not reflected in the service operators. Since the old
config replicas are still up and healthy, the service is still
functional with the old specs, but the rollout issue is not reflected.

This adds PollRolloutStatus() and calls it from CreateOrPatch()
when the oparation != create.

Using rolloutPollInterval and rolloutPollTimeout, the caller can
control the interval and timeout of one poll run which uses
PollUntilContextTimeout().

The status and message of each run is reflected as d.rolloutStatus
and d.rolloutMessage. Status can be Complete, Progressing and
ProgressDeadlineExceeded. The message has a corresponding string to
each of the status.

To note, the DeadlineExceeded from PollUntilContextTimeout() in
CreateOrPatch() is ignored to allow the caller to reflect the
rolleoutStatus/rolloutMessage in its DeploymentCondition and
reconcile and trigger a new poll.

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
stuggi added a commit to stuggi/keystone-operator that referenced this pull request Apr 1, 2025
When a deployment gets updated (config, image or changed spec) and
the rollout results in the deployment to fail it should be reflected
in the Deployment condition. Currently if there is an issue on
the rollout of the deployment and the rollout/update results in
a ProgressDeadlineExceed it is not reflected in the service operators
conditions.
Instead because the old config replicas are still up and healthy,
the service is still functional with the old specs, the Deployment
conditions shows ready, but the rollout is stuck and failed to deploy
what was requested.

Example:
~~~
NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
keystone                        True     Setup complete
keystone                        True     Setup complete <<<< depl complete

>> add broken httpd config, but could be a broken image, or someething else which makes the pod fail to start
keystone                        Unknown   Service config create not started
keystone                        Unknown   Setup started
keystone                        False     rollout status: 1/4 replicas updated
>>> replacement pod is failing
keystone                        False     rollout status: 1/4 replicas updated
...
keystone                        False     keystone ProgressDeadlineExceeded - ReplicaSet "keystone-74db779db5" has timed out progressing.
>>> DeadlineExceeded reached (default 10min)
~~~

As a result the keystoneAPI is not ready, because keystone is not ready
for the new config requested/rollout failed. Other services which rely
on the keystoneapi will stop at:

~~~
$ oc get neutronapi -n openstack
NAME      NETWORKATTACHMENTS                                                            STATUS   MESSAGE
neutron   {"openstack/internalapi":["172.17.0.33"],"ovn-kubernetes":["10.217.0.169"]}   False    KeystoneAPI not yet ready
~~~

Depends-On: openstack-k8s-operators/lib-common#613

Jira: OSPRH-14472

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi
Copy link
Contributor Author

stuggi commented Apr 3, 2025

closing in favor of a less invasive solution, at least atm

@stuggi stuggi closed this Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant