-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid reporting outdated ES health on reconciliation error that prevents getting the real one #5349
Conversation
ab96f7b
to
3112e3c
Compare
Quick question, does it means the operator will also set the status to Because today, if we exclude temporary a ressources, the status is reported is kept also it does not reflect the reality. |
No, it is independent.
Could you elaborate on that (maybe in #5330 or a dedicated issue)? What part of the status isn't updated? |
I think it's a different issue. |
I realized that we can reset the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the split in certificate reconciliation is worth the effort (but I am also not opposed to it).
Regarding the reset to unknown
I think we could instead just initialise the reconciliation state with unknown
health in the reconcile state constructor function.
Another possible improvement could be to update the status sub resource even if the cluster is unmanaged but that should probably be discussed in an issue first.
I discussed with barkbay and he is rather in favor of it but will share his opinion after verifying that it goes well with #5328. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think I'm fine with the proposal of having certificates reconciled in a "two phases" approach. Also it should be easy to merge it with #5328
👍 to create an issue to discuss whether or not the status subresource should be updated if the resource itself is not supposed to be managed.
The Elasticsearch health reported by the Operator in the Elasticsearch resource status subresource may never be updated if the Operator encounters an error during the reconciliation loop. This commits improves that by: * Initializing the ES health to 'unknown' in the NewState constructor, so that we stop to report a health that may be out of date * Splitting HTTP and transport certs reconciliation in order to be able to retrieve the ES health despite an issue with the transport certs * Starting the observer as soon as possible and then updating the ES state with the latest state
2 first commits are more an optimization:
Those after could be enough:
Relates to #5330.
Testing
Uncomment
transport.tls.certificate.secretName
and reapply.Cluster should be reported as
green
and not stuck in theyellow
state, with phaseApplyingChanges
.Do the same by breaking the
http
service. This time the operator will report anunknown
state as the observer depends on this service, still in phaseApplyingChanges
.