Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported conditions don't align with established conventions #4975

Open
thetechnick opened this issue Aug 19, 2022 · 3 comments
Open

Reported conditions don't align with established conventions #4975

thetechnick opened this issue Aug 19, 2022 · 3 comments
Labels

Comments

@thetechnick
Copy link

thetechnick commented Aug 19, 2022

We are implementing a higher-level controller on top of github.com/rhobs/observability-operator, as part of github.com/openshift/addon-operator.
Now we want to use the new status conditions posted by OO and Prometheus Operator, but this left us confused as these conditions seem to behave different to other conditions.

  1. Condition Status is either "True", "False" or "Unkown"
    Prometheus Operator and also Observability Operator are reporting an extra "Degraded" status, which is doesn't map to any existing API convention and makes interpreting status harder.
    Is there a difference between: Type: "Available", Status: "False" and Status: "Degraded"?
    Is "Degraded" referring to the condition itself or the operator being degraded and thus should not be taken for granted?
    Kubernetes API conventions

  2. What state does the "Reconciled" condition represent?
    If the controller was able to post that condition, it - by definition - reconciled something.
    Whether the status is up to date is already represented by reporting .observedGeneration.
    edit ObservedGeneration is currently not reported at all, making it impossible to judge if status is stale or not.

  3. (nit/suggestion) Conditions could use standardized conditions from apimachinery and their helper functions to make handling easier:
    Standardized Conditions from k8s/apimachinery
    Standardized Condition helper functions from k8s/apimachinery

This is not a blocker for us, but to quote from API conventions:

Conditions are most useful when they follow some consistent conventions

Right now it's very easy to miss-interpret status reported by Prometheus Operator and by extension OO.

Thanks for reading!

@JoaoBraveCoding
Copy link
Contributor

Hi Nico 👋 having read your issue I agree with you that we should try and follow the conventions! I'll review your PR but @simonpasquier will be the best person to discuss the design decisions. Thanks for the contribution!

@simonpasquier
Copy link
Contributor

Is there a difference between: Type: "Available", Status: "False" and Status: "Degraded"?
Is "Degraded" referring to the condition itself or the operator being degraded and thus should not be taken for granted?

Yes there's a difference:

  • (Type: Available, Status: False) means that the Prometheus service is totally unavailable (e.g. no pods are ready).
  • (Type: Available, Status: Degraded) means that the Prometheus service is partially available (e.g. at least 1 pod is ready, some aren't).

As I understood the conventions doc, True, False and Unknown are recommended but it doesn't tell that an operator should limit itself to these values (e.g. Condition status values may be True, False, or Unknown.).

What state does the "Reconciled" condition represent?

It represents the last status from the last reconciliation, it provides feedback to the user as to why the operator can't converge to the desired state (for instance, a reference to a secret key can't be resolved).

edit ObservedGeneration is currently not reported at all, making it impossible to judge if status is stale or not.

I agree that it should be added.

@thetechnick
Copy link
Author

Hi @simonpasquier ,
thanks for taking your time :)

I understand the reasoning behind reporting a degraded state, but this should be reported as a separate condition.
The reason for reporting individual conditions, is to reduce ambiguity and unexpected interpretation of distinct status conditions.

e.g. a Prometheus deployment can be both Degraded: "True" and Available: "True", signaling that it is still working, but not at full capacity.
Available: "Degraded" is ambiguous and special care needs to be taken to understand and handle this status.
It also creates a lot of extra edge cases that require consideration, e.g. "how degrades is?" Available: "True", Reconciled: "Degraded"

"True", "False" and "Unknown" simplify this to a great deal:
"True": The condition is certain to be in effect: e.g. A Deployment is Available == "True", when MinReplicas are ready.
"False": The condition is certain to be not in effect: e.g. A Deployment reports Progressing == "False", when only a single ReplicaSet is present.
"Unknown": No statement on the condition can be made: e.g. A Node reports Ready == "Unknown", when the kubelet didn't report back for a while

The API conventions doc gives little room for ambiguity, specifying each field in detail, including the Status field:

Conditions should follow the standard schema included in k8s.io/apimachinery/pkg/apis/meta/v1/types.go. It should be included as a top level element in status, similar to [...]

// status of the condition, one of True, False, Unknown.
// +required
Status ConditionStatus `json:"status" protobuf:"bytes,2,opt,name=status"`

It's also consistent throughout all kube apis as far as I can tell: https://github.com/kubernetes/api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants