Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for instance readiness KEP #1692

Merged
merged 1 commit into from
Sep 29, 2020
Merged

Proposal for instance readiness KEP #1692

merged 1 commit into from
Sep 29, 2020

Conversation

alenkacz
Copy link
Contributor

Signed-off-by: Alena Varkockova varkockova.a@gmail.com

What this PR does / why we need it:
This PR proposes an implementation for KEP-34.


Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.

From all these resources it would compute readiness based on `Condition: Ready`, `Condition: Available` or other fields based on the convention of that particular type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what to do for Service type. Maybe we should drop that from the listed resources?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a good idea for phase 1. We can always add it later, but I think it'll require custom code and more than Condition checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the services for now, I'll add it there again if we have a clear path on how to assume readiness there

Copy link
Member

@nfnt nfnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great that you keep existing status API designs in mind!

keps/0034-instance-health.md Outdated Show resolved Hide resolved
Copy link
Member

@ANeumann82 ANeumann82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good. I think we should have some more details about different plans and different resources in an operator - Not all resources are deployed by the default deploy plan, and not all plans affect the readiness of an operator.


Ready is expected to be an oscillating condition and it will indicate the object was believed to be fully operational at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).

Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness on a side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about all plans except "deploy/upgrade/update"? For example, kafka has "mirrormaker" and "kafka-connect" plans that deploy additional resources. I would assume that the resources deployed here will be covered in the readiness checks, which is ok.

Where it becomes a bit fuzzy is for example the "backup" plan in Cassandra. There we deploy a job (which is not currently covered in the readiness phase 1 resources. So, the "Ready" state would go into "Unknown" because a plan is running, but the "Reason" can not find a resource that is unready? I think we need to clarify this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the justification of when heuristics should be evaluated or ignored totally make sense! but doesn't it make the state Ready == false? It is either Ready or not... I don't understand the nuance of having known conditions being reported as unknown... because we are not ready and we don't want to say unready? Can't we just indicate unready?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ANeumann82 mentions that jobs are not covered under readiness phase 1 resources... but that job creates a pod... is that pod covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per convention, Ready usually covers long-running stuff, so the idea is "is my long running stuff ready". If you install kafka, this should give you a signal if the kafka is ready.

I don't think it should include also jobs that do backups. What would that create then is:

  • kafka is ready
  • I trigger backup plan and job is applied but not finished
  • kafka is not ready - well unknown (WHY??? that's not true - or I mean in terms of backup it might not be ready but think about any other job that is not "destructive" to the underlying tech)
  • job finishes, kafka is ready again

To be honest, I don't know actually. For some time I was thinking that any plan run should move the status to unknown because pretty much anything could be happening in that time and we just ... don't know until it finishes.

I am just worried that the ready = unknown will in some UI get translated to unhealthy and give people a signal that they need to investigate that. But maybe it's fine and the UI just needs to merge this with the fact that we have plan in progress (this should ideally also bubble up to the UI)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a backup plan isn't a change in ready... there is a need to communicate it... it is just a different status. It is very necessary, but we shouldn't boil all possible statuses into readiness.
it is a distinct worth surfacing IMO... which is "ready" for what? There are at least 2 things the service could be "ready" for.

  1. service level activities (backup, restore, rebalance, scale up)
  2. user level activities (use in an application)

It seems most of what "ready" means in this document is focused on num 2 (user level)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct, the current readiness is 2. We should definitely note that in the documentation for this feature

keps/0034-instance-health.md Show resolved Hide resolved
Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a few questions though.

Readiness of an `Instance` will be communicated via a newly introduced `Status.Conditions` field. The schema of this field will [conform the schema](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1367) recommended by k8s api machinery. The `Type` of the newly added condition will be `Ready`. Condition is supposed to have these three values which will have the following meaning:
True - the last observation of readiness on this instance is that it’s ready
False - the last observation of readiness on this instance is that it’s NOT ready
Unknown - deploy/upgrade/update plan is running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what effects other plans could have on the readiness. E.g. a service that is doing a re-computation after a node eviction (triggered by a plan), might be in a partially degraded state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like an abuse of "Unknown"... if not true... shouldn't it be false for known conditions?
This feels like the opposite of what I would expect... it is hard to understand when it would be false. Also I would recommend having other conditions for other known states we want to provide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can always add more conditions, let's just focus on readiness right now as that's pretty much what's asked for. The beauty of conditions is that you can always add more and it's not a breaking change :)

I choose unknown here per this guidance in the api design: Intermediate states may be indicated by setting the status of the condition to Unknown.

I feel like plan run is an intermediate state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, i'm the last to complain about a "troolean" ( A Tri-state-boolean :D )...

Unknown makes sense with the quite from the API design you gave, maybe we can include that with a source in the KEP? I feel like InProgress would make more sense for a running plan, and Unknown for when we can't determine the actual health state of any resource, but maybe that's encoding too much information in the "Ready" condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alenkacz thanks for the thoughts on unknown.. if I had a mental hangup for this kep it was centered around that... you're explanation makes sense. I imagine details will come out of reason and message. I'm onboard!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Ready: InProgress would also be kind of ... weird. But I agree that Unknown has its issues, there's actually a lot of discussions in the k8s KEP around the troolean, lots of people don't like it :)

Let's try to be consistent with the Ready as per Conditions recommendation and if we need to have a different signal as well, let's maybe introduce another condition in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I love the comment here https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1288-L1298 - I wish there was conditiondegraded :) I think that would work really well for lots of our cases

keps/0034-instance-health.md Show resolved Hide resolved
keps/0034-instance-health.md Outdated Show resolved Hide resolved

Setting `Ready` condition will be a responsibility of existing `instance_controller`. It will be watching for all the types in *readiness phase 1 resources* (it’s already watching most of those already so there’s really no additional big performance penalty) and trigger reconcile for owned Instance.

Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to filter out resources of other plans? E.g. if there are additional pods running (monitoring or 3rd party tools) that are not ready? Currently, we set kudo.dev/plan annotation but not as label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to zen-dog's question (which aligns with earlier mention of job pods a little)

additionally... if there is a plan exec that launches a temp resource (a service for a short duration) or pod that finishes... is that resource involved in the status? is there a way to opt-out of being involved? should we design an opt-out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point, I think I like the idea of opt-out as it's hard to really capture these cases without it as one can use plans in any way possible. Would that in your mind solve the problem you pointed out @zen-dog ?

Copy link
Contributor

@zen-dog zen-dog Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opt-out (or maybe rather opt-in?) should work. We would only consider resources with 'kudo.dev/plan' in (deploy, update) but extend the set with more plans if an operator developer specified it. But this is still "edgy" as resources can be reused between plans and other plans like e.g. upgrade can have one-time migration Jobs, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's enough to simply pull all resources and filter for labels. The issue is, that the deploy plan may have multiple steps that deploy resources, but at any given time the plan can land in FATAL_ERROR and cancel the execution of subsequent steps.

If we were to pull only deployed resources with a filter, we could end up with a half-executed deploy plan in FATAL_ERROR, but "Ready" condition reporting "True" because all deployed resources are healthy - But the operator itself isn't.

I think this touches a lot of the problems we'll see with drift detection. I'm not sure if opt-out or opt-in will be the better solution here. Maybe only apply to the "deploy" plan by default, and allow other plans to opt-in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually brought up a very good edge case, what if plan fails 🤔 I think if last plan is in fatal_error would not it make sense for ready to be "False"? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or should it rather be unknown then? I think ideally it should be degraded https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1288-L1298 but that does not exist yet :)

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of dropping service for now
I'm confused around the use of unknown for status for known things.
It would be great to hone this a little but nothing blocking

keps/0034-instance-health.md Outdated Show resolved Hide resolved
Readiness of an `Instance` will be communicated via a newly introduced `Status.Conditions` field. The schema of this field will [conform the schema](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1367) recommended by k8s api machinery. The `Type` of the newly added condition will be `Ready`. Condition is supposed to have these three values which will have the following meaning:
True - the last observation of readiness on this instance is that it’s ready
False - the last observation of readiness on this instance is that it’s NOT ready
Unknown - deploy/upgrade/update plan is running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readiness of an `Instance` will be communicated via a newly introduced `Status.Conditions` field. The schema of this field will [conform the schema](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1367) recommended by k8s api machinery. The `Type` of the newly added condition will be `Ready`. Condition is supposed to have these three values which will have the following meaning:
True - the last observation of readiness on this instance is that it’s ready
False - the last observation of readiness on this instance is that it’s NOT ready
Unknown - deploy/upgrade/update plan is running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like an abuse of "Unknown"... if not true... shouldn't it be false for known conditions?
This feels like the opposite of what I would expect... it is hard to understand when it would be false. Also I would recommend having other conditions for other known states we want to provide.

False - the last observation of readiness on this instance is that it’s NOT ready
Unknown - deploy/upgrade/update plan is running

Ready is expected to be an oscillating condition and it will indicate the object was believed to be fully operational at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not of a fan of "fully operational"... we don't know that... what we know is it is ready from the control planes perspective.... there are no actions for the control plane to handle... that is all we know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd sentence is unclear to me what is being communicated... "stalled"? are you trying to trying to address "stale" status? or? where does the 1 minute value come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased fully functional

I was trying to communicate that at least 1 minute from something being non-ready this status should reflect it (as a functional requirement). Any idea how to phrase it differently?


Ready is expected to be an oscillating condition and it will indicate the object was believed to be fully operational at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).

Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness on a side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the justification of when heuristics should be evaluated or ignored totally make sense! but doesn't it make the state Ready == false? It is either Ready or not... I don't understand the nuance of having known conditions being reported as unknown... because we are not ready and we don't want to say unready? Can't we just indicate unready?


Ready is expected to be an oscillating condition and it will indicate the object was believed to be fully operational at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).

Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness on a side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ANeumann82 mentions that jobs are not covered under readiness phase 1 resources... but that job creates a pod... is that pod covered?


Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness on a side.

The reason for using Conditions field rather than introducing a separate field is mainly that it’s starting to be established as a go-to pattern in kubernetes world. Also conditions have a standard set of fields attached with metadata that are useful for our case (namely Reason, Message). Having a ‘Ready’ condition is also [encouraged by sig-architecture](https://github.com/kubernetes/community/pull/4521/files). (excerpt from that document: *Although they are not a consistent standard, the `Ready` and `Succeeded` condition types may be used by API designers for long-running and bounded-execution objects, respectively.*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

status: "False"
lastTransitionTime: 2018-01-01T00:00:00Z
Reason: ResourceNotReady
Message: Deployment ‘xxx’ is not ready, Deployment ‘yyy’ is not ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation detail warning :)
Is it your expectation that we will create a "message" through the joining of resource events (which is where we can see that a pod can't be scheduled)? that is what I have in my mental model... I don't know how you would do it another way... if that is the case... is there a way to identify the resource with the message or a format for this message.

I also assume we will use the constraints defined in the schema provided... which has Reason constrained to 1024 and Reason to 32768 which we may need to be mindful of and have a solution for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you asking whether the resource name will be machine parse-able out of that? 🤔 but yeah, that idea is that we'll join some information we can get from the resources not ready. it won't be machine parse-able or that was not my intention - message has a goal of being human readable, while reason should be something that machine can rely on. If that's a hard requirement, we can try to make it machine parse-able.

Yeah, good point, we should think about those constraints in implementation. Won't be a problem for a reason as that should be an enum


Setting `Ready` condition will be a responsibility of existing `instance_controller`. It will be watching for all the types in *readiness phase 1 resources* (it’s already watching most of those already so there’s really no additional big performance penalty) and trigger reconcile for owned Instance.

Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to zen-dog's question (which aligns with earlier mention of job pods a little)

additionally... if there is a plan exec that launches a temp resource (a service for a short duration) or pod that finishes... is that resource involved in the status? is there a way to opt-out of being involved? should we design an opt-out?

Readiness of an `Instance` will be communicated via a newly introduced `Status.Conditions` field. The schema of this field will [conform the schema](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1367) recommended by k8s api machinery. The `Type` of the newly added condition will be `Ready`. Condition is supposed to have these three values which will have the following meaning:
True - the last observation of readiness on this instance is that it’s ready
False - the last observation of readiness on this instance is that it’s NOT ready
Unknown - deploy/upgrade/update plan is running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, i'm the last to complain about a "troolean" ( A Tri-state-boolean :D )...

Unknown makes sense with the quite from the API design you gave, maybe we can include that with a source in the KEP? I feel like InProgress would make more sense for a running plan, and Unknown for when we can't determine the actual health state of any resource, but maybe that's encoding too much information in the "Ready" condition.

lastTransitionTime: 2018-01-01T00:00:00Z
```

Example of Instance status being NOT ready:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an example of Unknown?

apiVersion: kudo.dev/v1beta1
kind: Instance
Status:
  Conditions:
    - type: Ready                     
      status: "Unknown"
      lastTransitionTime: 2018-01-01T00:00:00Z
      Reason: PlanRunning
      Message: Plan 'deploy' is currently running


Ready is expected to be an oscillating condition and it will indicate the resources owned by the instance were believed to be ready at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).

Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness on a side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym with "checking readiness on a side"? Do you mean "It feels redundant to be also checking for readiness additionally to the health checks of the plan execution"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thank you. That was a nice piece of czenglish there :)


Setting `Ready` condition will be a responsibility of existing `instance_controller`. It will be watching for all the types in *readiness phase 1 resources* (it’s already watching most of those already so there’s really no additional big performance penalty) and trigger reconcile for owned Instance.

Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's enough to simply pull all resources and filter for labels. The issue is, that the deploy plan may have multiple steps that deploy resources, but at any given time the plan can land in FATAL_ERROR and cancel the execution of subsequent steps.

If we were to pull only deployed resources with a filter, we could end up with a half-executed deploy plan in FATAL_ERROR, but "Ready" condition reporting "True" because all deployed resources are healthy - But the operator itself isn't.

I think this touches a lot of the problems we'll see with drift detection. I'm not sure if opt-out or opt-in will be the better solution here. Maybe only apply to the "deploy" plan by default, and allow other plans to opt-in?


Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.

From all these resources it would compute readiness based on `Condition: Ready`, `Condition: Available` or other fields based on the convention of that particular type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a good idea for phase 1. We can always add it later, but I think it'll require custom code and more than Condition checking.

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for waiting for at least 1 full round trip of discussion on this..
The direction makes sense and adds a good increment of progress

nice stuff!

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
@alenkacz
Copy link
Contributor Author

I'll merge this and extend the opt-ins and annotation filtering in the next PR

@alenkacz alenkacz merged commit 6d96f8e into main Sep 29, 2020
@alenkacz alenkacz deleted the av/kep-34-2 branch September 29, 2020 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants