Allow users to wait for conditions from kubectl and using the API #1899

smarterclayton · 2014-10-20T15:22:42Z

Spawned from #1325

It should be easy for users do two things:

Create new resources
Determine when they are "ready"

Readiness (#620) is a complex topic, and readiness can mean different things in different contexts. The Kubernetes client CLI and client library (see pkg/client/conditions.go) should provide tools for common readiness conditions and enable developers and administrators to easily script more complex readiness. This issue only covers client side readiness - server side readiness should be handled elsewhere.

Readiness must have an explicit upper bound (the system may never converge) - probably manifested as a maximum timeout. Certain errors may be transient (network, server) and some fatal (resource deleted?). It should be possible for end users to understand the ways that readiness can fail and work through those conditions.

Most resources are likely to have an implicit "ready" state:

pods - when the state is "Running", "Succeeded", or "Failed", depending on the restart policy
replicationControllers - when there are enough pods in running state to satisfy the label query
services - when at least one pod is running and reachable via the service?

However, readiness can vary in infinitely complex ways

services - user must wait for 2 pods to be running for HA, or pods must be running in X zones
two services must be running and serving requests (web frontend and database tier) AND the backend database must have its schema created and at the latest version

It should be possible for users to define their own client ready conditions via scripting (potentially outside of kubectl), as long as the tools kubectl provide a common layer for behavior.

Possible CLI examples:

$ kubectl create -c foo.json --wait
$ kubectl create -c foo.json --wait=1m
$ kubectl get pod my-pod --wait --wait-for="pod-running"
$ kubectl get pod my-pod --wait --wait-for --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'

Things I'd like to avoid end users doing:

Bash for | grep loops on output as much as possible
Implementing bash timeout logic
Describing common yet complex conditions (replication controller at desired state) in template logic

The text was updated successfully, but these errors were encountered:

smarterclayton · 2014-10-20T15:24:02Z

@fabianofranz

smarterclayton · 2014-10-20T17:13:46Z

Also @ghodss @bgrant0607

jcantrill · 2014-10-21T12:44:14Z

@smarterclayton What is described here, specifically in the CLI examples, speaks to the various individual pieces, but is there a need to wait for a group of related resources to achieve a desired state? Examples: a service, pod, controller or 2 different pods that work in conjunction with one another.

smarterclayton · 2014-10-21T13:02:07Z

I doubt that's the first thing we would need. In many cases you can just wait linearly (wait X, wait Y). Can you describe some concrete examples of multipod coordination that waiting would be needed on?

On Oct 21, 2014, at 8:44 AM, Jeff Cantrill notifications@github.com wrote:

@smarterclayton What is described here, specifically in the CLI examples, speaks to the various individual pieces, but is there a need to wait for a group of related resources to achieve a desired state? Examples: a service, pod, controller or 2 different pods that work in conjunction with one another.

—
Reply to this email directly or view it on GitHub.

jcantrill · 2014-10-21T13:25:04Z

@smarterclayton I'm thinking of the case were I apply a config and I want to wait until the results of that operation are 'ready'. The only other example I can think of for multipod is maybe a messaging system of some kind.

smarterclayton · 2014-10-21T14:26:31Z

Config is special, because config is just applying the same action to each individual component, or defining your own ready state at the end. If you need multi step behavior, you already implicitly need a way to describe sequential stepwise logic, and that is (in the short term) where you punt to shell, or come up with a way to express readiness more simply in your app

On Oct 21, 2014, at 9:25 AM, Jeff Cantrill notifications@github.com wrote:

@smarterclayton I'm thinking of the case were I apply a config and I want to wait until the results of that operation are 'ready'. The only other example I can think of for multipod is maybe a messaging system of some kind.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2014-10-21T23:18:27Z

I'm supportive of building in a mechanism to probe container/pod readiness (#620), similar to liveness. This information is needed by a wide variety of systems and tools, including services and perhaps replication controllers.

Similarly, we'll need a way to aggregate per-pod readiness for sets of pods identified by a label selectors. This perhaps could be returned in service and/or replication controller status.

Requirements such as requiring N instances to be ready are also very common. At least all systems that cause disruptions (e.g., rolling updates) need to be aware of them. I'd express this as an independent disruption policy with a label selector. Because an absolute N isn't very friendly to auto-scaling, there should also be ways to specify a percentage ready or max not ready.

I can also see the utility of waiting for a variety of conditions in the client, including various flavors of readiness and also termination. Is that format template syntax some standard or made up?

smarterclayton · 2014-10-22T00:31:37Z

The template listed is a mode of output using golang templates from kubecfg and kubectl - I'm not sure it's the best tool but I doubt we want to invent a custom query syntax. The template is probably how you'd script this today in bash without requiring an external tool like jq.

Having the server support simple label query conditions for readiness would simplify common clients, at the expense of flexibility. Would prefer never to wait on the server of course.

On Oct 21, 2014, at 7:18 PM, bgrant0607 notifications@github.com wrote:

I'm supportive of building in a mechanism to probe container/pod readiness (#620), similar to liveness. This information is needed by a wide variety of systems and tools, including services and perhaps replication controllers.

Similarly, we'll need a way to aggregate per-pod readiness for sets of pods identified by a label selectors. This perhaps could be returned in service and/or replication controller status.

Requirements such as requiring N instances to be ready are also very common. At least all systems that cause disruptions (e.g., rolling updates) need to be aware of them. I'd express this as an independent disruption policy with a label selector. Because an absolute N isn't very friendly to auto-scaling, there should also be ways to specify a percentage ready or max not ready.

I can also see the utility of waiting for a variety of conditions in the client, including various flavors of readiness and also termination. Is that format template syntax some standard or made up?

—
Reply to this email directly or view it on GitHub.

ghodss · 2014-10-22T03:29:41Z

I wonder if this can be broken out into its own command instead of building it into create, get, etc. It may not be quite as convenient but I think the wins in simplifying the kubectl interface, the implementation, and providing more cohesive building blocks for people's scripts may be worth it.

kubectl wait <condition> [<param1> <param2> ...]

$ kubectl wait pod-running my-pod
$ kubectl wait pod-running @my-pod-id
$ kubectl create -c foo.json | kubectl wait pod-running -  # accept from pod name/ID from stdin
$ kubectl wait pod-template my-pod --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'

Maybe you could also define custom s in plugins or config files. WDYT?

jcantrill · 2014-10-22T13:00:21Z

Why could we not provide both? Waiting for the condition (or not) with
one of the other commands is a matter of doing it (or not) if the flag
is there. One issue I see is if you could optionally provide a template
for the wait but this somehow interferes with a template required for
the core action command. However, spinning it off into a separate
command may allow for more complex conditions, but would this just mean
we would be better off allowing a client to craft their own waits

On 10/21/2014 11:29 PM, Sam Ghods wrote:

I wonder if this can be broken out into its own command instead of
building it into create, get, etc. It may not be /quite/ as convenient
but I think the wins in simplifying the kubectl interface, the
implementation, and providing more cohesive building blocks for people's
scripts may be worth it.

|kubectl wait [ ...]
|

|$ kubectl wait pod-running my-pod
$ kubectl wait pod-running @my-pod-id
$ kubectl create -c foo.json | kubectl wait pod-running - # accept from pod name/ID from stdin
$ kubectl wait pod-template my-pod --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'
|

Maybe you could also define custom s in plugins or config files. WDYT?

—
Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com

ghodss · 2014-10-22T17:46:07Z

We could. I still think there's a tradeoff in making the interface, documentation, etc. simpler if they're separate commands that can be chained. At the very least, we could start with just a wait subcommand and then add it into the other subcommands if there's enough need.

fabianofranz · 2014-10-23T23:47:00Z

I tend to prefer the flag syntax for its user-friendliness but I agree with the arguments around cohesion and simplicity (of code) of a separate wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for <condition>, something like resource-state (as in pod-running) keeping the same naming conventions already used in other parts of kubectl. For example, a pod should accept either pods, pod or po as in kubectl.go#resolveResource.

thockin · 2014-10-24T05:06:44Z

Also consistency around stdin. I think other places use -f - for stdin,
which implies general file-input is legit.

I would suggest something other than - here, since that is used in legit
names and flags. pod:running or pod=running or pod.running all read better
to my eyes.

On Thu, Oct 23, 2014 at 4:47 PM, Fabiano Franz notifications@github.com
wrote:

I tend to prefer the flag syntax for its user-friendliness but I agree
with the arguments around cohesion and simplicity (of code) of a separate
wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for
, something like resource-state (as in pod-running) keeping
the same naming conventions already used in other parts of kubectl. For
example, a pod should accept either pods, pod or po as in
kubectl.go#resolveResource
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/kubectl.go#L158
.

Reply to this email directly or view it on GitHub
#1899 (comment)
.

jcantrill · 2014-10-24T12:30:23Z

+1 for pod:running or pod=running. I think it also reads better and better implies what thing in what desired state

On Oct 24, 2014, at 1:07 AM, Tim Hockin notifications@github.com wrote:

Also consistency around stdin. I think other places use -f - for stdin,
which implies general file-input is legit.

I would suggest something other than - here, since that is used in legit
names and flags. pod:running or pod=running or pod.running all read better
to my eyes.

On Thu, Oct 23, 2014 at 4:47 PM, Fabiano Franz notifications@github.com
wrote:

I tend to prefer the flag syntax for its user-friendliness but I agree
with the arguments around cohesion and simplicity (of code) of a separate
wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for
, something like resource-state (as in pod-running) keeping
the same naming conventions already used in other parts of kubectl. For
example, a pod should accept either pods, pod or po as in
kubectl.go#resolveResource
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/kubectl.go#L158
.

Reply to this email directly or view it on GitHub
#1899 (comment)
.

—
Reply to this email directly or view it on GitHub.

jcantrill · 2014-11-07T16:49:55Z

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>) <state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>) --template=<exit conditions>

Possible Examples:

wait service -t 5m foobar    #state is implied? (port is available default e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5           #minimum of 5
wait replicationController -t 5s foobar 5!          #exactly 5
wait pod -t 12h foobar running                    #need to specify state?  would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types
• Assume resources that support a 'Status' will inherently support wait.
• Provide list of status that terminate wait early (e.g. failed)
• Determine default wait status for a resource type (can/should we standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait conditions

thockin · 2014-11-07T17:01:20Z

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)] |(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)] |(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

Does not seem to make sense to support 'wait' for all resource types

Assume resources that support a 'Status' will inherently support wait.

Provide list of status that terminate wait early (e.g. failed)

Determine default wait status for a resource type (can/should we
standardize?)

Providing a template super cedes status conditions

pkg/client/conditions.go example of utilizing closure for wait conditions

Reply to this email directly or view it on GitHub
#1899 (comment)
.

erictune · 2014-11-07T17:09:16Z

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

Does not seem to make sense to support 'wait' for all resource types

Assume resources that support a 'Status' will inherently support wait.

Provide list of status that terminate wait early (e.g. failed)

Determine default wait status for a resource type (can/should we
standardize?)

Providing a template super cedes status conditions

pkg/client/conditions.go example of utilizing closure for wait
conditions

Reply to this email directly or view it on GitHub
<
#1899 (comment)

.

—
Reply to this email directly or view it on GitHub
#1899 (comment)
.

smarterclayton · 2014-11-07T17:22:55Z

When I did some of the original conditional waiting in the code (pkg/client/conditions.go) it was obvious there were categories of waits that were common that you could easily agree "yes this is a valid thing to wait for". They weren't truly generic, but they potentially could be.

I think service readiness is the complex part. Service readiness might be a concept we're able to discuss. Could we define a readiness check type on a service that looks (but is not exactly) like what containers have? Needs X, needs Y, etc? Key difference is services transition between ready and not ready.

----- Original Message -----

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

Does not seem to make sense to support 'wait' for all resource types

Assume resources that support a 'Status' will inherently support wait.

Provide list of status that terminate wait early (e.g. failed)

Determine default wait status for a resource type (can/should we
standardize?)

Providing a template super cedes status conditions

pkg/client/conditions.go example of utilizing closure for wait
conditions

Reply to this email directly or view it on GitHub
<
#1899 (comment)

.

—
Reply to this email directly or view it on GitHub
#1899 (comment)
.

Reply to this email directly or view it on GitHub:
#1899 (comment)

smarterclayton · 2014-11-07T17:24:18Z

----- Original Message -----

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>)
--template=<exit conditions>

Possible Examples:

wait service -t 5m foobar    #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5           #minimum of 5
wait replicationController -t 5s foobar 5!          #exactly 5
wait pod -t 12h foobar running                    #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types

There are likely resources that have no meaningful wait. Each resource should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this condition met" and "here's the maximum I'll wait" - nothing else.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait conditions

Reply to this email directly or view it on GitHub:
#1899 (comment)

thockin · 2014-11-07T17:31:10Z

The problem with these condition codes is that they are very coarse, and
not always obvious.

If a replication controller is "pending" it means either it has not started
or it has observed a degraded number of pods. Can't tell which. And that
only addresses "readiness" not any other condition, but maybe that is OK?

And what does it mean for a service to be ready? Some number of
endpoints? ServiceStatus should probably report that number...

On Fri, Nov 7, 2014 at 9:09 AM, Eric Tune notifications@github.com wrote:

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get
around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com
wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is
an
actual expression evaluator. Can we build on Go's format package and
embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com

wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available
default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

Does not seem to make sense to support 'wait' for all resource types

Assume resources that support a 'Status' will inherently support
wait.

Provide list of status that terminate wait early (e.g. failed)

Determine default wait status for a resource type (can/should we
standardize?)

Providing a template super cedes status conditions

pkg/client/conditions.go example of utilizing closure for wait
conditions

Reply to this email directly or view it on GitHub
<

#1899 (comment)

.

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1899#issuecomment-62175754>

.

Reply to this email directly or view it on GitHub
#1899 (comment)
.

jcantrill · 2014-11-07T18:21:02Z

On 11/07/2014 12:24 PM, Clayton Coleman wrote:

----- Original Message -----
Trying to summarize the context of this discussion:

Possible Syntax:
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
--template=<exit conditions>
Possible Examples:
wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed
• Does not seem to make sense to support 'wait' for all resource types
There are likely resources that have no meaningful wait. Each resource
should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so
it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this
condition met" and "here's the maximum I'll wait" - nothing else.

I mean here more of something at would be internal to the
implementation. Why would we continue to wait for a 'failed' pod to get
to running when it never will. A short circuit mechanism seams in order.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait
conditions

Reply to this email directly or view it on GitHub:

#1899 (comment)

—
Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com

smarterclayton · 2014-11-07T18:46:58Z

----- Original Message -----

On 11/07/2014 12:24 PM, Clayton Coleman wrote:
----- Original Message -----
Trying to summarize the context of this discussion:

Possible Syntax:
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
--template=<exit conditions>
Possible Examples:
wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed
• Does not seem to make sense to support 'wait' for all resource types
There are likely resources that have no meaningful wait. Each resource
should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so
it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this
condition met" and "here's the maximum I'll wait" - nothing else.
I mean here more of something at would be internal to the
implementation. Why would we continue to wait for a 'failed' pod to get
to running when it never will. A short circuit mechanism seams in order.

Ok. Being aware of the state machine of the resource type is required - if you ask for "Running" but are in "Failed" you're right, you should stop. If you ask for "Failed" and you're in "Running", you keep waiting. While that couples us to knowing about the state machine, we specifically designed it to be simple enough to do this.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait
conditions

Reply to this email directly or view it on GitHub:

#1899 (comment)

—
Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com

Reply to this email directly or view it on GitHub:
#1899 (comment)

erictune · 2017-07-24T16:37:00Z

The Smith Resource Manager was presented at Sig-Apps on Jul 24th. Smith handles readiness, for creation-order-dependcies, by having a function for each resource type. For TPRs you can specify a field, which, if present, indicates readiness. @ash2k hope I got that somewhat right.

ash2k · 2017-07-25T01:49:31Z

@erictune yes, that is correct. More information is in the readme https://github.com/atlassian/smith

Please note that not all object kinds are supported right now but it is trivial to add support. Also there are some other limitations (see issues).
Anyone interested is welcome to contribute to the project - open issues/PRs.

smarterclayton · 2017-07-25T04:55:21Z

A note of why this is so difficult to get right and should be on the serverside - Smith's deployment readiness is wrong because you can have the correct number of pods but not be within the availability condition (i.e. maxUnavailable can be violated and smith will report ready). Given the complexity of these interactions, I think it's untenable to expect client authors to get this right everytime.

…

On Mon, Jul 24, 2017 at 9:49 PM, Mikhail Mazurskiy ***@***.*** > wrote: @erictune <https://github.com/erictune> yes, that is correct. More information is in the readme https://github.com/atlassian/smith Please note that not all object kinds are supported right now but it is trivial to add support. Also there are some other limitations (see issues). Anyone interested is welcome to contribute to the project - open issues/PRs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1899 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pz8Vz_kA43eZI6iwQ4mBPZ_1wuLoks5sRUnBgaJpZM4CwvMG> .

smarterclayton · 2017-07-25T04:56:22Z

And most people get this wrong, even people who are fairly familiar with deployments :) (no knock on Smith) On Tue, Jul 25, 2017 at 12:55 AM, Clayton Coleman <ccoleman@redhat.com> wrote:

…

A note of why this is so difficult to get right and should be on the serverside - Smith's deployment readiness is wrong because you can have the correct number of pods but not be within the availability condition (i.e. maxUnavailable can be violated and smith will report ready). Given the complexity of these interactions, I think it's untenable to expect client authors to get this right everytime. On Mon, Jul 24, 2017 at 9:49 PM, Mikhail Mazurskiy < ***@***.***> wrote: > @erictune <https://github.com/erictune> yes, that is correct. More > information is in the readme https://github.com/atlassian/smith > > Please note that not all object kinds are supported right now but it is > trivial to add support. Also there are some other limitations (see issues). > Anyone interested is welcome to contribute to the project - open > issues/PRs. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1899 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABG_pz8Vz_kA43eZI6iwQ4mBPZ_1wuLoks5sRUnBgaJpZM4CwvMG> > . >

cjw296 · 2017-08-24T08:44:25Z

Would be fantastic to have this for running one-shot database migration jobs that need to run to completion before doing a rolling update of a web app. Guess I'll go hack up a poll/wait loop in the meantime ;-)

mml · 2017-09-21T20:29:56Z

I know this is an old one, but here's another upvote. I've found another case where this would be quite useful: running conformance tests inside a cluster, it would be nice to know when the job has completed.

bgrant0607 · 2017-09-22T04:57:20Z

A related issue is that we don't make it easy to determine success or failure of a change, much less wait for it: #34363

jpza · 2017-12-11T19:40:19Z

Bumping this

zoobab · 2017-12-19T16:25:28Z

Damn this issue was created in 2014.

I need to wait for a Container to be in the state Running, will return to a shell loop watching the state.

erikbgithub · 2017-12-19T23:54:51Z

@zoobab also you probably want to check for the numbers x/y are the same. It's possible that the State is running but not all containers in the pod are available already. For instance 0/1 Running is totally possible. Best is probably to check healthz and logs.

daniilyar · 2018-02-26T15:27:54Z

One more example why kubectl is hard to integrate into automation scripts.

Imagine that you run:

kubectl run -it --image=< image > -- echo "run some actions and exit"

And the < image > does not exist in the registry for some reason. This is a very common case.
In this case the pod in K8 immediately fails with 'ErrImagePull'. But kubectl hangs forever because it blindly waits for a 'Running' status until the oceans dry up.

If you have a bash script running this command, the while loops in bash script will not help you because run -it is a blocking operation.

The only workaround for that if you use bash is to fork the asynshronous process from your script JUST to check for the pod status in a smart enough way.

Kubernetes is an awesome product, but it's unfriendliness to be maintained from automation scripts is a bit strange. Infinite kubectl timeout, impossibility to wait on entity status - all these minor things makes Kubernetes much less automation friendly that it could be.

deads2k · 2018-05-25T13:52:48Z

kubectl wait --for=delete resource/foo and kubectl wait --for=condition=available --timeout=60s resource-string(deployment)/foo exists after #64034

chucky2305 · 2018-07-05T14:03:18Z

Either i am doing something wrong or i misunderstood the concept behind kubectl wait.

I use kubectl wait in my gitlab ci pipeline.

Stage 1: Deploy my app
Stage 2: Detect if the app is deployed and running.

In Stage 2 i wanted to use
kubectl wait --timeout=60s --for condition=ready pod -l release=branch123,pod-name=php-nginx

When the deployment is not ready and no pod can be found with a labels "release=branch123" and "pod-name=php-nginx" kubectl wait ignores the define timeout (60 seconds) in my example.
Why is this so? I was expecting to always wait 60 seconds.

I hope my request is clear. Maybe i am just doing something wrong here?

Shivang44 · 2018-07-20T16:24:22Z

@chucky2305 Yeah, unfortunately it instantly returns if the selector does not match (I made a bug report for that #66456). Basically, it will only wait if that resource exists. It will not wait on nonexistent resources.

What you could do (I believe) is put those labels on the deployment, and wait until the deployment has a condition of available. The benefit of this is, since the deployment does exist, it won't instantly return.

So your command would be:

kubectl wait deployment --timeout=60s --for condition=available -l label1=value1,label2=value2

protometa · 2018-08-04T18:27:58Z

For fellow noobs from google wondering how to wait for a deployment update, rollout status is actually the way to go:

kubectl set image deployment/my-deployment my-container=my-image:tag
kubectl rollout status deployment/my-deployment

aakarshg · 2019-03-21T08:30:03Z

Most of what was mentioned above seemed to have been accomplished, the only thing that I'm still finding myself using grep is for checking pod's phase. Not sure how it can be done purely using kubectl without the need for grep.

soltysh · 2019-04-02T15:14:44Z

What about kubectl get pods -o jsonpath --template='{.items[].metadata.name} : {.items[].status.phase}'

…no-ocp-master OCPBUGS-10996: Fix race condition between resizer and kubelet

smarterclayton added area/client-libraries area/usability labels Oct 20, 2014

bgrant0607 added the workload/workflow label Oct 20, 2014

bgrant0607 added the area/app-lifecycle label Oct 21, 2014

bgrant0607 mentioned this issue Oct 24, 2014

Add support for applying set of resources via kubectl #1958

Merged

bgrant0607 mentioned this issue Nov 16, 2014

Pod dependencies on services #2385

Closed

bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 4, 2014

bgrant0607 removed the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 11, 2014

erictune mentioned this issue Aug 16, 2017

Should we deprecate status.conditions? #50798

Closed

bgrant0607 mentioned this issue Sep 23, 2017

Surface cloud-provider-specific information in Services of type LoadBalancer #52670

Closed

toddgardner mentioned this issue Oct 12, 2017

Add new option "env-from" for kubectl run command. #48684

Closed

bgrant0607 mentioned this issue Feb 8, 2018

Kubectl reported a Deployment scaled where as replicas are unavailable #55369

Open

spiffxp removed the triaged label Mar 16, 2018

tombentley mentioned this issue Mar 26, 2018

Docker registry strimzi/strimzi-kafka-operator#332

Closed

6 tasks

bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 1, 2018

soltysh mentioned this issue May 18, 2018

add kubectl wait #64034

Merged

deads2k closed this as completed May 25, 2018

fabiand mentioned this issue May 28, 2018

Make virtctl console wait for console to become ready kubevirt/kubevirt#1001

Closed

melissachang mentioned this issue Jan 18, 2019

Kubernetes Deployment Optimization and Improvements DataBiosphere/data-explorer-indexers#114

Merged

yashbhutwala mentioned this issue Jul 2, 2019

Add logic to wait for conditions to be true kubernetes-client/python#863

Closed

bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Feb 28, 2024

Merge pull request kubernetes#1899 from gnufied/backport-node-resize-…

6116860

…no-ocp-master OCPBUGS-10996: Fix race condition between resizer and kubelet

Allow users to wait for conditions from kubectl and using the API #1899

Allow users to wait for conditions from kubectl and using the API #1899

Comments

smarterclayton commented Oct 20, 2014

smarterclayton commented Oct 20, 2014

smarterclayton commented Oct 20, 2014

jcantrill commented Oct 21, 2014

smarterclayton commented Oct 21, 2014

jcantrill commented Oct 21, 2014

smarterclayton commented Oct 21, 2014

bgrant0607 commented Oct 21, 2014

smarterclayton commented Oct 22, 2014

ghodss commented Oct 22, 2014

jcantrill commented Oct 22, 2014

ghodss commented Oct 22, 2014

fabianofranz commented Oct 23, 2014

thockin commented Oct 24, 2014

jcantrill commented Oct 24, 2014

jcantrill commented Nov 7, 2014

thockin commented Nov 7, 2014

erictune commented Nov 7, 2014

smarterclayton commented Nov 7, 2014

smarterclayton commented Nov 7, 2014

thockin commented Nov 7, 2014

jcantrill commented Nov 7, 2014

smarterclayton commented Nov 7, 2014

erictune commented Jul 24, 2017

ash2k commented Jul 25, 2017

smarterclayton commented Jul 25, 2017 via email

smarterclayton commented Jul 25, 2017 via email

cjw296 commented Aug 24, 2017

mml commented Sep 21, 2017

bgrant0607 commented Sep 22, 2017

jpza commented Dec 11, 2017

zoobab commented Dec 19, 2017

erikbgithub commented Dec 19, 2017

daniilyar commented Feb 26, 2018 • edited

deads2k commented May 25, 2018

chucky2305 commented Jul 5, 2018

Shivang44 commented Jul 20, 2018 • edited

protometa commented Aug 4, 2018

aakarshg commented Mar 21, 2019

soltysh commented Apr 2, 2019

daniilyar commented Feb 26, 2018 •

edited

Shivang44 commented Jul 20, 2018 •

edited