Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to wait for conditions from kubectl and using the API #1899

Closed
smarterclayton opened this issue Oct 20, 2014 · 74 comments
Closed
Labels
area/kubectl area/usability lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@smarterclayton
Copy link
Contributor

Spawned from #1325

It should be easy for users do two things:

  • Create new resources
  • Determine when they are "ready"

Readiness (#620) is a complex topic, and readiness can mean different things in different contexts. The Kubernetes client CLI and client library (see pkg/client/conditions.go) should provide tools for common readiness conditions and enable developers and administrators to easily script more complex readiness. This issue only covers client side readiness - server side readiness should be handled elsewhere.

Readiness must have an explicit upper bound (the system may never converge) - probably manifested as a maximum timeout. Certain errors may be transient (network, server) and some fatal (resource deleted?). It should be possible for end users to understand the ways that readiness can fail and work through those conditions.

Most resources are likely to have an implicit "ready" state:

  • pods - when the state is "Running", "Succeeded", or "Failed", depending on the restart policy
  • replicationControllers - when there are enough pods in running state to satisfy the label query
  • services - when at least one pod is running and reachable via the service?

However, readiness can vary in infinitely complex ways

  • services - user must wait for 2 pods to be running for HA, or pods must be running in X zones
  • two services must be running and serving requests (web frontend and database tier) AND the backend database must have its schema created and at the latest version

It should be possible for users to define their own client ready conditions via scripting (potentially outside of kubectl), as long as the tools kubectl provide a common layer for behavior.

Possible CLI examples:

$ kubectl create -c foo.json --wait
$ kubectl create -c foo.json --wait=1m
$ kubectl get pod my-pod --wait --wait-for="pod-running"
$ kubectl get pod my-pod --wait --wait-for --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'

Things I'd like to avoid end users doing:

  • Bash for | grep loops on output as much as possible
  • Implementing bash timeout logic
  • Describing common yet complex conditions (replication controller at desired state) in template logic
@smarterclayton
Copy link
Contributor Author

@fabianofranz

@smarterclayton
Copy link
Contributor Author

Also @ghodss @bgrant0607

@jcantrill
Copy link

@smarterclayton What is described here, specifically in the CLI examples, speaks to the various individual pieces, but is there a need to wait for a group of related resources to achieve a desired state? Examples: a service, pod, controller or 2 different pods that work in conjunction with one another.

@smarterclayton
Copy link
Contributor Author

I doubt that's the first thing we would need. In many cases you can just wait linearly (wait X, wait Y). Can you describe some concrete examples of multipod coordination that waiting would be needed on?

On Oct 21, 2014, at 8:44 AM, Jeff Cantrill notifications@github.com wrote:

@smarterclayton What is described here, specifically in the CLI examples, speaks to the various individual pieces, but is there a need to wait for a group of related resources to achieve a desired state? Examples: a service, pod, controller or 2 different pods that work in conjunction with one another.


Reply to this email directly or view it on GitHub.

@jcantrill
Copy link

@smarterclayton I'm thinking of the case were I apply a config and I want to wait until the results of that operation are 'ready'. The only other example I can think of for multipod is maybe a messaging system of some kind.

@smarterclayton
Copy link
Contributor Author

Config is special, because config is just applying the same action to each individual component, or defining your own ready state at the end. If you need multi step behavior, you already implicitly need a way to describe sequential stepwise logic, and that is (in the short term) where you punt to shell, or come up with a way to express readiness more simply in your app

On Oct 21, 2014, at 9:25 AM, Jeff Cantrill notifications@github.com wrote:

@smarterclayton I'm thinking of the case were I apply a config and I want to wait until the results of that operation are 'ready'. The only other example I can think of for multipod is maybe a messaging system of some kind.


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member

I'm supportive of building in a mechanism to probe container/pod readiness (#620), similar to liveness. This information is needed by a wide variety of systems and tools, including services and perhaps replication controllers.

Similarly, we'll need a way to aggregate per-pod readiness for sets of pods identified by a label selectors. This perhaps could be returned in service and/or replication controller status.

Requirements such as requiring N instances to be ready are also very common. At least all systems that cause disruptions (e.g., rolling updates) need to be aware of them. I'd express this as an independent disruption policy with a label selector. Because an absolute N isn't very friendly to auto-scaling, there should also be ways to specify a percentage ready or max not ready.

I can also see the utility of waiting for a variety of conditions in the client, including various flavors of readiness and also termination. Is that format template syntax some standard or made up?

@smarterclayton
Copy link
Contributor Author

The template listed is a mode of output using golang templates from kubecfg and kubectl - I'm not sure it's the best tool but I doubt we want to invent a custom query syntax. The template is probably how you'd script this today in bash without requiring an external tool like jq.

Having the server support simple label query conditions for readiness would simplify common clients, at the expense of flexibility. Would prefer never to wait on the server of course.

On Oct 21, 2014, at 7:18 PM, bgrant0607 notifications@github.com wrote:

I'm supportive of building in a mechanism to probe container/pod readiness (#620), similar to liveness. This information is needed by a wide variety of systems and tools, including services and perhaps replication controllers.

Similarly, we'll need a way to aggregate per-pod readiness for sets of pods identified by a label selectors. This perhaps could be returned in service and/or replication controller status.

Requirements such as requiring N instances to be ready are also very common. At least all systems that cause disruptions (e.g., rolling updates) need to be aware of them. I'd express this as an independent disruption policy with a label selector. Because an absolute N isn't very friendly to auto-scaling, there should also be ways to specify a percentage ready or max not ready.

I can also see the utility of waiting for a variety of conditions in the client, including various flavors of readiness and also termination. Is that format template syntax some standard or made up?


Reply to this email directly or view it on GitHub.

@ghodss
Copy link
Contributor

ghodss commented Oct 22, 2014

I wonder if this can be broken out into its own command instead of building it into create, get, etc. It may not be quite as convenient but I think the wins in simplifying the kubectl interface, the implementation, and providing more cohesive building blocks for people's scripts may be worth it.

kubectl wait <condition> [<param1> <param2> ...]
$ kubectl wait pod-running my-pod
$ kubectl wait pod-running @my-pod-id
$ kubectl create -c foo.json | kubectl wait pod-running -  # accept from pod name/ID from stdin
$ kubectl wait pod-template my-pod --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'

Maybe you could also define custom s in plugins or config files. WDYT?

@jcantrill
Copy link

Why could we not provide both? Waiting for the condition (or not) with
one of the other commands is a matter of doing it (or not) if the flag
is there. One issue I see is if you could optionally provide a template
for the wait but this somehow interferes with a template required for
the core action command. However, spinning it off into a separate
command may allow for more complex conditions, but would this just mean
we would be better off allowing a client to craft their own waits

On 10/21/2014 11:29 PM, Sam Ghods wrote:

I wonder if this can be broken out into its own command instead of
building it into create, get, etc. It may not be /quite/ as convenient
but I think the wins in simplifying the kubectl interface, the
implementation, and providing more cohesive building blocks for people's
scripts may be worth it.

|kubectl wait [ ...]
|

|$ kubectl wait pod-running my-pod
$ kubectl wait pod-running @my-pod-id
$ kubectl create -c foo.json | kubectl wait pod-running - # accept from pod name/ID from stdin
$ kubectl wait pod-template my-pod --format-template='{{ if .Status.Condition == "Running" }}1{{ else }}0{{ end }}'
|

Maybe you could also define custom s in plugins or config files. WDYT?


Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com

@ghodss
Copy link
Contributor

ghodss commented Oct 22, 2014

We could. I still think there's a tradeoff in making the interface, documentation, etc. simpler if they're separate commands that can be chained. At the very least, we could start with just a wait subcommand and then add it into the other subcommands if there's enough need.

@fabianofranz
Copy link
Contributor

I tend to prefer the flag syntax for its user-friendliness but I agree with the arguments around cohesion and simplicity (of code) of a separate wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for <condition>, something like resource-state (as in pod-running) keeping the same naming conventions already used in other parts of kubectl. For example, a pod should accept either pods, pod or po as in kubectl.go#resolveResource.

@thockin
Copy link
Member

thockin commented Oct 24, 2014

Also consistency around stdin. I think other places use -f - for stdin,
which implies general file-input is legit.

I would suggest something other than - here, since that is used in legit
names and flags. pod:running or pod=running or pod.running all read better
to my eyes.

On Thu, Oct 23, 2014 at 4:47 PM, Fabiano Franz notifications@github.com
wrote:

I tend to prefer the flag syntax for its user-friendliness but I agree
with the arguments around cohesion and simplicity (of code) of a separate
wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for
, something like resource-state (as in pod-running) keeping
the same naming conventions already used in other parts of kubectl. For
example, a pod should accept either pods, pod or po as in
kubectl.go#resolveResource
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/kubectl.go#L158
.

Reply to this email directly or view it on GitHub
#1899 (comment)
.

@jcantrill
Copy link

+1 for pod:running or pod=running. I think it also reads better and better implies what thing in what desired state

On Oct 24, 2014, at 1:07 AM, Tim Hockin notifications@github.com wrote:

Also consistency around stdin. I think other places use -f - for stdin,
which implies general file-input is legit.

I would suggest something other than - here, since that is used in legit
names and flags. pod:running or pod=running or pod.running all read better
to my eyes.

On Thu, Oct 23, 2014 at 4:47 PM, Fabiano Franz notifications@github.com
wrote:

I tend to prefer the flag syntax for its user-friendliness but I agree
with the arguments around cohesion and simplicity (of code) of a separate
wait command, as long as pipes are supported and we keep flags as a TODO.

If we decide to go that path we should rely on a syntax pattern for
, something like resource-state (as in pod-running) keeping
the same naming conventions already used in other parts of kubectl. For
example, a pod should accept either pods, pod or po as in
kubectl.go#resolveResource
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/kubectl.go#L158
.

Reply to this email directly or view it on GitHub
#1899 (comment)
.


Reply to this email directly or view it on GitHub.

@jcantrill
Copy link

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>) <state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>) --template=<exit conditions>

Possible Examples:

wait service -t 5m foobar    #state is implied? (port is available default e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5           #minimum of 5
wait replicationController -t 5s foobar 5!          #exactly 5
wait pod -t 12h foobar running                    #need to specify state?  would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types
• Assume resources that support a 'Status' will inherently support wait.
• Provide list of status that terminate wait early (e.g. failed)
• Determine default wait status for a resource type (can/should we standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait conditions

@thockin
Copy link
Member

thockin commented Nov 7, 2014

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)] |(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)] |(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

  • Does not seem to make sense to support 'wait' for all resource types
  • Assume resources that support a 'Status' will inherently support wait.
  • Provide list of status that terminate wait early (e.g. failed)
  • Determine default wait status for a resource type (can/should we
    standardize?)
  • Providing a template super cedes status conditions
  • pkg/client/conditions.go example of utilizing closure for wait conditions

Reply to this email directly or view it on GitHub
#1899 (comment)
.

@erictune
Copy link
Member

erictune commented Nov 7, 2014

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

  • Does not seem to make sense to support 'wait' for all resource types
  • Assume resources that support a 'Status' will inherently support wait.
  • Provide list of status that terminate wait early (e.g. failed)
  • Determine default wait status for a resource type (can/should we
    standardize?)
  • Providing a template super cedes status conditions
  • pkg/client/conditions.go example of utilizing closure for wait
    conditions

Reply to this email directly or view it on GitHub
<
#1899 (comment)

.


Reply to this email directly or view it on GitHub
#1899 (comment)
.

@smarterclayton
Copy link
Contributor Author

When I did some of the original conditional waiting in the code (pkg/client/conditions.go) it was obvious there were categories of waits that were common that you could easily agree "yes this is a valid thing to wait for". They weren't truly generic, but they potentially could be.

I think service readiness is the complex part. Service readiness might be a concept we're able to discuss. Could we define a readiness check type on a service that looks (but is not exactly) like what containers have? Needs X, needs Y, etc? Key difference is services transition between ready and not ready.

----- Original Message -----

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is an
actual expression evaluator. Can we build on Go's format package and embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

  • Does not seem to make sense to support 'wait' for all resource types
  • Assume resources that support a 'Status' will inherently support wait.
  • Provide list of status that terminate wait early (e.g. failed)
  • Determine default wait status for a resource type (can/should we
    standardize?)
  • Providing a template super cedes status conditions
  • pkg/client/conditions.go example of utilizing closure for wait
    conditions

Reply to this email directly or view it on GitHub
<
#1899 (comment)

.


Reply to this email directly or view it on GitHub
#1899 (comment)
.


Reply to this email directly or view it on GitHub:
#1899 (comment)

@smarterclayton
Copy link
Contributor Author

----- Original Message -----

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]  <name>|(<label_key>:<label_value>)
--template=<exit conditions>

Possible Examples:

wait service -t 5m foobar    #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5           #minimum of 5
wait replicationController -t 5s foobar 5!          #exactly 5
wait pod -t 12h foobar running                    #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types

There are likely resources that have no meaningful wait. Each resource should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this condition met" and "here's the maximum I'll wait" - nothing else.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait conditions


Reply to this email directly or view it on GitHub:
#1899 (comment)

@thockin
Copy link
Member

thockin commented Nov 7, 2014

The problem with these condition codes is that they are very coarse, and
not always obvious.

If a replication controller is "pending" it means either it has not started
or it has observed a degraded number of pods. Can't tell which. And that
only addresses "readiness" not any other condition, but maybe that is OK?

And what does it mean for a service to be ready? Some number of
endpoints? ServiceStatus should probably report that number...

On Fri, Nov 7, 2014 at 9:09 AM, Eric Tune notifications@github.com wrote:

I like your line of reasoning Tim about several different object having
watchable statuses.
I think this could tie in nicely with "readiness checks" when we get
around
to that.
Although that could get complex, as services could independently meet or
not meet their # running criteria, and their # ready-to-serve criteria.
Ugh. Sorry for muddying the waters.

On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com
wrote:

Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's
inconsistent at best and totally random at worst.

What if we instead consider the Pod status codes here: pending, running,
succeeded, failed? Can we assert that every object follows these?

A ReplicationController is pending as long it has not met its N.
A Service is Running as soon as it is created.
A pod, well, that's obvious.

Now that I spelled it out, I don't like it so much. The alternative is
an
actual expression evaluator. Can we build on Go's format package and
embed
a simple expression system?

On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com

wrote:

Trying to summarize the context of this discussion:

Possible Syntax:

wait [(-t|--max-time=)]
|(<label_key>:<label_value>) [replicas]
wait [(-t|--max-time=)]
|(<label_key>:<label_value>) --template=

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available
default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state? would we really
care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

  • Does not seem to make sense to support 'wait' for all resource types
  • Assume resources that support a 'Status' will inherently support
    wait.
  • Provide list of status that terminate wait early (e.g. failed)
  • Determine default wait status for a resource type (can/should we
    standardize?)
  • Providing a template super cedes status conditions
  • pkg/client/conditions.go example of utilizing closure for wait
    conditions

Reply to this email directly or view it on GitHub
<

#1899 (comment)

.

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1899#issuecomment-62175754>

.

Reply to this email directly or view it on GitHub
#1899 (comment)
.

@jcantrill
Copy link

On 11/07/2014 12:24 PM, Clayton Coleman wrote:

----- Original Message -----

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
--template=<exit conditions>

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types

There are likely resources that have no meaningful wait. Each resource
should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so
it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this
condition met" and "here's the maximum I'll wait" - nothing else.

I mean here more of something at would be internal to the
implementation. Why would we continue to wait for a 'failed' pod to get
to running when it never will. A short circuit mechanism seams in order.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait
conditions


Reply to this email directly or view it on GitHub:

#1899 (comment)


Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com

@smarterclayton
Copy link
Contributor Author

----- Original Message -----

On 11/07/2014 12:24 PM, Clayton Coleman wrote:

----- Original Message -----

Trying to summarize the context of this discussion:

Possible Syntax:

wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
<state> [replicas]
wait <resource> [(-t|--max-time=)<sec>]
<name>|(<label_key>:<label_value>)
--template=<exit conditions>

Possible Examples:

wait service -t 5m foobar #state is implied? (port is available default
e.g. 404 vs ready? 200 returns SOMETHING)
wait replicationController -t 5s foobar 5 #minimum of 5
wait replicationController -t 5s foobar 5! #exactly 5
wait pod -t 12h foobar running #need to specify state?
would we really care about others except to exit early
wait build foobar completed
wait deployment foo:bar failed

• Does not seem to make sense to support 'wait' for all resource types

There are likely resources that have no meaningful wait. Each resource
should justify whether it can wait.

• Assume resources that support a 'Status' will inherently support wait.

Status implies that a resource has a desired state and a delta state, so
it does seem that anything with Status is implicitly waitable.

• Provide list of status that terminate wait early (e.g. failed)

Can you define what you mean by this? To me, wait should be "is this
condition met" and "here's the maximum I'll wait" - nothing else.

I mean here more of something at would be internal to the
implementation. Why would we continue to wait for a 'failed' pod to get
to running when it never will. A short circuit mechanism seams in order.

Ok. Being aware of the state machine of the resource type is required - if you ask for "Running" but are in "Failed" you're right, you should stop. If you ask for "Failed" and you're in "Running", you keep waiting. While that couples us to knowing about the state machine, we specifically designed it to be simple enough to do this.

• Determine default wait status for a resource type (can/should we
standardize?)
• Providing a template super cedes status conditions
• pkg/client/conditions.go example of utilizing closure for wait
conditions


Reply to this email directly or view it on GitHub:

#1899 (comment)


Reply to this email directly or view it on GitHub
#1899 (comment).

Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
Red Hat
Office: 703-748-4420 | 866-546-8970 ext. 8162420
jcantril@redhat.com
http://www.redhat.com


Reply to this email directly or view it on GitHub:
#1899 (comment)

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 4, 2014
@bgrant0607 bgrant0607 removed the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 11, 2014
@erictune
Copy link
Member

The Smith Resource Manager was presented at Sig-Apps on Jul 24th. Smith handles readiness, for creation-order-dependcies, by having a function for each resource type. For TPRs you can specify a field, which, if present, indicates readiness. @ash2k hope I got that somewhat right.

@ash2k
Copy link
Member

ash2k commented Jul 25, 2017

@erictune yes, that is correct. More information is in the readme https://github.com/atlassian/smith

Please note that not all object kinds are supported right now but it is trivial to add support. Also there are some other limitations (see issues).
Anyone interested is welcome to contribute to the project - open issues/PRs.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jul 25, 2017 via email

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jul 25, 2017 via email

@cjw296
Copy link

cjw296 commented Aug 24, 2017

Would be fantastic to have this for running one-shot database migration jobs that need to run to completion before doing a rolling update of a web app. Guess I'll go hack up a poll/wait loop in the meantime ;-)

@mml
Copy link
Contributor

mml commented Sep 21, 2017

I know this is an old one, but here's another upvote. I've found another case where this would be quite useful: running conformance tests inside a cluster, it would be nice to know when the job has completed.

@bgrant0607
Copy link
Member

A related issue is that we don't make it easy to determine success or failure of a change, much less wait for it: #34363

@jpza
Copy link

jpza commented Dec 11, 2017

Bumping this

@zoobab
Copy link

zoobab commented Dec 19, 2017

Damn this issue was created in 2014.

I need to wait for a Container to be in the state Running, will return to a shell loop watching the state.

@erikbgithub
Copy link

@zoobab also you probably want to check for the numbers x/y are the same. It's possible that the State is running but not all containers in the pod are available already. For instance 0/1 Running is totally possible. Best is probably to check healthz and logs.

@daniilyar
Copy link

daniilyar commented Feb 26, 2018

One more example why kubectl is hard to integrate into automation scripts.

Imagine that you run:

kubectl run -it --image=< image > -- echo "run some actions and exit"

And the < image > does not exist in the registry for some reason. This is a very common case.
In this case the pod in K8 immediately fails with 'ErrImagePull'. But kubectl hangs forever because it blindly waits for a 'Running' status until the oceans dry up.

If you have a bash script running this command, the while loops in bash script will not help you because run -it is a blocking operation.

The only workaround for that if you use bash is to fork the asynshronous process from your script JUST to check for the pod status in a smart enough way.

Kubernetes is an awesome product, but it's unfriendliness to be maintained from automation scripts is a bit strange. Infinite kubectl timeout, impossibility to wait on entity status - all these minor things makes Kubernetes much less automation friendly that it could be.

@spiffxp spiffxp removed the triaged label Mar 16, 2018
@bgrant0607 bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 1, 2018
@deads2k
Copy link
Contributor

deads2k commented May 25, 2018

kubectl wait --for=delete resource/foo and kubectl wait --for=condition=available --timeout=60s resource-string(deployment)/foo exists after #64034

@chucky2305
Copy link

Either i am doing something wrong or i misunderstood the concept behind kubectl wait.

I use kubectl wait in my gitlab ci pipeline.

Stage 1: Deploy my app
Stage 2: Detect if the app is deployed and running.

In Stage 2 i wanted to use
kubectl wait --timeout=60s --for condition=ready pod -l release=branch123,pod-name=php-nginx

When the deployment is not ready and no pod can be found with a labels "release=branch123" and "pod-name=php-nginx" kubectl wait ignores the define timeout (60 seconds) in my example.
Why is this so? I was expecting to always wait 60 seconds.

I hope my request is clear. Maybe i am just doing something wrong here?

@Shivang44
Copy link

Shivang44 commented Jul 20, 2018

@chucky2305 Yeah, unfortunately it instantly returns if the selector does not match (I made a bug report for that #66456). Basically, it will only wait if that resource exists. It will not wait on nonexistent resources.

What you could do (I believe) is put those labels on the deployment, and wait until the deployment has a condition of available. The benefit of this is, since the deployment does exist, it won't instantly return.

So your command would be:

kubectl wait deployment --timeout=60s --for condition=available -l label1=value1,label2=value2

@protometa
Copy link

For fellow noobs from google wondering how to wait for a deployment update, rollout status is actually the way to go:

kubectl set image deployment/my-deployment my-container=my-image:tag
kubectl rollout status deployment/my-deployment

@aakarshg
Copy link

Most of what was mentioned above seemed to have been accomplished, the only thing that I'm still finding myself using grep is for checking pod's phase. Not sure how it can be done purely using kubectl without the need for grep.

@soltysh
Copy link
Contributor

soltysh commented Apr 2, 2019

What about kubectl get pods -o jsonpath --template='{.items[].metadata.name} : {.items[].status.phase}'

bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Feb 28, 2024
…no-ocp-master

OCPBUGS-10996: Fix race condition between resizer and kubelet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl area/usability lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Development

No branches or pull requests