Pod state / status #1146
We have a PodStatus type, which is a string, with a few values defined. People will start programming to those strings. We should define a state machine that governs what causes state changes. We did this internally and it was a great help in reasoning about what happens.
If nobody balks, I can port over internal ideas into a design doc for dicsussion.
The text was updated successfully, but these errors were encountered:
Yes, I totally support this. When I implemented Restart Policy support #1147, and think about other features, such as garbage collection, events, etc. I couldn't help to wish we have a state machine defined, and feel that several states are missed from our PodStatus, for example, Error / Failure state. I even started to code with a state machine based on our internal design for Restart. :-)
@thockin There was some discussion about this when I added the binding object.
The summary is, we will have only the three states we have now. Additional states will come in the form of clarification of those main three states, in another field (PodStateReason or some such).
We're trying to avoid a monolithic state machine like the one we have internally. This approach will make it easier to have states that not all components need to understand.
Hmm, I'm not very satisfied with just those 3 states.
Internally (on the node side) we defined a fairly simple state machine in
ACCEPTED: The request to create the entity has been accepted, and a
PREPARING: Machine state is actively being changed to instantiate the
ACTIVE: The entity is ready to be used.
SUCCEEDED: The entity has completed. No extra effort to remove machine
FAILED: The entity has failed. No extra effort to remove machine state is
CLEANING: Machine state is actively being changed to remove the entity.
DEFUNCT: An error was encountered while CLEANING. No extra effort to
CLEANED: The entity no longer has any presence in actual machine state.
We could probably flatten ACCEPTED and PREPARING. These states should be
On Wed, Sep 3, 2014 at 1:29 PM, bgrant0607 firstname.lastname@example.org wrote:
So in the k8s model, we'd have
@bgrant0607 how are we supposed to model containers that are being restarted? Substate under running?
Well, my point was that the distinct states provide useful information that
On Thu, Sep 4, 2014 at 12:16 AM, Daniel Smith email@example.com
Is this just internal Kubelet state, or would it be exposed in Kubelet and/or apiserver APIs?
I'm skeptical of the utility of such fine-grain states in a flat state space, and I think it's a big mistake to expose such a thing in the API. The probability that we'll need to add new states in the future as we add functionality and components is 100% and the likelihood that it would be feasible to extend the set of states is 0%. Every piece of code that looked at the state field would almost certainly need to be changed. The problem is that states within a flat state space have no inherent, inferable semantics, and that resulting code that looks at such states is brittle.
Is a new state a new reason for not-yet-running, a new sort-of-running, or a new termination reason? What about restarting? Different forms of suspended? Waiting for activation by an external workflow, by an auto-sizer filling in resource values, or by a gang scheduler scheduling another pod? How about LOST? Does deliberately stopped count as FAILED? What is the state of a durable pod while the machine is down or unreachable? Is ACCEPTED admitted by the apiserver, or by the Kubelet? What state is the pod in during execution of PreStart, PreStop, or PostStop lifecycle hooks? While failing liveness and/or readiness probes?
Simple states have value to humans who want a high-level understanding of what's happening. The states are of questionable value to our components, however.
I advocate that additional useful information should be conveyed by other fields, such as some kind of one-word
I'm fine with renaming RUNNING to ACTIVE.
On Thu, Sep 4, 2014 at 8:56 AM, bgrant0607 firstname.lastname@example.org wrote:
I completely agree that a fine-grained state machine is a terrible API
Good questions, though many of them apply equally to what we call
To the first part, none of the above are represented (they are all
I'd say "lost" (we expected a pod to be present but it was not) is FAILED.
Stopped is interesting - at the node level we call it FAILED, I think,
If the machine disappears for longer that $timeout, pods become FAILED.
ACCEPTED means the apiserver has ACKed the transaction.
PreStart is new (we did not have events yet) and I would say PREPARING
PreStop is ACTIVE (the main application is in control)
PostStop is "ENDED" (either SUCCEEDED or FAILED) because the main
Failing probes is still ACTIVE (the main application is in control)
With simple semantics, you can know whether, for example, it is
image down load failed - will it be retried or not? Relying on
I'm willing to invest more time in this, thinking and discussing how
This is not about names :) [If anything, RUNNING is better, since it
@thockin I'm supportive of adding more fine-grained reasons, but they shouldn't be considered an exhaustive, finite enumerated list. We WILL need to add more fine distinctions in the future. I'm also fine with keeping the state internally and not exposing it in the APIs.
I could easily imagine inserting a stage before ACCEPTED, for instance, for instances pending an external admission control check.
Treating deliberate stops as failures would be misinterpreted by many tools/systems. And, as described in #137, there isn't a bounded number of reasons why a pod might be deliberately stopped.
My point about the lifecycle hooks was that there will be components that want to know about them, and that the Kubelet itself will need to keep track of whether they have been executed or not. Ditto with failing probes, unresponsive hosts, and several other examples. So, how does one decide what to elevate to states and what not to elevate as states? I suggest we don't elevate fine distinctions to top-level status, but remain open-ended and flexible about reporting the finer distinctions.
image load failed: Not intended to be an API. I intended it to be the cause of a failure (not retried at that level). Maybe it was a bad example since it could relate to retry logic, as discussed in #941 and #1088.
On Thu, Sep 4, 2014 at 1:28 PM, bgrant0607 email@example.com wrote:
IMO, reasons should be orthogonal to state, if we expose state this way.
I'd argue against it - the new state does not offer any semantic value
Yeah, I could see STOPPED as a peer of SUCCEEDED and FAILED.
I think I agree, if I understand. These states are not fine
For example, during task setup internally, we have a number of states
@thockin "reasons should be orthogonal to state": You mean that the clients shouldn't also need to look at state in order to interpret reason? I agree with that.
State before ACCEPTED: Accepted by apiserver or by kubelet? An example of where it would be useful to expose a pre-accepted state would be in order to await action by a budget-checking service. Anyway, I don't want to rathole on this. It was just one example.
STOPPED: STOPPED, SUCCEEDED, and FAILED are all flavors of TERMINATED -- no longer active. We should capture and report more detailed termination reasons, exit codes, etc.
While I can see why Kubelet might want to keep track of them internally, I can't imagine of cases where CLEANING, DEFUNCT, and CLEAN are going to be significant to API clients or even to cluster-level management components. This is why I consider them "fine distinctions". I didn't say that reasons/subStates/stateDetails could have no semantics, only that not everything needed to understand them, since they'd have the option of looking at the coarse states instead.
ACCEPTED and PREPARING are similar -- there's also another way a client could distinguish these states, which was by whether the host info (host, hostIP, etc.) were initialized or not.
With respect to what a client can assume: The apiserver will necessarily have stale information, and the client will have even-more-stale information. If the client thought, for example, that the state were PREPARING, does that mean that the pod is not yet running? Of course not.
I propose we keep high-level states, waiting (or pending), active (was running), and terminated, and push anything else into Reason.
What I currently see in type Status seems pretty reasonable: