Proposal: Make Pods (collections of containers) a first order container object. #8781

Open
brendandburns opened this Issue Oct 26, 2014 · 74 comments

Comments

Projects
None yet
@brendandburns

Pods

This is a proposal to change the first order container object within the Docker API from a single container to a pod of containers.

A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.

This is somewhat related to #8637 but that proposal has much more to do with the namespacing of containers into a single namespace, than grouping containers into logical hosts for the purposes of scheduling, resource tracking, isolation and sharing.

In this proposal there are two sub-proposals:

  • The first is that a new API object, representing a Pod, be added to the Docker API.
  • The second is that this new API object replace the existing singleton container object in future versions of the Docker API.

Since these topics are somewhat orthogonal, I will address them each in separate sections.

Pods as an API object

Inherently, a Pod represents an atomic unit of an application. It is the smallest piece of the application that it makes sense to consider as a unit. Primarily this atomicity is in terms of running the Pod. A pod may be made up of many containers, but the state of those containers must be treated as atomic. Either they are all running under the same Docker daemon, or they are not. It does not make sense to have a partially running pod. Nor does it make sense to run different containers from a single pod in different Docker daemons.

There are numerous examples of such mult-container atomic units, for example:

  • A user-facing web server, and a side-car container that periodically sync's the server's content's from version control.
  • A primary database container, and a periodic backup container that copies the database out to network storage.
  • Multiple containers synchronizing work through IPC or shared memory.
  • Side-car containers that provide thick libraries and simplified APIs that other containers can consume (e.g. Master Election)

In all of these cases, the important characteristic is that the containers involved are symbiotic, it doesn't make sense to place the containers in these example pods onto different hosts.

What does it mean to be a Pod?

Pods share a network namespace (and consequently an IP address). Members of a pod can address eachother via localhost. Pods also share a set of data volumes, which the pods can use to share data between different containers. Importantly, pods do not share a chroot, so data volumes are the only way to share storage. Pods also share a resource hierarchy, though the individual containers within a pod may also have their own specific resource limits, these resource limits are subdivisions of the resources allocated to the entire pod.

Why pods instead of scheduled co-location?

Co-location via a scheduling system achieves some of the goals of a pod, but it has signficant downsides, including the fact that the containers don't share a network namespace (and thus have to rely on additional discovery mechanisms). Additionally, they don't share a cgroup, so you can not express a parasitic container that steals resources when feasible from a co-container in it's pod, instead that parasitic container steals from all containers on the host. Additionally, the fact that the linkages between the containers is expressed as scheduling constraints, rather than an explicit grouping of the containers makes the it harder to reason about the container group and also makes the scheduler more complicated.

Pods as the only way to run containers

It would be a somewhat significant revision to the Docker API to transform the current singleton containers into Pods of containers, but it is a worthwhile endeavor, because it will retain the simplicity of the API.

Put concretely, there is no reason to introduce two different API objects (singleton containers and pods), when a Pod of a single container can effectively represent the singleton case. Sticking to a single API object will limit complexity both in the code and in the documentation, and will also give user's a seamless evolution from single container Pods to more sophisticated multi-container pods.

Implementation and further details

Pods are a foundational part of the Kubernetes API. The Kubernetes API spec for a Pod can be found in the v1 API

A fully functional implementation of pods in terms of Docker containers can be found inside of kubelet.go.

@brendandburns brendandburns changed the title from Proposal: Make Pods (collections of containers) the first order container object. to Proposal: Make Pods (collections of containers) the a order container object. Oct 26, 2014

@brendandburns brendandburns changed the title from Proposal: Make Pods (collections of containers) the a order container object. to Proposal: Make Pods (collections of containers) a order container object. Oct 26, 2014

@brendandburns brendandburns changed the title from Proposal: Make Pods (collections of containers) a order container object. to Proposal: Make Pods (collections of containers) a first order container object. Oct 26, 2014

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 26, 2014

Contributor

Awesome write-up. I think this idea would be great for the ecosystem overall. Less wrapping and hackery in kubernetes while bringing this simple but important idea to non-kubernetes users.

I think there are a number of steps toward the ultimate goal that could be delivered independently.

  1. No pod-level resources or volumes. This can be developed purely on top of existing constructs (as kubernetes has done). Build out the API and CLI experience. Maybe only shared netns, to keep scope simple. Bare containers imply a trivial pod-wrapper.

  2. Phase out bare container interfaces.

3...) Introduce more pod-level concepts one by one: Volumes, other namespaces, cgroups, GIDs and disk accounting, etc

The main point being we don't have to jump straight to the final product :)

Also as to "It does not make sense to have a partially running pod", we probably want to call out the run-to-completion aspect, and that restart policy applies per-pod, rather than per-container (or else re-debate that)

Contributor

thockin commented Oct 26, 2014

Awesome write-up. I think this idea would be great for the ecosystem overall. Less wrapping and hackery in kubernetes while bringing this simple but important idea to non-kubernetes users.

I think there are a number of steps toward the ultimate goal that could be delivered independently.

  1. No pod-level resources or volumes. This can be developed purely on top of existing constructs (as kubernetes has done). Build out the API and CLI experience. Maybe only shared netns, to keep scope simple. Bare containers imply a trivial pod-wrapper.

  2. Phase out bare container interfaces.

3...) Introduce more pod-level concepts one by one: Volumes, other namespaces, cgroups, GIDs and disk accounting, etc

The main point being we don't have to jump straight to the final product :)

Also as to "It does not make sense to have a partially running pod", we probably want to call out the run-to-completion aspect, and that restart policy applies per-pod, rather than per-container (or else re-debate that)

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Oct 26, 2014

Contributor

I think that there is one big motivation that you forgot, @brendandburns. Specifically, the pod is an atomic scheduling unit. As we move to a world where we want to use every last resource (CPU, RAM) on a host we need to worry about how the system acts as we near 100% utilization.

Simple constraint based co-scheduling creates situations where the intent of the user is to land N containers on a single host.

A naive way to do this would be to have a constraint based co-scheduling algorithm. You could have an 'anchor' container that you schedule first. And then you specify that all of the 'secondary' containers would schedule to the same host as the 'anchor'. This is how systems like fleet work. The problem is that you may schedule the 'anchor' on a host that doesn't have enough room for the 'secondary' containers. To make things even more complicated, there may be cases where there is no 'primary' (all containers are peers that come and go) and there all of the containers in question aren't known a-priori.

By having users specify workloads in terms of pods, we have a resource scheduling specification that sites above the individual containers. This separates the idea of what resources are needed on a machine from what code (in the form of a set of containers) will be running on that machine.

Contributor

jbeda commented Oct 26, 2014

I think that there is one big motivation that you forgot, @brendandburns. Specifically, the pod is an atomic scheduling unit. As we move to a world where we want to use every last resource (CPU, RAM) on a host we need to worry about how the system acts as we near 100% utilization.

Simple constraint based co-scheduling creates situations where the intent of the user is to land N containers on a single host.

A naive way to do this would be to have a constraint based co-scheduling algorithm. You could have an 'anchor' container that you schedule first. And then you specify that all of the 'secondary' containers would schedule to the same host as the 'anchor'. This is how systems like fleet work. The problem is that you may schedule the 'anchor' on a host that doesn't have enough room for the 'secondary' containers. To make things even more complicated, there may be cases where there is no 'primary' (all containers are peers that come and go) and there all of the containers in question aren't known a-priori.

By having users specify workloads in terms of pods, we have a resource scheduling specification that sites above the individual containers. This separates the idea of what resources are needed on a machine from what code (in the form of a set of containers) will be running on that machine.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Oct 31, 2014

Contributor

Thanks!

Contributor

crosbymichael commented Oct 31, 2014

Thanks!

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Oct 31, 2014

Contributor

@aanand this maybe interesting with your current proposal.

Contributor

crosbymichael commented Oct 31, 2014

@aanand this maybe interesting with your current proposal.

@tristanz

This comment has been minimized.

Show comment
Hide comment
@tristanz

tristanz Oct 31, 2014

+1 for pods in docker/Fig/docker cluster. We've found pod concept to be very helpful way to organize sidekicks and greatly simplifies working with groups of containers.

Absent pods, the tendency is to overload the container with services that should really be split out. @brendandburns examples are spot on. While you can try to have sidekicks follow the main container around without a pod concept, our experience with Fleet is that this is painful. I suspect the Docker Cluster proposal could avoid a lot of extra bloat by adding pods to Docker proper.

+1 for pods in docker/Fig/docker cluster. We've found pod concept to be very helpful way to organize sidekicks and greatly simplifies working with groups of containers.

Absent pods, the tendency is to overload the container with services that should really be split out. @brendandburns examples are spot on. While you can try to have sidekicks follow the main container around without a pod concept, our experience with Fleet is that this is painful. I suspect the Docker Cluster proposal could avoid a lot of extra bloat by adding pods to Docker proper.

@kelseyhightower

This comment has been minimized.

Show comment
Hide comment
@kelseyhightower

kelseyhightower Oct 31, 2014

I've had experience dealing with pods during my work with k8s and I gotta say it feels right. It also helps push the idea shared by many in the docker community that each container should run a single application. Using pods makes that advice easy to follow since you have the pod construct to ensure the collection of related containers are managed as a single unit.

I've had experience dealing with pods during my work with k8s and I gotta say it feels right. It also helps push the idea shared by many in the docker community that each container should run a single application. Using pods makes that advice easy to follow since you have the pod construct to ensure the collection of related containers are managed as a single unit.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Nov 1, 2014

Contributor

If a pod is just containers that share a netns, volumes, and cgroups then why don't we just add custom cgroup support (#8551). If you had the ability to say --cgroup=containerid then a pod can be done by just launching a "pod container" like

docker run -name mypod some-blank-image

Then

docker run --volumes-from mypod --cgroup=mypod --netns=mypod app1image
docker run --volumes-from mypod --cgroup=mypod --netns=mypod app2image

Because of the --netns=containerid, that automatically implies co-location from a scheduling perspective.

Contributor

ibuildthecloud commented Nov 1, 2014

If a pod is just containers that share a netns, volumes, and cgroups then why don't we just add custom cgroup support (#8551). If you had the ability to say --cgroup=containerid then a pod can be done by just launching a "pod container" like

docker run -name mypod some-blank-image

Then

docker run --volumes-from mypod --cgroup=mypod --netns=mypod app1image
docker run --volumes-from mypod --cgroup=mypod --netns=mypod app2image

Because of the --netns=containerid, that automatically implies co-location from a scheduling perspective.

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Nov 1, 2014

Placing everything into the same cgroup would require that all of the containers in that cgroup share the same resource and memory constraints (and possibly fight with each other)

Imagine that I have two containers:

  • A web serving container which is user facing and thus needs lots of CPU and high QoS.
  • A background sync. container which loads data, it doesn't need much CPU and limited memory.

I don't want a memory leak/bug in my background sync. container to steal memory or CPU from my user facing web container (and even worse possibly cause it to run out of memory and crash). I want to have two different sets of tight resource constraints for the two different containers, and I can't achieve that if they share a cgroup. Support for sub-groups within the original pod cgroup would mitigate this somewhat, but at that point, you basically have a Pod anyway.

There is an additional problem with the "place it in a pod container" approach in that because the api calls to create the collection of pods are not atomic, you can get into situations where the scheduler can not complete the scheduling. You have a sequence of API calls to create the N containers that make up the pod, and it is no longer an atomic operation that either suceeds or fails, it makes the logic much more complicated, since a failure at the end would force you to roll-back a bunch of other operations. We already have to do this in Kubernetes in order to achieve the shared network namespace, and its somewhat painful.

Placing everything into the same cgroup would require that all of the containers in that cgroup share the same resource and memory constraints (and possibly fight with each other)

Imagine that I have two containers:

  • A web serving container which is user facing and thus needs lots of CPU and high QoS.
  • A background sync. container which loads data, it doesn't need much CPU and limited memory.

I don't want a memory leak/bug in my background sync. container to steal memory or CPU from my user facing web container (and even worse possibly cause it to run out of memory and crash). I want to have two different sets of tight resource constraints for the two different containers, and I can't achieve that if they share a cgroup. Support for sub-groups within the original pod cgroup would mitigate this somewhat, but at that point, you basically have a Pod anyway.

There is an additional problem with the "place it in a pod container" approach in that because the api calls to create the collection of pods are not atomic, you can get into situations where the scheduler can not complete the scheduling. You have a sequence of API calls to create the N containers that make up the pod, and it is no longer an atomic operation that either suceeds or fails, it makes the logic much more complicated, since a failure at the end would force you to roll-back a bunch of other operations. We already have to do this in Kubernetes in order to achieve the shared network namespace, and its somewhat painful.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Nov 2, 2014

Contributor

I don't think pods are really an obvious concept to users. Most people I talk to about Kubernetes do not get pods at all. So I don't disagree with what you want to accomplish, I just think it needs to be presented in a different way.

First, @brendandburns your issues with regards to scheduling can be address by just separating create and start. Docker CLI didn't have it before, but the API was always there. docker create shouldn't schedule a container to a host.

It sounds like all that is needed is that in the --cgroups call I specified that you allow it to create a sub-cgroup of the other container, so --cgroup=child-of:containerid. Now this raises a lot of other problems because I don't think systemd has such a concept.

Contributor

ibuildthecloud commented Nov 2, 2014

I don't think pods are really an obvious concept to users. Most people I talk to about Kubernetes do not get pods at all. So I don't disagree with what you want to accomplish, I just think it needs to be presented in a different way.

First, @brendandburns your issues with regards to scheduling can be address by just separating create and start. Docker CLI didn't have it before, but the API was always there. docker create shouldn't schedule a container to a host.

It sounds like all that is needed is that in the --cgroups call I specified that you allow it to create a sub-cgroup of the other container, so --cgroup=child-of:containerid. Now this raises a lot of other problems because I don't think systemd has such a concept.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Nov 3, 2014

Contributor

Are there some public use cases or files from the community of how pods are currently being used? Are people actually using them as intended or are you seeing many people running a pod with only one container?

Contributor

crosbymichael commented Nov 3, 2014

Are there some public use cases or files from the community of how pods are currently being used? Are people actually using them as intended or are you seeing many people running a pod with only one container?

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 3, 2014

Contributor

We've seen both "correct" uses and "less-correct" uses. I put those in
quotes because, frankly, who are we to tell people how to use a particular
tool? That said, we based the pod design on significant experience running
jobs both without it and with it. I don't have stats at hand, but it was a
non-trivial fraction of jobs internally that use more than one container in
a pod. The number quickly tails off after 3 containers in a pod.

On Mon, Nov 3, 2014 at 12:24 PM, Michael Crosby notifications@github.com
wrote:

Are there some public use cases or files from the community of how pods
are currently being used? Are people actually using them as intended or are
you seeing many people running a pod with only one container?

Reply to this email directly or view it on GitHub
#8781 (comment).

Contributor

thockin commented Nov 3, 2014

We've seen both "correct" uses and "less-correct" uses. I put those in
quotes because, frankly, who are we to tell people how to use a particular
tool? That said, we based the pod design on significant experience running
jobs both without it and with it. I don't have stats at hand, but it was a
non-trivial fraction of jobs internally that use more than one container in
a pod. The number quickly tails off after 3 containers in a pod.

On Mon, Nov 3, 2014 at 12:24 PM, Michael Crosby notifications@github.com
wrote:

Are there some public use cases or files from the community of how pods
are currently being used? Are people actually using them as intended or are
you seeing many people running a pod with only one container?

Reply to this email directly or view it on GitHub
#8781 (comment).

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Nov 4, 2014

Collaborator

Are pods just a group of containers (as described in the proposal), or are they a group of (pods OR containers) ? In other words, is there such a thing as pods of pods of containers, and if so, what are the typical usecases for those?

Collaborator

tiborvass commented Nov 4, 2014

Are pods just a group of containers (as described in the proposal), or are they a group of (pods OR containers) ? In other words, is there such a thing as pods of pods of containers, and if so, what are the typical usecases for those?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Nov 4, 2014

Contributor

@thockin From what I understood, the way pods are implemented (sharing a network namespace) was because there was no better way to handle real linking of containers (due to shortcomings of links, esp. pre /etc/hosts population). Is that not the case?

I too would like to understand pods vs groups.
I feel like shared net namespaces is only something you'd want in certain scenarios (e.g. a simple LAMP stack where nothing is similar), but if links were better (and they are pretty good in 1.3) it would fit both the simple and the more complex use-cases.
That's my view, and I'm sure I'm missing something to what pods are doing.

Contributor

cpuguy83 commented Nov 4, 2014

@thockin From what I understood, the way pods are implemented (sharing a network namespace) was because there was no better way to handle real linking of containers (due to shortcomings of links, esp. pre /etc/hosts population). Is that not the case?

I too would like to understand pods vs groups.
I feel like shared net namespaces is only something you'd want in certain scenarios (e.g. a simple LAMP stack where nothing is similar), but if links were better (and they are pretty good in 1.3) it would fit both the simple and the more complex use-cases.
That's my view, and I'm sure I'm missing something to what pods are doing.

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Nov 4, 2014

+1 to take the model into docker proper.

+1 to take the model into docker proper.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 4, 2014

Contributor

Pods are groups of containers. As much as a recursive definition sounded
fun, I could not come up with a single use case where that was the only
solution or even the best solution. KISS and YAGNI

On Tue, Nov 4, 2014 at 6:25 AM, Tibor Vass notifications@github.com wrote:

Are pods just a group of containers (as described in the proposal), or are
they a group of (pods OR containers) ? In other words, is there such a
thing as pods of pods of containers, and if so, what are the typical
usecases for those?

Reply to this email directly or view it on GitHub
#8781 (comment).

Contributor

thockin commented Nov 4, 2014

Pods are groups of containers. As much as a recursive definition sounded
fun, I could not come up with a single use case where that was the only
solution or even the best solution. KISS and YAGNI

On Tue, Nov 4, 2014 at 6:25 AM, Tibor Vass notifications@github.com wrote:

Are pods just a group of containers (as described in the proposal), or are
they a group of (pods OR containers) ? In other words, is there such a
thing as pods of pods of containers, and if so, what are the typical
usecases for those?

Reply to this email directly or view it on GitHub
#8781 (comment).

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Nov 4, 2014

Contributor

@tiborvass As implemented in Kubernetes (and internally at Google) there is no sub-nesting of pods. The main reason for this is that it complicates the networking even further.

@cpuguy83 I think that the idea of grouping for applications is being conflated with grouping for resource allocation and placement. In my mind these are two separate types of objects. Based on the experiences at Google, you end up with objects at the following layers -- perhaps overly simplified but useful for this discussion:

  • Container -- (Google internal: Task) this is the smallest unit that can be monitored, restarted, upgraded, named, etc.
  • Pod -- (Google internal: Alloc) -- A resource scheduling primitive. This can also be thought of as a "resource reservation".
  • Replicated Pod Set -- (Google internal: Job) -- this is a set of Pods that are horizontally replicated. First generation systems at Google had this be a pretty structured concept (numbered array of replicas). With Kubernetes and Omega is a more loosely coupled concept consisting of a label selector/query and a replication controller. In most application architectures this is what you'd consider a tier or perhaps a simple micro-service.
  • Application Config -- This is the sum total configuration of all resources (including things not listed here) that makes up an application.

Something like fig has traditionally played at the "Application Config" level. In my mind, systems like Panamax also fit in at that level.

As to why the containers in a pod share a network namespace -- we discussed this quite a bit before settling on this model for Kubernetes and considered the following options:

  1. Each container in the pod gets its own IP/netns that is taken from the shared bridge for the node.
  2. The pod gets its own IP and bridge. Each container gets an IP that is private to that Pod and is NATed out through the pod IP. In this way there is an IP space within the pod that gets NATed out of the Pod.
  3. The pod gets its own IP and all of the containers in the pod share it.

We went with option 3 because the primary use case for Pods is for relatively tightly coupled containers. If the containers are different tiers in an application (for example, database vs. web) then they really should be different pods communicating through links/naming/service discovery. The common case for Pods is when you have a set of containers that were built to work together.

If you'd be tempted to run something like supervisord inside a container you could instead explode it out to a set of different containers that can be visible from the container management systems. They'd then be individually upgradable, monitorable and resource isolated. You could then have them run at different QoS so that the less critical services could be preempted by more critical services.

Contributor

jbeda commented Nov 4, 2014

@tiborvass As implemented in Kubernetes (and internally at Google) there is no sub-nesting of pods. The main reason for this is that it complicates the networking even further.

@cpuguy83 I think that the idea of grouping for applications is being conflated with grouping for resource allocation and placement. In my mind these are two separate types of objects. Based on the experiences at Google, you end up with objects at the following layers -- perhaps overly simplified but useful for this discussion:

  • Container -- (Google internal: Task) this is the smallest unit that can be monitored, restarted, upgraded, named, etc.
  • Pod -- (Google internal: Alloc) -- A resource scheduling primitive. This can also be thought of as a "resource reservation".
  • Replicated Pod Set -- (Google internal: Job) -- this is a set of Pods that are horizontally replicated. First generation systems at Google had this be a pretty structured concept (numbered array of replicas). With Kubernetes and Omega is a more loosely coupled concept consisting of a label selector/query and a replication controller. In most application architectures this is what you'd consider a tier or perhaps a simple micro-service.
  • Application Config -- This is the sum total configuration of all resources (including things not listed here) that makes up an application.

Something like fig has traditionally played at the "Application Config" level. In my mind, systems like Panamax also fit in at that level.

As to why the containers in a pod share a network namespace -- we discussed this quite a bit before settling on this model for Kubernetes and considered the following options:

  1. Each container in the pod gets its own IP/netns that is taken from the shared bridge for the node.
  2. The pod gets its own IP and bridge. Each container gets an IP that is private to that Pod and is NATed out through the pod IP. In this way there is an IP space within the pod that gets NATed out of the Pod.
  3. The pod gets its own IP and all of the containers in the pod share it.

We went with option 3 because the primary use case for Pods is for relatively tightly coupled containers. If the containers are different tiers in an application (for example, database vs. web) then they really should be different pods communicating through links/naming/service discovery. The common case for Pods is when you have a set of containers that were built to work together.

If you'd be tempted to run something like supervisord inside a container you could instead explode it out to a set of different containers that can be visible from the container management systems. They'd then be individually upgradable, monitorable and resource isolated. You could then have them run at different QoS so that the less critical services could be preempted by more critical services.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 4, 2014

Contributor

@cpuguy83 Shared netns was not just about links, though it did start there. We wanted to make a simple primitive that is purpose-built for the tight relationship that containers-in-a-pod have. Links are fancy and abstract and slow. We did not want or need that. What we wanted was a high-performance, low-overhead way for things to communicate. loopback interfaces fit the bill.

As soon as other namespaces can be shared (e.g. IPC) that is an obvious extension of pods.

Pods are also at least conceptually about resource sharing. We can't yet put multiple containers under a shared ancestor cgroup, but you can rest assured it will matter before too long.

Regarding LAMP: don't think of a pod as LAMP, think of one pod as A, one pod as M, and one pod as P, all running on L. Any of those might have need for helper apps (think symbiosis) or might not. It's the symbiotic helper apps for which pods are intended. Not for apache - mysql comms.

Contributor

thockin commented Nov 4, 2014

@cpuguy83 Shared netns was not just about links, though it did start there. We wanted to make a simple primitive that is purpose-built for the tight relationship that containers-in-a-pod have. Links are fancy and abstract and slow. We did not want or need that. What we wanted was a high-performance, low-overhead way for things to communicate. loopback interfaces fit the bill.

As soon as other namespaces can be shared (e.g. IPC) that is an obvious extension of pods.

Pods are also at least conceptually about resource sharing. We can't yet put multiple containers under a shared ancestor cgroup, but you can rest assured it will matter before too long.

Regarding LAMP: don't think of a pod as LAMP, think of one pod as A, one pod as M, and one pod as P, all running on L. Any of those might have need for helper apps (think symbiosis) or might not. It's the symbiotic helper apps for which pods are intended. Not for apache - mysql comms.

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Nov 4, 2014

Collaborator

@thockin @jbeda Thanks for clarifying this, I think it helps a lot more to understand the concept. IMHO, it would be useful to start explaining in the proposal that Pods are resource-oriented units, and not for Apache - mysql communication.

I would like to understand what is overlapping with what in different proposals and what are the advantages/downsides. For example, could we think of this proposal as being an important ingredient for separation of concerns? As in, resource management on the one-host level, should not be done by something like #8859 (equivalent of Replication Pod Set?) but by this notion of pods. However, #8859 could be the horizontal scaling ingredient, and #8637 would be one level up, at the application level ?

Maybe this should be discussed on IRC; I just find it hard to analyze one proposal without seeing the bigger picture.

Collaborator

tiborvass commented Nov 4, 2014

@thockin @jbeda Thanks for clarifying this, I think it helps a lot more to understand the concept. IMHO, it would be useful to start explaining in the proposal that Pods are resource-oriented units, and not for Apache - mysql communication.

I would like to understand what is overlapping with what in different proposals and what are the advantages/downsides. For example, could we think of this proposal as being an important ingredient for separation of concerns? As in, resource management on the one-host level, should not be done by something like #8859 (equivalent of Replication Pod Set?) but by this notion of pods. However, #8859 could be the horizontal scaling ingredient, and #8637 would be one level up, at the application level ?

Maybe this should be discussed on IRC; I just find it hard to analyze one proposal without seeing the bigger picture.

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Nov 4, 2014

Contributor

Right now the Docker Clustering (#8859) proposal is missing both a preferred method of replication and any higher level organizational primitives/tools (labels). Both of these are going to be necessary.

#8637 is really about object namespacing, I think. It is being conflated with application config. (Namespace here is at the cluster/API level, not at the linux kernel here -- not enough words for the concepts we have). See https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md for how we are approaching that in Kubernetes.

Contributor

jbeda commented Nov 4, 2014

Right now the Docker Clustering (#8859) proposal is missing both a preferred method of replication and any higher level organizational primitives/tools (labels). Both of these are going to be necessary.

#8637 is really about object namespacing, I think. It is being conflated with application config. (Namespace here is at the cluster/API level, not at the linux kernel here -- not enough words for the concepts we have). See https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md for how we are approaching that in Kubernetes.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Nov 4, 2014

@thockin Virtually all production services at Google use our internal legacy equivalent of pods. The most common multi-container use cases include pushing log data (not unlike fluentd or logstash) and content/config management / data loading.

However, pods are useful even without multiple containers. Why? Data and resource durability. Pods make it possible to keep data resident (e.g., in volumes) and to hold onto resources across container restarts and even across full container replacements, such as for image updates.

@ibuildthecloud Yes, features like shared namespaces and shared cgroups would make it easier for us to implement pods outside Docker, but would make our wrapper even fatter.

@thockin Virtually all production services at Google use our internal legacy equivalent of pods. The most common multi-container use cases include pushing log data (not unlike fluentd or logstash) and content/config management / data loading.

However, pods are useful even without multiple containers. Why? Data and resource durability. Pods make it possible to keep data resident (e.g., in volumes) and to hold onto resources across container restarts and even across full container replacements, such as for image updates.

@ibuildthecloud Yes, features like shared namespaces and shared cgroups would make it easier for us to implement pods outside Docker, but would make our wrapper even fatter.

@philips

This comment has been minimized.

Show comment
Hide comment
@philips

philips Nov 5, 2014

Contributor

@ibuildthecloud I don't understand your concern about subgroups + systemd. systemd has the concept of a slice hierarchy: http://www.freedesktop.org/software/systemd/man/systemd.slice.html

Overall, +1 on this proposal overall. It gives users an abstraction to avoid building complex multi-component containers.

Contributor

philips commented Nov 5, 2014

@ibuildthecloud I don't understand your concern about subgroups + systemd. systemd has the concept of a slice hierarchy: http://www.freedesktop.org/software/systemd/man/systemd.slice.html

Overall, +1 on this proposal overall. It gives users an abstraction to avoid building complex multi-component containers.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Nov 6, 2014

Contributor

@philips I was pointing out that if we were to do what I proposed, --cgroup=child-of:containerid, that that wouldn't work with systemd. systemd will create a scope for container X, and then if you were to say --cgroup=child-of:X that would be a scope that is a child of a scope.

I'd like to reiterate that I don't think pods is a straight forward concept to users. If you want this feature to be in Docker I think your going to have to find a different approach or way of explaining it to users.

It seems to me that the root thing that Google wants to address with pods is really about being able to effectively manage resources between containers. The shared volumes and netns are already supported. By combining shared volumes + netns + resource containers = pods doesn't make sense to me. It's a very specific configuration.

I think you would have more success with getting a proposal through if you introduced a concept that was purely oriented towards managing resources between containers. For example libvirt has resource partitions that I think addresses what you would need. So if Docker had resource partitions natively, then a Kubernetes pod == shared volumes + netns + resource partition.

Contributor

ibuildthecloud commented Nov 6, 2014

@philips I was pointing out that if we were to do what I proposed, --cgroup=child-of:containerid, that that wouldn't work with systemd. systemd will create a scope for container X, and then if you were to say --cgroup=child-of:X that would be a scope that is a child of a scope.

I'd like to reiterate that I don't think pods is a straight forward concept to users. If you want this feature to be in Docker I think your going to have to find a different approach or way of explaining it to users.

It seems to me that the root thing that Google wants to address with pods is really about being able to effectively manage resources between containers. The shared volumes and netns are already supported. By combining shared volumes + netns + resource containers = pods doesn't make sense to me. It's a very specific configuration.

I think you would have more success with getting a proposal through if you introduced a concept that was purely oriented towards managing resources between containers. For example libvirt has resource partitions that I think addresses what you would need. So if Docker had resource partitions natively, then a Kubernetes pod == shared volumes + netns + resource partition.

@dqminh

This comment has been minimized.

Show comment
Hide comment
@dqminh

dqminh Nov 6, 2014

Contributor

+1 for this proposal overall

@ibuildthecloud iirc, sharing resources is just one aspect of this, there's also a need to atomically schedule a set of containers that make up a pod. The system has to make sure that either all containers that make up a pod can be scheduled on the host or none at all, which is currently hard to do with container as the only primitive.

Contributor

dqminh commented Nov 6, 2014

+1 for this proposal overall

@ibuildthecloud iirc, sharing resources is just one aspect of this, there's also a need to atomically schedule a set of containers that make up a pod. The system has to make sure that either all containers that make up a pod can be scheduled on the host or none at all, which is currently hard to do with container as the only primitive.

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Nov 6, 2014

Contributor

I agree that it’s worth talking about how this will fit with #8637. Both serve use cases which I believe are largely orthogonal (indeed, I suspect they appeal to largely disjunct sets of users), so my hope is that they can coexist, as long as we design both of them right.

The purposes of pods have been amply outlined. (I don’t have strong feelings about their usefulness, because I haven’t encountered the problems they solve.)

I see the primary purposes of groups being:

  • Scope container names so that multiple people can work on multiple apps without treading on one another’s toes
  • Provide some of the functionality server-side which tools like Fig currently do client-side to improve performance (e.g. Fig’s filtering of the global container list is unusably slow on hosts with thousands of running containers)
  • Provide a way to create/update multiple, associated containers in a single call - in a clustered future, this enables the cluster to make scheduling decisions about where to place those containers based on the fact that they’re a logical group. (A bit like how pods are expressly single-host, but far more generic and up to the particular cluster implementation).

Unlike pods, groups are intended to:

  • exist across an arbitrary number of cluster nodes
  • contain an application’s entire stack (or any arbitrary slice of it)

This makes them - as this proposal has pointed out - much less about resource sharing and much more about logical grouping.

For this reason, I hope they can coexist just fine - intuitively, it seems as though a group should be able to contain containers, pods or a mixture of both.

The major potential pain points I see are:

  • Naming. My current thinking is that containers within a group have names prefixed with <groupname>/. Group names thus live in the same namespace as container names. How will pods be named? How will containers within pods be named?
  • Educating users. As has already been raised, pods are weird to someone who doesn’t understand their use cases. Explaining the difference between groups and pods, and when to use one or the other, is going to be tough.
Contributor

aanand commented Nov 6, 2014

I agree that it’s worth talking about how this will fit with #8637. Both serve use cases which I believe are largely orthogonal (indeed, I suspect they appeal to largely disjunct sets of users), so my hope is that they can coexist, as long as we design both of them right.

The purposes of pods have been amply outlined. (I don’t have strong feelings about their usefulness, because I haven’t encountered the problems they solve.)

I see the primary purposes of groups being:

  • Scope container names so that multiple people can work on multiple apps without treading on one another’s toes
  • Provide some of the functionality server-side which tools like Fig currently do client-side to improve performance (e.g. Fig’s filtering of the global container list is unusably slow on hosts with thousands of running containers)
  • Provide a way to create/update multiple, associated containers in a single call - in a clustered future, this enables the cluster to make scheduling decisions about where to place those containers based on the fact that they’re a logical group. (A bit like how pods are expressly single-host, but far more generic and up to the particular cluster implementation).

Unlike pods, groups are intended to:

  • exist across an arbitrary number of cluster nodes
  • contain an application’s entire stack (or any arbitrary slice of it)

This makes them - as this proposal has pointed out - much less about resource sharing and much more about logical grouping.

For this reason, I hope they can coexist just fine - intuitively, it seems as though a group should be able to contain containers, pods or a mixture of both.

The major potential pain points I see are:

  • Naming. My current thinking is that containers within a group have names prefixed with <groupname>/. Group names thus live in the same namespace as container names. How will pods be named? How will containers within pods be named?
  • Educating users. As has already been raised, pods are weird to someone who doesn’t understand their use cases. Explaining the difference between groups and pods, and when to use one or the other, is going to be tough.
@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Nov 6, 2014

Contributor

@aanand I think we are in violent agreement here -- namespacing resources and pods are orthagonal issues. This is how we have modeled it in Kubernetes. See my comment here: #8637 (comment)

As for pods being non-obvious to users, @dqminh has it right that the high order bit is atomic scheduling of a set of containers.

Now -- here is the philosophical question -- how opinionated do we want to be?

  1. Fine decomposition of resources and namespace connections. We could have the idea of resource partitions, network namespaces, file chroots and processes that map to all of those. A container/pod could be created by creating these and plugging them together. This would be super flexible.
  2. Pick a common pattern (resource reservation = network ns = pod, root process = file chroot = container) and go for ease of use.

My gut is that the Docker world is about making some simplifying choices to make things easier for users. If users really want that fine grained control over how to map these things they should probably be using LXC or LMCTFY or libcontainer directly.

With this in mind, why do pods at all? I'd say that Docker did an amazing job making this experience simple for the single node case. As we move to clustering and more sophisticated use cases (as outlined in this issue) we will need to add some complexity. In my mind pods are a minimal amount of additional complexity to unlock new use cases and solve some real problems with clustering.

But - with that being said, I'd agree that many many users won't want to know or care about pods. With that in mind, I'd say that we make pods fade into the background in the docker tooling. If a user wants to launch a single container, we create a pod around it and hide the pod from the user. If the user wants to delete that container we delete the pod out from under them. It is only as users need pods do they start using advanced commands that expose what is really going on.

An alternative would be to support both naked containers and pods in the Docker engine. However, our experience with this at Google tells us this is a mistake. Having those two cases complicated every tool.

Contributor

jbeda commented Nov 6, 2014

@aanand I think we are in violent agreement here -- namespacing resources and pods are orthagonal issues. This is how we have modeled it in Kubernetes. See my comment here: #8637 (comment)

As for pods being non-obvious to users, @dqminh has it right that the high order bit is atomic scheduling of a set of containers.

Now -- here is the philosophical question -- how opinionated do we want to be?

  1. Fine decomposition of resources and namespace connections. We could have the idea of resource partitions, network namespaces, file chroots and processes that map to all of those. A container/pod could be created by creating these and plugging them together. This would be super flexible.
  2. Pick a common pattern (resource reservation = network ns = pod, root process = file chroot = container) and go for ease of use.

My gut is that the Docker world is about making some simplifying choices to make things easier for users. If users really want that fine grained control over how to map these things they should probably be using LXC or LMCTFY or libcontainer directly.

With this in mind, why do pods at all? I'd say that Docker did an amazing job making this experience simple for the single node case. As we move to clustering and more sophisticated use cases (as outlined in this issue) we will need to add some complexity. In my mind pods are a minimal amount of additional complexity to unlock new use cases and solve some real problems with clustering.

But - with that being said, I'd agree that many many users won't want to know or care about pods. With that in mind, I'd say that we make pods fade into the background in the docker tooling. If a user wants to launch a single container, we create a pod around it and hide the pod from the user. If the user wants to delete that container we delete the pod out from under them. It is only as users need pods do they start using advanced commands that expose what is really going on.

An alternative would be to support both naked containers and pods in the Docker engine. However, our experience with this at Google tells us this is a mistake. Having those two cases complicated every tool.

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Nov 6, 2014

@jbeda It "sounds like" you believe the POD should be the atom in the engine, not opposed just distilling.

Could you elaborate on how naked containers + pods "complicated things."

@jbeda It "sounds like" you believe the POD should be the atom in the engine, not opposed just distilling.

Could you elaborate on how naked containers + pods "complicated things."

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Nov 6, 2014

@aanand I think we all agree that your groups proposal and pods are complimentary.

An analogy that I think works well to explain the two proposals is as follows:

groups in #8637 are basically similar to package foo in a programming language like Java or Go. They exist to organize and draw boundaries (possibly with ACLs) between groups of objects, but they are a loose coupling of things that some common purpose (e.g. an application stack.)

Pods in this proposal are more akin to a `class Barin Java ortype Bar struct`` in Go. They are a tightly coupled atomic package, where it doesn't make sense to split the object any further. It is one conceptual whole that is indivisible.

In this context, I think #8637 and this proposal are wholly complimentary, exactly as packages and classes/structs are complimentary in programming languages.

@timothysc naked containers + pods complicate things largely because you need to have duplicate code paths, validation logic, etc. Unless you implement stand-alone containers as a syntactic sugar wrapper on top of pods, all of the logic around validating, creating, deleting, etc. needs to be duplicated. There are ways to factor it so to increase code re-use, but there is still 2x the number of cases to unit test, etc.

Additionally, even if stand-alone containers are syntactic sugar in the API, every upstream consumer of the API now also needs to know about both pods and containers, and create different code paths to deal with them too. Imagine I'm writing a roll out tool, it's much, much simpler if the only thing I ever need to roll out is a Pod (including a singleton Pod). Rather than having to understand and call different APIs depending on if they are stand-alone containers or Pods.

@aanand I think we all agree that your groups proposal and pods are complimentary.

An analogy that I think works well to explain the two proposals is as follows:

groups in #8637 are basically similar to package foo in a programming language like Java or Go. They exist to organize and draw boundaries (possibly with ACLs) between groups of objects, but they are a loose coupling of things that some common purpose (e.g. an application stack.)

Pods in this proposal are more akin to a `class Barin Java ortype Bar struct`` in Go. They are a tightly coupled atomic package, where it doesn't make sense to split the object any further. It is one conceptual whole that is indivisible.

In this context, I think #8637 and this proposal are wholly complimentary, exactly as packages and classes/structs are complimentary in programming languages.

@timothysc naked containers + pods complicate things largely because you need to have duplicate code paths, validation logic, etc. Unless you implement stand-alone containers as a syntactic sugar wrapper on top of pods, all of the logic around validating, creating, deleting, etc. needs to be duplicated. There are ways to factor it so to increase code re-use, but there is still 2x the number of cases to unit test, etc.

Additionally, even if stand-alone containers are syntactic sugar in the API, every upstream consumer of the API now also needs to know about both pods and containers, and create different code paths to deal with them too. Imagine I'm writing a roll out tool, it's much, much simpler if the only thing I ever need to roll out is a Pod (including a singleton Pod). Rather than having to understand and call different APIs depending on if they are stand-alone containers or Pods.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 6, 2014

Contributor

@timothysc mixing naked containers and pods permeates everything. Every piece of code needs to check "are you talking about a pod or a container". It infects all historical data - any database of information now has to keep two schemata around or find a way to blend them. It makes every feature more complicated because you always need to think about the intersection of things. APIs that apply to one, don't always apply to the other, and now you have multiple levels of API that only work if the target of the call is the right kind.

It's a mess

Contributor

thockin commented Nov 6, 2014

@timothysc mixing naked containers and pods permeates everything. Every piece of code needs to check "are you talking about a pod or a container". It infects all historical data - any database of information now has to keep two schemata around or find a way to blend them. It makes every feature more complicated because you always need to think about the intersection of things. APIs that apply to one, don't always apply to the other, and now you have multiple levels of API that only work if the target of the call is the right kind.

It's a mess

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Nov 7, 2014

Contributor

@jbeda My first inclination is that fine decomposition would be the way to go here because I don't think pods are common enough to be a common pattern.

If Docker had the primitives you needed to easily do pods in Kubernetes, then pods could just be a Kubernetes construct. Not that this is really be best reason to do something, but pods can effectively be the differentiator between Kubernetes and other orchestration tools for Docker that exist today. If pods are useful as you guys assert they are then they should take off, be more common place, users will understand them more, and then we can create the "common pattern" in Docker itself.

If we were to actually include pods in Docker, I do like the idea of making them an implicit construct for users that don't care about them. So, as mentioned, just a create a new pod when someone says docker run.

Contributor

ibuildthecloud commented Nov 7, 2014

@jbeda My first inclination is that fine decomposition would be the way to go here because I don't think pods are common enough to be a common pattern.

If Docker had the primitives you needed to easily do pods in Kubernetes, then pods could just be a Kubernetes construct. Not that this is really be best reason to do something, but pods can effectively be the differentiator between Kubernetes and other orchestration tools for Docker that exist today. If pods are useful as you guys assert they are then they should take off, be more common place, users will understand them more, and then we can create the "common pattern" in Docker itself.

If we were to actually include pods in Docker, I do like the idea of making them an implicit construct for users that don't care about them. So, as mentioned, just a create a new pod when someone says docker run.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 7, 2014

Contributor

If docker gives us primitives to create pods, but does not support pods
itself, Kubernetes will continue to be a relatively fat wrapper over
Docker. One of the goals we have is to slim down that wrapper.

On Fri, Nov 7, 2014 at 5:27 AM, Darren notifications@github.com wrote:

@jbeda https://github.com/jbeda My first inclination is that fine
decomposition would be the way to go here because I don't think pods are
common enough to be a common pattern.

If Docker had the primitives you needed to easily do pods in Kubernetes,
then pods could just be a Kubernetes construct. Not that this is really be
best reason to do something, but pods can effectively be the differentiator
between Kubernetes and other orchestration tools for Docker that exist
today. If pods are useful as you guys assert they are then they should take
off, be more common place, users will understand them more, and then we can
create the "common pattern" in Docker itself.

If we were to actually include pods in Docker, I do like the idea of
making them an implicit construct for users that don't care about them. So,
as mentioned, just a create a new pod when someone says docker run.

Reply to this email directly or view it on GitHub
#8781 (comment).

Contributor

thockin commented Nov 7, 2014

If docker gives us primitives to create pods, but does not support pods
itself, Kubernetes will continue to be a relatively fat wrapper over
Docker. One of the goals we have is to slim down that wrapper.

On Fri, Nov 7, 2014 at 5:27 AM, Darren notifications@github.com wrote:

@jbeda https://github.com/jbeda My first inclination is that fine
decomposition would be the way to go here because I don't think pods are
common enough to be a common pattern.

If Docker had the primitives you needed to easily do pods in Kubernetes,
then pods could just be a Kubernetes construct. Not that this is really be
best reason to do something, but pods can effectively be the differentiator
between Kubernetes and other orchestration tools for Docker that exist
today. If pods are useful as you guys assert they are then they should take
off, be more common place, users will understand them more, and then we can
create the "common pattern" in Docker itself.

If we were to actually include pods in Docker, I do like the idea of
making them an implicit construct for users that don't care about them. So,
as mentioned, just a create a new pod when someone says docker run.

Reply to this email directly or view it on GitHub
#8781 (comment).

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Nov 7, 2014

@ibuildthecloud I'm ensconced in the belief that as Docker ventures closer to clustering, they will encounter similar problems, so why recreate unnecessary pain?

I'm a huge fan of analogies, and with regards to user understanding, I like the envelope + letter analogy where a POD is an envelope, a Container is a letter, and Clustering is the mail service. Users can still think in terms of letters (Containers), but internal infrastructure would always wrap the letter in the envelope(POD), as it needs to before it can be mailed off to another machine.

If folks want to name POD something else I don't particularly care, but the analogy still applies.

@ibuildthecloud I'm ensconced in the belief that as Docker ventures closer to clustering, they will encounter similar problems, so why recreate unnecessary pain?

I'm a huge fan of analogies, and with regards to user understanding, I like the envelope + letter analogy where a POD is an envelope, a Container is a letter, and Clustering is the mail service. Users can still think in terms of letters (Containers), but internal infrastructure would always wrap the letter in the envelope(POD), as it needs to before it can be mailed off to another machine.

If folks want to name POD something else I don't particularly care, but the analogy still applies.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Nov 7, 2014

Another way to think about Pods:

Pods are the primary isolation boundary: most namespaces (network, IPC, PID, user), most cgroups (cpu, memory, freezer, etc.), volumes.

"Containers" (or, subcontainers, if you will) are used for application management: images, mount namespaces, job cgroup, and process groups that are packed within pods, with sub-cgroups to track/bound resources (cpu, memory, etc.).

This model provides a clean separation of concerns, without fine-grain wiring together of custom namespaces, cgroups, volume containers, etc.

Another way to think about Pods:

Pods are the primary isolation boundary: most namespaces (network, IPC, PID, user), most cgroups (cpu, memory, freezer, etc.), volumes.

"Containers" (or, subcontainers, if you will) are used for application management: images, mount namespaces, job cgroup, and process groups that are packed within pods, with sub-cgroups to track/bound resources (cpu, memory, etc.).

This model provides a clean separation of concerns, without fine-grain wiring together of custom namespaces, cgroups, volume containers, etc.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 7, 2014

Contributor

We KNOW that people make containers with multiple apps inside them, even
though docker considers that an anti-pattern. Pods strengthen the idea of
running one app per container and make it feasible in at least some of
those cases.

On Fri, Nov 7, 2014 at 10:26 AM, bgrant0607 notifications@github.com
wrote:

Another way to think about Pods:

Pods are the primary isolation boundary: most namespaces (network, IPC,
PID, user), most cgroups (cpu, memory, freezer, etc.), volumes.

"Containers" (or, subcontainers, if you will) are used for application
management: images, mount namespaces, job cgroup, and process groups that
are packed within pods, with sub-cgroups to track/bound resources (cpu,
memory, etc.).

This model provides a clean separation of concerns, without fine-grain
wiring together of custom namespaces, cgroups, volume containers, etc.

Reply to this email directly or view it on GitHub
#8781 (comment).

Contributor

thockin commented Nov 7, 2014

We KNOW that people make containers with multiple apps inside them, even
though docker considers that an anti-pattern. Pods strengthen the idea of
running one app per container and make it feasible in at least some of
those cases.

On Fri, Nov 7, 2014 at 10:26 AM, bgrant0607 notifications@github.com
wrote:

Another way to think about Pods:

Pods are the primary isolation boundary: most namespaces (network, IPC,
PID, user), most cgroups (cpu, memory, freezer, etc.), volumes.

"Containers" (or, subcontainers, if you will) are used for application
management: images, mount namespaces, job cgroup, and process groups that
are packed within pods, with sub-cgroups to track/bound resources (cpu,
memory, etc.).

This model provides a clean separation of concerns, without fine-grain
wiring together of custom namespaces, cgroups, volume containers, etc.

Reply to this email directly or view it on GitHub
#8781 (comment).

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Nov 8, 2014

Contributor

Just to make it clear, I'm not opposed in concept to this proposal. I very much trust that you guys know what your doing. The only reason I've chimed in is that the way pods are presented today in Kubernetes and in this proposal is not really easily digestible by most users, IMHO. I don't think they'll be accepted into Docker unless they are better understood.

Why would it not be sufficient to just add the concept of resource partitions? I think people will understand resource partitions and see the usefulness. I think most users won't use them, but they get the purpose. Here is what I'm imagining:

docker resource-partition create foo
docker create --resource-partition foo myapp1
docker create --resource-partition foo myapp2
docker resource-partition start foo

I think that will address your scheduling concerns. This doesn't address shared volumes or network namespaces. To fully represent that I'm thinking the following may work.

docker resource-partition create foo
docker create --resource-partition foo --volumes-from resource-partition:foo --net resource-partition:foo myapp1
docker create --resource-partition foo --volumes-from resource-partition:foo --net resource-partition:foo myapp2
docker resource-partition start foo

I don't think its super obvious and straight forward to users that the volumes and network namespace are shared in a Pod. I think if the user has to explicitly indicate they want that, it would be easier to digest.

Contributor

ibuildthecloud commented Nov 8, 2014

Just to make it clear, I'm not opposed in concept to this proposal. I very much trust that you guys know what your doing. The only reason I've chimed in is that the way pods are presented today in Kubernetes and in this proposal is not really easily digestible by most users, IMHO. I don't think they'll be accepted into Docker unless they are better understood.

Why would it not be sufficient to just add the concept of resource partitions? I think people will understand resource partitions and see the usefulness. I think most users won't use them, but they get the purpose. Here is what I'm imagining:

docker resource-partition create foo
docker create --resource-partition foo myapp1
docker create --resource-partition foo myapp2
docker resource-partition start foo

I think that will address your scheduling concerns. This doesn't address shared volumes or network namespaces. To fully represent that I'm thinking the following may work.

docker resource-partition create foo
docker create --resource-partition foo --volumes-from resource-partition:foo --net resource-partition:foo myapp1
docker create --resource-partition foo --volumes-from resource-partition:foo --net resource-partition:foo myapp2
docker resource-partition start foo

I don't think its super obvious and straight forward to users that the volumes and network namespace are shared in a Pod. I think if the user has to explicitly indicate they want that, it would be easier to digest.

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Nov 13, 2014

Worth pointing out that the elastic container service announced today has an equivalent concept ("Task")

Worth pointing out that the elastic container service announced today has an equivalent concept ("Task")

@johngossman

This comment has been minimized.

Show comment
Hide comment
@johngossman

johngossman Nov 14, 2014

Contributor

@derekwaynecarr I completely agree with your concern here. At some level the imperative language vs declarative language debate is overblown, and the two are constantly blending into each other (@jbeda may recall Bog's jokes about adding "if" and "for" to XML, which turned out to not be so funny).
But ultimately the reason we invent things like XML, JSON and YAML is because writing code that reasons over arbitrary code like bash scripts is hard and ugly.

Contributor

johngossman commented Nov 14, 2014

@derekwaynecarr I completely agree with your concern here. At some level the imperative language vs declarative language debate is overblown, and the two are constantly blending into each other (@jbeda may recall Bog's jokes about adding "if" and "for" to XML, which turned out to not be so funny).
But ultimately the reason we invent things like XML, JSON and YAML is because writing code that reasons over arbitrary code like bash scripts is hard and ugly.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Nov 17, 2014

Contributor

Could you explain to me the difference between pods and nested/ sub containers? Are they the same thing but with a different name?

Contributor

crosbymichael commented Nov 17, 2014

Could you explain to me the difference between pods and nested/ sub containers? Are they the same thing but with a different name?

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Nov 17, 2014

Contributor

I think no one really knows what nested/sub containers are. I would say it depends on the specific. I what is here described can be very well implemented by a nested/sub container feature. The here called "pod" would be one parent container which might have several subcontainers which can (but don't require to) share namespaces with the parent container. Kubernetes pods could be just a specific usage of nested/sub containers.

Contributor

discordianfish commented Nov 17, 2014

I think no one really knows what nested/sub containers are. I would say it depends on the specific. I what is here described can be very well implemented by a nested/sub container feature. The here called "pod" would be one parent container which might have several subcontainers which can (but don't require to) share namespaces with the parent container. Kubernetes pods could be just a specific usage of nested/sub containers.

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Nov 17, 2014

@crosbymichael if you add network namespaces and disk volumes to "nested/sub containers" then I think that the feature set is reasonably the same, but what is important is the API.

A Pod needs to be an atomic object in the API. I need to be able to issue a single command to add a Pod, a single command to update a Pod, a single command to delete a Pod.

@crosbymichael if you add network namespaces and disk volumes to "nested/sub containers" then I think that the feature set is reasonably the same, but what is important is the API.

A Pod needs to be an atomic object in the API. I need to be able to issue a single command to add a Pod, a single command to update a Pod, a single command to delete a Pod.

@wyaeld

This comment has been minimized.

Show comment
Hide comment
@wyaeld

wyaeld Nov 26, 2014

Ambassadors and data containers are a great example of an existing docker concept that should be part of a pod definition.

I want a similar Podfile resource that makes it easy for me to package my containers in some envelope that treats my pod as a first-class resource, and then all of my CLI interaction should work against the Podfile in the same manner docker can work against a Dockerfile when working at the individual image level

Would like to +1 both of these statements by @smarterclayton & @derekwaynecarr . I really, really want to be able to keep a single defined config somewhere that describes the deployed 'unit of work', and that unit is frequently not a single container. Run now, every container goes out with monsterish 10-15line commands plugging everything together, and I can't just start, stop and move them around as a unit.

wyaeld commented Nov 26, 2014

Ambassadors and data containers are a great example of an existing docker concept that should be part of a pod definition.

I want a similar Podfile resource that makes it easy for me to package my containers in some envelope that treats my pod as a first-class resource, and then all of my CLI interaction should work against the Podfile in the same manner docker can work against a Dockerfile when working at the individual image level

Would like to +1 both of these statements by @smarterclayton & @derekwaynecarr . I really, really want to be able to keep a single defined config somewhere that describes the deployed 'unit of work', and that unit is frequently not a single container. Run now, every container goes out with monsterish 10-15line commands plugging everything together, and I can't just start, stop and move them around as a unit.

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Jan 21, 2015

ping @crosbymichael, @vieux - has there been any resolution or insight on status here?

ping @crosbymichael, @vieux - has there been any resolution or insight on status here?

@kelseyhightower

This comment has been minimized.

Show comment
Hide comment
@kelseyhightower

kelseyhightower Apr 15, 2015

@crosbymichael, @vieux @timothysc Are there plans to support a pod like concept in Swarm? If so would it make sense to push ahead on this proposal?

@crosbymichael, @vieux @timothysc Are there plans to support a pod like concept in Swarm? If so would it make sense to push ahead on this proposal?

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Apr 15, 2015

@kelseyhightower I would hope so, but alas I believe this issue has stalled.

@kelseyhightower I would hope so, but alas I believe this issue has stalled.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 15, 2015

Contributor

@kelseyhightower I think we need to make a decision on what level abstraction should go into docker core for this.
Pods are possible with Docker today, but only when using Docker in a particular way, ie. docker run --net container:<id> --ipc container:<id> --cgroup-parent <cgroup path>.

Are pods an api only concept? Does it make sense to create a pod from the CLI (maybe docker run --parent <container id> ?).

Probably the best way to move forward with this is to introduce a docs-only style proposal that details out what exactly this API looks like, how users might use it, etc, then something can either be approved or suggest improvements on.
I think with this issue the way it is someone can only say "👍 pods are cool"

Contributor

cpuguy83 commented Apr 15, 2015

@kelseyhightower I think we need to make a decision on what level abstraction should go into docker core for this.
Pods are possible with Docker today, but only when using Docker in a particular way, ie. docker run --net container:<id> --ipc container:<id> --cgroup-parent <cgroup path>.

Are pods an api only concept? Does it make sense to create a pod from the CLI (maybe docker run --parent <container id> ?).

Probably the best way to move forward with this is to introduce a docs-only style proposal that details out what exactly this API looks like, how users might use it, etc, then something can either be approved or suggest improvements on.
I think with this issue the way it is someone can only say "👍 pods are cool"

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 15, 2015

Contributor

Also with a docs-only proposal, if the writer is not willing/doesn't have the time to implement, it becomes a LOT easier for someone else to come along and build it.

Contributor

cpuguy83 commented Apr 15, 2015

Also with a docs-only proposal, if the writer is not willing/doesn't have the time to implement, it becomes a LOT easier for someone else to come along and build it.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Apr 15, 2015

Contributor

The problem is that writing such a doc is a non-trivial time commitment
with no clear indicator that it is something Docker actually wants. Nobody
wants to force themselves into a situation where they are not welcome.

I think with this issue the way it is someone can only say " pods are
cool"

If some of the movers and shakers said "we get it and we think it is a good
idea, please start to work out the details" it would be in a very different
place from where it is now.

Here's how I see the rationale for it. Docker is a piece of a larger
system for work management, whether that system is Swarm, Kubernetes,
Mesos, Lattice, ECS, or any of the dozen others that have emerged this
year. Users of these systems sometimes need to be able to schedule
containers that are co-located and linked in some way(s) (be that --link,
--net=container, --ipc=container, -- volumes-from, etc). Being able to
write a declarative definition of what you want to run (viz. Docker
compose) is powerful and simple.

For lack of this atom in docker, people either make monolith containers
running multiple apps or else build further rectification systems on top of
Docker's atoms. Having Docker implement pods (under whatever name)
directly means that all of these orchestration systems get thinner and
highlight Docker's capabilities rather than burying docker in wrappers and
layers. It means that users can do, by hand, what the orchestrations
systems do.

I really see it as a matter of convergent evolution - multiple systems have
(or will) evolve this same basic primitive. It's better for Docker to
offer it directly. I'd much rather be talking about how Docker pods work
than how we manipulate Docker into producing a pod construct.

I'd be happy to help work through how the CLI and API might look if only I
knew I wasn't wasting my time.

On Wed, Apr 15, 2015 at 8:04 AM, Brian Goff notifications@github.com
wrote:

Also with a docs-only proposal, if the writer is not willing/doesn't have
the time to implement, it becomes a LOT easier for someone else to come
along and build it.


Reply to this email directly or view it on GitHub
#8781 (comment).

Contributor

thockin commented Apr 15, 2015

The problem is that writing such a doc is a non-trivial time commitment
with no clear indicator that it is something Docker actually wants. Nobody
wants to force themselves into a situation where they are not welcome.

I think with this issue the way it is someone can only say " pods are
cool"

If some of the movers and shakers said "we get it and we think it is a good
idea, please start to work out the details" it would be in a very different
place from where it is now.

Here's how I see the rationale for it. Docker is a piece of a larger
system for work management, whether that system is Swarm, Kubernetes,
Mesos, Lattice, ECS, or any of the dozen others that have emerged this
year. Users of these systems sometimes need to be able to schedule
containers that are co-located and linked in some way(s) (be that --link,
--net=container, --ipc=container, -- volumes-from, etc). Being able to
write a declarative definition of what you want to run (viz. Docker
compose) is powerful and simple.

For lack of this atom in docker, people either make monolith containers
running multiple apps or else build further rectification systems on top of
Docker's atoms. Having Docker implement pods (under whatever name)
directly means that all of these orchestration systems get thinner and
highlight Docker's capabilities rather than burying docker in wrappers and
layers. It means that users can do, by hand, what the orchestrations
systems do.

I really see it as a matter of convergent evolution - multiple systems have
(or will) evolve this same basic primitive. It's better for Docker to
offer it directly. I'd much rather be talking about how Docker pods work
than how we manipulate Docker into producing a pod construct.

I'd be happy to help work through how the CLI and API might look if only I
knew I wasn't wasting my time.

On Wed, Apr 15, 2015 at 8:04 AM, Brian Goff notifications@github.com
wrote:

Also with a docs-only proposal, if the writer is not willing/doesn't have
the time to implement, it becomes a LOT easier for someone else to come
along and build it.


Reply to this email directly or view it on GitHub
#8781 (comment).

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Apr 15, 2015

@cpuguy83 as the original author of this Issue, @thockin is precisely correct. The reason that this exists as a issue, rather than a design doc, or working implementation is because the implementation is significant, and non-trivial, and I didn't/don't want to force things if they aren't wanted, or presume any particular design if the developers of Docker wanted to go in a different direction.

If the decision makers at Docker are believe (as I do) that this is a good modification of the Docker API, and it actually has a realistic chance of becoming a part of the Docker API, then I am more than willing to work out a more detailed design document of what it could look like and hand it over to others for implementation, as well as working on the implementation myself.

Perhaps @shykes @crosbymichael et al. can confirm that people are interested in seeing Pods as part of the Docker node API?

Thanks (and thanks to everyone for their continued interest in moving this proposal forward)
Brendan

@cpuguy83 as the original author of this Issue, @thockin is precisely correct. The reason that this exists as a issue, rather than a design doc, or working implementation is because the implementation is significant, and non-trivial, and I didn't/don't want to force things if they aren't wanted, or presume any particular design if the developers of Docker wanted to go in a different direction.

If the decision makers at Docker are believe (as I do) that this is a good modification of the Docker API, and it actually has a realistic chance of becoming a part of the Docker API, then I am more than willing to work out a more detailed design document of what it could look like and hand it over to others for implementation, as well as working on the implementation myself.

Perhaps @shykes @crosbymichael et al. can confirm that people are interested in seeing Pods as part of the Docker node API?

Thanks (and thanks to everyone for their continued interest in moving this proposal forward)
Brendan

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Apr 19, 2015

Contributor

+1 for this as i have been forced to use docker like a vm with mmonit inside for such a long time, its really disgusting as i have to write down 'monit start xxx' instead of the real start command in CMD part.

The 'real' apps in real world are always bunch of processes. 12 factor apps? Don't forget it's an ad from a traditional paas provider ...

Contributor

resouer commented Apr 19, 2015

+1 for this as i have been forced to use docker like a vm with mmonit inside for such a long time, its really disgusting as i have to write down 'monit start xxx' instead of the real start command in CMD part.

The 'real' apps in real world are always bunch of processes. 12 factor apps? Don't forget it's an ad from a traditional paas provider ...

@dcieslak19973

This comment has been minimized.

Show comment
Hide comment
@dcieslak19973

dcieslak19973 Apr 19, 2015

I'm relatively new to Docker and Swarm, but it seems to me that Swarm should rely more on Discovery Services for linking services together. Perhaps I don't fully understand the boundary between Docker and Kubernetes, though.

Also, I'm curious how the concept as described here so far would work with things like ZooKeeper, Mongo and other "cluster-able" services that rely on small config changes (e.g. BrokerID), but are otherwise configured identically.

Suppose I want to run Kafka and HBase in a cluster, both of which rely on Zookeeper. Which cluster pod would specify Zookeeper?

I'm relatively new to Docker and Swarm, but it seems to me that Swarm should rely more on Discovery Services for linking services together. Perhaps I don't fully understand the boundary between Docker and Kubernetes, though.

Also, I'm curious how the concept as described here so far would work with things like ZooKeeper, Mongo and other "cluster-able" services that rely on small config changes (e.g. BrokerID), but are otherwise configured identically.

Suppose I want to run Kafka and HBase in a cluster, both of which rely on Zookeeper. Which cluster pod would specify Zookeeper?

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Apr 19, 2015

Contributor

To me, the best approach would be to proceed to try to accomplish Pods under the banner of "nested containers." I can put together a design for nested containers as the comment has already been made that they are not clearly defined.

Nested containers will actually solve some more advanced use cases I have.

Contributor

ibuildthecloud commented Apr 19, 2015

To me, the best approach would be to proceed to try to accomplish Pods under the banner of "nested containers." I can put together a design for nested containers as the comment has already been made that they are not clearly defined.

Nested containers will actually solve some more advanced use cases I have.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 19, 2015

Member

@ibuildthecloud do you think this should be looked at in a wider perspective? (I saw your comment in #11535 (comment)); i.e. discussing what the use-cases are, how (and if) they're technically possible and a global design? Looking at this (and other) discussions I think there are some interested parties.

(just speaking on a personal note here)

Member

thaJeztah commented Apr 19, 2015

@ibuildthecloud do you think this should be looked at in a wider perspective? (I saw your comment in #11535 (comment)); i.e. discussing what the use-cases are, how (and if) they're technically possible and a global design? Looking at this (and other) discussions I think there are some interested parties.

(just speaking on a personal note here)

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Apr 20, 2015

Contributor

@dcieslak19973 Actually all the "cluster-able" apps will remain how you deal them with Docker today, nothing changed.

I think POD does not take care what have already distributed, it cares what is not distributed/isolated enough.

So you can deploy zkp in one POD and let HBase components in other PODs consume it.

---------I'm a Split Line-------------------
It suddenly comes to me that HBase is a good example to show how POD works, I had a one Master one Slave HBase (used by OpenTSDB) setup in production ENV, here is my HBase Slave node for example:

HBase Slave:
vcap@ubuntu12-171:~$ monit summary
The Monit daemon 5.2.4 uptime: 48d 18h 49m 

Process 'cloud_agent'               running # part of my monitoring system
Process 'hadoop_datanode'           running
Process 'hbase_regionserver'        running
System 'system_ubuntu12-171'        running

So, if I use one Docker container to replace this VM, I have to run multiple processes inside this HBase Slave container, I have to use monit to manage them and use monit start all as CMD. It really sucks

But if I have POD, I can use three Docker containers instead, and they share the same volume, network etc but still isolated.

HBase Slave POD:

Container 'cloud_agent'               running
Container 'hadoop_datanode'           running
Container 'hbase_regionserver'        running

Network: 10.0.0.4
...

My cloud_agent's memory leak will never kill the whole node, and I can also update 3 containers one by one.
What's more, my HBase Master node can still treat this POD as a single "node" and consume it as what it did before (all three containers run on 10.0.0.4).

And yes, POD works like nested containers to some extend.

Contributor

resouer commented Apr 20, 2015

@dcieslak19973 Actually all the "cluster-able" apps will remain how you deal them with Docker today, nothing changed.

I think POD does not take care what have already distributed, it cares what is not distributed/isolated enough.

So you can deploy zkp in one POD and let HBase components in other PODs consume it.

---------I'm a Split Line-------------------
It suddenly comes to me that HBase is a good example to show how POD works, I had a one Master one Slave HBase (used by OpenTSDB) setup in production ENV, here is my HBase Slave node for example:

HBase Slave:
vcap@ubuntu12-171:~$ monit summary
The Monit daemon 5.2.4 uptime: 48d 18h 49m 

Process 'cloud_agent'               running # part of my monitoring system
Process 'hadoop_datanode'           running
Process 'hbase_regionserver'        running
System 'system_ubuntu12-171'        running

So, if I use one Docker container to replace this VM, I have to run multiple processes inside this HBase Slave container, I have to use monit to manage them and use monit start all as CMD. It really sucks

But if I have POD, I can use three Docker containers instead, and they share the same volume, network etc but still isolated.

HBase Slave POD:

Container 'cloud_agent'               running
Container 'hadoop_datanode'           running
Container 'hbase_regionserver'        running

Network: 10.0.0.4
...

My cloud_agent's memory leak will never kill the whole node, and I can also update 3 containers one by one.
What's more, my HBase Master node can still treat this POD as a single "node" and consume it as what it did before (all three containers run on 10.0.0.4).

And yes, POD works like nested containers to some extend.

@luebken

This comment has been minimized.

Show comment
Hide comment
@luebken

luebken Aug 17, 2015

I’m investigating patterns around building applications with containers. And from my point of view it seems that a concept like pods is getting more traction.

This discussion paused in april with the question if there is support from Docker decision makers in putting more effort in this proposal. This support is called for before putting more effort (e.g. docs, implementation) into it.

Did I miss something? What can we do to move this proposal forward?

luebken commented Aug 17, 2015

I’m investigating patterns around building applications with containers. And from my point of view it seems that a concept like pods is getting more traction.

This discussion paused in april with the question if there is support from Docker decision makers in putting more effort in this proposal. This support is called for before putting more effort (e.g. docs, implementation) into it.

Did I miss something? What can we do to move this proposal forward?

@aanand aanand referenced this issue in docker/compose Aug 24, 2015

Closed

compose scale (on swarm) with volumes_from #1899

@brendandburns

This comment has been minimized.

Show comment
Hide comment
@brendandburns

brendandburns Sep 2, 2015

@luebken thanks for your continued interest.

We continue to be interested in trying to move this forward, either in the context of Docker or the context of the OCI, however we have never had any feedback that there was buy-in from the Docker maintainers.

The issues raised by an end user in docker/compose#1899 clearly show the need for Pods as an API primitive.

@shykes @crosbymichael any further thoughts?

Thanks!
--brendan

@luebken thanks for your continued interest.

We continue to be interested in trying to move this forward, either in the context of Docker or the context of the OCI, however we have never had any feedback that there was buy-in from the Docker maintainers.

The issues raised by an end user in docker/compose#1899 clearly show the need for Pods as an API primitive.

@shykes @crosbymichael any further thoughts?

Thanks!
--brendan

@luebken

This comment has been minimized.

Show comment
Hide comment
@luebken

luebken Sep 2, 2015

Let me add that we see quite some interest in this topic with advanced users. After building a first version of their applications a typical first refactoring we see is “extracting platform concerns into a sidecar”. The last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we feel that the notion of a pod (or nested containers or container groups) is a simple to understand concept. This simplicity is crucial to the adoption of these concepts and it will improve the overall architecture of many applications.

luebken commented Sep 2, 2015

Let me add that we see quite some interest in this topic with advanced users. After building a first version of their applications a typical first refactoring we see is “extracting platform concerns into a sidecar”. The last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we feel that the notion of a pod (or nested containers or container groups) is a simple to understand concept. This simplicity is crucial to the adoption of these concepts and it will improve the overall architecture of many applications.

@wyaeld

This comment has been minimized.

Show comment
Hide comment
@wyaeld

wyaeld Sep 2, 2015

A year on, I suspect users are better off just including Kubernetes in
their solution if they want this. Its solid, and avoids enlarging the
docker footprint, which it seems they don't want to do.

On Wed, Sep 2, 2015 at 7:38 PM, Matthias Lübken notifications@github.com
wrote:

Let me add that we see quite some interest in this topic with advanced
users. After building a first version of their applications a typical first
refactoring we see is “extracting platform concerns into a sidecar”. The
last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we
feel that the notion of a pod (or nested containers or container groups) is
a simple to understand concept. This simplicity is crucial to the adoption
of these concepts and it will improve the overall architecture of many
applications.


Reply to this email directly or view it on GitHub
#8781 (comment).

wyaeld commented Sep 2, 2015

A year on, I suspect users are better off just including Kubernetes in
their solution if they want this. Its solid, and avoids enlarging the
docker footprint, which it seems they don't want to do.

On Wed, Sep 2, 2015 at 7:38 PM, Matthias Lübken notifications@github.com
wrote:

Let me add that we see quite some interest in this topic with advanced
users. After building a first version of their applications a typical first
refactoring we see is “extracting platform concerns into a sidecar”. The
last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we
feel that the notion of a pod (or nested containers or container groups) is
a simple to understand concept. This simplicity is crucial to the adoption
of these concepts and it will improve the overall architecture of many
applications.


Reply to this email directly or view it on GitHub
#8781 (comment).

@jessfraz jessfraz removed the kind/proposal label Sep 8, 2015

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Feb 25, 2016

Contributor

@shykes Any news on this? I'd really wish we could avoid further fragmentation by providing this low level concept across all deployment mechanisms.

Contributor

discordianfish commented Feb 25, 2016

@shykes Any news on this? I'd really wish we could avoid further fragmentation by providing this low level concept across all deployment mechanisms.

@jatins

This comment has been minimized.

Show comment
Hide comment
@jatins

jatins Jul 15, 2016

A unit like pod does seem to be something that people think in terms of. Kubernetes has it. Even AWS ECS has concept of tasks. From the ecs getting started guide:

Task definition: A description of an application that contains one or more container definitions.

If docker-compose also had the concept of a group of containers, I think it'll really unify the way we develop and the way we deploy.

cc: @shykes

jatins commented Jul 15, 2016

A unit like pod does seem to be something that people think in terms of. Kubernetes has it. Even AWS ECS has concept of tasks. From the ecs getting started guide:

Task definition: A description of an application that contains one or more container definitions.

If docker-compose also had the concept of a group of containers, I think it'll really unify the way we develop and the way we deploy.

cc: @shykes

@schmunk42

This comment has been minimized.

Show comment
Hide comment
@schmunk42

schmunk42 Jul 15, 2016

Contributor

We're using separate files for that at the moment, but couldn't we use that in one file (for docker-compose). Just thinking loud here...

version: '3'
name: 'default'
depends_on:
  - storage
services:
  web:
    image: nginx
    depends_on:
      - app
  app:
    image: some-app
    depends_on:
      - storage.db
networks:
  external_network: storage
---
version: '3'
name: 'storage'
services:
  db:
    image: mysql
Contributor

schmunk42 commented Jul 15, 2016

We're using separate files for that at the moment, but couldn't we use that in one file (for docker-compose). Just thinking loud here...

version: '3'
name: 'default'
depends_on:
  - storage
services:
  web:
    image: nginx
    depends_on:
      - app
  app:
    image: some-app
    depends_on:
      - storage.db
networks:
  external_network: storage
---
version: '3'
name: 'storage'
services:
  db:
    image: mysql
@jatins

This comment has been minimized.

Show comment
Hide comment
@jatins

jatins Jul 15, 2016

@schmunk42 If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

jatins commented Jul 15, 2016

@schmunk42 If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jul 15, 2016

Contributor

A pod is not just a way to describe them together, but a way to ensure that THROUGH THE ENTIRE SYSTEM they stay together and share fate and lifecycle and resources.

At this point, I don't think Docker should implement pods. We have most of the primitives we need to implement pods without all the complexity of a probably different interpretation. This idea has to be implemented from top to bottom or it's not worth doing.

Contributor

thockin commented Jul 15, 2016

A pod is not just a way to describe them together, but a way to ensure that THROUGH THE ENTIRE SYSTEM they stay together and share fate and lifecycle and resources.

At this point, I don't think Docker should implement pods. We have most of the primitives we need to implement pods without all the complexity of a probably different interpretation. This idea has to be implemented from top to bottom or it's not worth doing.

@schmunk42

This comment has been minimized.

Show comment
Hide comment
@schmunk42

schmunk42 Jul 18, 2016

Contributor

If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

That's configurable, I think. I have --- in my example which marks a new document in YAML, it should act like two separate files, but as said, it's just a rough idea.

The main reason for splitting up the stacks in our case is that we usually don't want or need to redeploy databases, caches, etc. Restarting an application with a new image is usually no big deal, but kill and removing your database container might take a while and prolonges downtime.

Contributor

schmunk42 commented Jul 18, 2016

If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

That's configurable, I think. I have --- in my example which marks a new document in YAML, it should act like two separate files, but as said, it's just a rough idea.

The main reason for splitting up the stacks in our case is that we usually don't want or need to redeploy databases, caches, etc. Restarting an application with a new image is usually no big deal, but kill and removing your database container might take a while and prolonges downtime.

@CrazyPandar

This comment has been minimized.

Show comment
Hide comment
@CrazyPandar

CrazyPandar Oct 15, 2017

+1 for this feature,we have some services need to be binded one by one in pairs. service a bind to local host, service b connect to a by localhost. If one container of service a exists there must be a b connect to it.

+1 for this feature,we have some services need to be binded one by one in pairs. service a bind to local host, service b connect to a by localhost. If one container of service a exists there must be a b connect to it.

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Oct 16, 2017

Contributor

@CrazyPandar With CRI present, this requirement is not solid any more.

Contributor

resouer commented Oct 16, 2017

@CrazyPandar With CRI present, this requirement is not solid any more.

@rycus86

This comment has been minimized.

Show comment
Hide comment
@rycus86

rycus86 May 14, 2018

I now I'm late to the party, but maybe have a look if this is something that could help people out here in lieu of native pod support: https://github.com/rycus86/podlike

Also an intro: https://blog.viktoradam.net/2018/05/14/podlike/

rycus86 commented May 14, 2018

I now I'm late to the party, but maybe have a look if this is something that could help people out here in lieu of native pod support: https://github.com/rycus86/podlike

Also an intro: https://blog.viktoradam.net/2018/05/14/podlike/

@Vanuan

This comment has been minimized.

Show comment
Hide comment
@Vanuan

Vanuan Jul 3, 2018

I think the interesting part here is not having another abstraction layer somewhere between services, tasks and containers. What makes pods interesting is use cases they needed for: sidecar, ambassador, adapter.

These patterns are based on being able to share resources (shared loopback network interface, shared filesystem) while keeping things in separate images. Putting container co-location and synchronized replication aside this can be implemented using some kind of magical network driver called pod-like:

services:
  myservice1-backend:
    image: myservice1
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice1-pod
  myservice1:
    image: envoy
    config: ... # listens to 127.0.0.1:80, myservice1:80
                # proxies myservice1:80 -> 127.0.0.1:81
                # proxies 127.0.0.1:80/myservice2 -> myservice2:80
    networks:
      - myservice1-pod
      - myapp-net

  myservice2-backend:
    image: myservice2
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice2-pod
  myservice2:
    image: envoy
    config: ... # listens to 127.0.0.1:80, 127.0.0.1:81
                # proxies myservice2:80 -> myservice2-backend:80
                # proxies 127.0.0.1:80/myservice1 -> myservice1:80
    networks:
      - myservice2-pod
      - myapp-net


networks:
  myservice1-pod:
    driver: pod-like
  myservice2-pod:
    driver: pod-like
  myapp-net:
    driver: overlay

So what use cases does it enable? Apart of stronger guarantee that services don't talk directly to each other (which can as well be enforced using code review)? Is this pod feature really worth the easiness of using 127.0.0.1 instead of myservice1-backend, myservice2-backend? You still need to type that 127.0.0.1 somewhere, right? And it doesn't look like you're getting much of re-usability with this pattern. Yes, you probably don't have to use template substitution for service hostnames in your configs, but that's it?

What would really be cool is use case like "hey, I want all my services talking to HTTP protocol to use this envoy thing, so that I could control traffic using this istio pilot thing". But pods don't enable that. What things are they useful for?

Vanuan commented Jul 3, 2018

I think the interesting part here is not having another abstraction layer somewhere between services, tasks and containers. What makes pods interesting is use cases they needed for: sidecar, ambassador, adapter.

These patterns are based on being able to share resources (shared loopback network interface, shared filesystem) while keeping things in separate images. Putting container co-location and synchronized replication aside this can be implemented using some kind of magical network driver called pod-like:

services:
  myservice1-backend:
    image: myservice1
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice1-pod
  myservice1:
    image: envoy
    config: ... # listens to 127.0.0.1:80, myservice1:80
                # proxies myservice1:80 -> 127.0.0.1:81
                # proxies 127.0.0.1:80/myservice2 -> myservice2:80
    networks:
      - myservice1-pod
      - myapp-net

  myservice2-backend:
    image: myservice2
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice2-pod
  myservice2:
    image: envoy
    config: ... # listens to 127.0.0.1:80, 127.0.0.1:81
                # proxies myservice2:80 -> myservice2-backend:80
                # proxies 127.0.0.1:80/myservice1 -> myservice1:80
    networks:
      - myservice2-pod
      - myapp-net


networks:
  myservice1-pod:
    driver: pod-like
  myservice2-pod:
    driver: pod-like
  myapp-net:
    driver: overlay

So what use cases does it enable? Apart of stronger guarantee that services don't talk directly to each other (which can as well be enforced using code review)? Is this pod feature really worth the easiness of using 127.0.0.1 instead of myservice1-backend, myservice2-backend? You still need to type that 127.0.0.1 somewhere, right? And it doesn't look like you're getting much of re-usability with this pattern. Yes, you probably don't have to use template substitution for service hostnames in your configs, but that's it?

What would really be cool is use case like "hey, I want all my services talking to HTTP protocol to use this envoy thing, so that I could control traffic using this istio pilot thing". But pods don't enable that. What things are they useful for?

@Vanuan

This comment has been minimized.

Show comment
Hide comment
@Vanuan

Vanuan Jul 6, 2018

One issue that loopback fixes is that developers don't need to fix theirs application for a situation where ip address suddenly change. Instead of resolving DN each time devs rely on the fact that it rarely changes so they just never bother resolving it again before trying to reconnect. For example, this happened with elasticsearch: elastic/elasticsearch#16412 (comment)

Vanuan commented Jul 6, 2018

One issue that loopback fixes is that developers don't need to fix theirs application for a situation where ip address suddenly change. Instead of resolving DN each time devs rely on the fact that it rarely changes so they just never bother resolving it again before trying to reconnect. For example, this happened with elasticsearch: elastic/elasticsearch#16412 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment