Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Make Pods (collections of containers) a first order container object. #8781

Open
brendandburns opened this issue Oct 26, 2014 · 74 comments

Comments

@brendandburns
Copy link

@brendandburns brendandburns commented Oct 26, 2014

Pods

This is a proposal to change the first order container object within the Docker API from a single container to a pod of containers.

A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.

This is somewhat related to #8637 but that proposal has much more to do with the namespacing of containers into a single namespace, than grouping containers into logical hosts for the purposes of scheduling, resource tracking, isolation and sharing.

In this proposal there are two sub-proposals:

  • The first is that a new API object, representing a Pod, be added to the Docker API.
  • The second is that this new API object replace the existing singleton container object in future versions of the Docker API.

Since these topics are somewhat orthogonal, I will address them each in separate sections.

Pods as an API object

Inherently, a Pod represents an atomic unit of an application. It is the smallest piece of the application that it makes sense to consider as a unit. Primarily this atomicity is in terms of running the Pod. A pod may be made up of many containers, but the state of those containers must be treated as atomic. Either they are all running under the same Docker daemon, or they are not. It does not make sense to have a partially running pod. Nor does it make sense to run different containers from a single pod in different Docker daemons.

There are numerous examples of such mult-container atomic units, for example:

  • A user-facing web server, and a side-car container that periodically sync's the server's content's from version control.
  • A primary database container, and a periodic backup container that copies the database out to network storage.
  • Multiple containers synchronizing work through IPC or shared memory.
  • Side-car containers that provide thick libraries and simplified APIs that other containers can consume (e.g. Master Election)

In all of these cases, the important characteristic is that the containers involved are symbiotic, it doesn't make sense to place the containers in these example pods onto different hosts.

What does it mean to be a Pod?

Pods share a network namespace (and consequently an IP address). Members of a pod can address eachother via localhost. Pods also share a set of data volumes, which the pods can use to share data between different containers. Importantly, pods do not share a chroot, so data volumes are the only way to share storage. Pods also share a resource hierarchy, though the individual containers within a pod may also have their own specific resource limits, these resource limits are subdivisions of the resources allocated to the entire pod.

Why pods instead of scheduled co-location?

Co-location via a scheduling system achieves some of the goals of a pod, but it has signficant downsides, including the fact that the containers don't share a network namespace (and thus have to rely on additional discovery mechanisms). Additionally, they don't share a cgroup, so you can not express a parasitic container that steals resources when feasible from a co-container in it's pod, instead that parasitic container steals from all containers on the host. Additionally, the fact that the linkages between the containers is expressed as scheduling constraints, rather than an explicit grouping of the containers makes the it harder to reason about the container group and also makes the scheduler more complicated.

Pods as the only way to run containers

It would be a somewhat significant revision to the Docker API to transform the current singleton containers into Pods of containers, but it is a worthwhile endeavor, because it will retain the simplicity of the API.

Put concretely, there is no reason to introduce two different API objects (singleton containers and pods), when a Pod of a single container can effectively represent the singleton case. Sticking to a single API object will limit complexity both in the code and in the documentation, and will also give user's a seamless evolution from single container Pods to more sophisticated multi-container pods.

Implementation and further details

Pods are a foundational part of the Kubernetes API. The Kubernetes API spec for a Pod can be found in the v1 API

A fully functional implementation of pods in terms of Docker containers can be found inside of kubelet.go.

@brendandburns brendandburns changed the title Proposal: Make Pods (collections of containers) the first order container object. Proposal: Make Pods (collections of containers) the a order container object. Oct 26, 2014
@brendandburns brendandburns changed the title Proposal: Make Pods (collections of containers) the a order container object. Proposal: Make Pods (collections of containers) a order container object. Oct 26, 2014
@brendandburns brendandburns changed the title Proposal: Make Pods (collections of containers) a order container object. Proposal: Make Pods (collections of containers) a first order container object. Oct 26, 2014
@thockin
Copy link
Contributor

@thockin thockin commented Oct 26, 2014

Awesome write-up. I think this idea would be great for the ecosystem overall. Less wrapping and hackery in kubernetes while bringing this simple but important idea to non-kubernetes users.

I think there are a number of steps toward the ultimate goal that could be delivered independently.

  1. No pod-level resources or volumes. This can be developed purely on top of existing constructs (as kubernetes has done). Build out the API and CLI experience. Maybe only shared netns, to keep scope simple. Bare containers imply a trivial pod-wrapper.

  2. Phase out bare container interfaces.

3...) Introduce more pod-level concepts one by one: Volumes, other namespaces, cgroups, GIDs and disk accounting, etc

The main point being we don't have to jump straight to the final product :)

Also as to "It does not make sense to have a partially running pod", we probably want to call out the run-to-completion aspect, and that restart policy applies per-pod, rather than per-container (or else re-debate that)

@jbeda
Copy link
Contributor

@jbeda jbeda commented Oct 26, 2014

I think that there is one big motivation that you forgot, @brendandburns. Specifically, the pod is an atomic scheduling unit. As we move to a world where we want to use every last resource (CPU, RAM) on a host we need to worry about how the system acts as we near 100% utilization.

Simple constraint based co-scheduling creates situations where the intent of the user is to land N containers on a single host.

A naive way to do this would be to have a constraint based co-scheduling algorithm. You could have an 'anchor' container that you schedule first. And then you specify that all of the 'secondary' containers would schedule to the same host as the 'anchor'. This is how systems like fleet work. The problem is that you may schedule the 'anchor' on a host that doesn't have enough room for the 'secondary' containers. To make things even more complicated, there may be cases where there is no 'primary' (all containers are peers that come and go) and there all of the containers in question aren't known a-priori.

By having users specify workloads in terms of pods, we have a resource scheduling specification that sites above the individual containers. This separates the idea of what resources are needed on a machine from what code (in the form of a set of containers) will be running on that machine.

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Oct 31, 2014

Thanks!

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Oct 31, 2014

@aanand this maybe interesting with your current proposal.

@tristanz
Copy link

@tristanz tristanz commented Oct 31, 2014

+1 for pods in docker/Fig/docker cluster. We've found pod concept to be very helpful way to organize sidekicks and greatly simplifies working with groups of containers.

Absent pods, the tendency is to overload the container with services that should really be split out. @brendandburns examples are spot on. While you can try to have sidekicks follow the main container around without a pod concept, our experience with Fleet is that this is painful. I suspect the Docker Cluster proposal could avoid a lot of extra bloat by adding pods to Docker proper.

@kelseyhightower
Copy link

@kelseyhightower kelseyhightower commented Oct 31, 2014

I've had experience dealing with pods during my work with k8s and I gotta say it feels right. It also helps push the idea shared by many in the docker community that each container should run a single application. Using pods makes that advice easy to follow since you have the pod construct to ensure the collection of related containers are managed as a single unit.

@ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented Nov 1, 2014

If a pod is just containers that share a netns, volumes, and cgroups then why don't we just add custom cgroup support (#8551). If you had the ability to say --cgroup=containerid then a pod can be done by just launching a "pod container" like

docker run -name mypod some-blank-image

Then

docker run --volumes-from mypod --cgroup=mypod --netns=mypod app1image
docker run --volumes-from mypod --cgroup=mypod --netns=mypod app2image

Because of the --netns=containerid, that automatically implies co-location from a scheduling perspective.

@brendandburns
Copy link
Author

@brendandburns brendandburns commented Nov 1, 2014

Placing everything into the same cgroup would require that all of the containers in that cgroup share the same resource and memory constraints (and possibly fight with each other)

Imagine that I have two containers:

  • A web serving container which is user facing and thus needs lots of CPU and high QoS.
  • A background sync. container which loads data, it doesn't need much CPU and limited memory.

I don't want a memory leak/bug in my background sync. container to steal memory or CPU from my user facing web container (and even worse possibly cause it to run out of memory and crash). I want to have two different sets of tight resource constraints for the two different containers, and I can't achieve that if they share a cgroup. Support for sub-groups within the original pod cgroup would mitigate this somewhat, but at that point, you basically have a Pod anyway.

There is an additional problem with the "place it in a pod container" approach in that because the api calls to create the collection of pods are not atomic, you can get into situations where the scheduler can not complete the scheduling. You have a sequence of API calls to create the N containers that make up the pod, and it is no longer an atomic operation that either suceeds or fails, it makes the logic much more complicated, since a failure at the end would force you to roll-back a bunch of other operations. We already have to do this in Kubernetes in order to achieve the shared network namespace, and its somewhat painful.

@ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented Nov 2, 2014

I don't think pods are really an obvious concept to users. Most people I talk to about Kubernetes do not get pods at all. So I don't disagree with what you want to accomplish, I just think it needs to be presented in a different way.

First, @brendandburns your issues with regards to scheduling can be address by just separating create and start. Docker CLI didn't have it before, but the API was always there. docker create shouldn't schedule a container to a host.

It sounds like all that is needed is that in the --cgroups call I specified that you allow it to create a sub-cgroup of the other container, so --cgroup=child-of:containerid. Now this raises a lot of other problems because I don't think systemd has such a concept.

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Nov 3, 2014

Are there some public use cases or files from the community of how pods are currently being used? Are people actually using them as intended or are you seeing many people running a pod with only one container?

@thockin
Copy link
Contributor

@thockin thockin commented Nov 3, 2014

We've seen both "correct" uses and "less-correct" uses. I put those in
quotes because, frankly, who are we to tell people how to use a particular
tool? That said, we based the pod design on significant experience running
jobs both without it and with it. I don't have stats at hand, but it was a
non-trivial fraction of jobs internally that use more than one container in
a pod. The number quickly tails off after 3 containers in a pod.

On Mon, Nov 3, 2014 at 12:24 PM, Michael Crosby notifications@github.com
wrote:

Are there some public use cases or files from the community of how pods
are currently being used? Are people actually using them as intended or are
you seeing many people running a pod with only one container?

Reply to this email directly or view it on GitHub
#8781 (comment).

@tiborvass
Copy link
Collaborator

@tiborvass tiborvass commented Nov 4, 2014

Are pods just a group of containers (as described in the proposal), or are they a group of (pods OR containers) ? In other words, is there such a thing as pods of pods of containers, and if so, what are the typical usecases for those?

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Nov 4, 2014

@thockin From what I understood, the way pods are implemented (sharing a network namespace) was because there was no better way to handle real linking of containers (due to shortcomings of links, esp. pre /etc/hosts population). Is that not the case?

I too would like to understand pods vs groups.
I feel like shared net namespaces is only something you'd want in certain scenarios (e.g. a simple LAMP stack where nothing is similar), but if links were better (and they are pretty good in 1.3) it would fit both the simple and the more complex use-cases.
That's my view, and I'm sure I'm missing something to what pods are doing.

@timothysc
Copy link

@timothysc timothysc commented Nov 4, 2014

+1 to take the model into docker proper.

@thockin
Copy link
Contributor

@thockin thockin commented Nov 4, 2014

Pods are groups of containers. As much as a recursive definition sounded
fun, I could not come up with a single use case where that was the only
solution or even the best solution. KISS and YAGNI

On Tue, Nov 4, 2014 at 6:25 AM, Tibor Vass notifications@github.com wrote:

Are pods just a group of containers (as described in the proposal), or are
they a group of (pods OR containers) ? In other words, is there such a
thing as pods of pods of containers, and if so, what are the typical
usecases for those?

Reply to this email directly or view it on GitHub
#8781 (comment).

@jbeda
Copy link
Contributor

@jbeda jbeda commented Nov 4, 2014

@tiborvass As implemented in Kubernetes (and internally at Google) there is no sub-nesting of pods. The main reason for this is that it complicates the networking even further.

@cpuguy83 I think that the idea of grouping for applications is being conflated with grouping for resource allocation and placement. In my mind these are two separate types of objects. Based on the experiences at Google, you end up with objects at the following layers -- perhaps overly simplified but useful for this discussion:

  • Container -- (Google internal: Task) this is the smallest unit that can be monitored, restarted, upgraded, named, etc.
  • Pod -- (Google internal: Alloc) -- A resource scheduling primitive. This can also be thought of as a "resource reservation".
  • Replicated Pod Set -- (Google internal: Job) -- this is a set of Pods that are horizontally replicated. First generation systems at Google had this be a pretty structured concept (numbered array of replicas). With Kubernetes and Omega is a more loosely coupled concept consisting of a label selector/query and a replication controller. In most application architectures this is what you'd consider a tier or perhaps a simple micro-service.
  • Application Config -- This is the sum total configuration of all resources (including things not listed here) that makes up an application.

Something like fig has traditionally played at the "Application Config" level. In my mind, systems like Panamax also fit in at that level.

As to why the containers in a pod share a network namespace -- we discussed this quite a bit before settling on this model for Kubernetes and considered the following options:

  1. Each container in the pod gets its own IP/netns that is taken from the shared bridge for the node.
  2. The pod gets its own IP and bridge. Each container gets an IP that is private to that Pod and is NATed out through the pod IP. In this way there is an IP space within the pod that gets NATed out of the Pod.
  3. The pod gets its own IP and all of the containers in the pod share it.

We went with option 3 because the primary use case for Pods is for relatively tightly coupled containers. If the containers are different tiers in an application (for example, database vs. web) then they really should be different pods communicating through links/naming/service discovery. The common case for Pods is when you have a set of containers that were built to work together.

If you'd be tempted to run something like supervisord inside a container you could instead explode it out to a set of different containers that can be visible from the container management systems. They'd then be individually upgradable, monitorable and resource isolated. You could then have them run at different QoS so that the less critical services could be preempted by more critical services.

@thockin
Copy link
Contributor

@thockin thockin commented Nov 4, 2014

@cpuguy83 Shared netns was not just about links, though it did start there. We wanted to make a simple primitive that is purpose-built for the tight relationship that containers-in-a-pod have. Links are fancy and abstract and slow. We did not want or need that. What we wanted was a high-performance, low-overhead way for things to communicate. loopback interfaces fit the bill.

As soon as other namespaces can be shared (e.g. IPC) that is an obvious extension of pods.

Pods are also at least conceptually about resource sharing. We can't yet put multiple containers under a shared ancestor cgroup, but you can rest assured it will matter before too long.

Regarding LAMP: don't think of a pod as LAMP, think of one pod as A, one pod as M, and one pod as P, all running on L. Any of those might have need for helper apps (think symbiosis) or might not. It's the symbiotic helper apps for which pods are intended. Not for apache - mysql comms.

@tiborvass
Copy link
Collaborator

@tiborvass tiborvass commented Nov 4, 2014

@thockin @jbeda Thanks for clarifying this, I think it helps a lot more to understand the concept. IMHO, it would be useful to start explaining in the proposal that Pods are resource-oriented units, and not for Apache - mysql communication.

I would like to understand what is overlapping with what in different proposals and what are the advantages/downsides. For example, could we think of this proposal as being an important ingredient for separation of concerns? As in, resource management on the one-host level, should not be done by something like #8859 (equivalent of Replication Pod Set?) but by this notion of pods. However, #8859 could be the horizontal scaling ingredient, and #8637 would be one level up, at the application level ?

Maybe this should be discussed on IRC; I just find it hard to analyze one proposal without seeing the bigger picture.

@jbeda
Copy link
Contributor

@jbeda jbeda commented Nov 4, 2014

Right now the Docker Clustering (#8859) proposal is missing both a preferred method of replication and any higher level organizational primitives/tools (labels). Both of these are going to be necessary.

#8637 is really about object namespacing, I think. It is being conflated with application config. (Namespace here is at the cluster/API level, not at the linux kernel here -- not enough words for the concepts we have). See https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md for how we are approaching that in Kubernetes.

@bgrant0607
Copy link

@bgrant0607 bgrant0607 commented Nov 4, 2014

@thockin Virtually all production services at Google use our internal legacy equivalent of pods. The most common multi-container use cases include pushing log data (not unlike fluentd or logstash) and content/config management / data loading.

However, pods are useful even without multiple containers. Why? Data and resource durability. Pods make it possible to keep data resident (e.g., in volumes) and to hold onto resources across container restarts and even across full container replacements, such as for image updates.

@ibuildthecloud Yes, features like shared namespaces and shared cgroups would make it easier for us to implement pods outside Docker, but would make our wrapper even fatter.

@philips
Copy link
Contributor

@philips philips commented Nov 5, 2014

@ibuildthecloud I don't understand your concern about subgroups + systemd. systemd has the concept of a slice hierarchy: http://www.freedesktop.org/software/systemd/man/systemd.slice.html

Overall, +1 on this proposal overall. It gives users an abstraction to avoid building complex multi-component containers.

@ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented Nov 6, 2014

@philips I was pointing out that if we were to do what I proposed, --cgroup=child-of:containerid, that that wouldn't work with systemd. systemd will create a scope for container X, and then if you were to say --cgroup=child-of:X that would be a scope that is a child of a scope.

I'd like to reiterate that I don't think pods is a straight forward concept to users. If you want this feature to be in Docker I think your going to have to find a different approach or way of explaining it to users.

It seems to me that the root thing that Google wants to address with pods is really about being able to effectively manage resources between containers. The shared volumes and netns are already supported. By combining shared volumes + netns + resource containers = pods doesn't make sense to me. It's a very specific configuration.

I think you would have more success with getting a proposal through if you introduced a concept that was purely oriented towards managing resources between containers. For example libvirt has resource partitions that I think addresses what you would need. So if Docker had resource partitions natively, then a Kubernetes pod == shared volumes + netns + resource partition.

@dqminh
Copy link
Contributor

@dqminh dqminh commented Nov 6, 2014

+1 for this proposal overall

@ibuildthecloud iirc, sharing resources is just one aspect of this, there's also a need to atomically schedule a set of containers that make up a pod. The system has to make sure that either all containers that make up a pod can be scheduled on the host or none at all, which is currently hard to do with container as the only primitive.

@aanand
Copy link
Contributor

@aanand aanand commented Nov 6, 2014

I agree that it’s worth talking about how this will fit with #8637. Both serve use cases which I believe are largely orthogonal (indeed, I suspect they appeal to largely disjunct sets of users), so my hope is that they can coexist, as long as we design both of them right.

The purposes of pods have been amply outlined. (I don’t have strong feelings about their usefulness, because I haven’t encountered the problems they solve.)

I see the primary purposes of groups being:

  • Scope container names so that multiple people can work on multiple apps without treading on one another’s toes
  • Provide some of the functionality server-side which tools like Fig currently do client-side to improve performance (e.g. Fig’s filtering of the global container list is unusably slow on hosts with thousands of running containers)
  • Provide a way to create/update multiple, associated containers in a single call - in a clustered future, this enables the cluster to make scheduling decisions about where to place those containers based on the fact that they’re a logical group. (A bit like how pods are expressly single-host, but far more generic and up to the particular cluster implementation).

Unlike pods, groups are intended to:

  • exist across an arbitrary number of cluster nodes
  • contain an application’s entire stack (or any arbitrary slice of it)

This makes them - as this proposal has pointed out - much less about resource sharing and much more about logical grouping.

For this reason, I hope they can coexist just fine - intuitively, it seems as though a group should be able to contain containers, pods or a mixture of both.

The major potential pain points I see are:

  • Naming. My current thinking is that containers within a group have names prefixed with <groupname>/. Group names thus live in the same namespace as container names. How will pods be named? How will containers within pods be named?
  • Educating users. As has already been raised, pods are weird to someone who doesn’t understand their use cases. Explaining the difference between groups and pods, and when to use one or the other, is going to be tough.
@kelseyhightower
Copy link

@kelseyhightower kelseyhightower commented Apr 15, 2015

@crosbymichael, @vieux @timothysc Are there plans to support a pod like concept in Swarm? If so would it make sense to push ahead on this proposal?

@timothysc
Copy link

@timothysc timothysc commented Apr 15, 2015

@kelseyhightower I would hope so, but alas I believe this issue has stalled.

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Apr 15, 2015

@kelseyhightower I think we need to make a decision on what level abstraction should go into docker core for this.
Pods are possible with Docker today, but only when using Docker in a particular way, ie. docker run --net container:<id> --ipc container:<id> --cgroup-parent <cgroup path>.

Are pods an api only concept? Does it make sense to create a pod from the CLI (maybe docker run --parent <container id> ?).

Probably the best way to move forward with this is to introduce a docs-only style proposal that details out what exactly this API looks like, how users might use it, etc, then something can either be approved or suggest improvements on.
I think with this issue the way it is someone can only say "👍 pods are cool"

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Apr 15, 2015

Also with a docs-only proposal, if the writer is not willing/doesn't have the time to implement, it becomes a LOT easier for someone else to come along and build it.

@thockin
Copy link
Contributor

@thockin thockin commented Apr 15, 2015

The problem is that writing such a doc is a non-trivial time commitment
with no clear indicator that it is something Docker actually wants. Nobody
wants to force themselves into a situation where they are not welcome.

I think with this issue the way it is someone can only say " pods are
cool"

If some of the movers and shakers said "we get it and we think it is a good
idea, please start to work out the details" it would be in a very different
place from where it is now.

Here's how I see the rationale for it. Docker is a piece of a larger
system for work management, whether that system is Swarm, Kubernetes,
Mesos, Lattice, ECS, or any of the dozen others that have emerged this
year. Users of these systems sometimes need to be able to schedule
containers that are co-located and linked in some way(s) (be that --link,
--net=container, --ipc=container, -- volumes-from, etc). Being able to
write a declarative definition of what you want to run (viz. Docker
compose) is powerful and simple.

For lack of this atom in docker, people either make monolith containers
running multiple apps or else build further rectification systems on top of
Docker's atoms. Having Docker implement pods (under whatever name)
directly means that all of these orchestration systems get thinner and
highlight Docker's capabilities rather than burying docker in wrappers and
layers. It means that users can do, by hand, what the orchestrations
systems do.

I really see it as a matter of convergent evolution - multiple systems have
(or will) evolve this same basic primitive. It's better for Docker to
offer it directly. I'd much rather be talking about how Docker pods work
than how we manipulate Docker into producing a pod construct.

I'd be happy to help work through how the CLI and API might look if only I
knew I wasn't wasting my time.

On Wed, Apr 15, 2015 at 8:04 AM, Brian Goff notifications@github.com
wrote:

Also with a docs-only proposal, if the writer is not willing/doesn't have
the time to implement, it becomes a LOT easier for someone else to come
along and build it.


Reply to this email directly or view it on GitHub
#8781 (comment).

@brendandburns
Copy link
Author

@brendandburns brendandburns commented Apr 15, 2015

@cpuguy83 as the original author of this Issue, @thockin is precisely correct. The reason that this exists as a issue, rather than a design doc, or working implementation is because the implementation is significant, and non-trivial, and I didn't/don't want to force things if they aren't wanted, or presume any particular design if the developers of Docker wanted to go in a different direction.

If the decision makers at Docker are believe (as I do) that this is a good modification of the Docker API, and it actually has a realistic chance of becoming a part of the Docker API, then I am more than willing to work out a more detailed design document of what it could look like and hand it over to others for implementation, as well as working on the implementation myself.

Perhaps @shykes @crosbymichael et al. can confirm that people are interested in seeing Pods as part of the Docker node API?

Thanks (and thanks to everyone for their continued interest in moving this proposal forward)
Brendan

@resouer
Copy link
Contributor

@resouer resouer commented Apr 19, 2015

+1 for this as i have been forced to use docker like a vm with mmonit inside for such a long time, its really disgusting as i have to write down 'monit start xxx' instead of the real start command in CMD part.

The 'real' apps in real world are always bunch of processes. 12 factor apps? Don't forget it's an ad from a traditional paas provider ...

@dcieslak19973
Copy link

@dcieslak19973 dcieslak19973 commented Apr 19, 2015

I'm relatively new to Docker and Swarm, but it seems to me that Swarm should rely more on Discovery Services for linking services together. Perhaps I don't fully understand the boundary between Docker and Kubernetes, though.

Also, I'm curious how the concept as described here so far would work with things like ZooKeeper, Mongo and other "cluster-able" services that rely on small config changes (e.g. BrokerID), but are otherwise configured identically.

Suppose I want to run Kafka and HBase in a cluster, both of which rely on Zookeeper. Which cluster pod would specify Zookeeper?

@ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented Apr 19, 2015

To me, the best approach would be to proceed to try to accomplish Pods under the banner of "nested containers." I can put together a design for nested containers as the comment has already been made that they are not clearly defined.

Nested containers will actually solve some more advanced use cases I have.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 19, 2015

@ibuildthecloud do you think this should be looked at in a wider perspective? (I saw your comment in #11535 (comment)); i.e. discussing what the use-cases are, how (and if) they're technically possible and a global design? Looking at this (and other) discussions I think there are some interested parties.

(just speaking on a personal note here)

@resouer
Copy link
Contributor

@resouer resouer commented Apr 20, 2015

@dcieslak19973 Actually all the "cluster-able" apps will remain how you deal them with Docker today, nothing changed.

I think POD does not take care what have already distributed, it cares what is not distributed/isolated enough.

So you can deploy zkp in one POD and let HBase components in other PODs consume it.

---------I'm a Split Line-------------------
It suddenly comes to me that HBase is a good example to show how POD works, I had a one Master one Slave HBase (used by OpenTSDB) setup in production ENV, here is my HBase Slave node for example:

HBase Slave:
vcap@ubuntu12-171:~$ monit summary
The Monit daemon 5.2.4 uptime: 48d 18h 49m 

Process 'cloud_agent'               running # part of my monitoring system
Process 'hadoop_datanode'           running
Process 'hbase_regionserver'        running
System 'system_ubuntu12-171'        running

So, if I use one Docker container to replace this VM, I have to run multiple processes inside this HBase Slave container, I have to use monit to manage them and use monit start all as CMD. It really sucks

But if I have POD, I can use three Docker containers instead, and they share the same volume, network etc but still isolated.

HBase Slave POD:

Container 'cloud_agent'               running
Container 'hadoop_datanode'           running
Container 'hbase_regionserver'        running

Network: 10.0.0.4
...

My cloud_agent's memory leak will never kill the whole node, and I can also update 3 containers one by one.
What's more, my HBase Master node can still treat this POD as a single "node" and consume it as what it did before (all three containers run on 10.0.0.4).

And yes, POD works like nested containers to some extend.

@luebken
Copy link

@luebken luebken commented Aug 17, 2015

I’m investigating patterns around building applications with containers. And from my point of view it seems that a concept like pods is getting more traction.

This discussion paused in april with the question if there is support from Docker decision makers in putting more effort in this proposal. This support is called for before putting more effort (e.g. docs, implementation) into it.

Did I miss something? What can we do to move this proposal forward?

@brendandburns
Copy link
Author

@brendandburns brendandburns commented Sep 2, 2015

@luebken thanks for your continued interest.

We continue to be interested in trying to move this forward, either in the context of Docker or the context of the OCI, however we have never had any feedback that there was buy-in from the Docker maintainers.

The issues raised by an end user in docker/compose#1899 clearly show the need for Pods as an API primitive.

@shykes @crosbymichael any further thoughts?

Thanks!
--brendan

@luebken
Copy link

@luebken luebken commented Sep 2, 2015

Let me add that we see quite some interest in this topic with advanced users. After building a first version of their applications a typical first refactoring we see is “extracting platform concerns into a sidecar”. The last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we feel that the notion of a pod (or nested containers or container groups) is a simple to understand concept. This simplicity is crucial to the adoption of these concepts and it will improve the overall architecture of many applications.

@wyaeld
Copy link

@wyaeld wyaeld commented Sep 2, 2015

A year on, I suspect users are better off just including Kubernetes in
their solution if they want this. Its solid, and avoids enlarging the
docker footprint, which it seems they don't want to do.

On Wed, Sep 2, 2015 at 7:38 PM, Matthias Lübken notifications@github.com
wrote:

Let me add that we see quite some interest in this topic with advanced
users. After building a first version of their applications a typical first
refactoring we see is “extracting platform concerns into a sidecar”. The
last example was sensu for monitoring and fluentd for logging.

Although this can be achieved with opening up the individual namespaces we
feel that the notion of a pod (or nested containers or container groups) is
a simple to understand concept. This simplicity is crucial to the adoption
of these concepts and it will improve the overall architecture of many
applications.


Reply to this email directly or view it on GitHub
#8781 (comment).

@discordianfish
Copy link
Contributor

@discordianfish discordianfish commented Feb 25, 2016

@shykes Any news on this? I'd really wish we could avoid further fragmentation by providing this low level concept across all deployment mechanisms.

@jatins
Copy link

@jatins jatins commented Jul 15, 2016

A unit like pod does seem to be something that people think in terms of. Kubernetes has it. Even AWS ECS has concept of tasks. From the ecs getting started guide:

Task definition: A description of an application that contains one or more container definitions.

If docker-compose also had the concept of a group of containers, I think it'll really unify the way we develop and the way we deploy.

cc: @shykes

@schmunk42
Copy link
Contributor

@schmunk42 schmunk42 commented Jul 15, 2016

We're using separate files for that at the moment, but couldn't we use that in one file (for docker-compose). Just thinking loud here...

version: '3'
name: 'default'
depends_on:
  - storage
services:
  web:
    image: nginx
    depends_on:
      - app
  app:
    image: some-app
    depends_on:
      - storage.db
networks:
  external_network: storage
---
version: '3'
name: 'storage'
services:
  db:
    image: mysql
@jatins
Copy link

@jatins jatins commented Jul 15, 2016

@schmunk42 If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

@thockin
Copy link
Contributor

@thockin thockin commented Jul 15, 2016

A pod is not just a way to describe them together, but a way to ensure that THROUGH THE ENTIRE SYSTEM they stay together and share fate and lifecycle and resources.

At this point, I don't think Docker should implement pods. We have most of the primitives we need to implement pods without all the complexity of a probably different interpretation. This idea has to be implemented from top to bottom or it's not worth doing.

@schmunk42
Copy link
Contributor

@schmunk42 schmunk42 commented Jul 18, 2016

If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual docker-compse.yml file?

That's configurable, I think. I have --- in my example which marks a new document in YAML, it should act like two separate files, but as said, it's just a rough idea.

The main reason for splitting up the stacks in our case is that we usually don't want or need to redeploy databases, caches, etc. Restarting an application with a new image is usually no big deal, but kill and removing your database container might take a while and prolonges downtime.

@CrazyPandar
Copy link

@CrazyPandar CrazyPandar commented Oct 15, 2017

+1 for this feature,we have some services need to be binded one by one in pairs. service a bind to local host, service b connect to a by localhost. If one container of service a exists there must be a b connect to it.

@resouer
Copy link
Contributor

@resouer resouer commented Oct 16, 2017

@CrazyPandar With CRI present, this requirement is not solid any more.

@rycus86
Copy link

@rycus86 rycus86 commented May 14, 2018

I now I'm late to the party, but maybe have a look if this is something that could help people out here in lieu of native pod support: https://github.com/rycus86/podlike

Also an intro: https://blog.viktoradam.net/2018/05/14/podlike/

@Vanuan
Copy link

@Vanuan Vanuan commented Jul 3, 2018

I think the interesting part here is not having another abstraction layer somewhere between services, tasks and containers. What makes pods interesting is use cases they needed for: sidecar, ambassador, adapter.

These patterns are based on being able to share resources (shared loopback network interface, shared filesystem) while keeping things in separate images. Putting container co-location and synchronized replication aside this can be implemented using some kind of magical network driver called pod-like:

services:
  myservice1-backend:
    image: myservice1
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice1-pod
  myservice1:
    image: envoy
    config: ... # listens to 127.0.0.1:80, myservice1:80
                # proxies myservice1:80 -> 127.0.0.1:81
                # proxies 127.0.0.1:80/myservice2 -> myservice2:80
    networks:
      - myservice1-pod
      - myapp-net

  myservice2-backend:
    image: myservice2
    config: ... # listens to 127.0.0.1:81,
                # talks to other services via 127.0.0.1:80/myserviceN
    networks:
      - myservice2-pod
  myservice2:
    image: envoy
    config: ... # listens to 127.0.0.1:80, 127.0.0.1:81
                # proxies myservice2:80 -> myservice2-backend:80
                # proxies 127.0.0.1:80/myservice1 -> myservice1:80
    networks:
      - myservice2-pod
      - myapp-net


networks:
  myservice1-pod:
    driver: pod-like
  myservice2-pod:
    driver: pod-like
  myapp-net:
    driver: overlay

So what use cases does it enable? Apart of stronger guarantee that services don't talk directly to each other (which can as well be enforced using code review)? Is this pod feature really worth the easiness of using 127.0.0.1 instead of myservice1-backend, myservice2-backend? You still need to type that 127.0.0.1 somewhere, right? And it doesn't look like you're getting much of re-usability with this pattern. Yes, you probably don't have to use template substitution for service hostnames in your configs, but that's it?

What would really be cool is use case like "hey, I want all my services talking to HTTP protocol to use this envoy thing, so that I could control traffic using this istio pilot thing". But pods don't enable that. What things are they useful for?

@Vanuan
Copy link

@Vanuan Vanuan commented Jul 6, 2018

One issue that loopback fixes is that developers don't need to fix theirs application for a situation where ip address suddenly change. Instead of resolving DN each time devs rely on the fact that it rarely changes so they just never bother resolving it again before trying to reconnect. For example, this happened with elasticsearch: elastic/elasticsearch#16412 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet