-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Make Pods (collections of containers) a first order container object. #8781
Comments
Awesome write-up. I think this idea would be great for the ecosystem overall. Less wrapping and hackery in kubernetes while bringing this simple but important idea to non-kubernetes users. I think there are a number of steps toward the ultimate goal that could be delivered independently.
3...) Introduce more pod-level concepts one by one: Volumes, other namespaces, cgroups, GIDs and disk accounting, etc The main point being we don't have to jump straight to the final product :) Also as to "It does not make sense to have a partially running pod", we probably want to call out the run-to-completion aspect, and that restart policy applies per-pod, rather than per-container (or else re-debate that) |
I think that there is one big motivation that you forgot, @brendandburns. Specifically, the pod is an atomic scheduling unit. As we move to a world where we want to use every last resource (CPU, RAM) on a host we need to worry about how the system acts as we near 100% utilization. Simple constraint based co-scheduling creates situations where the intent of the user is to land N containers on a single host. A naive way to do this would be to have a constraint based co-scheduling algorithm. You could have an 'anchor' container that you schedule first. And then you specify that all of the 'secondary' containers would schedule to the same host as the 'anchor'. This is how systems like fleet work. The problem is that you may schedule the 'anchor' on a host that doesn't have enough room for the 'secondary' containers. To make things even more complicated, there may be cases where there is no 'primary' (all containers are peers that come and go) and there all of the containers in question aren't known a-priori. By having users specify workloads in terms of pods, we have a resource scheduling specification that sites above the individual containers. This separates the idea of what resources are needed on a machine from what code (in the form of a set of containers) will be running on that machine. |
Thanks! |
@aanand this maybe interesting with your current proposal. |
+1 for pods in docker/Fig/docker cluster. We've found pod concept to be very helpful way to organize sidekicks and greatly simplifies working with groups of containers. Absent pods, the tendency is to overload the container with services that should really be split out. @brendandburns examples are spot on. While you can try to have sidekicks follow the main container around without a pod concept, our experience with Fleet is that this is painful. I suspect the Docker Cluster proposal could avoid a lot of extra bloat by adding pods to Docker proper. |
I've had experience dealing with pods during my work with k8s and I gotta say it feels right. It also helps push the idea shared by many in the docker community that each container should run a single application. Using pods makes that advice easy to follow since you have the pod construct to ensure the collection of related containers are managed as a single unit. |
If a pod is just containers that share a netns, volumes, and cgroups then why don't we just add custom cgroup support (#8551). If you had the ability to say --cgroup=containerid then a pod can be done by just launching a "pod container" like
Then
Because of the --netns=containerid, that automatically implies co-location from a scheduling perspective. |
Placing everything into the same cgroup would require that all of the containers in that cgroup share the same resource and memory constraints (and possibly fight with each other) Imagine that I have two containers:
I don't want a memory leak/bug in my background sync. container to steal memory or CPU from my user facing web container (and even worse possibly cause it to run out of memory and crash). I want to have two different sets of tight resource constraints for the two different containers, and I can't achieve that if they share a cgroup. Support for sub-groups within the original pod cgroup would mitigate this somewhat, but at that point, you basically have a Pod anyway. There is an additional problem with the "place it in a pod container" approach in that because the api calls to create the collection of pods are not atomic, you can get into situations where the scheduler can not complete the scheduling. You have a sequence of API calls to create the N containers that make up the pod, and it is no longer an atomic operation that either suceeds or fails, it makes the logic much more complicated, since a failure at the end would force you to roll-back a bunch of other operations. We already have to do this in Kubernetes in order to achieve the shared network namespace, and its somewhat painful. |
I don't think pods are really an obvious concept to users. Most people I talk to about Kubernetes do not get pods at all. So I don't disagree with what you want to accomplish, I just think it needs to be presented in a different way. First, @brendandburns your issues with regards to scheduling can be address by just separating create and start. Docker CLI didn't have it before, but the API was always there. It sounds like all that is needed is that in the |
Are there some public use cases or files from the community of how pods are currently being used? Are people actually using them as intended or are you seeing many people running a pod with only one container? |
We've seen both "correct" uses and "less-correct" uses. I put those in On Mon, Nov 3, 2014 at 12:24 PM, Michael Crosby notifications@github.com
|
Are pods just a group of containers (as described in the proposal), or are they a group of (pods OR containers) ? In other words, is there such a thing as pods of pods of containers, and if so, what are the typical usecases for those? |
@thockin From what I understood, the way pods are implemented (sharing a network namespace) was because there was no better way to handle real linking of containers (due to shortcomings of links, esp. pre /etc/hosts population). Is that not the case? I too would like to understand pods vs groups. |
+1 to take the model into docker proper. |
Pods are groups of containers. As much as a recursive definition sounded On Tue, Nov 4, 2014 at 6:25 AM, Tibor Vass notifications@github.com wrote:
|
@tiborvass As implemented in Kubernetes (and internally at Google) there is no sub-nesting of pods. The main reason for this is that it complicates the networking even further. @cpuguy83 I think that the idea of grouping for applications is being conflated with grouping for resource allocation and placement. In my mind these are two separate types of objects. Based on the experiences at Google, you end up with objects at the following layers -- perhaps overly simplified but useful for this discussion:
Something like fig has traditionally played at the "Application Config" level. In my mind, systems like Panamax also fit in at that level. As to why the containers in a pod share a network namespace -- we discussed this quite a bit before settling on this model for Kubernetes and considered the following options:
We went with option 3 because the primary use case for Pods is for relatively tightly coupled containers. If the containers are different tiers in an application (for example, database vs. web) then they really should be different pods communicating through links/naming/service discovery. The common case for Pods is when you have a set of containers that were built to work together. If you'd be tempted to run something like supervisord inside a container you could instead explode it out to a set of different containers that can be visible from the container management systems. They'd then be individually upgradable, monitorable and resource isolated. You could then have them run at different QoS so that the less critical services could be preempted by more critical services. |
@cpuguy83 Shared netns was not just about links, though it did start there. We wanted to make a simple primitive that is purpose-built for the tight relationship that containers-in-a-pod have. Links are fancy and abstract and slow. We did not want or need that. What we wanted was a high-performance, low-overhead way for things to communicate. loopback interfaces fit the bill. As soon as other namespaces can be shared (e.g. IPC) that is an obvious extension of pods. Pods are also at least conceptually about resource sharing. We can't yet put multiple containers under a shared ancestor cgroup, but you can rest assured it will matter before too long. Regarding LAMP: don't think of a pod as LAMP, think of one pod as A, one pod as M, and one pod as P, all running on L. Any of those might have need for helper apps (think symbiosis) or might not. It's the symbiotic helper apps for which pods are intended. Not for apache - mysql comms. |
@thockin @jbeda Thanks for clarifying this, I think it helps a lot more to understand the concept. IMHO, it would be useful to start explaining in the proposal that Pods are resource-oriented units, and not for Apache - mysql communication. I would like to understand what is overlapping with what in different proposals and what are the advantages/downsides. For example, could we think of this proposal as being an important ingredient for separation of concerns? As in, resource management on the one-host level, should not be done by something like #8859 (equivalent of Replication Pod Set?) but by this notion of pods. However, #8859 could be the horizontal scaling ingredient, and #8637 would be one level up, at the application level ? Maybe this should be discussed on IRC; I just find it hard to analyze one proposal without seeing the bigger picture. |
Right now the Docker Clustering (#8859) proposal is missing both a preferred method of replication and any higher level organizational primitives/tools (labels). Both of these are going to be necessary. #8637 is really about object namespacing, I think. It is being conflated with application config. (Namespace here is at the cluster/API level, not at the linux kernel here -- not enough words for the concepts we have). See https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md for how we are approaching that in Kubernetes. |
@thockin Virtually all production services at Google use our internal legacy equivalent of pods. The most common multi-container use cases include pushing log data (not unlike fluentd or logstash) and content/config management / data loading. However, pods are useful even without multiple containers. Why? Data and resource durability. Pods make it possible to keep data resident (e.g., in volumes) and to hold onto resources across container restarts and even across full container replacements, such as for image updates. @ibuildthecloud Yes, features like shared namespaces and shared cgroups would make it easier for us to implement pods outside Docker, but would make our wrapper even fatter. |
@ibuildthecloud I don't understand your concern about subgroups + systemd. systemd has the concept of a slice hierarchy: http://www.freedesktop.org/software/systemd/man/systemd.slice.html Overall, +1 on this proposal overall. It gives users an abstraction to avoid building complex multi-component containers. |
@philips I was pointing out that if we were to do what I proposed, I'd like to reiterate that I don't think It seems to me that the root thing that Google wants to address with pods is really about being able to effectively manage resources between containers. The shared volumes and netns are already supported. By combining shared volumes + netns + resource containers = pods doesn't make sense to me. It's a very specific configuration. I think you would have more success with getting a proposal through if you introduced a concept that was purely oriented towards managing resources between containers. For example libvirt has resource partitions that I think addresses what you would need. So if Docker had resource partitions natively, then a Kubernetes pod == shared volumes + netns + resource partition. |
+1 for this proposal overall @ibuildthecloud iirc, sharing resources is just one aspect of this, there's also a need to atomically schedule a set of containers that make up a pod. The system has to make sure that either all containers that make up a pod can be scheduled on the host or none at all, which is currently hard to do with container as the only primitive. |
I agree that it’s worth talking about how this will fit with #8637. Both serve use cases which I believe are largely orthogonal (indeed, I suspect they appeal to largely disjunct sets of users), so my hope is that they can coexist, as long as we design both of them right. The purposes of pods have been amply outlined. (I don’t have strong feelings about their usefulness, because I haven’t encountered the problems they solve.) I see the primary purposes of groups being:
Unlike pods, groups are intended to:
This makes them - as this proposal has pointed out - much less about resource sharing and much more about logical grouping. For this reason, I hope they can coexist just fine - intuitively, it seems as though a group should be able to contain containers, pods or a mixture of both. The major potential pain points I see are:
|
@crosbymichael, @vieux @timothysc Are there plans to support a pod like concept in Swarm? If so would it make sense to push ahead on this proposal? |
@kelseyhightower I would hope so, but alas I believe this issue has stalled. |
@kelseyhightower I think we need to make a decision on what level abstraction should go into docker core for this. Are pods an api only concept? Does it make sense to create a pod from the CLI (maybe Probably the best way to move forward with this is to introduce a docs-only style proposal that details out what exactly this API looks like, how users might use it, etc, then something can either be approved or suggest improvements on. |
Also with a docs-only proposal, if the writer is not willing/doesn't have the time to implement, it becomes a LOT easier for someone else to come along and build it. |
The problem is that writing such a doc is a non-trivial time commitment
If some of the movers and shakers said "we get it and we think it is a good Here's how I see the rationale for it. Docker is a piece of a larger For lack of this atom in docker, people either make monolith containers I really see it as a matter of convergent evolution - multiple systems have I'd be happy to help work through how the CLI and API might look if only I On Wed, Apr 15, 2015 at 8:04 AM, Brian Goff notifications@github.com
|
@cpuguy83 as the original author of this Issue, @thockin is precisely correct. The reason that this exists as a issue, rather than a design doc, or working implementation is because the implementation is significant, and non-trivial, and I didn't/don't want to force things if they aren't wanted, or presume any particular design if the developers of Docker wanted to go in a different direction. If the decision makers at Docker Perhaps @shykes @crosbymichael et al. can confirm that people are interested in seeing Pods as part of the Docker node API? Thanks (and thanks to everyone for their continued interest in moving this proposal forward) |
+1 for this as i have been forced to use docker like a vm with mmonit inside for such a long time, its really disgusting as i have to write down 'monit start xxx' instead of the real start command in CMD part. The 'real' apps in real world are always bunch of processes. 12 factor apps? Don't forget it's an ad from a traditional paas provider ... |
I'm relatively new to Docker and Swarm, but it seems to me that Swarm should rely more on Discovery Services for linking services together. Perhaps I don't fully understand the boundary between Docker and Kubernetes, though. Also, I'm curious how the concept as described here so far would work with things like ZooKeeper, Mongo and other "cluster-able" services that rely on small config changes (e.g. BrokerID), but are otherwise configured identically. Suppose I want to run Kafka and HBase in a cluster, both of which rely on Zookeeper. Which cluster pod would specify Zookeeper? |
To me, the best approach would be to proceed to try to accomplish Pods under the banner of "nested containers." I can put together a design for nested containers as the comment has already been made that they are not clearly defined. Nested containers will actually solve some more advanced use cases I have. |
@ibuildthecloud do you think this should be looked at in a wider perspective? (I saw your comment in #11535 (comment)); i.e. discussing what the use-cases are, how (and if) they're technically possible and a global design? Looking at this (and other) discussions I think there are some interested parties. (just speaking on a personal note here) |
@dcieslak19973 Actually all the "cluster-able" apps will remain how you deal them with Docker today, nothing changed. I think POD does not take care what have already distributed, it cares what is not distributed/isolated enough. So you can deploy zkp in one POD and let HBase components in other PODs consume it. ---------I'm a Split Line-------------------
So, if I use one Docker container to replace this VM, I have to run multiple processes inside this HBase Slave container, I have to use But if I have POD, I can use three Docker containers instead, and they share the same volume, network etc but still isolated.
My cloud_agent's memory leak will never kill the whole node, and I can also update 3 containers one by one. And yes, POD works like nested containers to some extend. |
I’m investigating patterns around building applications with containers. And from my point of view it seems that a concept like pods is getting more traction. This discussion paused in april with the question if there is support from Docker decision makers in putting more effort in this proposal. This support is called for before putting more effort (e.g. docs, implementation) into it. Did I miss something? What can we do to move this proposal forward? |
@luebken thanks for your continued interest. We continue to be interested in trying to move this forward, either in the context of Docker or the context of the OCI, however we have never had any feedback that there was buy-in from the Docker maintainers. The issues raised by an end user in docker/compose#1899 clearly show the need for Pods as an API primitive. @shykes @crosbymichael any further thoughts? Thanks! |
Let me add that we see quite some interest in this topic with advanced users. After building a first version of their applications a typical first refactoring we see is “extracting platform concerns into a sidecar”. The last example was sensu for monitoring and fluentd for logging. Although this can be achieved with opening up the individual namespaces we feel that the notion of a pod (or nested containers or container groups) is a simple to understand concept. This simplicity is crucial to the adoption of these concepts and it will improve the overall architecture of many applications. |
A year on, I suspect users are better off just including Kubernetes in On Wed, Sep 2, 2015 at 7:38 PM, Matthias Lübken notifications@github.com
|
@shykes Any news on this? I'd really wish we could avoid further fragmentation by providing this low level concept across all deployment mechanisms. |
A unit like
If cc: @shykes |
We're using separate files for that at the moment, but couldn't we use that in one file (for
|
@schmunk42 If all the compose files are in the same folder, then it only creates one network. So, won't that be same as putting all the services in one usual |
A pod is not just a way to describe them together, but a way to ensure that THROUGH THE ENTIRE SYSTEM they stay together and share fate and lifecycle and resources. At this point, I don't think Docker should implement pods. We have most of the primitives we need to implement pods without all the complexity of a probably different interpretation. This idea has to be implemented from top to bottom or it's not worth doing. |
That's configurable, I think. I have The main reason for splitting up the stacks in our case is that we usually don't want or need to redeploy databases, caches, etc. Restarting an application with a new image is usually no big deal, but kill and removing your database container might take a while and prolonges downtime. |
+1 for this feature,we have some services need to be binded one by one in pairs. service a bind to local host, service b connect to a by localhost. If one container of service a exists there must be a b connect to it. |
@CrazyPandar With CRI present, this requirement is not solid any more. |
I now I'm late to the party, but maybe have a look if this is something that could help people out here in lieu of native pod support: https://github.com/rycus86/podlike Also an intro: https://blog.viktoradam.net/2018/05/14/podlike/ |
I think the interesting part here is not having another abstraction layer somewhere between services, tasks and containers. What makes pods interesting is use cases they needed for: sidecar, ambassador, adapter. These patterns are based on being able to share resources (shared loopback network interface, shared filesystem) while keeping things in separate images. Putting container co-location and synchronized replication aside this can be implemented using some kind of magical network driver called
So what use cases does it enable? Apart of stronger guarantee that services don't talk directly to each other (which can as well be enforced using code review)? Is this pod feature really worth the easiness of using 127.0.0.1 instead of myservice1-backend, myservice2-backend? You still need to type that 127.0.0.1 somewhere, right? And it doesn't look like you're getting much of re-usability with this pattern. Yes, you probably don't have to use template substitution for service hostnames in your configs, but that's it? What would really be cool is use case like "hey, I want all my services talking to HTTP protocol to use this |
One issue that loopback fixes is that developers don't need to fix theirs application for a situation where ip address suddenly change. Instead of resolving DN each time devs rely on the fact that it rarely changes so they just never bother resolving it again before trying to reconnect. For example, this happened with elasticsearch: elastic/elasticsearch#16412 (comment) |
Pods
This is a proposal to change the first order container object within the Docker API from a single container to a pod of containers.
A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.
This is somewhat related to #8637 but that proposal has much more to do with the namespacing of containers into a single namespace, than grouping containers into logical hosts for the purposes of scheduling, resource tracking, isolation and sharing.
In this proposal there are two sub-proposals:
Since these topics are somewhat orthogonal, I will address them each in separate sections.
Pods as an API object
Inherently, a Pod represents an atomic unit of an application. It is the smallest piece of the application that it makes sense to consider as a unit. Primarily this atomicity is in terms of running the Pod. A pod may be made up of many containers, but the state of those containers must be treated as atomic. Either they are all running under the same Docker daemon, or they are not. It does not make sense to have a partially running pod. Nor does it make sense to run different containers from a single pod in different Docker daemons.
There are numerous examples of such mult-container atomic units, for example:
In all of these cases, the important characteristic is that the containers involved are symbiotic, it doesn't make sense to place the containers in these example pods onto different hosts.
What does it mean to be a Pod?
Pods share a network namespace (and consequently an IP address). Members of a pod can address eachother via localhost. Pods also share a set of data volumes, which the pods can use to share data between different containers. Importantly, pods do not share a chroot, so data volumes are the only way to share storage. Pods also share a resource hierarchy, though the individual containers within a pod may also have their own specific resource limits, these resource limits are subdivisions of the resources allocated to the entire pod.
Why pods instead of scheduled co-location?
Co-location via a scheduling system achieves some of the goals of a pod, but it has signficant downsides, including the fact that the containers don't share a network namespace (and thus have to rely on additional discovery mechanisms). Additionally, they don't share a cgroup, so you can not express a parasitic container that steals resources when feasible from a co-container in it's pod, instead that parasitic container steals from all containers on the host. Additionally, the fact that the linkages between the containers is expressed as scheduling constraints, rather than an explicit grouping of the containers makes the it harder to reason about the container group and also makes the scheduler more complicated.
Pods as the only way to run containers
It would be a somewhat significant revision to the Docker API to transform the current singleton containers into Pods of containers, but it is a worthwhile endeavor, because it will retain the simplicity of the API.
Put concretely, there is no reason to introduce two different API objects (singleton containers and pods), when a Pod of a single container can effectively represent the singleton case. Sticking to a single API object will limit complexity both in the code and in the documentation, and will also give user's a seamless evolution from single container Pods to more sophisticated multi-container pods.
Implementation and further details
Pods are a foundational part of the Kubernetes API. The Kubernetes API spec for a Pod can be found in the v1 API
A fully functional implementation of pods in terms of Docker containers can be found inside of kubelet.go.
The text was updated successfully, but these errors were encountered: