-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Container groups #8637
Comments
+1 |
Much prefer this to #7576. Strikes a nice balance between prescriptive and extensible, imho. 👍 |
👍 |
Very cool to see fig-like behaviour coming to Docker. 😄 I am sad to see links go away, though. I like how links provide an explicit dependency graph on a distributed application. This is, I think, what fig users think they are describing when they write a fig file - therefore IMO thinking through how a group will be able to span multiple hosts is very important. The problem with the proposal for composability of groups (hypothetical clustering tool exposing the groups API externally and internally telling Docker about the existence of "remote" containers just by specifying other IPs), is that each cluster node might only have one (internal/external) IP address, and so "refer to service by IP address" may not permit the level of multi-tenancy we'd like to be able to in a cluster of containers (each host can be running many containers, balanced according to some sensible scheduling policy). I would love it if we could have links and all the lovely semantic dependency information they provide first-class in the Docker API, and provide hooks for a clustering tool to provide an (IP, port) pair which it might dynamically allocate for such a link, perhaps using the classic Docker links environment variable syntax. This would allow us to extend nicely to multi-host links with an interface which remains backward compatible for folks who are already developing Docker{files,images} against these expected env vars. |
@lukemarsden isn't specifying the container's in a group an explicit dependency graph on a distributed application. If you don't want them communicating with each other then why are you starting them in a group and interacting with them as a single unit? Atleast that is my take on it. If I wanted them to be separate then I wouldn't have put them together ;) |
+1 |
How will cgroups be setup for containers in a group? Ideally they should On Fri, Oct 17, 2014 at 9:10 PM, Shreyas Karnik notifications@github.com
|
@crosbymichael Aanand has not explicitly stated so, but one can infer that a group is intended to represent the entirety of a multi-tiered microservice-based application:
I too would lament the loss of links as the Docker standard for composing applications using multiple containers. @aanand what is the motivation for "Inter-container communication using these hostnames is preferred over explicit links"? I think this merits some discussion (or perhaps this has already taken place and I have missed it!) |
Really exited to see this, initial impression is good! I'm wondering if the "group" namespace is just for organisation, or if it also acts as a sandbox? I.e., would I be able to create @crosbymichael re: links;
Where |
Unless I'm mistaken, env vars are pretty much going away in links v2. This container grouping is really akin to Kubernetes pods (which share a net namespace). |
@cpuguy83 can you reference the Docker links v2 discussion please? Also "If you need them to be split up" - please can you define "split up"? Do you mean "spread over multiple hosts" ? |
I mean "split up" as in separate entities, whatever that may mean (same host or multi-host). |
Thanks for the "links" (hur) 😄 |
In light of links v2 I can see how this is functionality identical, so +1 from me. Losing the implied dependency graph (that Fig makes use of today) would be sad though. |
I find the proposed networking model confusing. In Kubernetes, the containers within a pod actually share a network namespace and therefore share the same IP address and port space. We've proposed that they share resources (@vishh's question) and other namespaces, also (kubernetes/kubernetes#1615). Copying text from https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md: A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host. Pods facilitate data sharing and communication among their constituents. The containers in the pod all use the same network namespace/IP and port space, and can find and communicate with each other using localhost. Each pod has an IP address in a flat shared networking namespace that has full communication with other physical computers and containers across the network. In addition to defining the containers that run in the pod, the pod specifies a set of shared storage volumes. Volumes enable data to survive container restarts and to be shared among the containers within the pod. In the future, pods will share IPC namespaces, CPU, and memory (http://www.linuxplumbersconf.org/2013/ocw//system/presentations/1239/original/lmctfy%20(1).pdf). Pods also simplify application deployment and management by providing a higher-level abstraction than the raw, low-level container interface. Pods serve as units of deployment and horizontal scaling/replication. Co-location, fate sharing, coordinated replication, resource sharing, and dependency management are handled automatically. Pods can be used to host vertically integrated application stacks, but their primary motivation is to support co-located, co-managed helper programs, such as:
Individual pods are not intended to run multiple instances of the same application, in general. |
Something else we need in order to be able to use this is hooks: #6982. Otherwise if we want to do anything during container lifecycle (say, count the number of container restarts), we either need to add the support directly to Docker, which is sometimes desirable but sometimes not, or abandon the mechanism. For this reason, we can't currently use Docker's container restarts, for example. |
@bgrant0607 Are events not good enough for counting container restarts? While hooks (or plugins) are indeed important, I don't see how they are relevant to this particular discussion. |
@cpuguy83 Ok, counting restarts was not a good example. Hooks are relevant because they are a prerequisite to make this feature usable by anyone for whom the provided behavior does not fully cover and/or match what they need to do. |
Let's talk about what missing functionality you are referring to. |
@cpuguy83 Would be happy to go into more detail, here or in a hangout or something. In Kubernetes, containers within pods are never spread across hosts. That's the whole point of that abstraction. For managing sets of pods across multiple hosts, we use labels. My OSCON presentation explains the rationale: |
I liked the proposal until the last section, where I feel it jumped the Providing a simple grouping mechanism, akin to what Kubernetes calls pods, Trying to make a cluster look like one big docker node is cute, but not Also, I +1 Vish's comment strongly. Once you create grouping you have to On Fri, Oct 17, 2014 at 1:21 PM, Rob Haswell notifications@github.com
|
I have a much longer comment I'd like to make, but first I'll get some random small things out of the way. @crosbymichael Links form a directed graph between containers. Groups in this PR form a set of containers, or essentially a bidirectional full mesh. The key things missing here is that groups do not describe directional explicit relationships between containers. Imagine a simple web, app, db application. They would be in the same group but you no longer know that web talks to app, app to db. So there are definitely some capabilities of describing an application lost here. Especially when you consider security. In the links case you could essentially prevent (assuming some other tool you built) outbound connections from DB because due to the links you know no outbound traffic will be initiated. @vishh @bgrant0607 Groups seem to be an entirely different concept than pods so it doesn't seem like were comparing apples to apples. This does point out that maybe "Groups" is a wrong term for this API. All groups means at the moment is a shared DNS namespace and common start/stop lifecycle. More comments to come.... :) |
I am surprised to hear that groups are supposed to be different than pods -
|
I agree with @thockin. I don't understand the point of the proposal, then. This model won't scale to non-trivial deployment and scaling scenarios. |
@aanand what would be the relationship between groups and volumes? |
@aanand would this deprecate container-to-container links, and the naming/hierarchy system that comes with it? If so, how would we deal with reverse compatibility? |
Where do we draw the line? As an architect I like lines (circles and squares too). Last DockerCon Docker was given the moniker Docker Engine in an attempt to distinguish the core Docker from all the other services and functionality that we will begin to layer on top of it. This proposal is one of the first pieces of functionality that I see that is logically layered above the core functionality of the Docker Engine. Previously with fig and Docker the separation was clear and obvious. But, as we want to increase the convenience, utility, and usability of Docker it makes sense to pull fig “into the fold” and make it a first class citizen. This way users can get fig functionality but still with a single binary download, and single CLI interface. This is a tricky maneuver. With fig, lets say I had some magical Docker orchestration tool (let’s just completely artribtrarily pick Stampede.io as an example 😄). If I was to perfectly implement the Docker Remote API with Stampede.io, then fig would just magically work with Stampede.io. Fig, or any tool that uses the remote API, would work because the remote API is the interface and contract to Docker. With this proposal, the Docker Remote API is expanded to add this functionality. While some of the functionality is indeed client side, the remote API is extended to add the Groups API. Now lets say the creator of Stampede.io (who I happen to know well), doesn’t really want to implement Groups functionality because he’s lazy, or wants to do different things. (Let’s not fool ourselves, group start/stop/update will become very complex if one wants to do real production worthy deployment methodologies.) The problem is that if he doesn’t implement the API, then it just won’t work. Now looking back at the fig approach, it will work because it’s physically a layer outside of Docker. So where do we draw the line. Where does the Docker Engine end and where does higher level functionality start. What I would propose is that the goal is to not expand the Docker Remote API. Keep it as small as possible with only the bare minimum primitives to allow higher level functionality to be implemented. Then higher level functionality, such as Groups should be developed as a logically separate component that only relies on the Remote API to perform its functionality. If we still want a single binary download, that is fine. That is a packaging issue, I’m purely talking about architecture and design. Logically something like Groups should be implemented as a bolt on that requires only the Remote API. This way people building magical Docker platforms are only required to support the Docker Remote API and then all other higher level functionality can be layered on top. You can imagine we are going to see more and more higher level functionality: scheduling, application templating/management, policy management, service discovery, etc. If all of these things creep into the core Remote API we lose composability. We want to encourage a pluggable ecosystem and not massive vertical implementations. I still have some real technical comments on the implementation, but I want to hold off until we can talk about this higher level approach stuff first. I feel this proposal is going to essentially set the precedent on how we approach bringing higher level functionality into Docker and I’d like to get it right (or as close to it) now. @shykes thoughts? |
The first thing I think we should try to get agreement on here is: "what are groups for?" If groups are meant to be "fig in Docker" then I think we should observe and respect the practice of fig users (much like philosophers of maths observe and reason about the practice of mathematicians). As far as I see it, fig is all about describing a distributed application as a set of connected containers. Even looking at the simplest possible example of fig on its homepage, it seems to be about describing an app (the counter app) in terms of its flask server and redis database. That's the whole app, not just one service within it. I've heard from fig users when talking to them that they love being able to package their entire distributed application within a single fig file. As the author of fig I think you probably have a lot more visibility into fig users' current practice. Is what I've described so far consistent with your analysis? If this is accurate, then, it seems really very different to the kubernetes pod concept, which seems primarily motivated to allow encapsulation of a component container along with its adjunct services, e.g. a web server with logging (as per @bgrant0607's quote). My view is that the multi-layered approach in Kubernetes is certainly valid (and is a sophisticated way of describing apps), but we should either:
Given that Docker is currently single-host, the second option would seem like a simpler, more incremental step. In which case, I would support this proposal with the removal of the statement that "groups are supposed to be single host entities", and I suggest that we figure the multi-host story out later. If we can agree on this, the next step would be a discussion of links (and perhaps volumes). |
OK, I’ve slept on it and here are my conclusions:
As such, I propose to advance the current grouping implementation by dropping the shared hosts file and adopting a Fig-like While thinking about multi-host, I also realised that creating/updating a group atomically (i.e. with a single JSON POST) is great in a multi-host scenario, because it lets the cluster make decisions on how exactly to do that. Docker’s single-host implementation can work differently than a multi-host one in terms of the actual steps taken (e.g. a fancy clustering system might choose to start new containers, update a load balancer, then let the old ones die etc). So we should definitely stick with that. |
+1 😄 |
Great. So I guess the networking and possibly general multi-host semantics part of this discussion is closed until you complete a new proposal? Seperately.. what is the reasoning behind:
Feels like a hack. |
Having groups as "folders" within the container namespace seems natural enough to me. |
@robhaswell Multi-host semantics is likely going to need a lot more thinking about; I don't want to give the impression that discussion of that is "closed" in any way. If you have thoughts or concerns, please voice them! Groups and containers sharing the same namespace is the opposite of a hack in my opinion: the way Fig separates things right now (by prefixing everything with a name) is much more hacky and less performant (it really breaks down when a docker server is being used for lots of things at once). (This makes me realise I omitted that it'll be possible to filter |
The current implementation does nothing special with volumes where the host path is explicitly specified. For unspecified host paths, we mount them at a predictable path:
This is nice because it means that when we destroy and recreate containers, we don't need to bother with
I've been convinced that (semi-)deprecating links is a bad idea; this leaves the question of the naming/hierarchy system. It's true that |
I think:
is incompatible with:
It feels like this proposal is trying to blend two notion of grouping:
I think it would be useful to highlight this more explicitly in the proposal description, and either explain how the implementation will allow to organize your groups seamlessly between these two notions or recognize this should be exposed as two different primitives. |
It feels wrong having them share the same namespace, because you cannot perform the same operations on them. Also as the container namespace is externally controlled (by your registry; usually Docker) it opens up the possibility for unexpected namespace collisions - say there is a new public contributor with the same name as your group. Now you have to rename your group if you want to use their images. I like the proposal, just not shared namespaces. |
I think everyone is saying "namespace" and meaning something different. On Wed, Oct 22, 2014 at 3:18 PM, Rob Haswell notifications@github.com
|
I feel going back to The basics of how Docker orchestration across many servers will work going down the line is really yet to be determine. It's evident in this discussion that while Kubernetes may be the predominate Docker orchestration platform at the moment, the approach of wrapping Docker may not be the way we want to go ultimately. Because of the apparent uncertainty of orchestration in Docker, I would defer introducing any orchestration related features until we have a clearer direction. If Docker was to allow arbitrary metadata to be attached to a container (which I believe has been asked for many times), fig like functionality could be implemented completely in the Docker client and not require much change to the Docker Remote API. |
👍 |
I find this proposal very interesting and have a few points of feedback. At Discourse we are using runit for which I know you are actively trying to provide a workable solution for. So here are some ideas in no particular order: socketsIf a group is on a single machine it is key to have access to sockets, a prime example is hosting of a Rails app. The application itself runs in a process called Thinking of a more general solution here groups should be able to define a shared portions of the filesystem either volume like on the host or internal between members of the groups. Examples: members A, B may elect to share /shared/bla ... which is only visible between the 2 yamlIn the year I have worked on Discourse Docker I have grown to detest yaml, even though it is great in theory end-users just keep mucking it up and it becomes impossible to debug. I know this is "optional" but my vote would be to ship with nothing yaml in core docker (including CLI) and just shipping with:
If people want yaml they can always write the extra tooling to generate the feature full json. dependenciesStart/stop process often has various dependencies, for example you must first start postgres and only after it started start the web, same goes for group shut down, this logic needs to be written loggingwe would need something like upgradesGroups open up the window to allowing for seamless upgrades without needing to front stuff with haproxy. For example say I have a web and data container and elect to upgrade the web container it would be super nice if it could start the new web unamed, ensure it is all started up and then switch it in to where the old web was. distributionIs the registry going to be group aware? Can we use it to distribute groups? PS. I really wish we could have this kind of discussion on https://forums.docker.com/ it is far better suited at dealing with big discussions like this :) |
As for YAML, it's really helpful that it's the standard for configuration input in cloud-init. That means users may supply Docker orchestration information as user-data to VMs launched on EC2, GCE, or OpenStack clouds. The user-data document could indicate that a VM should have (or install) Docker and should spawn and configure a group of containers as specified. That's powerful for anyone using Docker on a VM-based cloud infrastructure (which I imagine is a pretty large number of users, I hear EC2 is still popular...) I wouldn't say to make the format YAML for just this singular reason, but it's certainly a very nice property in the short-term, for as long as cloud-init is a standard. |
I don't see it referenced here so I thought I'd drop a pointer: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md In Kubernetes we've separated out some things related to this proposal:
Typically, the app config system would work within a namespace, but it doesn't define the namespace. |
Update: see here for the current state of composition (which builds on top of grouping): https://gist.github.com/aanand/9e7ac7185ffd64c1a91a |
I like this a lot, but I still have one big concern. It sounds like the client portion of this is going to be added to docker (core). Right now if I want to run pre-release features of fig (or even features that maybe aren't considered general enough to be added to fig upstream, ex: docker/compose/issues/318), it's very easy for me to run a fork. I don't have to run a custom dockerd or rebuild the docker package at all, I can just run a custom client (fig). If this support is added directly to the core docker client, I now have to rebuild everything to get client-only customizations. I really wish/hope the client portions could be released as a separate package (docker-groups?) so that running a custom client is much less involved. An official docker release (debian/rpm packaging) could still bring together these repos and include all the clients, but it would make it easier for people to run their own fork of individual clients. |
@aanand re: group.yml config format and My use-case for setting this explicitly has always been to keep it consistent between environments. Having to set an environment variable is not ideal for this case, so I really like being able to set it in the Could it be made optional, and still support an override from environment variable? I believe that should handle all use cases (consistent between environments, and changing for each environment). |
I've been wondering about having a separate override file that could be unversioned. group.yml: name: foo
containers:
... .override.yml: name: bar This could also be used to give users the default option of not specifying the name in a versioned file: if However, this is more of a composition concern than a grouping concern. (Proposal for that should land later this week.) |
-1 and it seems to make docker client more and more complicated. Fig or Kubernetes do this well and keep working on them. |
Oops, forgot to close this as it's been deprecated in favour of #9694 |
NOTE: this is out of date - please see #9694
This is a proposal to add the concept of a group to the Docker engine, which is a logical group of potentially interdependent containers, performing related work, potentially on multiple hosts. It replaces issue #7576.
The primary motivations are:
<project name>_<service name>_<numeric suffix>
), which is brittle and slow.Update: the following description and screencast are a little out-of-date - see the ensuing discussion. Furthermore, here's a preview build of the current state of groups and composition: https://gist.github.com/aanand/9e7ac7185ffd64c1a91a
Some progress has been made on implementing this proposal in the figgy branch of @crosbymichael’s fork. I’ve recorded a short screencast to demonstrate how the current implementation fulfils feature 1 above, and could be trivially extended to fulfil feature 2:
Basic functionality
A group has a unique name which occupies the same namespace as container names.
A container can optionally belong to a single group, in which case its name begins with
<group name>/
. For example, a group namedmyapp
may have the containersmyapp/web
andmyapp/db
.API/CLI additions
Existing API endpoints and CLI commands for starting, stopping, killing and removing containers work exactly the same on containers inside groups as on those outside.
There are new API endpoints for creating, listing and removing groups, and a new CLI command,
docker groups
, with subcommands for the same:The API endpoint for creating a container has a new parameter,
groupName
, which causes the container to be created within the named group (which must already exist). Thedocker run
command has a new argument,--group NAME
:If a name isn’t supplied when creating a group, one is generated in a similar manner to container names (
<adjective>_<surname>
).There are new API endpoints and CLI commands for starting, stopping and killing all containers in a group.
Communication within groups and between hosts [OUT OF DATE - see discussion]
Containers within a group share a network namespace and
/etc/hosts
file. The hosts file has an entry for each container with its name (sans group prefix) as the hostname:Inter-container communication using these hostnames is preferred over explicit links.
There is an API endpoint for creating/updating a group from a blob of JSON which contains both a name for the group and the names and configuration for zero or more containers within the group. If a group with the specified name already exists, containers with matching names within it are killed and removed.
When creating/updating a group with this endpoint, instead of supplying configuration for a particular name, an IP address can be supplied. For example:
This results in no redis container being created, and the given IP address being written into the shared hosts file:
Orchestration primitive [OUT OF DATE - see discussion]
A hypothetical clustering/orchestration layer could both use and expose the Docker groups API, but make use of replacement IPs to make the cluster appear like a single Docker host from the outside. Here's a supremely trivial example:
When a user posts JSON containing
web
anddb
entries to the cluster:web
, but the IP address of the second host forredis
.redis
, but the IP address of the first host forweb
.Both containers are now running on separate hosts, but are still communicating as if they were in a single group on a single host. When the cluster is queried, it unifies the contents of each host behind the scenes to present a single collection of containers/groups, just as if it were a single host.
The text was updated successfully, but these errors were encountered: