New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Container groups #8637

Closed
aanand opened this Issue Oct 17, 2014 · 68 comments

Comments

Projects
None yet
@aanand
Contributor

aanand commented Oct 17, 2014

NOTE: this is out of date - please see #9694

This is a proposal to add the concept of a group to the Docker engine, which is a logical group of potentially interdependent containers, performing related work, potentially on multiple hosts. It replaces issue #7576.

The primary motivations are:

  1. Provide a platform on which to build a Fig-like stack composition feature without resorting to hacks such as Fig's container naming scheme (<project name>_<service name>_<numeric suffix>), which is brittle and slow.
  2. Scope container names so that multiple people can work on multiple apps without treading on one another’s toes
  3. Provide a way to create/update multiple, associated containers in a single call - in a clustered future, this enables the cluster to make scheduling decisions about where to place those containers based on the fact that they’re a logical group.

Update: the following description and screencast are a little out-of-date - see the ensuing discussion. Furthermore, here's a preview build of the current state of groups and composition: https://gist.github.com/aanand/9e7ac7185ffd64c1a91a

Some progress has been made on implementing this proposal in the figgy branch of @crosbymichael’s fork. I’ve recorded a short screencast to demonstrate how the current implementation fulfils feature 1 above, and could be trivially extended to fulfil feature 2:

Screenshot

Basic functionality

A group has a unique name which occupies the same namespace as container names.

A container can optionally belong to a single group, in which case its name begins with <group name>/. For example, a group named myapp may have the containers myapp/web and myapp/db.

API/CLI additions

Existing API endpoints and CLI commands for starting, stopping, killing and removing containers work exactly the same on containers inside groups as on those outside.

There are new API endpoints for creating, listing and removing groups, and a new CLI command, docker groups, with subcommands for the same:

$ docker groups list
NAME     CONTAINERS

$ docker groups create foo

$ docker groups list
NAME     CONTAINERS
foo      0

$ docker groups rm foo
foo

The API endpoint for creating a container has a new parameter, groupName, which causes the container to be created within the named group (which must already exist). The docker run command has a new argument, --group NAME:

$ docker run --group=foo --name=sleep ubuntu sleep infinity

$ docker groups list
NAME     CONTAINERS
foo      1

$ docker ps
NAME        COMMAND          ...
foo/sleep   sleep infinity   ...

If a name isn’t supplied when creating a group, one is generated in a similar manner to container names (<adjective>_<surname>).

There are new API endpoints and CLI commands for starting, stopping and killing all containers in a group.

$ docker groups stop foo
foo/sleep

$ docker groups start foo
foo/sleep

$ docker groups kill foo
foo/sleep

Communication within groups and between hosts [OUT OF DATE - see discussion]

Containers within a group share a network namespace and /etc/hosts file. The hosts file has an entry for each container with its name (sans group prefix) as the hostname:

10.0.0.1    web
10.0.0.2    redis

Inter-container communication using these hostnames is preferred over explicit links.

There is an API endpoint for creating/updating a group from a blob of JSON which contains both a name for the group and the names and configuration for zero or more containers within the group. If a group with the specified name already exists, containers with matching names within it are killed and removed.

When creating/updating a group with this endpoint, instead of supplying configuration for a particular name, an IP address can be supplied. For example:

{
  "Name": "foo",
  "Containers": [
    {
      "Name": "web",
      "Image": "my-web-image",
      <...more configuration...>
    },
    {
      "Name": "redis",
      "IP": "123.456.78.9"
    }
  ]
}

This results in no redis container being created, and the given IP address being written into the shared hosts file:

10.0.0.1        web
123.456.78.9    redis

Orchestration primitive [OUT OF DATE - see discussion]

A hypothetical clustering/orchestration layer could both use and expose the Docker groups API, but make use of replacement IPs to make the cluster appear like a single Docker host from the outside. Here's a supremely trivial example:

When a user posts JSON containing web and db entries to the cluster:

  1. Create a group on two separate hosts.
  2. On one host, create a group with the supplied container config for web, but the IP address of the second host for redis.
  3. On the second host, create a group with the supplied container config for redis, but the IP address of the first host for web.

Both containers are now running on separate hosts, but are still communicating as if they were in a single group on a single host. When the cluster is queried, it unifies the contents of each host behind the scenes to present a single collection of containers/groups, just as if it were a single host.

@progrium

This comment has been minimized.

Show comment
Hide comment
@progrium

progrium Oct 17, 2014

Contributor

+1

Contributor

progrium commented Oct 17, 2014

+1

@gabrtv

This comment has been minimized.

Show comment
Hide comment
@gabrtv

gabrtv Oct 17, 2014

Much prefer this to #7576. Strikes a nice balance between prescriptive and extensible, imho. 👍

gabrtv commented Oct 17, 2014

Much prefer this to #7576. Strikes a nice balance between prescriptive and extensible, imho. 👍

@Xe

This comment has been minimized.

Show comment
Hide comment
@Xe

Xe commented Oct 17, 2014

👍

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Oct 17, 2014

Contributor

Very cool to see fig-like behaviour coming to Docker. 😄

I am sad to see links go away, though. I like how links provide an explicit dependency graph on a distributed application. This is, I think, what fig users think they are describing when they write a fig file - therefore IMO thinking through how a group will be able to span multiple hosts is very important.

The problem with the proposal for composability of groups (hypothetical clustering tool exposing the groups API externally and internally telling Docker about the existence of "remote" containers just by specifying other IPs), is that each cluster node might only have one (internal/external) IP address, and so "refer to service by IP address" may not permit the level of multi-tenancy we'd like to be able to in a cluster of containers (each host can be running many containers, balanced according to some sensible scheduling policy).

I would love it if we could have links and all the lovely semantic dependency information they provide first-class in the Docker API, and provide hooks for a clustering tool to provide an (IP, port) pair which it might dynamically allocate for such a link, perhaps using the classic Docker links environment variable syntax. This would allow us to extend nicely to multi-host links with an interface which remains backward compatible for folks who are already developing Docker{files,images} against these expected env vars.

Contributor

lukemarsden commented Oct 17, 2014

Very cool to see fig-like behaviour coming to Docker. 😄

I am sad to see links go away, though. I like how links provide an explicit dependency graph on a distributed application. This is, I think, what fig users think they are describing when they write a fig file - therefore IMO thinking through how a group will be able to span multiple hosts is very important.

The problem with the proposal for composability of groups (hypothetical clustering tool exposing the groups API externally and internally telling Docker about the existence of "remote" containers just by specifying other IPs), is that each cluster node might only have one (internal/external) IP address, and so "refer to service by IP address" may not permit the level of multi-tenancy we'd like to be able to in a cluster of containers (each host can be running many containers, balanced according to some sensible scheduling policy).

I would love it if we could have links and all the lovely semantic dependency information they provide first-class in the Docker API, and provide hooks for a clustering tool to provide an (IP, port) pair which it might dynamically allocate for such a link, perhaps using the classic Docker links environment variable syntax. This would allow us to extend nicely to multi-host links with an interface which remains backward compatible for folks who are already developing Docker{files,images} against these expected env vars.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Oct 17, 2014

Contributor

@lukemarsden isn't specifying the container's in a group an explicit dependency graph on a distributed application. If you don't want them communicating with each other then why are you starting them in a group and interacting with them as a single unit? Atleast that is my take on it. If I wanted them to be separate then I wouldn't have put them together ;)

Contributor

crosbymichael commented Oct 17, 2014

@lukemarsden isn't specifying the container's in a group an explicit dependency graph on a distributed application. If you don't want them communicating with each other then why are you starting them in a group and interacting with them as a single unit? Atleast that is my take on it. If I wanted them to be separate then I wouldn't have put them together ;)

@shreyu86

This comment has been minimized.

Show comment
Hide comment
@shreyu86

shreyu86 commented Oct 17, 2014

+1

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Oct 17, 2014

Contributor

How will cgroups be setup for containers in a group? Ideally they should
share a common resource pool.

On Fri, Oct 17, 2014 at 9:10 PM, Shreyas Karnik notifications@github.com
wrote:

+1


Reply to this email directly or view it on GitHub
#8637 (comment).

Contributor

vishh commented Oct 17, 2014

How will cgroups be setup for containers in a group? Ideally they should
share a common resource pool.

On Fri, Oct 17, 2014 at 9:10 PM, Shreyas Karnik notifications@github.com
wrote:

+1


Reply to this email directly or view it on GitHub
#8637 (comment).

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 17, 2014

@crosbymichael Aanand has not explicitly stated so, but one can infer that a group is intended to represent the entirety of a multi-tiered microservice-based application:

For example, a group named myapp may have the containers myapp/web and myapp/db.

I too would lament the loss of links as the Docker standard for composing applications using multiple containers. @aanand what is the motivation for "Inter-container communication using these hostnames is preferred over explicit links"? I think this merits some discussion (or perhaps this has already taken place and I have missed it!)

robhaswell commented Oct 17, 2014

@crosbymichael Aanand has not explicitly stated so, but one can infer that a group is intended to represent the entirety of a multi-tiered microservice-based application:

For example, a group named myapp may have the containers myapp/web and myapp/db.

I too would lament the loss of links as the Docker standard for composing applications using multiple containers. @aanand what is the motivation for "Inter-container communication using these hostnames is preferred over explicit links"? I think this merits some discussion (or perhaps this has already taken place and I have missed it!)

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 17, 2014

Member

Really exited to see this, initial impression is good!

I'm wondering if the "group" namespace is just for organisation, or if it also acts as a sandbox? I.e., would I be able to create tennant1/redis and tennant2/redis and restrict access to each namespace, even if tennant1 tries to communicate with tennant2/redis by IP-address?

@crosbymichael re: links;
I agree that using groups defines which containers belong together, but I agree with @lukemarsden that using links may have its place. Personally, I see links as the "patch cables" in a container stack. Although it's currently not possible to dynamically "patch" those cables, I would really like it if that was possible. A possible use case for this;

some-project/web
some-project/db-dev
some-project/db-production

Where db-production is a copy of the production database. Being able to "swap" the database during development (just for testing) and alias it as db, so that the web container doesn't have to be changed would be awesome.

Member

thaJeztah commented Oct 17, 2014

Really exited to see this, initial impression is good!

I'm wondering if the "group" namespace is just for organisation, or if it also acts as a sandbox? I.e., would I be able to create tennant1/redis and tennant2/redis and restrict access to each namespace, even if tennant1 tries to communicate with tennant2/redis by IP-address?

@crosbymichael re: links;
I agree that using groups defines which containers belong together, but I agree with @lukemarsden that using links may have its place. Personally, I see links as the "patch cables" in a container stack. Although it's currently not possible to dynamically "patch" those cables, I would really like it if that was possible. A possible use case for this;

some-project/web
some-project/db-dev
some-project/db-production

Where db-production is a copy of the production database. Being able to "swap" the database during development (just for testing) and alias it as db, so that the web container doesn't have to be changed would be awesome.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 17, 2014

Contributor

Unless I'm mistaken, env vars are pretty much going away in links v2.
Other than that, the only other way to do linking currently is from /etc/hosts.

This container grouping is really akin to Kubernetes pods (which share a net namespace).
I think for this particular use case is just right.
You have an app, a database, a web frontend, etc.
If you need them to be split up, then don't use grouping.

Contributor

cpuguy83 commented Oct 17, 2014

Unless I'm mistaken, env vars are pretty much going away in links v2.
Other than that, the only other way to do linking currently is from /etc/hosts.

This container grouping is really akin to Kubernetes pods (which share a net namespace).
I think for this particular use case is just right.
You have an app, a database, a web frontend, etc.
If you need them to be split up, then don't use grouping.

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 17, 2014

@cpuguy83 can you reference the Docker links v2 discussion please? Also "If you need them to be split up" - please can you define "split up"? Do you mean "spread over multiple hosts" ?

robhaswell commented Oct 17, 2014

@cpuguy83 can you reference the Docker links v2 discussion please? Also "If you need them to be split up" - please can you define "split up"? Do you mean "spread over multiple hosts" ?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 17, 2014

Contributor

@robhaswell #7468 #7467

I mean "split up" as in separate entities, whatever that may mean (same host or multi-host).

Contributor

cpuguy83 commented Oct 17, 2014

@robhaswell #7468 #7467

I mean "split up" as in separate entities, whatever that may mean (same host or multi-host).

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 17, 2014

Thanks for the "links" (hur) 😄

robhaswell commented Oct 17, 2014

Thanks for the "links" (hur) 😄

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 17, 2014

In light of links v2 I can see how this is functionality identical, so +1 from me. Losing the implied dependency graph (that Fig makes use of today) would be sad though.

robhaswell commented Oct 17, 2014

In light of links v2 I can see how this is functionality identical, so +1 from me. Losing the implied dependency graph (that Fig makes use of today) would be sad though.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 17, 2014

I find the proposed networking model confusing. In Kubernetes, the containers within a pod actually share a network namespace and therefore share the same IP address and port space.

We've proposed that they share resources (@vishh's question) and other namespaces, also (kubernetes/kubernetes#1615).

Copying text from https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md:

A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.

Pods facilitate data sharing and communication among their constituents.

The containers in the pod all use the same network namespace/IP and port space, and can find and communicate with each other using localhost. Each pod has an IP address in a flat shared networking namespace that has full communication with other physical computers and containers across the network.

In addition to defining the containers that run in the pod, the pod specifies a set of shared storage volumes. Volumes enable data to survive container restarts and to be shared among the containers within the pod.

In the future, pods will share IPC namespaces, CPU, and memory (http://www.linuxplumbersconf.org/2013/ocw//system/presentations/1239/original/lmctfy%20(1).pdf).

Pods also simplify application deployment and management by providing a higher-level abstraction than the raw, low-level container interface. Pods serve as units of deployment and horizontal scaling/replication. Co-location, fate sharing, coordinated replication, resource sharing, and dependency management are handled automatically.

Pods can be used to host vertically integrated application stacks, but their primary motivation is to support co-located, co-managed helper programs, such as:

  • content management systems, file and data loaders, local cache managers, etc.
  • log and checkpoint backup, compression, rotation, snapshotting, etc.
  • data change watchers, log tailers, logging and monitoring adapters, event publishers, etc.
  • proxies, bridges, and adapters
  • controllers, managers, configurators, and updaters

Individual pods are not intended to run multiple instances of the same application, in general.

bgrant0607 commented Oct 17, 2014

I find the proposed networking model confusing. In Kubernetes, the containers within a pod actually share a network namespace and therefore share the same IP address and port space.

We've proposed that they share resources (@vishh's question) and other namespaces, also (kubernetes/kubernetes#1615).

Copying text from https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md:

A pod (as in a pod of whales or pea pod) models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.

Pods facilitate data sharing and communication among their constituents.

The containers in the pod all use the same network namespace/IP and port space, and can find and communicate with each other using localhost. Each pod has an IP address in a flat shared networking namespace that has full communication with other physical computers and containers across the network.

In addition to defining the containers that run in the pod, the pod specifies a set of shared storage volumes. Volumes enable data to survive container restarts and to be shared among the containers within the pod.

In the future, pods will share IPC namespaces, CPU, and memory (http://www.linuxplumbersconf.org/2013/ocw//system/presentations/1239/original/lmctfy%20(1).pdf).

Pods also simplify application deployment and management by providing a higher-level abstraction than the raw, low-level container interface. Pods serve as units of deployment and horizontal scaling/replication. Co-location, fate sharing, coordinated replication, resource sharing, and dependency management are handled automatically.

Pods can be used to host vertically integrated application stacks, but their primary motivation is to support co-located, co-managed helper programs, such as:

  • content management systems, file and data loaders, local cache managers, etc.
  • log and checkpoint backup, compression, rotation, snapshotting, etc.
  • data change watchers, log tailers, logging and monitoring adapters, event publishers, etc.
  • proxies, bridges, and adapters
  • controllers, managers, configurators, and updaters

Individual pods are not intended to run multiple instances of the same application, in general.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 17, 2014

Something else we need in order to be able to use this is hooks: #6982. Otherwise if we want to do anything during container lifecycle (say, count the number of container restarts), we either need to add the support directly to Docker, which is sometimes desirable but sometimes not, or abandon the mechanism. For this reason, we can't currently use Docker's container restarts, for example.

bgrant0607 commented Oct 17, 2014

Something else we need in order to be able to use this is hooks: #6982. Otherwise if we want to do anything during container lifecycle (say, count the number of container restarts), we either need to add the support directly to Docker, which is sometimes desirable but sometimes not, or abandon the mechanism. For this reason, we can't currently use Docker's container restarts, for example.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 17, 2014

Contributor

@bgrant0607 Are events not good enough for counting container restarts?

While hooks (or plugins) are indeed important, I don't see how they are relevant to this particular discussion.

Contributor

cpuguy83 commented Oct 17, 2014

@bgrant0607 Are events not good enough for counting container restarts?

While hooks (or plugins) are indeed important, I don't see how they are relevant to this particular discussion.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 17, 2014

@cpuguy83 Ok, counting restarts was not a good example. Hooks are relevant because they are a prerequisite to make this feature usable by anyone for whom the provided behavior does not fully cover and/or match what they need to do.

bgrant0607 commented Oct 17, 2014

@cpuguy83 Ok, counting restarts was not a good example. Hooks are relevant because they are a prerequisite to make this feature usable by anyone for whom the provided behavior does not fully cover and/or match what they need to do.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 17, 2014

Contributor

@bgrant0607

Let's talk about what missing functionality you are referring to.
If something has been overlooked then we need to discuss.

Contributor

cpuguy83 commented Oct 17, 2014

@bgrant0607

Let's talk about what missing functionality you are referring to.
If something has been overlooked then we need to discuss.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 18, 2014

@cpuguy83 Would be happy to go into more detail, here or in a hangout or something.

In Kubernetes, containers within pods are never spread across hosts. That's the whole point of that abstraction.

For managing sets of pods across multiple hosts, we use labels.

My OSCON presentation explains the rationale:
https://speakerdeck.com/bgrant0607/managing-containerized-applications

bgrant0607 commented Oct 18, 2014

@cpuguy83 Would be happy to go into more detail, here or in a hangout or something.

In Kubernetes, containers within pods are never spread across hosts. That's the whole point of that abstraction.

For managing sets of pods across multiple hosts, we use labels.

My OSCON presentation explains the rationale:
https://speakerdeck.com/bgrant0607/managing-containerized-applications

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 18, 2014

Contributor

I liked the proposal until the last section, where I feel it jumped the
rails.

Providing a simple grouping mechanism, akin to what Kubernetes calls pods,
is part of a layered design. That's what I thought was being proposed (in
fact, the proposed API is pretty much identical to what we've been doing
with Kubernetes since the beginning.

Trying to make a cluster look like one big docker node is cute, but not
really that useful. Don't hide the cluster-ness. Embrace it. A cluster
is not a node. It fails to meet the node abstraction in all sorts of ways.

Also, I +1 Vish's comment strongly. Once you create grouping you have to
answer questions like resources and namespaces (other than net).

On Fri, Oct 17, 2014 at 1:21 PM, Rob Haswell notifications@github.com
wrote:

@crosbymichael https://github.com/crosbymichael Aanand has not
explicitly stated so, but one can infer that a group is intended to
represent the entirety of a multi-tiered microservice-based application:

For example, a group named myapp may have the containers myapp/web and
myapp/db.

I too would lament the loss of links as the Docker standard for composing
applications using multiple containers. @aanand
https://github.com/aanand what is the motivation for "Inter-container
communication using these hostnames is preferred over explicit links"? I
think this merits some discussion (or perhaps this has already taken place
and I have missed it!)

Reply to this email directly or view it on GitHub
#8637 (comment).

Contributor

thockin commented Oct 18, 2014

I liked the proposal until the last section, where I feel it jumped the
rails.

Providing a simple grouping mechanism, akin to what Kubernetes calls pods,
is part of a layered design. That's what I thought was being proposed (in
fact, the proposed API is pretty much identical to what we've been doing
with Kubernetes since the beginning.

Trying to make a cluster look like one big docker node is cute, but not
really that useful. Don't hide the cluster-ness. Embrace it. A cluster
is not a node. It fails to meet the node abstraction in all sorts of ways.

Also, I +1 Vish's comment strongly. Once you create grouping you have to
answer questions like resources and namespaces (other than net).

On Fri, Oct 17, 2014 at 1:21 PM, Rob Haswell notifications@github.com
wrote:

@crosbymichael https://github.com/crosbymichael Aanand has not
explicitly stated so, but one can infer that a group is intended to
represent the entirety of a multi-tiered microservice-based application:

For example, a group named myapp may have the containers myapp/web and
myapp/db.

I too would lament the loss of links as the Docker standard for composing
applications using multiple containers. @aanand
https://github.com/aanand what is the motivation for "Inter-container
communication using these hostnames is preferred over explicit links"? I
think this merits some discussion (or perhaps this has already taken place
and I have missed it!)

Reply to this email directly or view it on GitHub
#8637 (comment).

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Oct 18, 2014

Contributor

I have a much longer comment I'd like to make, but first I'll get some random small things out of the way.

@crosbymichael Links form a directed graph between containers. Groups in this PR form a set of containers, or essentially a bidirectional full mesh. The key things missing here is that groups do not describe directional explicit relationships between containers. Imagine a simple web, app, db application. They would be in the same group but you no longer know that web talks to app, app to db. So there are definitely some capabilities of describing an application lost here. Especially when you consider security. In the links case you could essentially prevent (assuming some other tool you built) outbound connections from DB because due to the links you know no outbound traffic will be initiated.

@vishh @bgrant0607 Groups seem to be an entirely different concept than pods so it doesn't seem like were comparing apples to apples. This does point out that maybe "Groups" is a wrong term for this API. All groups means at the moment is a shared DNS namespace and common start/stop lifecycle.

More comments to come.... :)

Contributor

ibuildthecloud commented Oct 18, 2014

I have a much longer comment I'd like to make, but first I'll get some random small things out of the way.

@crosbymichael Links form a directed graph between containers. Groups in this PR form a set of containers, or essentially a bidirectional full mesh. The key things missing here is that groups do not describe directional explicit relationships between containers. Imagine a simple web, app, db application. They would be in the same group but you no longer know that web talks to app, app to db. So there are definitely some capabilities of describing an application lost here. Especially when you consider security. In the links case you could essentially prevent (assuming some other tool you built) outbound connections from DB because due to the links you know no outbound traffic will be initiated.

@vishh @bgrant0607 Groups seem to be an entirely different concept than pods so it doesn't seem like were comparing apples to apples. This does point out that maybe "Groups" is a wrong term for this API. All groups means at the moment is a shared DNS namespace and common start/stop lifecycle.

More comments to come.... :)

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 18, 2014

Contributor

I am surprised to hear that groups are supposed to be different than pods -
you use the exact same words and API to describe them, right up until the
last piece of the OP description.
On Oct 17, 2014 11:23 PM, "Darren" notifications@github.com wrote:

I have a much longer comment I'd like to make, but first I'll get some
random small things out of the way.

@crosbymichael https://github.com/crosbymichael Links form a directed
graph between containers. Groups in this PR form a set of containers, or
essentially a bidirectional full mesh. The key things missing here is that
groups do not describe directional explicit relationships between
containers. Imagine a simple web, app, db application. They would be in the
same group but you no longer know that web talks to app, app to db. So
there are definitely some capabilities of describing an application lost
here. Especially when you consider security. In the links case you could
essentially prevent (assuming some other tool you built) outbound
connections from DB because due to the links you know no outbound traffic
will be initiated.

@vishh https://github.com/vishh @bgrant0607
https://github.com/bgrant0607 Groups seem to be an entirely different
concept than pods so it doesn't seem like were comparing apples to apples.
This does point out that maybe "Groups" is a wrong term for this API. All
groups means at the moment is a shared DNS namespace.

More comments to come.... :)

Reply to this email directly or view it on GitHub
#8637 (comment).

Contributor

thockin commented Oct 18, 2014

I am surprised to hear that groups are supposed to be different than pods -
you use the exact same words and API to describe them, right up until the
last piece of the OP description.
On Oct 17, 2014 11:23 PM, "Darren" notifications@github.com wrote:

I have a much longer comment I'd like to make, but first I'll get some
random small things out of the way.

@crosbymichael https://github.com/crosbymichael Links form a directed
graph between containers. Groups in this PR form a set of containers, or
essentially a bidirectional full mesh. The key things missing here is that
groups do not describe directional explicit relationships between
containers. Imagine a simple web, app, db application. They would be in the
same group but you no longer know that web talks to app, app to db. So
there are definitely some capabilities of describing an application lost
here. Especially when you consider security. In the links case you could
essentially prevent (assuming some other tool you built) outbound
connections from DB because due to the links you know no outbound traffic
will be initiated.

@vishh https://github.com/vishh @bgrant0607
https://github.com/bgrant0607 Groups seem to be an entirely different
concept than pods so it doesn't seem like were comparing apples to apples.
This does point out that maybe "Groups" is a wrong term for this API. All
groups means at the moment is a shared DNS namespace.

More comments to come.... :)

Reply to this email directly or view it on GitHub
#8637 (comment).

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 18, 2014

I agree with @thockin. I don't understand the point of the proposal, then. This model won't scale to non-trivial deployment and scaling scenarios.

bgrant0607 commented Oct 18, 2014

I agree with @thockin. I don't understand the point of the proposal, then. This model won't scale to non-trivial deployment and scaling scenarios.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 18, 2014

Collaborator

@aanand what would be the relationship between groups and volumes?

Collaborator

shykes commented Oct 18, 2014

@aanand what would be the relationship between groups and volumes?

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 18, 2014

Collaborator

@aanand would this deprecate container-to-container links, and the naming/hierarchy system that comes with it? If so, how would we deal with reverse compatibility?

Collaborator

shykes commented Oct 18, 2014

@aanand would this deprecate container-to-container links, and the naming/hierarchy system that comes with it? If so, how would we deal with reverse compatibility?

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Oct 18, 2014

Contributor

Where do we draw the line? As an architect I like lines (circles and squares too). Last DockerCon Docker was given the moniker Docker Engine in an attempt to distinguish the core Docker from all the other services and functionality that we will begin to layer on top of it. This proposal is one of the first pieces of functionality that I see that is logically layered above the core functionality of the Docker Engine.

Previously with fig and Docker the separation was clear and obvious. But, as we want to increase the convenience, utility, and usability of Docker it makes sense to pull fig “into the fold” and make it a first class citizen. This way users can get fig functionality but still with a single binary download, and single CLI interface. This is a tricky maneuver.

With fig, lets say I had some magical Docker orchestration tool (let’s just completely artribtrarily pick Stampede.io as an example 😄). If I was to perfectly implement the Docker Remote API with Stampede.io, then fig would just magically work with Stampede.io. Fig, or any tool that uses the remote API, would work because the remote API is the interface and contract to Docker.

With this proposal, the Docker Remote API is expanded to add this functionality. While some of the functionality is indeed client side, the remote API is extended to add the Groups API. Now lets say the creator of Stampede.io (who I happen to know well), doesn’t really want to implement Groups functionality because he’s lazy, or wants to do different things. (Let’s not fool ourselves, group start/stop/update will become very complex if one wants to do real production worthy deployment methodologies.) The problem is that if he doesn’t implement the API, then it just won’t work. Now looking back at the fig approach, it will work because it’s physically a layer outside of Docker.

So where do we draw the line. Where does the Docker Engine end and where does higher level functionality start. What I would propose is that the goal is to not expand the Docker Remote API. Keep it as small as possible with only the bare minimum primitives to allow higher level functionality to be implemented. Then higher level functionality, such as Groups should be developed as a logically separate component that only relies on the Remote API to perform its functionality. If we still want a single binary download, that is fine. That is a packaging issue, I’m purely talking about architecture and design. Logically something like Groups should be implemented as a bolt on that requires only the Remote API. This way people building magical Docker platforms are only required to support the Docker Remote API and then all other higher level functionality can be layered on top.

You can imagine we are going to see more and more higher level functionality: scheduling, application templating/management, policy management, service discovery, etc. If all of these things creep into the core Remote API we lose composability. We want to encourage a pluggable ecosystem and not massive vertical implementations.

I still have some real technical comments on the implementation, but I want to hold off until we can talk about this higher level approach stuff first. I feel this proposal is going to essentially set the precedent on how we approach bringing higher level functionality into Docker and I’d like to get it right (or as close to it) now.

@shykes thoughts?

Contributor

ibuildthecloud commented Oct 18, 2014

Where do we draw the line? As an architect I like lines (circles and squares too). Last DockerCon Docker was given the moniker Docker Engine in an attempt to distinguish the core Docker from all the other services and functionality that we will begin to layer on top of it. This proposal is one of the first pieces of functionality that I see that is logically layered above the core functionality of the Docker Engine.

Previously with fig and Docker the separation was clear and obvious. But, as we want to increase the convenience, utility, and usability of Docker it makes sense to pull fig “into the fold” and make it a first class citizen. This way users can get fig functionality but still with a single binary download, and single CLI interface. This is a tricky maneuver.

With fig, lets say I had some magical Docker orchestration tool (let’s just completely artribtrarily pick Stampede.io as an example 😄). If I was to perfectly implement the Docker Remote API with Stampede.io, then fig would just magically work with Stampede.io. Fig, or any tool that uses the remote API, would work because the remote API is the interface and contract to Docker.

With this proposal, the Docker Remote API is expanded to add this functionality. While some of the functionality is indeed client side, the remote API is extended to add the Groups API. Now lets say the creator of Stampede.io (who I happen to know well), doesn’t really want to implement Groups functionality because he’s lazy, or wants to do different things. (Let’s not fool ourselves, group start/stop/update will become very complex if one wants to do real production worthy deployment methodologies.) The problem is that if he doesn’t implement the API, then it just won’t work. Now looking back at the fig approach, it will work because it’s physically a layer outside of Docker.

So where do we draw the line. Where does the Docker Engine end and where does higher level functionality start. What I would propose is that the goal is to not expand the Docker Remote API. Keep it as small as possible with only the bare minimum primitives to allow higher level functionality to be implemented. Then higher level functionality, such as Groups should be developed as a logically separate component that only relies on the Remote API to perform its functionality. If we still want a single binary download, that is fine. That is a packaging issue, I’m purely talking about architecture and design. Logically something like Groups should be implemented as a bolt on that requires only the Remote API. This way people building magical Docker platforms are only required to support the Docker Remote API and then all other higher level functionality can be layered on top.

You can imagine we are going to see more and more higher level functionality: scheduling, application templating/management, policy management, service discovery, etc. If all of these things creep into the core Remote API we lose composability. We want to encourage a pluggable ecosystem and not massive vertical implementations.

I still have some real technical comments on the implementation, but I want to hold off until we can talk about this higher level approach stuff first. I feel this proposal is going to essentially set the precedent on how we approach bringing higher level functionality into Docker and I’d like to get it right (or as close to it) now.

@shykes thoughts?

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Oct 19, 2014

Contributor

@aanand

The first thing I think we should try to get agreement on here is: "what are groups for?"

If groups are meant to be "fig in Docker" then I think we should observe and respect the practice of fig users (much like philosophers of maths observe and reason about the practice of mathematicians).

As far as I see it, fig is all about describing a distributed application as a set of connected containers.

Even looking at the simplest possible example of fig on its homepage, it seems to be about describing an app (the counter app) in terms of its flask server and redis database. That's the whole app, not just one service within it.

I've heard from fig users when talking to them that they love being able to package their entire distributed application within a single fig file.

As the author of fig I think you probably have a lot more visibility into fig users' current practice. Is what I've described so far consistent with your analysis?

If this is accurate, then, it seems really very different to the kubernetes pod concept, which seems primarily motivated to allow encapsulation of a component container along with its adjunct services, e.g. a web server with logging (as per @bgrant0607's quote).

My view is that the multi-layered approach in Kubernetes is certainly valid (and is a sophisticated way of describing apps), but we should either:

  1. Go all in on that approach, and drag in the concepts of services as loosely-coupled (multi-host) collections of (tightly-coupled) pods, or,
  2. Agree that groups are, for now, supposed to be simpler, higher level entities than this multi-layered approach. This view is sympathetic to current usage of fig which represents collections of containers which make up an application. This view is from the application developer's point-of-view, paying little heed at this point to the specifics of how it gets deployed - which is more of an ops concern.

Given that Docker is currently single-host, the second option would seem like a simpler, more incremental step.

In which case, I would support this proposal with the removal of the statement that "groups are supposed to be single host entities", and I suggest that we figure the multi-host story out later.

If we can agree on this, the next step would be a discussion of links (and perhaps volumes).

Contributor

lukemarsden commented Oct 19, 2014

@aanand

The first thing I think we should try to get agreement on here is: "what are groups for?"

If groups are meant to be "fig in Docker" then I think we should observe and respect the practice of fig users (much like philosophers of maths observe and reason about the practice of mathematicians).

As far as I see it, fig is all about describing a distributed application as a set of connected containers.

Even looking at the simplest possible example of fig on its homepage, it seems to be about describing an app (the counter app) in terms of its flask server and redis database. That's the whole app, not just one service within it.

I've heard from fig users when talking to them that they love being able to package their entire distributed application within a single fig file.

As the author of fig I think you probably have a lot more visibility into fig users' current practice. Is what I've described so far consistent with your analysis?

If this is accurate, then, it seems really very different to the kubernetes pod concept, which seems primarily motivated to allow encapsulation of a component container along with its adjunct services, e.g. a web server with logging (as per @bgrant0607's quote).

My view is that the multi-layered approach in Kubernetes is certainly valid (and is a sophisticated way of describing apps), but we should either:

  1. Go all in on that approach, and drag in the concepts of services as loosely-coupled (multi-host) collections of (tightly-coupled) pods, or,
  2. Agree that groups are, for now, supposed to be simpler, higher level entities than this multi-layered approach. This view is sympathetic to current usage of fig which represents collections of containers which make up an application. This view is from the application developer's point-of-view, paying little heed at this point to the specifics of how it gets deployed - which is more of an ops concern.

Given that Docker is currently single-host, the second option would seem like a simpler, more incremental step.

In which case, I would support this proposal with the removal of the statement that "groups are supposed to be single host entities", and I suggest that we figure the multi-host story out later.

If we can agree on this, the next step would be a discussion of links (and perhaps volumes).

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 19, 2014

@aanand wrote: Provide a platform on which to build a Fig-like stack composition feature without resorting to hacks such as Fig's container naming scheme (<project name>_<service name>_<numeric suffix>).

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers or stages, multiple micro-services per tier). Labels in Kubernetes were designed to address this problem.

bgrant0607 commented Oct 19, 2014

@aanand wrote: Provide a platform on which to build a Fig-like stack composition feature without resorting to hacks such as Fig's container naming scheme (<project name>_<service name>_<numeric suffix>).

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers or stages, multiple micro-services per tier). Labels in Kubernetes were designed to address this problem.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 19, 2014

Collaborator

@bgrant0607, respectfully. I think these are legitimate questions and by
all means let's debate them.

But, you guys are going a little too heavy on the Kubernetes propaganda
here. In many ways that project overlaps with Docker itself, both in
current functionality and future goals. There's nothing wrong with that,
healthy competition is good. But this is the Docker repo and I'd appreciate
it if you kept the focus where if belongs - on improving Docker.

Thanks.

On Sun, Oct 19, 2014 at 6:49 PM, bgrant0607 notifications@github.com
wrote:

@aanand https://github.com/aanand wrote: Provide a platform on which to
build a Fig-like stack composition feature without resorting to hacks such
as Fig's container naming scheme ().

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often
multi-dimensional entities (e.g., multiple partitions or deployments,
multiple release tracks, multiple tiers or stages, multiple micro-services
per tier). Labels in Kubernetes
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/labels.md
were designed to address this problem.


Reply to this email directly or view it on GitHub
#8637 (comment).

Collaborator

shykes commented Oct 19, 2014

@bgrant0607, respectfully. I think these are legitimate questions and by
all means let's debate them.

But, you guys are going a little too heavy on the Kubernetes propaganda
here. In many ways that project overlaps with Docker itself, both in
current functionality and future goals. There's nothing wrong with that,
healthy competition is good. But this is the Docker repo and I'd appreciate
it if you kept the focus where if belongs - on improving Docker.

Thanks.

On Sun, Oct 19, 2014 at 6:49 PM, bgrant0607 notifications@github.com
wrote:

@aanand https://github.com/aanand wrote: Provide a platform on which to
build a Fig-like stack composition feature without resorting to hacks such
as Fig's container naming scheme ().

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often
multi-dimensional entities (e.g., multiple partitions or deployments,
multiple release tracks, multiple tiers or stages, multiple micro-services
per tier). Labels in Kubernetes
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/labels.md
were designed to address this problem.


Reply to this email directly or view it on GitHub
#8637 (comment).

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 19, 2014

@aanand @lukemarsden @shykes

Pods would provide functionality needed by most of Google's production services to Docker. OTOH, the simplistic service deployment mechanism wouldn't be usable by even the smallest of our services.

Rather than baking a fairly limited deployment mechanism into the Docker API, I'd make Docker's API more suitable for layering configuration/deployment systems atop the API, as we are doing in Kubernetes. No matter what functionality you add, some significant number of users will need to extend it. If they can't, they'll need to abandon it. Embrace the fact that people will extend, compose, layer, and wrap additional functionality.

In production deployment scenarios, issues such as multiple deployments, workflow dependencies, load balancing, auto-scaling, rolling updates, rollbacks, reproducibility, and shared services (i.e., services that don't know a priori who their clients will be) quickly enter the picture. Additionally, resources other than just Docker containers will need to be deployed, such as updates to data and/or database schemas, or coordinated provisioning at the IaaS layer.

Mechanisms such as labels, [annotations](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/annotations.md, #6839), lifecycle hooks (#6982) would help make Docker a sound building block for deployment ecosystems.

Additionally, we do need broad agreement on are interfaces with the applications running in containers, the upward interfaces (#2336) and downward interfaces (#8427), including naming/discovery, since those interfaces will couple applications to their environment.

bgrant0607 commented Oct 19, 2014

@aanand @lukemarsden @shykes

Pods would provide functionality needed by most of Google's production services to Docker. OTOH, the simplistic service deployment mechanism wouldn't be usable by even the smallest of our services.

Rather than baking a fairly limited deployment mechanism into the Docker API, I'd make Docker's API more suitable for layering configuration/deployment systems atop the API, as we are doing in Kubernetes. No matter what functionality you add, some significant number of users will need to extend it. If they can't, they'll need to abandon it. Embrace the fact that people will extend, compose, layer, and wrap additional functionality.

In production deployment scenarios, issues such as multiple deployments, workflow dependencies, load balancing, auto-scaling, rolling updates, rollbacks, reproducibility, and shared services (i.e., services that don't know a priori who their clients will be) quickly enter the picture. Additionally, resources other than just Docker containers will need to be deployed, such as updates to data and/or database schemas, or coordinated provisioning at the IaaS layer.

Mechanisms such as labels, [annotations](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/annotations.md, #6839), lifecycle hooks (#6982) would help make Docker a sound building block for deployment ecosystems.

Additionally, we do need broad agreement on are interfaces with the applications running in containers, the upward interfaces (#2336) and downward interfaces (#8427), including naming/discovery, since those interfaces will couple applications to their environment.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 19, 2014

Collaborator

@lukemarsden I agree that we are dealing with 2 different possible meanings of "group" here.

The way I understand it, this proposal matches your bullet point 2: it is a model to define the entire application, similar to Fig (and also very similar to dotcloud.yml :). So it is not equivalent to a pod. Rather, it is comparable to the combination of labels and services in kubernetes.

Right now my question is: are these 2 models (let's call them the F and K models, for Fig and Kubernetes) really so different? Personally I'm not convinced they are, they seem pretty complementary to me.

I think we should try a more rigorous "line by line" comparison of the 2 models, it should help clarify the discussion. In that comparison we should set aside the concept of pods which I believe is completely orthogonal here.

Collaborator

shykes commented Oct 19, 2014

@lukemarsden I agree that we are dealing with 2 different possible meanings of "group" here.

The way I understand it, this proposal matches your bullet point 2: it is a model to define the entire application, similar to Fig (and also very similar to dotcloud.yml :). So it is not equivalent to a pod. Rather, it is comparable to the combination of labels and services in kubernetes.

Right now my question is: are these 2 models (let's call them the F and K models, for Fig and Kubernetes) really so different? Personally I'm not convinced they are, they seem pretty complementary to me.

I think we should try a more rigorous "line by line" comparison of the 2 models, it should help clarify the discussion. In that comparison we should set aside the concept of pods which I believe is completely orthogonal here.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 19, 2014

Contributor

I would think ideally pods would be easier to implement on top of the grouping functionality and labelling (and related things) are really separate from this idea.

Contributor

cpuguy83 commented Oct 19, 2014

I would think ideally pods would be easier to implement on top of the grouping functionality and labelling (and related things) are really separate from this idea.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 19, 2014

Collaborator

@bgrant0607 just to clarify, I am not backing either proposal right now, including this one. I would like to find a common framework for comparing the 2 approaches. My feeling is that they are more complementary than we think.

About designing the Docker API to allow extensions: I couldn't agree more. This is the absolute core of what we're trying to do. We just disagree in the best way to provide extension. We both agree that "extension by implementing every pattern ourselves" is not the answer. But you advocate extension by wrapping. I advocate extension by programmability. I believe Docker must be the runtime, not just for the application, but for the software infrastructure around it. That way you get the best of both worlds: the Docker API stays small and fundamental, but it is still facing the developer and their application, unaltered, preventing fragmentation.

To offer an analogy: if Docker is a shell, I want to implement the equivalent of shell functions, rather than everyone creating their own custom wrapper shell just to add 1 function.

Collaborator

shykes commented Oct 19, 2014

@bgrant0607 just to clarify, I am not backing either proposal right now, including this one. I would like to find a common framework for comparing the 2 approaches. My feeling is that they are more complementary than we think.

About designing the Docker API to allow extensions: I couldn't agree more. This is the absolute core of what we're trying to do. We just disagree in the best way to provide extension. We both agree that "extension by implementing every pattern ourselves" is not the answer. But you advocate extension by wrapping. I advocate extension by programmability. I believe Docker must be the runtime, not just for the application, but for the software infrastructure around it. That way you get the best of both worlds: the Docker API stays small and fundamental, but it is still facing the developer and their application, unaltered, preventing fragmentation.

To offer an analogy: if Docker is a shell, I want to implement the equivalent of shell functions, rather than everyone creating their own custom wrapper shell just to add 1 function.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 19, 2014

Contributor

To be sure, the last thing we want is to shove Kubernetes onto people who
don't need it. I think the confusion comes from some previous discussions
about adopting a pod-like semantic into the Docker core, and this proposal
being confusingly worded (e.g. "share a network namespace").

Given that this is NOT the intent of the proposal, we should stop trying to
make it so.

But I don't think Docker and Kubernetes are competitive - I think they are
both pieces of a layered solution. Clean layers are important, as I know
you know. Having Docker adopt some of the Kubernetes ideas means that we
might be able to make thinner and thinner wrappers and that some of the
things we know are important will be more available to more users.

Tim

On Sun, Oct 19, 2014 at 11:11 AM, Solomon Hykes notifications@github.com
wrote:

@bgrant0607, respectfully. I think these are legitimate questions and by
all means let's debate them.

But, you guys are going a little too heavy on the Kubernetes propaganda
here. In many ways that project overlaps with Docker itself, both in
current functionality and future goals. There's nothing wrong with that,
healthy competition is good. But this is the Docker repo and I'd
appreciate
it if you kept the focus where if belongs - on improving Docker.

Thanks.

On Sun, Oct 19, 2014 at 6:49 PM, bgrant0607 notifications@github.com
wrote:

@aanand https://github.com/aanand wrote: Provide a platform on which
to
build a Fig-like stack composition feature without resorting to hacks
such
as Fig's container naming scheme ().

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often
multi-dimensional entities (e.g., multiple partitions or deployments,
multiple release tracks, multiple tiers or stages, multiple
micro-services
per tier). Labels in Kubernetes
<
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/labels.md>

were designed to address this problem.

Reply to this email directly or view it on GitHub
#8637 (comment).

Reply to this email directly or view it on GitHub
#8637 (comment).

Contributor

thockin commented Oct 19, 2014

To be sure, the last thing we want is to shove Kubernetes onto people who
don't need it. I think the confusion comes from some previous discussions
about adopting a pod-like semantic into the Docker core, and this proposal
being confusingly worded (e.g. "share a network namespace").

Given that this is NOT the intent of the proposal, we should stop trying to
make it so.

But I don't think Docker and Kubernetes are competitive - I think they are
both pieces of a layered solution. Clean layers are important, as I know
you know. Having Docker adopt some of the Kubernetes ideas means that we
might be able to make thinner and thinner wrappers and that some of the
things we know are important will be more available to more users.

Tim

On Sun, Oct 19, 2014 at 11:11 AM, Solomon Hykes notifications@github.com
wrote:

@bgrant0607, respectfully. I think these are legitimate questions and by
all means let's debate them.

But, you guys are going a little too heavy on the Kubernetes propaganda
here. In many ways that project overlaps with Docker itself, both in
current functionality and future goals. There's nothing wrong with that,
healthy competition is good. But this is the Docker repo and I'd
appreciate
it if you kept the focus where if belongs - on improving Docker.

Thanks.

On Sun, Oct 19, 2014 at 6:49 PM, bgrant0607 notifications@github.com
wrote:

@aanand https://github.com/aanand wrote: Provide a platform on which
to
build a Fig-like stack composition feature without resorting to hacks
such
as Fig's container naming scheme ().

The proposed solution does not solve this problem.

Service deployments and batch processing pipelines are often
multi-dimensional entities (e.g., multiple partitions or deployments,
multiple release tracks, multiple tiers or stages, multiple
micro-services
per tier). Labels in Kubernetes
<
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/labels.md>

were designed to address this problem.

Reply to this email directly or view it on GitHub
#8637 (comment).

Reply to this email directly or view it on GitHub
#8637 (comment).

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 20, 2014

What @thockin said. Trying to help reduce barriers for legacy and production workloads.

bgrant0607 commented Oct 20, 2014

What @thockin said. Trying to help reduce barriers for legacy and production workloads.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Oct 20, 2014

Contributor

There's work on links v2 right now (#7468) but then this proposal seems to abandon links as a means of tying together multi-tiered applications. I don't really see why. Why a new concept that has impact on networking? I think we are overloading the role of groups by saying that a group is now a shared DNS namespace.

Adding a groups API has huge, huge implications if the group concept means that the lifecycle of the members are tied together in some fashion. The reason being that this gets into orchestration, and there's a lot of ways to skin that cat.

If we back up a bit and say what we really want is fig functionality, why can't we build the majority of the group functionality in the client itself. The the Docker API only really needs to be extended to add the ability to attach some more metadata to the containers.

Contributor

ibuildthecloud commented Oct 20, 2014

There's work on links v2 right now (#7468) but then this proposal seems to abandon links as a means of tying together multi-tiered applications. I don't really see why. Why a new concept that has impact on networking? I think we are overloading the role of groups by saying that a group is now a shared DNS namespace.

Adding a groups API has huge, huge implications if the group concept means that the lifecycle of the members are tied together in some fashion. The reason being that this gets into orchestration, and there's a lot of ways to skin that cat.

If we back up a bit and say what we really want is fig functionality, why can't we build the majority of the group functionality in the client itself. The the Docker API only really needs to be extended to add the ability to attach some more metadata to the containers.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Oct 20, 2014

Contributor

The ideal situation I see is that links and groups complete as projects independently and complement each other; e.g., the grouping api just leverages the links api to provide discovery.

Contributor

erikh commented Oct 20, 2014

The ideal situation I see is that links and groups complete as projects independently and complement each other; e.g., the grouping api just leverages the links api to provide discovery.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Oct 20, 2014

Contributor

Just to iterate more on this; links and groups can be developed (and improved) independently this way, instead of doing it all in a monolithic architecture.

Contributor

erikh commented Oct 20, 2014

Just to iterate more on this; links and groups can be developed (and improved) independently this way, instead of doing it all in a monolithic architecture.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 21, 2014

Collaborator

@thockin @bgrant0607 I have a question on the google model of discovery and networking via service proxy + label queries. How do you suggest we handle inter-host level 3 networking?

More specifically:

The model you describe seems to only allow discovery and networking at the tcp level (setting aside pods which don't allow inter-host networking) using a userland proxy. However, there are many applications out there which 1) may require distributing their components across machines, and 2) require discovery and networking at the IP level, usually because they already have a discovery system of their own. Many of the popular networking extensions of Docker rely on overlay networking, and even native Docker links are commonly used at the IP level (relying on the fact that, in the current implementation, links to not translate port numbers). If we adopted a networking model which does not accommodate these use cases, we would be shutting out a significant portion of Docker users - forcing them to continue to rely on wrappers, which in turn will cause further fragmentation.

Collaborator

shykes commented Oct 21, 2014

@thockin @bgrant0607 I have a question on the google model of discovery and networking via service proxy + label queries. How do you suggest we handle inter-host level 3 networking?

More specifically:

The model you describe seems to only allow discovery and networking at the tcp level (setting aside pods which don't allow inter-host networking) using a userland proxy. However, there are many applications out there which 1) may require distributing their components across machines, and 2) require discovery and networking at the IP level, usually because they already have a discovery system of their own. Many of the popular networking extensions of Docker rely on overlay networking, and even native Docker links are commonly used at the IP level (relying on the fact that, in the current implementation, links to not translate port numbers). If we adopted a networking model which does not accommodate these use cases, we would be shutting out a significant portion of Docker users - forcing them to continue to rely on wrappers, which in turn will cause further fragmentation.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 21, 2014

Contributor

On Mon, Oct 20, 2014 at 6:36 PM, Solomon Hykes notifications@github.com wrote:

@thockin @bgrant0607 I have a question on the google model of discovery and networking via service proxy + label queries. How do you suggest we handle inter-host level 3 networking?

More specifically:

The model you describe seems to only allow discovery and networking at the tcp level (setting aside pods which don't allow inter-host networking) using a userland proxy. However, there are many applications out there which 1) may require distributing their components across machines, and 2) require discovery and networking at the IP level, usually because they already have a discovery system of their own. Many of the popular networking extensions of Docker rely on overlay networking, and even native Docker links are commonly used at the IP level (relying on the fact that, in the current implementation, links to not translate port numbers). If we adopted a networking model which does not accommodate these use cases, we would be shutting out a significant portion of Docker users - forcing them to continue to rely on wrappers, which in turn will cause further fragmentation.

I'm not sure I get what "at the IP level" means here. Do you mean
"without a proxy" ? The intention is that kubernetes publishes
information such that other discovery/naming systems can be synced to
it, or even that call-outs to naming systems could happen at pod
bind-to-host time. We have not detailed this last part yet, but we've
discussed it in a few of places, and I think it's even on the 1.0
roadmap. Or am I missing the question?

Tim

Contributor

thockin commented Oct 21, 2014

On Mon, Oct 20, 2014 at 6:36 PM, Solomon Hykes notifications@github.com wrote:

@thockin @bgrant0607 I have a question on the google model of discovery and networking via service proxy + label queries. How do you suggest we handle inter-host level 3 networking?

More specifically:

The model you describe seems to only allow discovery and networking at the tcp level (setting aside pods which don't allow inter-host networking) using a userland proxy. However, there are many applications out there which 1) may require distributing their components across machines, and 2) require discovery and networking at the IP level, usually because they already have a discovery system of their own. Many of the popular networking extensions of Docker rely on overlay networking, and even native Docker links are commonly used at the IP level (relying on the fact that, in the current implementation, links to not translate port numbers). If we adopted a networking model which does not accommodate these use cases, we would be shutting out a significant portion of Docker users - forcing them to continue to rely on wrappers, which in turn will cause further fragmentation.

I'm not sure I get what "at the IP level" means here. Do you mean
"without a proxy" ? The intention is that kubernetes publishes
information such that other discovery/naming systems can be synced to
it, or even that call-outs to naming systems could happen at pod
bind-to-host time. We have not detailed this last part yet, but we've
discussed it in a few of places, and I think it's even on the 1.0
roadmap. Or am I missing the question?

Tim

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 21, 2014

Pods have routable IP addresses within a flat address space. However, it's true that they can change when pods are recreated on new hosts.

I see us supporting 2 approaches:

  1. Stable IP addresses that are logically separate from specific/individual pods, for which we publish DNS names. Today, Kubernetes supports load-balanced services. In the future, I expect us to also support services where one IP address maps to one pod at a time (e.g., kubernetes/kubernetes#260, kubernetes/kubernetes#1542). Routing to service addresses is currently done via iptables and the service proxy, but I could imagine overlay networks or other SDN techniques in the future.
  2. API for watching a group of endpoints (IP:port) and/or IP addresses. Kubernetes exposes this now, but it's consumed by the service proxy. I've proposed we allow users to specify groups just to produce this endpoint list (kubernetes/kubernetes#1607). I'm interested in standardizing a simple API of this kind for use in systems other than just Kubernetes.

Right now, Kubernetes requires exactly one port to be associated with services, but I'd like to allow none and multiple (kubernetes/kubernetes#1802).

It would still be the case that Kubernetes wouldn't use the host IP addresses and wouldn't dynamically allocate ports.

bgrant0607 commented Oct 21, 2014

Pods have routable IP addresses within a flat address space. However, it's true that they can change when pods are recreated on new hosts.

I see us supporting 2 approaches:

  1. Stable IP addresses that are logically separate from specific/individual pods, for which we publish DNS names. Today, Kubernetes supports load-balanced services. In the future, I expect us to also support services where one IP address maps to one pod at a time (e.g., kubernetes/kubernetes#260, kubernetes/kubernetes#1542). Routing to service addresses is currently done via iptables and the service proxy, but I could imagine overlay networks or other SDN techniques in the future.
  2. API for watching a group of endpoints (IP:port) and/or IP addresses. Kubernetes exposes this now, but it's consumed by the service proxy. I've proposed we allow users to specify groups just to produce this endpoint list (kubernetes/kubernetes#1607). I'm interested in standardizing a simple API of this kind for use in systems other than just Kubernetes.

Right now, Kubernetes requires exactly one port to be associated with services, but I'd like to allow none and multiple (kubernetes/kubernetes#1802).

It would still be the case that Kubernetes wouldn't use the host IP addresses and wouldn't dynamically allocate ports.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Oct 21, 2014

Collaborator

@bgrant0607 so how much of that should we do in Docker in your opinion?

Collaborator

shykes commented Oct 21, 2014

@bgrant0607 so how much of that should we do in Docker in your opinion?

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Oct 21, 2014

Contributor

Containers within a group share a network namespace and /etc/hosts file

Excuse in advance, the (maybe) dumb questions:

If containers within a group share the same netns but have separate IP, does that mean that each container has its own interface visible by all the other containers of the same group?
Can containers of the same group mess with each others interfaces?

What happens if multiple containers bind on localhost? Do they still get free discovery a-la --net=container:?

Contributor

proppy commented Oct 21, 2014

Containers within a group share a network namespace and /etc/hosts file

Excuse in advance, the (maybe) dumb questions:

If containers within a group share the same netns but have separate IP, does that mean that each container has its own interface visible by all the other containers of the same group?
Can containers of the same group mess with each others interfaces?

What happens if multiple containers bind on localhost? Do they still get free discovery a-la --net=container:?

@nathanleclaire

This comment has been minimized.

Show comment
Hide comment
@nathanleclaire

nathanleclaire Oct 21, 2014

Contributor

+1 to @proppy's questions, additionally I was wondering how failure conditions will be handled. If you link to db on an external IP (say it is updated in-place as described) and the node goes down, what happens? Whose responsibility (docker, the process, or other) is it to get it back up?

Which parts of this are on client (YAML parsing?) and which are located in the daemon (container operations?)?

Can you push and pull groups and/or would there be a mechanism like docker export?

How does one do redundant containers or groups (e.g. behind a load balancer, for a highly available deploy)? Is there blah_0 or the same image with different names? Speaking of highly available deploys, what does a group of redundant containers across hosts look like in this setup? Can you criss-cross hosts? Does Docker do it automatically?

That said, I love the JSON-update-as-composable building block method, I think it's a very interesting way to stimulate the way that orchestration tool creators think about design.

I think the concerns favouring an explicit dependency graph for links are reasonable. If you have 10 containers in your group.yml (Dockergroup.yml anybody?), and you use a model similar to what you're describing, you've opened up 100 possible network paths vs., say, 5 with the way fig currently is.

I really like this proposal, especially the docker up --parse | docker groups run - mechanic, these are just some things that have been mulling over in my mind about it.

Contributor

nathanleclaire commented Oct 21, 2014

+1 to @proppy's questions, additionally I was wondering how failure conditions will be handled. If you link to db on an external IP (say it is updated in-place as described) and the node goes down, what happens? Whose responsibility (docker, the process, or other) is it to get it back up?

Which parts of this are on client (YAML parsing?) and which are located in the daemon (container operations?)?

Can you push and pull groups and/or would there be a mechanism like docker export?

How does one do redundant containers or groups (e.g. behind a load balancer, for a highly available deploy)? Is there blah_0 or the same image with different names? Speaking of highly available deploys, what does a group of redundant containers across hosts look like in this setup? Can you criss-cross hosts? Does Docker do it automatically?

That said, I love the JSON-update-as-composable building block method, I think it's a very interesting way to stimulate the way that orchestration tool creators think about design.

I think the concerns favouring an explicit dependency graph for links are reasonable. If you have 10 containers in your group.yml (Dockergroup.yml anybody?), and you use a model similar to what you're describing, you've opened up 100 possible network paths vs., say, 5 with the way fig currently is.

I really like this proposal, especially the docker up --parse | docker groups run - mechanic, these are just some things that have been mulling over in my mind about it.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Oct 21, 2014

@shykes This probably deserves a discussion via VC or in person, but briefly:

Ideally, applications would be able to use the same models/interfaces for addressing/naming/discovery whether single-host (Docker-only) or multi-host (Docker + clustering solution such as Kubernetes):

  • localhost for containers sharing a network namespace,
  • IP addresses and DNS names that are stable at least across container restarts and deployment of new container images, but also possibly further decoupled (e.g., for load balancing), and
  • the group endpoint/IP list API for dynamic discovery.

The implementations of those interfaces don't need to be the same in single-host and multi-host scenarios, however, and multiple approaches to management of address/name bindings and group of addresses/endpoints could be supported (e.g., both imperative management APIs, as with the proposed dynamic links commands in #7468, and labels, as in Kubernetes).

bgrant0607 commented Oct 21, 2014

@shykes This probably deserves a discussion via VC or in person, but briefly:

Ideally, applications would be able to use the same models/interfaces for addressing/naming/discovery whether single-host (Docker-only) or multi-host (Docker + clustering solution such as Kubernetes):

  • localhost for containers sharing a network namespace,
  • IP addresses and DNS names that are stable at least across container restarts and deployment of new container images, but also possibly further decoupled (e.g., for load balancing), and
  • the group endpoint/IP list API for dynamic discovery.

The implementations of those interfaces don't need to be the same in single-host and multi-host scenarios, however, and multiple approaches to management of address/name bindings and group of addresses/endpoints could be supported (e.g., both imperative management APIs, as with the proposed dynamic links commands in #7468, and labels, as in Kubernetes).

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Oct 21, 2014

Contributor

Thanks for the feedback, all - fantastic stuff, exactly the discussion I was hoping for. Some loosely grouped answers/clarifications/reckons follow.

Fig and Stuff

To directly answer @lukemarsden’s question: groups are supposed to be a step towards two things: “Fig in Docker”, as you put it (i.e. a brilliant development experience) and a complete story regarding multi-host production workflows.

I agree that most Fig users probably think about Fig as something which spins up their whole, potentially-distributed app. Of course apps managed by Fig aren’t distributed today (as far as I know, only a tiny number of people are even using Fig in production), but I agree that Docker’s composition capabilities should be eventually extensible to be able to compose (i.e. create and spin up) a complete multi-container app on multiple hosts.

If I do say so myself, Fig is wonderfully high-level - almost Platonic - requiring no information beyond “what needs to run, and what needs to talk to what?” It would be fantastic if Fig (“in Docker” or not) in a multi-host scenario could get by with just as little information, with some mechanism in the remote API for providing extra instruction (the form of which might be specific to the clustering system actually in use).

So as well as gauging its usefulness in isolation from any automated composition tool, I’d like to figure out if a grouping feature - either as described here or radically different - would help at all in achieving that, or if it’d be useless, or even a hindrance.

Pods

Regarding the Kubernetes pods comparison that several have raised: yes, they’re similar. Pods were an influence on this proposal; it’s fairly easy to imagine porting Kubernetes’ concept of a “service” (a distributed set of pods) over to groups. (Usage example: I have a group with web + db; in development I spin up my own db container, but in production I maintain a multi-host db service; when deploying I use IP replacement in the hosts file to point the new web container at one node in the db service).

Links

Criticism of the hosts file / deprecation of links for intra-group communication is valid. Links have a few advantages:

  • They’re already in use.
  • The dependency graph is explicit, as pointed out, which is arguably a good thing for both the Docker engine and any hypothetical orchestration system to be aware of.
  • They make for a smaller set of logical connections within a group (a shared hosts file means, as pointed out, a fully-connected dependency graph)

Still, cross-host links don’t actually exist (ambassador containers are a way to fake them today, but they introduce performance overhead, points of failure and clean-up work). So for multi-host apps to work, we either need to make cross-host links exist first or provide some other mechanism for discovery (like the shared hosts file).

(Aside 1: As @shykes points out, there’s also software which relies on the presence of somewhat conventional networking to do inter-node discovery, for which links in their current incarnation won’t work - though perhaps it’s in the minority).

(Aside 2: @lukemarsden’s point about environment variables is valid: IP and port are configurable, which works in a broader range of scenarios than just an IP address. That too is an issue with Docker links as they exist right now, as people are already using the hostname in /etc/hosts and hard-coding the same port that the parent container is exposing. If we were serious about making the IP and port mutable, though, the only reasonable approach would be to remove the /etc/hosts entries entirely and ensure the port was unpredictable, so people stop depending on it. But I digress.)

Contributor

aanand commented Oct 21, 2014

Thanks for the feedback, all - fantastic stuff, exactly the discussion I was hoping for. Some loosely grouped answers/clarifications/reckons follow.

Fig and Stuff

To directly answer @lukemarsden’s question: groups are supposed to be a step towards two things: “Fig in Docker”, as you put it (i.e. a brilliant development experience) and a complete story regarding multi-host production workflows.

I agree that most Fig users probably think about Fig as something which spins up their whole, potentially-distributed app. Of course apps managed by Fig aren’t distributed today (as far as I know, only a tiny number of people are even using Fig in production), but I agree that Docker’s composition capabilities should be eventually extensible to be able to compose (i.e. create and spin up) a complete multi-container app on multiple hosts.

If I do say so myself, Fig is wonderfully high-level - almost Platonic - requiring no information beyond “what needs to run, and what needs to talk to what?” It would be fantastic if Fig (“in Docker” or not) in a multi-host scenario could get by with just as little information, with some mechanism in the remote API for providing extra instruction (the form of which might be specific to the clustering system actually in use).

So as well as gauging its usefulness in isolation from any automated composition tool, I’d like to figure out if a grouping feature - either as described here or radically different - would help at all in achieving that, or if it’d be useless, or even a hindrance.

Pods

Regarding the Kubernetes pods comparison that several have raised: yes, they’re similar. Pods were an influence on this proposal; it’s fairly easy to imagine porting Kubernetes’ concept of a “service” (a distributed set of pods) over to groups. (Usage example: I have a group with web + db; in development I spin up my own db container, but in production I maintain a multi-host db service; when deploying I use IP replacement in the hosts file to point the new web container at one node in the db service).

Links

Criticism of the hosts file / deprecation of links for intra-group communication is valid. Links have a few advantages:

  • They’re already in use.
  • The dependency graph is explicit, as pointed out, which is arguably a good thing for both the Docker engine and any hypothetical orchestration system to be aware of.
  • They make for a smaller set of logical connections within a group (a shared hosts file means, as pointed out, a fully-connected dependency graph)

Still, cross-host links don’t actually exist (ambassador containers are a way to fake them today, but they introduce performance overhead, points of failure and clean-up work). So for multi-host apps to work, we either need to make cross-host links exist first or provide some other mechanism for discovery (like the shared hosts file).

(Aside 1: As @shykes points out, there’s also software which relies on the presence of somewhat conventional networking to do inter-node discovery, for which links in their current incarnation won’t work - though perhaps it’s in the minority).

(Aside 2: @lukemarsden’s point about environment variables is valid: IP and port are configurable, which works in a broader range of scenarios than just an IP address. That too is an issue with Docker links as they exist right now, as people are already using the hostname in /etc/hosts and hard-coding the same port that the parent container is exposing. If we were serious about making the IP and port mutable, though, the only reasonable approach would be to remove the /etc/hosts entries entirely and ensure the port was unpredictable, so people stop depending on it. But I digress.)

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 22, 2014

Thanks @aanand.

So for multi-host apps to work, we either need to make cross-host links exist first

This sounds like the remit of the "orchestration system"? It definitely feels outside the scope of the Docker binary. It's good that one can achieve this result without needing an orchestration system (the ambassadors pattern). Might make a nice side project for github.com/docker though.

I don't think that a shared network is the ideal solution - it seems that, just because nobody outside of Docker has provided this functionality yet, there's no reason to implement a sub-optimal solution here.

there’s also software which relies on the presence of somewhat conventional networking to do inter-node discovery

You should always be able to tack on Weave or something for this.

On the subject of environment variables - 12factor has these as a core tenet (http://12factor.net/config), and 12factor is a philosophy I thought Docker subscribed to. It's a shame when you have to resort to the hosts file but it's becoming less necessary as time goes on. Perhaps this is the wrong issue for this discussion though!

robhaswell commented Oct 22, 2014

Thanks @aanand.

So for multi-host apps to work, we either need to make cross-host links exist first

This sounds like the remit of the "orchestration system"? It definitely feels outside the scope of the Docker binary. It's good that one can achieve this result without needing an orchestration system (the ambassadors pattern). Might make a nice side project for github.com/docker though.

I don't think that a shared network is the ideal solution - it seems that, just because nobody outside of Docker has provided this functionality yet, there's no reason to implement a sub-optimal solution here.

there’s also software which relies on the presence of somewhat conventional networking to do inter-node discovery

You should always be able to tack on Weave or something for this.

On the subject of environment variables - 12factor has these as a core tenet (http://12factor.net/config), and 12factor is a philosophy I thought Docker subscribed to. It's a shame when you have to resort to the hosts file but it's becoming less necessary as time goes on. Perhaps this is the wrong issue for this discussion though!

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Oct 22, 2014

Contributor

OK, I’ve slept on it and here are my conclusions:

  • Nothing about grouping should be expressly single-host, or make distributing it across multiple hosts difficult enough that people will abandon it as soon as they scale.
  • A shared hosts file is bad in a clustered scenario, for reasons outlined previously (essentially, it gives the clustering system less knowledge about what needs to talk to what, which restricts the decisions it can make). Links are better.
  • Software which needs networking capabilities beyond what links provide should be able to run, but doesn’t need to be treated as a first-class use case. It’s fine to say “use something like Weave”, as long as that’s not a total hack.

As such, I propose to advance the current grouping implementation by dropping the shared hosts file and adopting a Fig-like links configuration option.

While thinking about multi-host, I also realised that creating/updating a group atomically (i.e. with a single JSON POST) is great in a multi-host scenario, because it lets the cluster make decisions on how exactly to do that. Docker’s single-host implementation can work differently than a multi-host one in terms of the actual steps taken (e.g. a fancy clustering system might choose to start new containers, update a load balancer, then let the old ones die etc). So we should definitely stick with that.

Contributor

aanand commented Oct 22, 2014

OK, I’ve slept on it and here are my conclusions:

  • Nothing about grouping should be expressly single-host, or make distributing it across multiple hosts difficult enough that people will abandon it as soon as they scale.
  • A shared hosts file is bad in a clustered scenario, for reasons outlined previously (essentially, it gives the clustering system less knowledge about what needs to talk to what, which restricts the decisions it can make). Links are better.
  • Software which needs networking capabilities beyond what links provide should be able to run, but doesn’t need to be treated as a first-class use case. It’s fine to say “use something like Weave”, as long as that’s not a total hack.

As such, I propose to advance the current grouping implementation by dropping the shared hosts file and adopting a Fig-like links configuration option.

While thinking about multi-host, I also realised that creating/updating a group atomically (i.e. with a single JSON POST) is great in a multi-host scenario, because it lets the cluster make decisions on how exactly to do that. Docker’s single-host implementation can work differently than a multi-host one in terms of the actual steps taken (e.g. a fancy clustering system might choose to start new containers, update a load balancer, then let the old ones die etc). So we should definitely stick with that.

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Oct 22, 2014

Contributor

+1 😄

Contributor

lukemarsden commented Oct 22, 2014

+1 😄

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 22, 2014

Great. So I guess the networking and possibly general multi-host semantics part of this discussion is closed until you complete a new proposal?

Seperately.. what is the reasoning behind:

A group has a unique name which occupies the same namespace as container names.

Feels like a hack.

robhaswell commented Oct 22, 2014

Great. So I guess the networking and possibly general multi-host semantics part of this discussion is closed until you complete a new proposal?

Seperately.. what is the reasoning behind:

A group has a unique name which occupies the same namespace as container names.

Feels like a hack.

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Oct 22, 2014

Contributor

Having groups as "folders" within the container namespace seems natural enough to me.

Contributor

lukemarsden commented Oct 22, 2014

Having groups as "folders" within the container namespace seems natural enough to me.

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Oct 22, 2014

Contributor

@robhaswell Multi-host semantics is likely going to need a lot more thinking about; I don't want to give the impression that discussion of that is "closed" in any way. If you have thoughts or concerns, please voice them!

Groups and containers sharing the same namespace is the opposite of a hack in my opinion: the way Fig separates things right now (by prefixing everything with a name) is much more hacky and less performant (it really breaks down when a docker server is being used for lots of things at once).

(This makes me realise I omitted that it'll be possible to filter docker ps and GET /containers/json to a single group.)

Contributor

aanand commented Oct 22, 2014

@robhaswell Multi-host semantics is likely going to need a lot more thinking about; I don't want to give the impression that discussion of that is "closed" in any way. If you have thoughts or concerns, please voice them!

Groups and containers sharing the same namespace is the opposite of a hack in my opinion: the way Fig separates things right now (by prefixing everything with a name) is much more hacky and less performant (it really breaks down when a docker server is being used for lots of things at once).

(This makes me realise I omitted that it'll be possible to filter docker ps and GET /containers/json to a single group.)

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Oct 22, 2014

Contributor

@shykes:

what would be the relationship between groups and volumes?

The current implementation does nothing special with volumes where the host path is explicitly specified. For unspecified host paths, we mount them at a predictable path:

/var/lib/docker/groups/<group-name>/<hash-of-container-path>

This is nice because it means that when we destroy and recreate containers, we don't need to bother with VolumesFrom and all the intermediate container shenanigans, which caused a lot of bugs in Fig.

would this deprecate container-to-container links, and the naming/hierarchy system that comes with it? If so, how would we deal with reverse compatibility?

I've been convinced that (semi-)deprecating links is a bad idea; this leaves the question of the naming/hierarchy system.

It's true that foo/bar can now mean either "a container inside group foo, referred to in the group JSON as bar" or "a container linked to from a container named foo under the alias bar". I think it's fine for these to coexist, though - there's still no way for them to actually clash, as you can't have both a container named foo and a group named foo. So in short: there's no need to deprecate anything.

Contributor

aanand commented Oct 22, 2014

@shykes:

what would be the relationship between groups and volumes?

The current implementation does nothing special with volumes where the host path is explicitly specified. For unspecified host paths, we mount them at a predictable path:

/var/lib/docker/groups/<group-name>/<hash-of-container-path>

This is nice because it means that when we destroy and recreate containers, we don't need to bother with VolumesFrom and all the intermediate container shenanigans, which caused a lot of bugs in Fig.

would this deprecate container-to-container links, and the naming/hierarchy system that comes with it? If so, how would we deal with reverse compatibility?

I've been convinced that (semi-)deprecating links is a bad idea; this leaves the question of the naming/hierarchy system.

It's true that foo/bar can now mean either "a container inside group foo, referred to in the group JSON as bar" or "a container linked to from a container named foo under the alias bar". I think it's fine for these to coexist, though - there's still no way for them to actually clash, as you can't have both a container named foo and a group named foo. So in short: there's no need to deprecate anything.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Oct 22, 2014

Contributor

I think:

Nothing about grouping should be expressly single-host

is incompatible with:

Containers within a group share a network namespace

It feels like this proposal is trying to blend two notion of grouping:

  • one used for collocation where volume and network namespace are shared and discovery is implicit.
  • one used for orchestration where network and volumes are isolated, and discovery is explicit through link and/or service definition.

I think it would be useful to highlight this more explicitly in the proposal description, and either explain how the implementation will allow to organize your groups seamlessly between these two notions or recognize this should be exposed as two different primitives.

Contributor

proppy commented Oct 22, 2014

I think:

Nothing about grouping should be expressly single-host

is incompatible with:

Containers within a group share a network namespace

It feels like this proposal is trying to blend two notion of grouping:

  • one used for collocation where volume and network namespace are shared and discovery is implicit.
  • one used for orchestration where network and volumes are isolated, and discovery is explicit through link and/or service definition.

I think it would be useful to highlight this more explicitly in the proposal description, and either explain how the implementation will allow to organize your groups seamlessly between these two notions or recognize this should be exposed as two different primitives.

@robhaswell

This comment has been minimized.

Show comment
Hide comment
@robhaswell

robhaswell Oct 22, 2014

It feels wrong having them share the same namespace, because you cannot perform the same operations on them. Also as the container namespace is externally controlled (by your registry; usually Docker) it opens up the possibility for unexpected namespace collisions - say there is a new public contributor with the same name as your group. Now you have to rename your group if you want to use their images. I like the proposal, just not shared namespaces.

robhaswell commented Oct 22, 2014

It feels wrong having them share the same namespace, because you cannot perform the same operations on them. Also as the container namespace is externally controlled (by your registry; usually Docker) it opens up the possibility for unexpected namespace collisions - say there is a new public contributor with the same name as your group. Now you have to rename your group if you want to use their images. I like the proposal, just not shared namespaces.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 22, 2014

Contributor

I think everyone is saying "namespace" and meaning something different.

On Wed, Oct 22, 2014 at 3:18 PM, Rob Haswell notifications@github.com
wrote:

It feels wrong having them share the same namespace, because you cannot
perform the same operations on them. Also as the container namespace is
externally controlled (by your registry; usually Docker) it opens up the
possibility for unexpected namespace collisions - say there is a new public
contributor with the same name as your group. Now you have to rename your
group if you want to use their images. I like the proposal, just not shared
namespaces.

Reply to this email directly or view it on GitHub
#8637 (comment).

Contributor

thockin commented Oct 22, 2014

I think everyone is saying "namespace" and meaning something different.

On Wed, Oct 22, 2014 at 3:18 PM, Rob Haswell notifications@github.com
wrote:

It feels wrong having them share the same namespace, because you cannot
perform the same operations on them. Also as the container namespace is
externally controlled (by your registry; usually Docker) it opens up the
possibility for unexpected namespace collisions - say there is a new public
contributor with the same name as your group. Now you have to rename your
group if you want to use their images. I like the proposal, just not shared
namespaces.

Reply to this email directly or view it on GitHub
#8637 (comment).

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Oct 23, 2014

Contributor

I feel going back to links configuration in a way simplifies this proposal in that less new concepts are introduced. I'd like to take it a bit further and reiterate that maybe we should not even add a Groups API. We need to tread lightly here. What a group is and how it will be used has been a large portion of this discussion. There's a very good reason for that. As we move towards orchestration and cross server composition of applications a group and the definition of that becomes a powerful construct. Groups form the bases of things like auto scaling, anti-affinity scheduling, and many other things.

The basics of how Docker orchestration across many servers will work going down the line is really yet to be determine. It's evident in this discussion that while Kubernetes may be the predominate Docker orchestration platform at the moment, the approach of wrapping Docker may not be the way we want to go ultimately. Because of the apparent uncertainty of orchestration in Docker, I would defer introducing any orchestration related features until we have a clearer direction.

If Docker was to allow arbitrary metadata to be attached to a container (which I believe has been asked for many times), fig like functionality could be implemented completely in the Docker client and not require much change to the Docker Remote API.

Contributor

ibuildthecloud commented Oct 23, 2014

I feel going back to links configuration in a way simplifies this proposal in that less new concepts are introduced. I'd like to take it a bit further and reiterate that maybe we should not even add a Groups API. We need to tread lightly here. What a group is and how it will be used has been a large portion of this discussion. There's a very good reason for that. As we move towards orchestration and cross server composition of applications a group and the definition of that becomes a powerful construct. Groups form the bases of things like auto scaling, anti-affinity scheduling, and many other things.

The basics of how Docker orchestration across many servers will work going down the line is really yet to be determine. It's evident in this discussion that while Kubernetes may be the predominate Docker orchestration platform at the moment, the approach of wrapping Docker may not be the way we want to go ultimately. Because of the apparent uncertainty of orchestration in Docker, I would defer introducing any orchestration related features until we have a clearer direction.

If Docker was to allow arbitrary metadata to be attached to a container (which I believe has been asked for many times), fig like functionality could be implemented completely in the Docker client and not require much change to the Docker Remote API.

@tupy

This comment has been minimized.

Show comment
Hide comment
@tupy

tupy commented Oct 27, 2014

👍

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Oct 27, 2014

I find this proposal very interesting and have a few points of feedback. At Discourse we are using runit for which I know you are actively trying to provide a workable solution for. So here are some ideas in no particular order:

sockets

If a group is on a single machine it is key to have access to sockets, a prime example is hosting of a Rails app. The application itself runs in a process called unicorn and NGINX fronts it, for best performance/security its ideal to use sockets for this communication.

Thinking of a more general solution here groups should be able to define a shared portions of the filesystem either volume like on the host or internal between members of the groups.

Examples:

members A, B may elect to share /shared/bla ... which is only visible between the 2
member B may elect to grant read/write to member A on /shared/bla
members A,B may elect to have a volume on the host

yaml

In the year I have worked on Discourse Docker I have grown to detest yaml, even though it is great in theory end-users just keep mucking it up and it becomes impossible to debug.

I know this is "optional" but my vote would be to ship with nothing yaml in core docker (including CLI) and just shipping with:

  • A feature limited Dockerfile like format
  • A feature full json like format

If people want yaml they can always write the extra tooling to generate the feature full json.

dependencies

Start/stop process often has various dependencies, for example you must first start postgres and only after it started start the web, same goes for group shut down, this logic needs to be written

logging

we would need something like docker group logs to get the list of logs, how would would logs for a group integrate to other tooling etc needs to be thought out, what about stuff like log rotation and so on.

upgrades

Groups open up the window to allowing for seamless upgrades without needing to front stuff with haproxy.

For example say I have a web and data container and elect to upgrade the web container it would be super nice if it could start the new web unamed, ensure it is all started up and then switch it in to where the old web was.

distribution

Is the registry going to be group aware? Can we use it to distribute groups?


PS. I really wish we could have this kind of discussion on https://forums.docker.com/ it is far better suited at dealing with big discussions like this :)

SamSaffron commented Oct 27, 2014

I find this proposal very interesting and have a few points of feedback. At Discourse we are using runit for which I know you are actively trying to provide a workable solution for. So here are some ideas in no particular order:

sockets

If a group is on a single machine it is key to have access to sockets, a prime example is hosting of a Rails app. The application itself runs in a process called unicorn and NGINX fronts it, for best performance/security its ideal to use sockets for this communication.

Thinking of a more general solution here groups should be able to define a shared portions of the filesystem either volume like on the host or internal between members of the groups.

Examples:

members A, B may elect to share /shared/bla ... which is only visible between the 2
member B may elect to grant read/write to member A on /shared/bla
members A,B may elect to have a volume on the host

yaml

In the year I have worked on Discourse Docker I have grown to detest yaml, even though it is great in theory end-users just keep mucking it up and it becomes impossible to debug.

I know this is "optional" but my vote would be to ship with nothing yaml in core docker (including CLI) and just shipping with:

  • A feature limited Dockerfile like format
  • A feature full json like format

If people want yaml they can always write the extra tooling to generate the feature full json.

dependencies

Start/stop process often has various dependencies, for example you must first start postgres and only after it started start the web, same goes for group shut down, this logic needs to be written

logging

we would need something like docker group logs to get the list of logs, how would would logs for a group integrate to other tooling etc needs to be thought out, what about stuff like log rotation and so on.

upgrades

Groups open up the window to allowing for seamless upgrades without needing to front stuff with haproxy.

For example say I have a web and data container and elect to upgrade the web container it would be super nice if it could start the new web unamed, ensure it is all started up and then switch it in to where the old web was.

distribution

Is the registry going to be group aware? Can we use it to distribute groups?


PS. I really wish we could have this kind of discussion on https://forums.docker.com/ it is far better suited at dealing with big discussions like this :)

@ewindisch

This comment has been minimized.

Show comment
Hide comment
@ewindisch

ewindisch Nov 4, 2014

Contributor

As for YAML, it's really helpful that it's the standard for configuration input in cloud-init. That means users may supply Docker orchestration information as user-data to VMs launched on EC2, GCE, or OpenStack clouds. The user-data document could indicate that a VM should have (or install) Docker and should spawn and configure a group of containers as specified. That's powerful for anyone using Docker on a VM-based cloud infrastructure (which I imagine is a pretty large number of users, I hear EC2 is still popular...)

I wouldn't say to make the format YAML for just this singular reason, but it's certainly a very nice property in the short-term, for as long as cloud-init is a standard.

Contributor

ewindisch commented Nov 4, 2014

As for YAML, it's really helpful that it's the standard for configuration input in cloud-init. That means users may supply Docker orchestration information as user-data to VMs launched on EC2, GCE, or OpenStack clouds. The user-data document could indicate that a VM should have (or install) Docker and should spawn and configure a group of containers as specified. That's powerful for anyone using Docker on a VM-based cloud infrastructure (which I imagine is a pretty large number of users, I hear EC2 is still popular...)

I wouldn't say to make the format YAML for just this singular reason, but it's certainly a very nice property in the short-term, for as long as cloud-init is a standard.

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Nov 4, 2014

Contributor

I don't see it referenced here so I thought I'd drop a pointer:

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md

In Kubernetes we've separated out some things related to this proposal:

  • Object Namespaces -- the idea that different "objects" in the container/cluster management world can belong to different users/applications and operate independently. The doc above talks about that. Note that this is different from Linux kernel namespaces. We struggled with finding a better name here (I'm not sure that group is any better).
  • Pod -- A set of co-managed containers that share resources on a single machines. Proposal for Docker here: #8781
  • Application Config -- Being able to capture all of the configuration that makes up an application is a higher level thing that we haven't nailed down in Kubernetes at this point. I see fig and systems like Panamax filling this need. These systems can get highly opinionated (think puppet, chef, etc.) and there will be variety. An application will typically consist of a set of "micro-services" or tiers in the form of sets of replicated pods. There are then connections between those. Describing, deploying and upgrading is the job of the app config system/layer.

Typically, the app config system would work within a namespace, but it doesn't define the namespace.

Contributor

jbeda commented Nov 4, 2014

I don't see it referenced here so I thought I'd drop a pointer:

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/namespaces.md

In Kubernetes we've separated out some things related to this proposal:

  • Object Namespaces -- the idea that different "objects" in the container/cluster management world can belong to different users/applications and operate independently. The doc above talks about that. Note that this is different from Linux kernel namespaces. We struggled with finding a better name here (I'm not sure that group is any better).
  • Pod -- A set of co-managed containers that share resources on a single machines. Proposal for Docker here: #8781
  • Application Config -- Being able to capture all of the configuration that makes up an application is a higher level thing that we haven't nailed down in Kubernetes at this point. I see fig and systems like Panamax filling this need. These systems can get highly opinionated (think puppet, chef, etc.) and there will be variety. An application will typically consist of a set of "micro-services" or tiers in the form of sets of replicated pods. There are then connections between those. Describing, deploying and upgrading is the job of the app config system/layer.

Typically, the app config system would work within a namespace, but it doesn't define the namespace.

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Nov 5, 2014

Contributor

Update: see here for the current state of composition (which builds on top of grouping): https://gist.github.com/aanand/9e7ac7185ffd64c1a91a

Contributor

aanand commented Nov 5, 2014

Update: see here for the current state of composition (which builds on top of grouping): https://gist.github.com/aanand/9e7ac7185ffd64c1a91a

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Nov 8, 2014

Member

I like this a lot, but I still have one big concern. It sounds like the client portion of this is going to be added to docker (core).

Right now if I want to run pre-release features of fig (or even features that maybe aren't considered general enough to be added to fig upstream, ex: docker/compose/issues/318), it's very easy for me to run a fork. I don't have to run a custom dockerd or rebuild the docker package at all, I can just run a custom client (fig).

If this support is added directly to the core docker client, I now have to rebuild everything to get client-only customizations.

I really wish/hope the client portions could be released as a separate package (docker-groups?) so that running a custom client is much less involved. An official docker release (debian/rpm packaging) could still bring together these repos and include all the clients, but it would make it easier for people to run their own fork of individual clients.

Member

dnephin commented Nov 8, 2014

I like this a lot, but I still have one big concern. It sounds like the client portion of this is going to be added to docker (core).

Right now if I want to run pre-release features of fig (or even features that maybe aren't considered general enough to be added to fig upstream, ex: docker/compose/issues/318), it's very easy for me to run a fork. I don't have to run a custom dockerd or rebuild the docker package at all, I can just run a custom client (fig).

If this support is added directly to the core docker client, I now have to rebuild everything to get client-only customizations.

I really wish/hope the client portions could be released as a separate package (docker-groups?) so that running a custom client is much less involved. An official docker release (debian/rpm packaging) could still bring together these repos and include all the clients, but it would make it easier for people to run their own fork of individual clients.

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Nov 11, 2014

Member

@aanand re: group.yml config format and name (from another issue)

My use-case for setting this explicitly has always been to keep it consistent between environments. Having to set an environment variable is not ideal for this case, so I really like being able to set it in the group.yml.

Could it be made optional, and still support an override from environment variable? I believe that should handle all use cases (consistent between environments, and changing for each environment).

Member

dnephin commented Nov 11, 2014

@aanand re: group.yml config format and name (from another issue)

My use-case for setting this explicitly has always been to keep it consistent between environments. Having to set an environment variable is not ideal for this case, so I really like being able to set it in the group.yml.

Could it be made optional, and still support an override from environment variable? I believe that should handle all use cases (consistent between environments, and changing for each environment).

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Nov 12, 2014

Contributor

I've been wondering about having a separate override file that could be unversioned.

group.yml:

name: foo
containers:
  ...

.override.yml:

name: bar

This could also be used to give users the default option of not specifying the name in a versioned file: if group.yml contains no name, docker up can automatically create an override file with a generated name in it.

However, this is more of a composition concern than a grouping concern. (Proposal for that should land later this week.)

Contributor

aanand commented Nov 12, 2014

I've been wondering about having a separate override file that could be unversioned.

group.yml:

name: foo
containers:
  ...

.override.yml:

name: bar

This could also be used to give users the default option of not specifying the name in a versioned file: if group.yml contains no name, docker up can automatically create an override file with a generated name in it.

However, this is more of a composition concern than a grouping concern. (Proposal for that should land later this week.)

@tobegit3hub

This comment has been minimized.

Show comment
Hide comment
@tobegit3hub

tobegit3hub Dec 2, 2014

-1 and it seems to make docker client more and more complicated. Fig or Kubernetes do this well and keep working on them.

tobegit3hub commented Dec 2, 2014

-1 and it seems to make docker client more and more complicated. Fig or Kubernetes do this well and keep working on them.

@aanand

This comment has been minimized.

Show comment
Hide comment
@aanand

aanand Feb 6, 2015

Contributor

Oops, forgot to close this as it's been deprecated in favour of #9694

Contributor

aanand commented Feb 6, 2015

Oops, forgot to close this as it's been deprecated in favour of #9694

@aanand aanand closed this Feb 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment