Proposal: Network Drivers #9983

Closed
erikh opened this Issue Jan 8, 2015 · 71 comments

Comments

Projects
None yet
@erikh
Contributor

erikh commented Jan 8, 2015

Proposal: Network Driver

THIS PROPOSAL IS A WORK IN PROGRESS

This proposal brings new functionality and new interfaces to Docker's
networking implementation. Some things are still in flux, but our goal is to
get this approved as there is a sizeable effort internally to make this happen.
We need reviewers and comments on this proposal from the community.

You can see some of the work here
but this is a work in progress, and its implementation (and the maturity of it)
should not reflect the state of this document.

Concepts

Many concepts will be used throughout this and several proposals that are in
process. You may see these repeated with slightly different verbiage or may
contain at-length descriptions of how these interfaces will be implemented.

Here's a basic diagram about how the networking itself operates:

network extensions diagram

The following includes a description of the interface and its name. The
sub-bullets provide practical examples of these components with the existing
implementation of Docker as an example.

  • Driver: executable code which is triggered when certain operations are
    requested.
  • Extension: A collection of Drivers that supply different portions of
    functionality to Docker.
  • State: A key/value store of parameterized data for consumption and mutation by a Driver.
    • libpack is our initial state
      implementation, but we are also evaluating
      ecc and others.
  • Sandbox: An isolated environment
    • libcontainer's functionality -- for now, this is more or less a standard docker container.
  • Endpoint: An addressable endpoint used for communication over a specific network. Endpoints join exactly one network and are expected to create a method of network communication for a container. Endpoints are garbage collected when they no longer belong to any Sandboxes.
    • veth pair
  • Network: A collection of endpoints that are able to communicate to each other. These networks are intended to be isolated from each other and do not cross communicate. Networks house endpoints which can communicate with each other.
    • Our ethernet bridge, iptables rules we use.

Container Network Model (or CNM)

The container network model is a few axioms about how docker wishes to supply
interoperation between networks and containers.

  1. All containers on a specific network can communicate with each other freely.
  2. Multiple networks are the way to segment traffic between containers and
    should be supported by all drivers.
  3. Multiple endpoints per container are the way to join a container to
    multiple networks.
  4. An endpoint is added to a sandbox to provide it with network connectivity.

This has a few consequences:

  • Network-based service discovery will replace on-disk and ENV discovery. This
    allows discovery to be scoped and more flexible for external implementations.
    Implementation of this is still TBD.
  • Links will be deprecated or only exist on the default network.

Notion of a Default Network

Since Docker is a tool heavily used by both operations and development
personnel to different goals respective to their skill sets, it is critical to
have a functioning "out of the box" implementation for people to use with ease.
Docker will create a "default" network (named default) for the use of basic
networking.

Networks and Endpoints

Endpoints are a part of a Network. The network is (at least in the simplebridge implementation) isolated, but drivers may implement the notion of a network however they choose.

Docker's new system will allow for N networks, but a 'default' network will be
created as a convenience for users, and with the default driver it will
function similarly to the existing network solution now.

Multiple endpoints can be created for a single container, and bound to them at
the same time. The endpoints may live on different networks and may all belong
to one container.

Again, Endpoints as a part of different networks should not be able to communicate with each other in the default implementation. It's expected that network operators would program any bridging between two networks.

Workflow for a Network Driver

At boot, a network driver will be given a replay of its state; this will allow
the driver to return to being in sync with the state of the system, to create
new networks, etc. How replays are handled by drivers is intentionally
undefined.

A network can be requested to be created. In the workflow below, the network
assumes to be created already.

A network driver will be asked at docker run (in order, for a given network):

  1. To create an endpoint within the network
  2. To join an endpoint to a sandbox

The driver will also provide port mapping/expose functionality (see below for
API) and communicate with service discovery (TBD).

Network API

Driver abstract interface:

type Driver interface {
  // Restore from state
  Restore(netstate state.State) error

  // Create a new network with the network id and supplied parameters
  AddNetwork(netid string, params []string) error

  // Remove a network using the network ID.
  RemoveNetwork(netid string) error

  // Retrieve a network by ID.
  GetNetwork(id string) (Network, error)

  // List returns the IDs of available networks
  ListNetworks() []string
}

Network abstract interface:

(note that Link and Unlink here are merely for convenience and do not require
an alternative implementation)

// A network is a perimeter of IP connectivity between network services.
type Network interface {
  // Id returns the network's globally unique identifier
  Id() string

  // List of endpoints that belong to this network.
  Endpoints() []Endpoint

  // Link makes the specified sandbox reachable as a named endpoint on the network.
  // If the endpoint already exists, the call will either fail (replace=false), or
  // unlink the previous endpoint.
  //
  // For example mynet.Link(mysandbox, "db", true) will make mysandbox available as
  // "db" on mynet, and will replace the other previous endpoint, if any.
  //
  // The same sandbox can be linked to multiple networks.
  // The same sandbox can be linked to the same network as multiple endpoints.
  Link(s sandbox.Sandbox, name string, replace bool) (Endpoint, error)

  // Unlink removes the specified endpoint, unlinking the corresponding sandbox from the
  // network.
  Unlink(name string) error
}

Endpoint abstract interface:

// An endpoint represents a particular member of a network, registered under a certain name
// and reachable over IP by other endpoints on the same network.
type Endpoint interface {
  // The name of the endpoint.
  Name() string

  // Expose a port over the network. Publish it as the port to the host if
  // requested.
  Expose(portspec string, publish bool) error

  // The network this endpoint belongs to.
  Network() Network
}

docker net tool

docker net is the vehicle we're pushing to manipulate networks. The
basic commands are described below:

  • create: create a network
  • destroy: destroy a network
  • join: join a container to a network
  • leave: remove a container from a network

There will be a forthcoming UI extension to docker run which also selects a
network at run time. This interface is currently to be determined.

Our initial implementation

Our implementation is very similar to docker's existing implementation, which
we are dubbing simplebridge. This driver creates a bridge for each network,
and a veth pair for each endpoint. Networks may contain a set of vxlan peers
which will be attached to the bridge to ensure network connectivity to get
multi-host links.

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Jan 8, 2015

Contributor

This looks great, thanks for working on this!

Maybe this is the wrong proposal to comment on (if so, please tell me where to go), but I'm curious specifically about the driver/extension mechanism. Is there any working code that shows simplebridge as an extension? And are these extensions planned to be late-bound, out-of-process and delivered as a container?

I'm reminded of this tweet from @shykes: https://twitter.com/solomonstre/status/542027779198701568 😄

Contributor

lukemarsden commented Jan 8, 2015

This looks great, thanks for working on this!

Maybe this is the wrong proposal to comment on (if so, please tell me where to go), but I'm curious specifically about the driver/extension mechanism. Is there any working code that shows simplebridge as an extension? And are these extensions planned to be late-bound, out-of-process and delivered as a container?

I'm reminded of this tweet from @shykes: https://twitter.com/solomonstre/status/542027779198701568 😄

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 9, 2015

Contributor

There is! We are retooling to create a separate binary since we have reached our initial PoC. The PoC is here: https://github.com/shykes/docker/tree/extensions/extensions/simplebridge -- please keep in mind this is seriously alpha code and some corners were cut to accomodate our PoC.

To answer the extensions question; yes, it is being worked on as a separate project. For now, we're clarifying the interfaces terminology-wise only specifically for this reason -- the other proposals are not ready yet. I hope this helps, even if it is probably not the answer you were looking for. :)

Contributor

erikh commented Jan 9, 2015

There is! We are retooling to create a separate binary since we have reached our initial PoC. The PoC is here: https://github.com/shykes/docker/tree/extensions/extensions/simplebridge -- please keep in mind this is seriously alpha code and some corners were cut to accomodate our PoC.

To answer the extensions question; yes, it is being worked on as a separate project. For now, we're clarifying the interfaces terminology-wise only specifically for this reason -- the other proposals are not ready yet. I hope this helps, even if it is probably not the answer you were looking for. :)

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Jan 9, 2015

Contributor

@erikh Awesome! Thanks for sharing this. This looks like really good progress towards the extensions architecture we talked about at DockerCon Europe. I look forward to seeing the extensions proposals when they're ready. I'll leave the rest of this thread for the networking folks to feed back on.

Contributor

lukemarsden commented Jan 9, 2015

@erikh Awesome! Thanks for sharing this. This looks like really good progress towards the extensions architecture we talked about at DockerCon Europe. I look forward to seeing the extensions proposals when they're ready. I'll leave the rest of this thread for the networking folks to feed back on.

@dave-tucker

This comment has been minimized.

Show comment
Hide comment
@dave-tucker

dave-tucker Jan 9, 2015

Contributor

@erikh based on @phemmer's comment in #8951 it might be worth highlighting somewhere in the text that this proposal is designed to enable multi-host networking solutions.

Contributor

dave-tucker commented Jan 9, 2015

@erikh based on @phemmer's comment in #8951 it might be worth highlighting somewhere in the text that this proposal is designed to enable multi-host networking solutions.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 9, 2015

Contributor

Will do, probably later today. Will update with a post then.

On Fri, 2015-01-09 at 06:00 -0800, Dave Tucker wrote:

@erikh based on @phemmer's comment in #8951 it might be worth
highlighting somewhere in the text that this proposal is designed to
enable multi-host networking solutions.


Reply to this email directly or view it on GitHub.

Contributor

erikh commented Jan 9, 2015

Will do, probably later today. Will update with a post then.

On Fri, 2015-01-09 at 06:00 -0800, Dave Tucker wrote:

@erikh based on @phemmer's comment in #8951 it might be worth
highlighting somewhere in the text that this proposal is designed to
enable multi-host networking solutions.


Reply to this email directly or view it on GitHub.

@phemmer

This comment has been minimized.

Show comment
Hide comment
@phemmer

phemmer Jan 9, 2015

Contributor

I understood that this model would allow multi-host networking. But what wasn't clear was whether docker would still be implementing multi-host networking, or if it was simply designing this driver scheme so that someone else could implement it.

Contributor

phemmer commented Jan 9, 2015

I understood that this model would allow multi-host networking. But what wasn't clear was whether docker would still be implementing multi-host networking, or if it was simply designing this driver scheme so that someone else could implement it.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 9, 2015

Contributor

@erikh Any links to the "simplebridge" implementation (to get an idea about how it all fits together) ?

Contributor

mrunalp commented Jan 9, 2015

@erikh Any links to the "simplebridge" implementation (to get an idea about how it all fits together) ?

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 9, 2015

Contributor

My comments as I read it through:

"network's globally unique identifier" - how global? a machine? a cluster? UUID?

Is Driver.Link(foo, ...) just an alias to Driver.GetNetwork(foo).Link(...)? why are the args in a different order between them?

Is Endpoint Name() the same as the netif name (e.g. ifconfig) ?

The existence of Expose() as an API suggests that ports are not exposed by default. Is that a requirement or a suggestion?

What does it mean to create/destroy networks in a dumb L2 model where each container gets an SRIOV interface (or the moral equivalent - macvlan, ipvlan, etc)?

Is it expected that 'docker net create' reaches out into the network fabric (off-host) and configures things? How do other nodes learn about the existence of new networks? Or is the concept of a Network purely single-node-centric?

Where do I specify things like "use this specific IP address" or "use this MAC address" or "set MTU to X" or "use IPv6"?

What is a sandbox? Is it a net namespace? How do I allow multiple containers to join the same net namespace (as we do with --net=container:id)?

How are drivers going to be implemented? The simplest answer for just trying things out would be something like an "exec driver" which looks in a directory, e.g. /var/lib/docker/net-drivers/exec. If it finds any executable file, it creates a network driver with the same name. Whenever a driver API has to be called, simply exec that file with some args indicating the API call and params. That way, we can
quickly prototype and iterate on ideas.

Contributor

thockin commented Jan 9, 2015

My comments as I read it through:

"network's globally unique identifier" - how global? a machine? a cluster? UUID?

Is Driver.Link(foo, ...) just an alias to Driver.GetNetwork(foo).Link(...)? why are the args in a different order between them?

Is Endpoint Name() the same as the netif name (e.g. ifconfig) ?

The existence of Expose() as an API suggests that ports are not exposed by default. Is that a requirement or a suggestion?

What does it mean to create/destroy networks in a dumb L2 model where each container gets an SRIOV interface (or the moral equivalent - macvlan, ipvlan, etc)?

Is it expected that 'docker net create' reaches out into the network fabric (off-host) and configures things? How do other nodes learn about the existence of new networks? Or is the concept of a Network purely single-node-centric?

Where do I specify things like "use this specific IP address" or "use this MAC address" or "set MTU to X" or "use IPv6"?

What is a sandbox? Is it a net namespace? How do I allow multiple containers to join the same net namespace (as we do with --net=container:id)?

How are drivers going to be implemented? The simplest answer for just trying things out would be something like an "exec driver" which looks in a directory, e.g. /var/lib/docker/net-drivers/exec. If it finds any executable file, it creates a network driver with the same name. Whenever a driver API has to be called, simply exec that file with some args indicating the API call and params. That way, we can
quickly prototype and iterate on ideas.

@MalteJ

This comment has been minimized.

Show comment
Hide comment
@MalteJ

MalteJ Jan 9, 2015

Contributor

@thockin has written:

Where do I specify things like "use this specific IP address" or "use this MAC address" or "set MTU to X" or "use IPv6"?

I would like to realize that via a docker run network-config string that is passed to the driver. e.g. docker run -d --network-config="role=frontend,ipv6=true" nginx.
The Driver API specification should not define the format of this string. It should be up to the driver to specify the format. This way you can realize various configuration strategies. e.g. you could pass a UUID of some central configuration, a JSON string, an http URL, whatever you can imagine.
I would drop all other network related flags.

Yet I have not understood what docker net is doing. Could you clarify that?
Looks like we still have networking in the main Docker binary?
I would prefer docker-network as the default Docker driver binary and for configuration. A separate binary. Not included and no UI in the docker binary.

Update
Another option would be, the driver exposes some parameters, that will be added to docker run with a --network- prefix. E.g. the driver tells Docker it accepts the parameters

{
    "role" : "string",
    "ipv6" : "boolean"
}

then docker run accepts the flags --network-role and --network-ipv6 and passes their values to the Driver.

Contributor

MalteJ commented Jan 9, 2015

@thockin has written:

Where do I specify things like "use this specific IP address" or "use this MAC address" or "set MTU to X" or "use IPv6"?

I would like to realize that via a docker run network-config string that is passed to the driver. e.g. docker run -d --network-config="role=frontend,ipv6=true" nginx.
The Driver API specification should not define the format of this string. It should be up to the driver to specify the format. This way you can realize various configuration strategies. e.g. you could pass a UUID of some central configuration, a JSON string, an http URL, whatever you can imagine.
I would drop all other network related flags.

Yet I have not understood what docker net is doing. Could you clarify that?
Looks like we still have networking in the main Docker binary?
I would prefer docker-network as the default Docker driver binary and for configuration. A separate binary. Not included and no UI in the docker binary.

Update
Another option would be, the driver exposes some parameters, that will be added to docker run with a --network- prefix. E.g. the driver tells Docker it accepts the parameters

{
    "role" : "string",
    "ipv6" : "boolean"
}

then docker run accepts the flags --network-role and --network-ipv6 and passes their values to the Driver.

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Jan 9, 2015

Maybe I'm missing something, but re:

'Docker's new system will allow for N networks, but a 'default' network will be
created as a convenience for users...'

I don't see how consensus is maintained across a cluster.

Maybe I'm missing something, but re:

'Docker's new system will allow for N networks, but a 'default' network will be
created as a convenience for users...'

I don't see how consensus is maintained across a cluster.

@MalteJ

This comment has been minimized.

Show comment
Hide comment
@MalteJ

MalteJ Jan 9, 2015

Contributor

@timothysc
I don't think the default network will be for clusters. You will have to configure it.

Contributor

MalteJ commented Jan 9, 2015

@timothysc
I don't think the default network will be for clusters. You will have to configure it.

@shettyg

This comment has been minimized.

Show comment
Hide comment
@shettyg

shettyg Jan 9, 2015

My takeway from this proposal is that it is intended for a single user that owns multiple hosts/VMs. The VMs should have IP reachability with each other. The user deploys docker containers across multiple hosts to get the benefits of redundancy. The containers that run on these hosts should completely trust each other. Docker network daemon peer with each other to create overlay tunnels.

For a multi-tenant environment (i.e people that don't trust each other, but use the same infrastructure/cloud to deploy docker), then a very simplistic workflow like this would be needed.

  • docker net network create NETWORK-X --username=developer1 --password=****
    Reaches out to a infrastructure server which verifies credentials and creates a concept of network that belongs to only that user.
  • docker net endpoint create NETWORK-X ENDPOINT-A --username=.. --password=..
    Reaches out to an infrastructure server which verifies credentials and creates an endpoint record called ENDPOINT-A that belongs to NETWORK-X. The infrastructure server may decide on an IP address using IPAM and a MAC address for ENDPOINT-A
  • docker net endpoint add-acl ENDPOINT-A ACL_PROFILE --username=.. --password=..
    Associates a ACL profile (which can specify things like 'allow only http traffic to this endpoint) with ENDPOINT-A.
  • docker run --username=.. --password=.. --interface=ENDPOINT-A,ENDPOINT-B -d ubuntu COMMAND
    The plugin receives ENDPOINT-A, ENDPOINT-B strings, contacts the infrastructure server, which in-turn creates overlay/underlay networking and provides back the IP address and MAC address to use for the end-points and decides which host to place the container.

If the proposal in question allows extension of 'docker net' tool for different drivers/plugins,
then multi-tenant networking can be achieved.

shettyg commented Jan 9, 2015

My takeway from this proposal is that it is intended for a single user that owns multiple hosts/VMs. The VMs should have IP reachability with each other. The user deploys docker containers across multiple hosts to get the benefits of redundancy. The containers that run on these hosts should completely trust each other. Docker network daemon peer with each other to create overlay tunnels.

For a multi-tenant environment (i.e people that don't trust each other, but use the same infrastructure/cloud to deploy docker), then a very simplistic workflow like this would be needed.

  • docker net network create NETWORK-X --username=developer1 --password=****
    Reaches out to a infrastructure server which verifies credentials and creates a concept of network that belongs to only that user.
  • docker net endpoint create NETWORK-X ENDPOINT-A --username=.. --password=..
    Reaches out to an infrastructure server which verifies credentials and creates an endpoint record called ENDPOINT-A that belongs to NETWORK-X. The infrastructure server may decide on an IP address using IPAM and a MAC address for ENDPOINT-A
  • docker net endpoint add-acl ENDPOINT-A ACL_PROFILE --username=.. --password=..
    Associates a ACL profile (which can specify things like 'allow only http traffic to this endpoint) with ENDPOINT-A.
  • docker run --username=.. --password=.. --interface=ENDPOINT-A,ENDPOINT-B -d ubuntu COMMAND
    The plugin receives ENDPOINT-A, ENDPOINT-B strings, contacts the infrastructure server, which in-turn creates overlay/underlay networking and provides back the IP address and MAC address to use for the end-points and decides which host to place the container.

If the proposal in question allows extension of 'docker net' tool for different drivers/plugins,
then multi-tenant networking can be achieved.

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Jan 9, 2015

overall – a very good proposal!

Quick thoughts and comments:

Concepts:

  • State: is the interface/api going to be generic enough to adapt this to work with etcd (if I were to use kubernetes, for example)?
  • How does an interface (linux netdev interface) relate to the endpoint – it seems like 1:1 mapping between the two.
  • Should ip-allocater also be described?

CNM:

  • It is perhaps worth clarifying the nature of traffic separation achieve by ‘network’. A Segment, often interpreted as a bridged network or subset of a bridged-network i.e. all containers belonging to a bridged network will be allocated IP addresses from one subnet. This is different from routing separation (vrfs) that allows IP addresses to overlap among two disjoint routed networks. Later in the text, ‘freely’ is used to describe intra-network communication. I am trying to understand the nature of restriction between two containers in different networks – 1) they can’t talk to each other at all 2) they can talk but via an external network, or 3) only when exposed by the driver

Workflow for a network driver:

  • Replays by the driver: this is desired behavior. However, I am trying to understand the primary motivation in providing this API – 1) handle the host bootup scenario 2) driver/docker restarting 3) driver is invoked after network/endpoints are created, or 4) all of the above
  • During replay, it is not clear what is the state of driver during bootup, is it the state that driver had saved earlier (prior to reboot), or is it the ‘desired state’ of what network and endpoints within a network. Is the state ‘local’ or belong to entire cluster.

Network abstract interface:

  • Network has Id() and List() methods, either it represents a network or a set of networks. On other hand it may make sense to have a method called Endpoints() in Network interface.
  • Do we also need APIs to add/delete routes in the network
  • Do we also need an API to allow driver to manipulate networking configuration (netdev parameters, like what is exposed by iptool) in the container’s (sandbox’s) namespace?

docker net tool:

  • Join/leave operations: wouldn’t it be too late for a container to join the network after it has been created. The network should be primed before application comes up. i.e. IP is assigned, glued to the underlying network, service exposed to external networks. This all be done before container is brought up. It is like volume that a container maps to should be ready before container starts.

overall – a very good proposal!

Quick thoughts and comments:

Concepts:

  • State: is the interface/api going to be generic enough to adapt this to work with etcd (if I were to use kubernetes, for example)?
  • How does an interface (linux netdev interface) relate to the endpoint – it seems like 1:1 mapping between the two.
  • Should ip-allocater also be described?

CNM:

  • It is perhaps worth clarifying the nature of traffic separation achieve by ‘network’. A Segment, often interpreted as a bridged network or subset of a bridged-network i.e. all containers belonging to a bridged network will be allocated IP addresses from one subnet. This is different from routing separation (vrfs) that allows IP addresses to overlap among two disjoint routed networks. Later in the text, ‘freely’ is used to describe intra-network communication. I am trying to understand the nature of restriction between two containers in different networks – 1) they can’t talk to each other at all 2) they can talk but via an external network, or 3) only when exposed by the driver

Workflow for a network driver:

  • Replays by the driver: this is desired behavior. However, I am trying to understand the primary motivation in providing this API – 1) handle the host bootup scenario 2) driver/docker restarting 3) driver is invoked after network/endpoints are created, or 4) all of the above
  • During replay, it is not clear what is the state of driver during bootup, is it the state that driver had saved earlier (prior to reboot), or is it the ‘desired state’ of what network and endpoints within a network. Is the state ‘local’ or belong to entire cluster.

Network abstract interface:

  • Network has Id() and List() methods, either it represents a network or a set of networks. On other hand it may make sense to have a method called Endpoints() in Network interface.
  • Do we also need APIs to add/delete routes in the network
  • Do we also need an API to allow driver to manipulate networking configuration (netdev parameters, like what is exposed by iptool) in the container’s (sandbox’s) namespace?

docker net tool:

  • Join/leave operations: wouldn’t it be too late for a container to join the network after it has been created. The network should be primed before application comes up. i.e. IP is assigned, glued to the underlying network, service exposed to external networks. This all be done before container is brought up. It is like volume that a container maps to should be ready before container starts.
@thewmf

This comment has been minimized.

Show comment
Hide comment
@thewmf

thewmf Jan 9, 2015

I am strongly in favor of these axioms and the rest looks pretty good but it will take some time to fully understand. The fact that each container will have multiple IP addresses will ripple through service discovery since you need to use a different IP depending on where you're coming from.

@shettyg There's already authentication between docker and dockerd so we should not need networking-specific auth. Multi-tenant details should mostly be left to each driver, but I think there are some common-sense solutions here; e.g. the network ID namespace is implicitly scoped per tenant and is not globally visible, but inter-tenant communication can be allowed if both sides opt in.

thewmf commented Jan 9, 2015

I am strongly in favor of these axioms and the rest looks pretty good but it will take some time to fully understand. The fact that each container will have multiple IP addresses will ripple through service discovery since you need to use a different IP depending on where you're coming from.

@shettyg There's already authentication between docker and dockerd so we should not need networking-specific auth. Multi-tenant details should mostly be left to each driver, but I think there are some common-sense solutions here; e.g. the network ID namespace is implicitly scoped per tenant and is not globally visible, but inter-tenant communication can be allowed if both sides opt in.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 9, 2015

Contributor

@mrunalp https://github.com/shykes/docker/tree/extensions/extensions/simplebridge is most of it. the rest of the branch is set up to support it.

Contributor

erikh commented Jan 9, 2015

@mrunalp https://github.com/shykes/docker/tree/extensions/extensions/simplebridge is most of it. the rest of the branch is set up to support it.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 9, 2015

Contributor

@thockin we're working on a way to deal with parameters, they're just not quite ready yet.

Contributor

erikh commented Jan 9, 2015

@thockin we're working on a way to deal with parameters, they're just not quite ready yet.

@shettyg

This comment has been minimized.

Show comment
Hide comment
@shettyg

shettyg Jan 9, 2015

@thewmf
You said:

There's already authentication between docker and dockerd so we should not need networking-specific >auth. Multi-tenant details should mostly be left to each driver, but I think there are some common-sense >solutions here; e.g. the network ID namespace is implicitly scoped per tenant and is not globally visible, >but inter-tenant communication can be allowed if both sides opt in.

May be I am missing something, so do correct me. Assume there is a cloud-provider 'X' that wants to let its multiple customers run docker containers on its infrastructure. So 'X' needs to create a network plugin which is specific to its infrastructure. Customer1(dev) has credentials with docker but not with the infrastructure provider's network controller. You will let Customer1 create a network-X (using 'docker net create') with the network controller and start his containers on that network.

Now Customer2 (QA) can come in and ask to create his containers on network-X too. You need to prevent that. What I am saying is that, the plugin should be able to get some additional CLI parameters than what is needed for the single customer case. So if 'docker net' tool will allow additional CLI commands to be passed to the plugin hiding beneath (So, 'X' will need a set of key-value pair, 'Y' may need a different set), it will help a customer specific workflow. For e.g., 'docker net' may not have a 'add-acl' cli command in its default driver. But a infrastructure provider may want to have that command for people using his infrastructure. So, his customers should be able to pass additional commands that only his plugin understands.

shettyg commented Jan 9, 2015

@thewmf
You said:

There's already authentication between docker and dockerd so we should not need networking-specific >auth. Multi-tenant details should mostly be left to each driver, but I think there are some common-sense >solutions here; e.g. the network ID namespace is implicitly scoped per tenant and is not globally visible, >but inter-tenant communication can be allowed if both sides opt in.

May be I am missing something, so do correct me. Assume there is a cloud-provider 'X' that wants to let its multiple customers run docker containers on its infrastructure. So 'X' needs to create a network plugin which is specific to its infrastructure. Customer1(dev) has credentials with docker but not with the infrastructure provider's network controller. You will let Customer1 create a network-X (using 'docker net create') with the network controller and start his containers on that network.

Now Customer2 (QA) can come in and ask to create his containers on network-X too. You need to prevent that. What I am saying is that, the plugin should be able to get some additional CLI parameters than what is needed for the single customer case. So if 'docker net' tool will allow additional CLI commands to be passed to the plugin hiding beneath (So, 'X' will need a set of key-value pair, 'Y' may need a different set), it will help a customer specific workflow. For e.g., 'docker net' may not have a 'add-acl' cli command in its default driver. But a infrastructure provider may want to have that command for people using his infrastructure. So, his customers should be able to pass additional commands that only his plugin understands.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 9, 2015

Contributor

Guys, just letting you know that I'm watching and preparing responses to some of the questions in this thread. I hope to have them ready tomorrow or Monday depending on availability. I'm also keeping a list of things requested so we can discuss and review them at a later point, especially to the questions I don't have answers to.

Contributor

erikh commented Jan 9, 2015

Guys, just letting you know that I'm watching and preparing responses to some of the questions in this thread. I hope to have them ready tomorrow or Monday depending on availability. I'm also keeping a list of things requested so we can discuss and review them at a later point, especially to the questions I don't have answers to.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jan 9, 2015

Contributor

It seems like List() []string should be on Driver and not Network. From the description this seem to be a list of network ids. The Driver can provide the list of available network id's. Along the same lines should there be a ListEndpoints() on Network? Is there ever a need for Docker to enumerate the list of all endpoints?

Contributor

ibuildthecloud commented Jan 9, 2015

It seems like List() []string should be on Driver and not Network. From the description this seem to be a list of network ids. The Driver can provide the list of available network id's. Along the same lines should there be a ListEndpoints() on Network? Is there ever a need for Docker to enumerate the list of all endpoints?

@NetCubist

This comment has been minimized.

Show comment
Hide comment
@NetCubist

NetCubist Jan 9, 2015

I think there might be some confusion around "sandbox". Is the sandbox the netns or is it the docker container itself?
What is the use case for supporting multiple endpoints (for the same container) in the same network?
As I understand it, the identifier (string name) for a endpoint could be used for service discovery using the network itself. Is that the intention? This would need to be scoped on a per tenant basis really (not network). How will we avoid collision? How will it tie in to DNS lookups?
Can a user chose to skip "default" network completely? Yes I suppose

The proposal mentions "iptable rules", can you clarify how and when iptables will be used? Assuming other networks want to populate the iptable rules as well, is there a way to co-ordinate or is that not an issue?

I think there might be some confusion around "sandbox". Is the sandbox the netns or is it the docker container itself?
What is the use case for supporting multiple endpoints (for the same container) in the same network?
As I understand it, the identifier (string name) for a endpoint could be used for service discovery using the network itself. Is that the intention? This would need to be scoped on a per tenant basis really (not network). How will we avoid collision? How will it tie in to DNS lookups?
Can a user chose to skip "default" network completely? Yes I suppose

The proposal mentions "iptable rules", can you clarify how and when iptables will be used? Assuming other networks want to populate the iptable rules as well, is there a way to co-ordinate or is that not an issue?

@NetCubist

This comment has been minimized.

Show comment
Hide comment
@NetCubist

NetCubist Jan 9, 2015

Maybe I missed it, but what is the proposal around policies? Firewall, QOS etc

Maybe I missed it, but what is the proposal around policies? Firewall, QOS etc

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 10, 2015

Contributor

Also, when endpoints supposed to be removed? Can endpoint exist without sandbox?
How driver can mutate state if it passed only in restore?
And seems like sandbox not very abstract from network, but maybe this is not a problem.

Contributor

LK4D4 commented Jan 10, 2015

Also, when endpoints supposed to be removed? Can endpoint exist without sandbox?
How driver can mutate state if it passed only in restore?
And seems like sandbox not very abstract from network, but maybe this is not a problem.

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Jan 10, 2015

@NetCubist, @shettyg - I am not sure why QoS/ACL need to be brought as an API. Of course that has to be handled by the driver based on the network name, or a label/network-profile. It is not something that the driver APIs need to expose unless you believe that the ACLs/QoS (tc, iptables, or OVS rules) must be communicated using docker apis.

Note that I already asked earlier about ability to run the commands within netns of the container and so long as driver can do it, it can do all the operations on the netdev from driver.

Feel free to correct me if I understood you wrong on this.

@NetCubist, @shettyg - I am not sure why QoS/ACL need to be brought as an API. Of course that has to be handled by the driver based on the network name, or a label/network-profile. It is not something that the driver APIs need to expose unless you believe that the ACLs/QoS (tc, iptables, or OVS rules) must be communicated using docker apis.

Note that I already asked earlier about ability to run the commands within netns of the container and so long as driver can do it, it can do all the operations on the netdev from driver.

Feel free to correct me if I understood you wrong on this.

@shettyg

This comment has been minimized.

Show comment
Hide comment
@shettyg

shettyg Jan 10, 2015

@jainvipin - Is a ACL per network or QoS per network useful? Don't you want to add an ACL (or Qos or firewall or port mirroring) per endpoint? When one endpoint could be a webserver and another endpoint of the same network could be a database, I see them having different needs. May be doing it per network is good enough for majority of the use cases of the container world. If that is the case, you are right.

Also I agree with you that QoS and ACL etc are just labels that the driver can interpret anyway it chooses. With the assumption that we need to associate that label per end-point (and not network), I was saying that we can pass it to the driver through a docker-network command extension.

Note that based on the below code, I have been assuming that the 'add network', 'del network', 'add endpoint', 'del endpoint' etc are part of the docker-network binary and not part of docker itself.
https://github.com/docker/docker-network/blob/master/main.go

I have also been assuming that different plugins will be part of docker-network project. Something similar to what docker machine does here with digital-ocean, AWS, Azure etc and can take additional command line options.
https://github.com/docker/machine

If different plugins will be independent binaries and not part of docker-network project, then my points are moot.

shettyg commented Jan 10, 2015

@jainvipin - Is a ACL per network or QoS per network useful? Don't you want to add an ACL (or Qos or firewall or port mirroring) per endpoint? When one endpoint could be a webserver and another endpoint of the same network could be a database, I see them having different needs. May be doing it per network is good enough for majority of the use cases of the container world. If that is the case, you are right.

Also I agree with you that QoS and ACL etc are just labels that the driver can interpret anyway it chooses. With the assumption that we need to associate that label per end-point (and not network), I was saying that we can pass it to the driver through a docker-network command extension.

Note that based on the below code, I have been assuming that the 'add network', 'del network', 'add endpoint', 'del endpoint' etc are part of the docker-network binary and not part of docker itself.
https://github.com/docker/docker-network/blob/master/main.go

I have also been assuming that different plugins will be part of docker-network project. Something similar to what docker machine does here with digital-ocean, AWS, Azure etc and can take additional command line options.
https://github.com/docker/machine

If different plugins will be independent binaries and not part of docker-network project, then my points are moot.

@shettyg

This comment has been minimized.

Show comment
Hide comment
@shettyg

shettyg Jan 10, 2015

@MalteJ
You said:

I would like to realize that via a docker run network-config string that is passed to the driver. e.g. docker >run -d --network-config="role=frontend,ipv6=true" nginx.
The Driver API specification should not define the format of this string. It should be up to the driver to >specify the format. This way you can realize various configuration strategies. e.g. you could pass a >UUID of some central configuration, a JSON string, an http URL, whatever you can imagine.

I think what you suggest does provide flexibility and I can see how it can useful.

shettyg commented Jan 10, 2015

@MalteJ
You said:

I would like to realize that via a docker run network-config string that is passed to the driver. e.g. docker >run -d --network-config="role=frontend,ipv6=true" nginx.
The Driver API specification should not define the format of this string. It should be up to the driver to >specify the format. This way you can realize various configuration strategies. e.g. you could pass a >UUID of some central configuration, a JSON string, an http URL, whatever you can imagine.

I think what you suggest does provide flexibility and I can see how it can useful.

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Jan 10, 2015

@shettyg - There is a disconnect, I think. Network drivers, like other plugins (docker-extensions) have an indication to events as described by the driver interface mentioned above to offer specific backend implementation for various events.

@jainvipin - Is a ACL per network or QoS per network useful? Don't you want to add an ACL (or Qos or firewall or port mirroring) per endpoint? When one endpoint could be a webserver and another endpoint of the same network could be a database, I see them having different needs. May be doing it per network is good enough for majority of the use cases of the container world. If that is the case, you are right.

I wasn't suggesting having these rules per network - the policies IMO have to be instantiated per endpoint in a network for them to be useful. Regardless, I was suggesting that this can be done in the driver/plugin, that has hooks in the docker with appropriate APIs that are triggered by docker upon appropriate events, which in this case would be the following API described above:

// Create an endpoint for a container with a name, and place it in the sandbox.
Link(netid, name string, sb sandbox.Sandbox, replace bool) (Endpoint, error)

With regards to how user indicate the policies associated with the container:

Also I agree with you that QoS and ACL etc are just labels that the driver can interpret anyway it chooses. With the assumption that we need to associate that label per end-point (and not network), I was saying that we can pass it to the driver through a docker-network command extension.

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what @ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and keep the infrastructure generic.

@shettyg - There is a disconnect, I think. Network drivers, like other plugins (docker-extensions) have an indication to events as described by the driver interface mentioned above to offer specific backend implementation for various events.

@jainvipin - Is a ACL per network or QoS per network useful? Don't you want to add an ACL (or Qos or firewall or port mirroring) per endpoint? When one endpoint could be a webserver and another endpoint of the same network could be a database, I see them having different needs. May be doing it per network is good enough for majority of the use cases of the container world. If that is the case, you are right.

I wasn't suggesting having these rules per network - the policies IMO have to be instantiated per endpoint in a network for them to be useful. Regardless, I was suggesting that this can be done in the driver/plugin, that has hooks in the docker with appropriate APIs that are triggered by docker upon appropriate events, which in this case would be the following API described above:

// Create an endpoint for a container with a name, and place it in the sandbox.
Link(netid, name string, sb sandbox.Sandbox, replace bool) (Endpoint, error)

With regards to how user indicate the policies associated with the container:

Also I agree with you that QoS and ACL etc are just labels that the driver can interpret anyway it chooses. With the assumption that we need to associate that label per end-point (and not network), I was saying that we can pass it to the driver through a docker-network command extension.

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what @ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and keep the infrastructure generic.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jan 10, 2015

Member

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what @ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and keep the infrastructure generic.

I like that idea. Using the example that @MalteJ provided earlier;

--network-config="role=frontend,ipv6=true"

While possible (also with #9882) to include all parameters in a single value/string, I think the possibility should be kept open to provide the --network-config multiple times;

 --network-config role=frontend --network-config ipv6=true

Or even, using the standard meta-data flag of #9882;

  --label role=frontend --label ipv6=true

The name of those labels and their meaning is defined by the network-plugin (not by Docker itself). And to prevent collisions, can be namespaced. Namespacing is not required, but just a best-practice;

  --label foobarNetWorking.role=frontend --label foobarNetWorking.ipv6=true
Member

thaJeztah commented Jan 10, 2015

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what @ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and keep the infrastructure generic.

I like that idea. Using the example that @MalteJ provided earlier;

--network-config="role=frontend,ipv6=true"

While possible (also with #9882) to include all parameters in a single value/string, I think the possibility should be kept open to provide the --network-config multiple times;

 --network-config role=frontend --network-config ipv6=true

Or even, using the standard meta-data flag of #9882;

  --label role=frontend --label ipv6=true

The name of those labels and their meaning is defined by the network-plugin (not by Docker itself). And to prevent collisions, can be namespaced. Namespacing is not required, but just a best-practice;

  --label foobarNetWorking.role=frontend --label foobarNetWorking.ipv6=true
@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 10, 2015

Contributor

I urge the designers here to not underestimate the value of opaque
interfaces, even at the cost of code duplication or inconsistencies between
plugins.

If the API is too fine grained and prescriptive, it makes the plugins hard
to implement and full of awkward corner cases.

Start simple, and let the common patterns emerge. Capture those in v2.
On Jan 10, 2015 10:14 AM, "Sebastiaan van Stijn" notifications@github.com
wrote:

Instead of passing them as explicit acl/qos profiles, I was suggesting
that this can be done using what @ibuildthecloud
https://github.com/ibuildthecloud suggested in #9882
#9882 and let driver handle the
specifics of qos or acl, or whatever and keep the infrastructure generic.

I like that idea. Using the example that @MalteJ
https://github.com/MalteJ provided earlier;

--network-config="role=frontend,ipv6=true"

While possible (also with #9882
#9882) to include all parameters
in a single value/string, I think the possibility should be kept open to
provide the --network-config multiple times;

--network-config role=frontend --network-config ipv6=true

Or even, using the standard meta-data flag of #9882
#9882;

--label role=frontend --label ipv6=true

The name of those labels and their meaning is defined by the
network-plugin (not by Docker itself). And to prevent collisions, can be
namespaced. Namespacing is not required, but just a best-practice;

--label foobarNetWorking.role=frontend --label foobarNetWorking.ipv6=true

"Standardized" names for some labels can be decided on for easier
interchangeability between network-plugins, but that is something the
developers of network-plugins can discuss (lets call it the "Docker Plugin
Interchangeability Workgroup")

Reply to this email directly or view it on GitHub
#9983 (comment).

Contributor

thockin commented Jan 10, 2015

I urge the designers here to not underestimate the value of opaque
interfaces, even at the cost of code duplication or inconsistencies between
plugins.

If the API is too fine grained and prescriptive, it makes the plugins hard
to implement and full of awkward corner cases.

Start simple, and let the common patterns emerge. Capture those in v2.
On Jan 10, 2015 10:14 AM, "Sebastiaan van Stijn" notifications@github.com
wrote:

Instead of passing them as explicit acl/qos profiles, I was suggesting
that this can be done using what @ibuildthecloud
https://github.com/ibuildthecloud suggested in #9882
#9882 and let driver handle the
specifics of qos or acl, or whatever and keep the infrastructure generic.

I like that idea. Using the example that @MalteJ
https://github.com/MalteJ provided earlier;

--network-config="role=frontend,ipv6=true"

While possible (also with #9882
#9882) to include all parameters
in a single value/string, I think the possibility should be kept open to
provide the --network-config multiple times;

--network-config role=frontend --network-config ipv6=true

Or even, using the standard meta-data flag of #9882
#9882;

--label role=frontend --label ipv6=true

The name of those labels and their meaning is defined by the
network-plugin (not by Docker itself). And to prevent collisions, can be
namespaced. Namespacing is not required, but just a best-practice;

--label foobarNetWorking.role=frontend --label foobarNetWorking.ipv6=true

"Standardized" names for some labels can be decided on for easier
interchangeability between network-plugins, but that is something the
developers of network-plugins can discuss (lets call it the "Docker Plugin
Interchangeability Workgroup")

Reply to this email directly or view it on GitHub
#9983 (comment).

@MalteJ

This comment has been minimized.

Show comment
Hide comment
@MalteJ

MalteJ Jan 10, 2015

Contributor

@thockin full ack

Contributor

MalteJ commented Jan 10, 2015

@thockin full ack

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jan 10, 2015

Member

@thockin I fully agree. I attempted to include that in my comment, but just to be clear; the only thing Docker itself would facilitate is the option to pass --network-opts[] / --label[]. What's in them is up to the plugin-developers to decide. I'll remove the "Interchangeability Workgroup" part, which was more of a sneaky note and only confuses things.

Member

thaJeztah commented Jan 10, 2015

@thockin I fully agree. I attempted to include that in my comment, but just to be clear; the only thing Docker itself would facilitate is the option to pass --network-opts[] / --label[]. What's in them is up to the plugin-developers to decide. I'll remove the "Interchangeability Workgroup" part, which was more of a sneaky note and only confuses things.

@MalteJ

This comment has been minimized.

Show comment
Hide comment
@MalteJ

MalteJ Jan 11, 2015

Contributor

@erikh I don't think I have got it, yet.
Could you elaborate which part of the networking logic will stay in Docker and which will be pushed into the driver? Where is the line? And why?

Due to the existence of the Restore method in the driver API and the network management methods (AddNetwork, RemoveNetwork, GetNetwork) it looks like there is still plenty of networking logic in Docker. I am wondering why?

Contributor

MalteJ commented Jan 11, 2015

@erikh I don't think I have got it, yet.
Could you elaborate which part of the networking logic will stay in Docker and which will be pushed into the driver? Where is the line? And why?

Due to the existence of the Restore method in the driver API and the network management methods (AddNetwork, RemoveNetwork, GetNetwork) it looks like there is still plenty of networking logic in Docker. I am wondering why?

@shettyg

This comment has been minimized.

Show comment
Hide comment
@shettyg

shettyg Jan 12, 2015

@jainvipin
You wrote:

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what >@ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and >keep the infrastructure generic.

That would work too, I think. I haven't yet wrapped my head around the entire life-cycle with labels being discussed in multiple proposals. Are you suggesting that (you are likely not), in Dockerfile, a network specific label could be added which is passed onto the network driver along with netid for Link()? So this makes an assumption that the Dockerfile writer is also the person that deploys the application and that is not always true (he could have been just an application writer), right?

Thinking out loudly: If I imagine 'docker-network' binary being used by an ops/dev-ops guy to create network, create endpoint etc, he would also be the one creating the label. I agree that it need not be explicit acl, qos profile (they can be created and associated with an end-point without going through docker-network binary). It can be just a generic label passed through 'docker run'.

So my earlier simplistic workflow about add-acl, authentication etc can be made without using docker-network api or extending it for additional cli commands.

shettyg commented Jan 12, 2015

@jainvipin
You wrote:

Instead of passing them as explicit acl/qos profiles, I was suggesting that this can be done using what >@ibuildthecloud suggested in #9882 and let driver handle the specifics of qos or acl, or whatever and >keep the infrastructure generic.

That would work too, I think. I haven't yet wrapped my head around the entire life-cycle with labels being discussed in multiple proposals. Are you suggesting that (you are likely not), in Dockerfile, a network specific label could be added which is passed onto the network driver along with netid for Link()? So this makes an assumption that the Dockerfile writer is also the person that deploys the application and that is not always true (he could have been just an application writer), right?

Thinking out loudly: If I imagine 'docker-network' binary being used by an ops/dev-ops guy to create network, create endpoint etc, he would also be the one creating the label. I agree that it need not be explicit acl, qos profile (they can be created and associated with an end-point without going through docker-network binary). It can be just a generic label passed through 'docker run'.

So my earlier simplistic workflow about add-acl, authentication etc can be made without using docker-network api or extending it for additional cli commands.

@celebdor

This comment has been minimized.

Show comment
Hide comment
@celebdor

celebdor Jan 12, 2015

Great proposal!

I agree with @shettyg about ACLs and QoS being very relevant to multi tennant
scenarios. I think, though, as @jainvipin said, that this is something that
should be handled not by 'docker net' but by the networking layer manager, i.e.,
whichever networking solution that is backing the driver/plugin.

So for example if 'simplebridge' wants to provide QoS (that it probably does
not need to), its backing network solution being the traditional linux stack,
it can just convert a label of the kind:

--label role=database

to adding a tc filter that puts the endpoint into the traffic class (for
example of HTB or HFSC) that the admin has prepared for databases or directly
accepting rates and policies for tc filters as labels.

Adding QoS to an endpoint or ACLs to endpoints and networks could be handled
with labels as per @thaJeztah examples when wishing to do it at the docker
command execution, and otherwise it can be altered and managed at the solution
provider level.

In regards to the Network.List as a way of listing all the networks, I think
that it'd be more natural to have it in the Driver interface.

Great proposal!

I agree with @shettyg about ACLs and QoS being very relevant to multi tennant
scenarios. I think, though, as @jainvipin said, that this is something that
should be handled not by 'docker net' but by the networking layer manager, i.e.,
whichever networking solution that is backing the driver/plugin.

So for example if 'simplebridge' wants to provide QoS (that it probably does
not need to), its backing network solution being the traditional linux stack,
it can just convert a label of the kind:

--label role=database

to adding a tc filter that puts the endpoint into the traffic class (for
example of HTB or HFSC) that the admin has prepared for databases or directly
accepting rates and policies for tc filters as labels.

Adding QoS to an endpoint or ACLs to endpoints and networks could be handled
with labels as per @thaJeztah examples when wishing to do it at the docker
command execution, and otherwise it can be altered and managed at the solution
provider level.

In regards to the Network.List as a way of listing all the networks, I think
that it'd be more natural to have it in the Driver interface.

@timothysc timothysc referenced this issue in mesosphere/kubernetes-mesos Jan 15, 2015

Open

Networking TBD. #5

@liljenstolpe

This comment has been minimized.

Show comment
Hide comment
@liljenstolpe

liljenstolpe Jan 16, 2015

Overall, I like the proposal. I'm not sure that we want to proscribe that more than one network would be created, or that networks are a policy boundary. @jainvipin has touched on this. I think that what we would want is a description of what characteristics an endpoint wants (make it endpoint based, not network based). That could be IPv6 reachability, it could be a policy as to who it wants to communicate with, it could be how much bandwidth it needs. Networks, segments, etc. are all tools that the plugin can use to render that list of policies or requirements.

Overall, I like the proposal. I'm not sure that we want to proscribe that more than one network would be created, or that networks are a policy boundary. @jainvipin has touched on this. I think that what we would want is a description of what characteristics an endpoint wants (make it endpoint based, not network based). That could be IPv6 reachability, it could be a policy as to who it wants to communicate with, it could be how much bandwidth it needs. Networks, segments, etc. are all tools that the plugin can use to render that list of policies or requirements.

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Jan 16, 2015

Contributor

Hi,

I have a few questions:

  • If we're using libpack, how do we replicate the state in a save manner across all participating docker hosts?
  • If we're using ecc, will that be reused by swarm? Or would people running swarm and docker have two date stores, one in swarm and one in the docker networking plugin?
  • I assume the simplebridge setup the vxvlan peers automatically, so your network spans multiple hosts, right? How does it know about the other docker hosts?
Contributor

discordianfish commented Jan 16, 2015

Hi,

I have a few questions:

  • If we're using libpack, how do we replicate the state in a save manner across all participating docker hosts?
  • If we're using ecc, will that be reused by swarm? Or would people running swarm and docker have two date stores, one in swarm and one in the docker networking plugin?
  • I assume the simplebridge setup the vxvlan peers automatically, so your network spans multiple hosts, right? How does it know about the other docker hosts?
@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 16, 2015

Contributor

@discordianfish libpack will be distributed, also I believe there will be multiple backend support for storage, I hope that swarm will support multiple too.
About vxlan: it will know from that distiributed storage :)

Contributor

LK4D4 commented Jan 16, 2015

@discordianfish libpack will be distributed, also I believe there will be multiple backend support for storage, I hope that swarm will support multiple too.
About vxlan: it will know from that distiributed storage :)

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Jan 16, 2015

Contributor

@LK4D4 So on a Docker cluster with reworked networking and clustering, I will have two instances of some replicated data store, one for swarm and one for networking? Or can there be just one instance that is shared for both the cluster as well as the networking state?

Contributor

discordianfish commented Jan 16, 2015

@LK4D4 So on a Docker cluster with reworked networking and clustering, I will have two instances of some replicated data store, one for swarm and one for networking? Or can there be just one instance that is shared for both the cluster as well as the networking state?

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 16, 2015

Contributor

@discordianfish it can be one instance of course or this loose all sense :)

Contributor

LK4D4 commented Jan 16, 2015

@discordianfish it can be one instance of course or this loose all sense :)

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Jan 16, 2015

Contributor

Well, this was clear from the proposal to me. So this proposal assumes some sort of replicated data store from which is also can get the other a list of other docker hosts for cross-host networking. Is there a proposal on introducing that data store? Seems to me like the first step before something like this here can be implemented.

Contributor

discordianfish commented Jan 16, 2015

Well, this was clear from the proposal to me. So this proposal assumes some sort of replicated data store from which is also can get the other a list of other docker hosts for cross-host networking. Is there a proposal on introducing that data store? Seems to me like the first step before something like this here can be implemented.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 16, 2015

Contributor

@discordianfish Yes, if we want support multihost without multicast - we need storage. But idea of this proposal was not in multihost I think.

Contributor

LK4D4 commented Jan 16, 2015

@discordianfish Yes, if we want support multihost without multicast - we need storage. But idea of this proposal was not in multihost I think.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 19, 2015

Contributor

@discordianfish @vieux and I are talking about how swarm and docker networking could cooperate but there's no immediate plan to share this. I will let you know as I know more.

Contributor

erikh commented Jan 19, 2015

@discordianfish @vieux and I are talking about how swarm and docker networking could cooperate but there's no immediate plan to share this. I will let you know as I know more.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 19, 2015

Contributor

@discordianfish libpack intends to have a distributed implementation, but we're still figuring out if libpack is the right tool for us.

Contributor

erikh commented Jan 19, 2015

@discordianfish libpack intends to have a distributed implementation, but we're still figuring out if libpack is the right tool for us.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 19, 2015

Contributor

I've added a diagram which goes into how Extensions, Drivers, Networks and Endpoints relate to each other. Please inform me all if this is insufficient.

Contributor

erikh commented Jan 19, 2015

I've added a diagram which goes into how Extensions, Drivers, Networks and Endpoints relate to each other. Please inform me all if this is insufficient.

@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 19, 2015

Contributor

Hello again folks,

For starters, please review the proposal again for those of you heavily invested. I have made several updates based on your comments which I hope will clarify things. There's quite a few edits so I will not make note of them here. If you were not mentioned, please, please check the proposal to see if I answered your questions.

@shettyg to answer your questions, we have no intention of mandating ACL, QoS, or other multi-tenancy solutions for drivers at this time. These things will be expected to be implemented by custom drivers with custom configuration.

@MalteJ I really like this flags/configuration system and am working on a way to incorporate it.

Someone asked if Ports were exposed by default. Our feeling is that expose becomes less of a network solution and more of a service discovery mechanism, e.g., for finding your redis port or host, etc. We intend to implement a REST API at a link-local IPv6/IPv4 address which should satisfy the service discovery portion. We're also looking at using A and SRV records. There's a large possibility we will implement both (see https://github.com/docker/dnsserver for a prototype of how DNS might work). This will supplant the links feature in a useful way, hopefully.

As for compose, machine, swarm integration I would like to point most of you at @bfirsh or @shykes for now, but realize we have no plans to directly integrate with them immediately, but we are evaluating the notion of sharing a state/discovery implementation with the swarm project. I will have more answers on discovery and shared state in a week or two.

I hope this helps! Please let me know if you have additional questions.

Contributor

erikh commented Jan 19, 2015

Hello again folks,

For starters, please review the proposal again for those of you heavily invested. I have made several updates based on your comments which I hope will clarify things. There's quite a few edits so I will not make note of them here. If you were not mentioned, please, please check the proposal to see if I answered your questions.

@shettyg to answer your questions, we have no intention of mandating ACL, QoS, or other multi-tenancy solutions for drivers at this time. These things will be expected to be implemented by custom drivers with custom configuration.

@MalteJ I really like this flags/configuration system and am working on a way to incorporate it.

Someone asked if Ports were exposed by default. Our feeling is that expose becomes less of a network solution and more of a service discovery mechanism, e.g., for finding your redis port or host, etc. We intend to implement a REST API at a link-local IPv6/IPv4 address which should satisfy the service discovery portion. We're also looking at using A and SRV records. There's a large possibility we will implement both (see https://github.com/docker/dnsserver for a prototype of how DNS might work). This will supplant the links feature in a useful way, hopefully.

As for compose, machine, swarm integration I would like to point most of you at @bfirsh or @shykes for now, but realize we have no plans to directly integrate with them immediately, but we are evaluating the notion of sharing a state/discovery implementation with the swarm project. I will have more answers on discovery and shared state in a week or two.

I hope this helps! Please let me know if you have additional questions.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 20, 2015

Contributor

Collected thoughts as I read again:

  1. "Multiple networks ... segment traffic ... should be supported by all
    drivers" - I am still wrestling with a mental model of a dumb network where
    I just want to expose network interfaces with real IP addresses (e.g.
    ipvlan) - you said "should be supported" so I am assuming it is technically
    OK if segmenting is NOT supported?

  2. I don't see how this proposal simplifies the endpoint-per-pod model that
    we (kubernetes) use. You specifically say that docker run time will
    create an endpoint and join that to a sandbox. I suppose that leaves us
    doing what we do today, which is to create a dummy container with the net
    namespace (and endpoint) and use --net=container:id for all of the
    containers in the pod -- I was hoping this proposal would make that less
    fugly. As it is, we just have to explain it with "because that's how
    Docker allows us to model the concept". Can we / should we do better with
    this proposal?

  3. Network.Endpoints() - does that return a list of endpoints ON THIS HOST,
    or a global list?

  4. endpoint.Name() - does this need to be globally unique or just unique on
    a given host? For example, if this is the interface name, what happens if
    two hosts in the same Network use the same interface name?

On Mon, Jan 19, 2015 at 11:15 AM, Erik Hollensbe notifications@github.com
wrote:

Hello again folks,

For starters, please review the proposal again for those of you heavily
invested. I have made several updates based on your comments which I hope
will clarify things. There's quite a few edits so I will not make note of
them here. If you were not mentioned, please, please check the proposal to
see if I answered your questions.

@shettyg https://github.com/shettyg to answer your questions, we have
no intention of mandating ACL, QoS, or other multi-tenancy solutions for
drivers at this time. These things will be expected to be implemented by
custom drivers with custom configuration.

@MalteJ https://github.com/MalteJ I really like this
flags/configuration system and am working on a way to incorporate it.

Someone asked if Ports were exposed by default. Our feeling is that expose
becomes less of a network solution and more of a service discovery
mechanism, e.g., for finding your redis port or host, etc. We intend to
implement a REST API at a link-local IPv6/IPv4 address which should satisfy
the service discovery portion. We're also looking at using A and SRV
records. There's a large possibility we will implement both (see
https://github.com/docker/dnsserver for a prototype of how DNS might
work). This will supplant the links feature in a useful way, hopefully.

As for compose, machine, swarm integration I would like to point most of
you at @bfirsh https://github.com/bfirsh or @shykes
https://github.com/shykes for now, but realize we have no plans to
directly integrate with them immediately, but we are evaluating the notion
of sharing a state/discovery implementation with the swarm project. I will
have more answers on discovery and shared state in a week or two.

I hope this helps! Please let me know if you have additional questions.

Reply to this email directly or view it on GitHub
#9983 (comment).

Contributor

thockin commented Jan 20, 2015

Collected thoughts as I read again:

  1. "Multiple networks ... segment traffic ... should be supported by all
    drivers" - I am still wrestling with a mental model of a dumb network where
    I just want to expose network interfaces with real IP addresses (e.g.
    ipvlan) - you said "should be supported" so I am assuming it is technically
    OK if segmenting is NOT supported?

  2. I don't see how this proposal simplifies the endpoint-per-pod model that
    we (kubernetes) use. You specifically say that docker run time will
    create an endpoint and join that to a sandbox. I suppose that leaves us
    doing what we do today, which is to create a dummy container with the net
    namespace (and endpoint) and use --net=container:id for all of the
    containers in the pod -- I was hoping this proposal would make that less
    fugly. As it is, we just have to explain it with "because that's how
    Docker allows us to model the concept". Can we / should we do better with
    this proposal?

  3. Network.Endpoints() - does that return a list of endpoints ON THIS HOST,
    or a global list?

  4. endpoint.Name() - does this need to be globally unique or just unique on
    a given host? For example, if this is the interface name, what happens if
    two hosts in the same Network use the same interface name?

On Mon, Jan 19, 2015 at 11:15 AM, Erik Hollensbe notifications@github.com
wrote:

Hello again folks,

For starters, please review the proposal again for those of you heavily
invested. I have made several updates based on your comments which I hope
will clarify things. There's quite a few edits so I will not make note of
them here. If you were not mentioned, please, please check the proposal to
see if I answered your questions.

@shettyg https://github.com/shettyg to answer your questions, we have
no intention of mandating ACL, QoS, or other multi-tenancy solutions for
drivers at this time. These things will be expected to be implemented by
custom drivers with custom configuration.

@MalteJ https://github.com/MalteJ I really like this
flags/configuration system and am working on a way to incorporate it.

Someone asked if Ports were exposed by default. Our feeling is that expose
becomes less of a network solution and more of a service discovery
mechanism, e.g., for finding your redis port or host, etc. We intend to
implement a REST API at a link-local IPv6/IPv4 address which should satisfy
the service discovery portion. We're also looking at using A and SRV
records. There's a large possibility we will implement both (see
https://github.com/docker/dnsserver for a prototype of how DNS might
work). This will supplant the links feature in a useful way, hopefully.

As for compose, machine, swarm integration I would like to point most of
you at @bfirsh https://github.com/bfirsh or @shykes
https://github.com/shykes for now, but realize we have no plans to
directly integrate with them immediately, but we are evaluating the notion
of sharing a state/discovery implementation with the swarm project. I will
have more answers on discovery and shared state in a week or two.

I hope this helps! Please let me know if you have additional questions.

Reply to this email directly or view it on GitHub
#9983 (comment).

@bboreham

This comment has been minimized.

Show comment
Hide comment
@bboreham

bboreham Jan 20, 2015

Contributor
  1. Some questions on timing:

A network driver will be asked at docker run

  • From a user's point of view, it is ideal if this happens before their container's entrypoint starts.
  • From an implementer's point of view, it is ideal if this happens after the process exists, so that a network namespace can be hung on it.

Is it your intention to mandate both of these in the spec?

If a user desires to join a container to several networks, implemented by drivers from different providers, how will these operations be ordered?

  1. When a container (sandbox) ceases to exist, how does the driver find out? Will Unlink() be called for all endpoints?
Contributor

bboreham commented Jan 20, 2015

  1. Some questions on timing:

A network driver will be asked at docker run

  • From a user's point of view, it is ideal if this happens before their container's entrypoint starts.
  • From an implementer's point of view, it is ideal if this happens after the process exists, so that a network namespace can be hung on it.

Is it your intention to mandate both of these in the spec?

If a user desires to join a container to several networks, implemented by drivers from different providers, how will these operations be ordered?

  1. When a container (sandbox) ceases to exist, how does the driver find out? Will Unlink() be called for all endpoints?
@erikh

This comment has been minimized.

Show comment
Hide comment
@erikh

erikh Jan 20, 2015

Contributor

This is already the way it works now, so I presumed it and overlooked it in the proposal.

Sorry about that! I will correct it now.

On Jan 20, 2015, at 1:57 AM, Bryan Boreham notifications@github.com wrote:

Some questions on timing:
A network driver will be asked at docker run

From a user's point of view, it is ideal if this happens before their container's entrypoint starts.
From an implementer's point of view, it is ideal if this happens after the process exists, so that a network namespace can be hung on it.
Is it your intention to mandate both of these in the spec?

If a user desires to join a container to several networks, implemented by drivers from different providers, how will these operations be ordered?

When a container (sandbox) ceases to exist, how does the driver find out? Will Unlink() be called for all endpoints?

Reply to this email directly or view it on GitHub #9983 (comment).

Contributor

erikh commented Jan 20, 2015

This is already the way it works now, so I presumed it and overlooked it in the proposal.

Sorry about that! I will correct it now.

On Jan 20, 2015, at 1:57 AM, Bryan Boreham notifications@github.com wrote:

Some questions on timing:
A network driver will be asked at docker run

From a user's point of view, it is ideal if this happens before their container's entrypoint starts.
From an implementer's point of view, it is ideal if this happens after the process exists, so that a network namespace can be hung on it.
Is it your intention to mandate both of these in the spec?

If a user desires to join a container to several networks, implemented by drivers from different providers, how will these operations be ordered?

When a container (sandbox) ceases to exist, how does the driver find out? Will Unlink() be called for all endpoints?

Reply to this email directly or view it on GitHub #9983 (comment).

@bboreham

This comment has been minimized.

Show comment
Hide comment
@bboreham

bboreham Jan 27, 2015

Contributor

Currently, Docker writes the IP address it has allocated for a container into the container's /etc/hosts file. How will this be achieved in the new model?

Related: when a container is attached to several networks, which address will go first in /etc/hosts? Some programs take this as 'my' IP address.

Disclosure: I work on Weave.

Contributor

bboreham commented Jan 27, 2015

Currently, Docker writes the IP address it has allocated for a container into the container's /etc/hosts file. How will this be achieved in the new model?

Related: when a container is attached to several networks, which address will go first in /etc/hosts? Some programs take this as 'my' IP address.

Disclosure: I work on Weave.

@bpradipt

This comment has been minimized.

Show comment
Hide comment
@bpradipt

bpradipt Feb 17, 2015

@erikh, I would like to get clarity on few questions primarily from usage point of view

  1. Currently docker daemon either starts with the virtual NAT bridge by default, or can use any pre-created Linux bridge. Will this behaviour change ?
  2. Can I provide a specific OpenVswitch (OVS) bridge when starting docker daemon with the proposed changes ?
  3. What will be the high level steps to run and attach a container to a specific network ?
    Will it be something like
    • Run containers with --net='none'
    • Join the container to a network (docker net join ..)
      or
    • Run containers with --net=network-label which will attach the container to the specific network defined with 'network-label'
      or both ?

@erikh, I would like to get clarity on few questions primarily from usage point of view

  1. Currently docker daemon either starts with the virtual NAT bridge by default, or can use any pre-created Linux bridge. Will this behaviour change ?
  2. Can I provide a specific OpenVswitch (OVS) bridge when starting docker daemon with the proposed changes ?
  3. What will be the high level steps to run and attach a container to a specific network ?
    Will it be something like
    • Run containers with --net='none'
    • Join the container to a network (docker net join ..)
      or
    • Run containers with --net=network-label which will attach the container to the specific network defined with 'network-label'
      or both ?
@fmzhen

This comment has been minimized.

Show comment
Hide comment
@fmzhen

fmzhen Mar 24, 2015

Contributor

I am very interested in the networking plugin.How's it going now?

Contributor

fmzhen commented Mar 24, 2015

I am very interested in the networking plugin.How's it going now?

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Jul 10, 2015

Contributor

closed by libnetwork

Contributor

jessfraz commented Jul 10, 2015

closed by libnetwork

@junneyang

This comment has been minimized.

Show comment
Hide comment
@junneyang

junneyang Dec 30, 2016

@shettyg
very good suggestion
but i think it's hard to accomplish this goal
multi-tenant need much more works
and kubernetes also doesn't support this , kubernetes simplify the network model as well

a question:
if we comply with the norm of kubernetes, how do we achieve the goal of multi-tenant?

thanks very much!

@shettyg
very good suggestion
but i think it's hard to accomplish this goal
multi-tenant need much more works
and kubernetes also doesn't support this , kubernetes simplify the network model as well

a question:
if we comply with the norm of kubernetes, how do we achieve the goal of multi-tenant?

thanks very much!

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 3, 2017

Contributor
Contributor

thockin commented Jan 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment