-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Network Drivers #8952
Comments
Broken link at end of Driver API section should be https://github.com/openvswitch/ovs/blob/master/WHY-OVS.md |
Updated. Thanks @billsmith |
This is a great idea! One warning: I think specifying the API in terms of the method of creating the topology is a bad idea. The initial strawman for the API was:
A better approach would be for the interface to be in terms of the desired connectivity to be achieved, and to allow the plugin to sort it out. For example, rather than saying "I want a bridge", say "I want this container to talk to these containers". For OVS, this turns in to "I want a bridge", while for L3 approaches (say Project Calico), this turns into "I want appropriate ACLs". Basically, I think this approach for network drivers should be broader than just switches. Switches are great and I want to support them, but let's not rule out alternative approaches to building the network topologies. |
Does this proposal mean that all drivers will continue to live in the docker codebase? That seems to be suggested by the sentence "From a user perspective, the network driver would be chosen by specifying a flag when running the docker daemon". In practice, only a small set of network drivers will be blessed by inclusion in docker, and it will be hard for users to try out alternative network driver implementations. If instead the network driver API is exposed through a true plug-in system, then an ecosystem of driver implementations can thrive. For the same reason, the details of the API should not assume a particular style of driver. Furthermore, the driver should be selected on a per-container basis. For example, a user might want some containers on a single host to use an Open vSwitch-based driver (for high-performance virtualized netoworks), others to use a weave-based driver (in order to have an encrypted overlay network crossing firewalls), and others using the traditional Linux bridge-based docker networking (because they don't need anything more sophisticated). |
@Lukasa great point. this api is just a starter for 10 - i'd really like the api to be defined by interested parties :) @dpw i think we'll be basing this atop #8968 so drivers don't have to live in the docker code base. To follow docker's batteries included ethos it would make sense to have a sensible default in tree. I only found out about the plugin proposal today though ;) As for driver on a per-container basis, I'll defer to others to see if this is something we should consider. I'm wary of pushing something like that in a |
@dave-tucker Awesome, I'd love to be a voice in this discussion. I'm tentatively open to being a guinea pig for proposed APIs as well if we decide we need to workshop this. Per container drivers might be a little trickier, but I'm interested in seeing if a straw man can be proposed. I'll have a think about a straw man ideal API from where I'm standing, and propose it back. |
I think there is room to be opinionated here on the design in the early days as to avoid the pathological effects of injecting all kinds of state distribution issues into system. Others have mentioned that these efforts should be in line with both the plugin architecture #8968 https://github.com/docker/docker/pull/89688968 #8968 design but also the work going on with clustering #8859 #8859. There should be nothing preventing arbitrary composition of services but we can imagine that there are patterns of composition (spatial and temporal) which are necessary to meet the emerging data-flow patterns and micro-service design. As mentioned we should not have to worry about the underlying implementation and configuration mechanisms to assemble the graph of services whether applying a model such as one-app per container, full stack (not a good idea) or the assembly of containers into a higher collection (i.e. POD). From a high level I would love to see a generic "topology” abstraction, purposely not using network as topology infers some scope of spatial isomorphism. The topology class can be sub-classed into hints related to the composition of the graph of services i.e. one-to-one, one-to-many, many-to-one, many-to-many. This would allow the underlying configuration mechanisms and higher-level orchestration functions more intelligent about the patterns of communication which can be used to add constraints to the scheduling decisions. Quite possible this can be a very simple interface addition to Docker/Libswarm Verb such as: docker create topology -type one2one -name spark All of the identifier assignments (MAC, IP, VID, A record, etc..) are available through a service discovery API (think consul, etcd, etc..) The decisions about the intra-host, inter-host IPC capabilities need to be formalized under some constraints imposed by use-case analysis. For instance if mobility is a first-class function in Docker, then managing IP bindings becomes critical of which the options of L2 bridging (scaling challenges) or IPinIP encapsulation such as LISP(state distribution challenges) need to be considered. If we are going down the path of a broker/proxy based technology implementation like Weave, we might as well look at a more robust cluster based IPC mechanisms such as libchan over sockets which might be more valuable to application designers then stitching together tunnel endpoints. -g
|
some thoughts.. |
@ All This thread and #8951 does a good job of laying out the approaches to integrate advance networking models with docker. And as @lexlapax summarizes, these seem to be boiling down two broad approaches viz.
Since we still seem to be weighing the proposals, I want to add that while these two approaches can be thought to co-exist there are potentially different implications wrt design and implementation of docker as a scalable and lightweight container management/launching infrastructure, so it might not be desirable to support both but just do one right! IMO what can be achieved by one of the proposed approaches can potentially be achieved by the other approach as well. So instead of listing the differences I would like to list the similarity in capabilities of the two approaches in an attempt to make it easier to compare:
Conclusion I personally favor [2] mainly because it keeps docker implementation pretty simple and completely offloads the complexity of networking management to the external network orchestrator, which anyways needs to be involved in either approaches. However, [1] might be preferable if a driver/plugin based approach as being proposed by #8968 is adopted by docker in general. I would love to hear back if there are other considerations where one approach might be favored over the other. Or if I am missing something obvious in my understanding. This is a relevant discussion wrt shaping up network integration for docker and I would love to be involved. |
@dave-tucker Do you have a link to the Go based OVSDB library that you mention? |
@mrunalp https://github.com/socketplane/libovsdb it's still alpha quality, but we're working on that :) |
@dave-tucker No worries :) |
@dave-tucker, ah I might be missing something but I thought this proposal requires a network configuration to be passed down to docker which let's it provision the container network (ovs; or linux bridge + iptables etc) by calling into the driver's API (like My understanding of the proposal is that in order to be able to address complex network scenarios like making use of underlays (vlans) or overlays or just L3 and ACLs based on user's network case, we want to abstract it under the driver layer. But won't this require such the network configuration to be prepared by an orchestrator and then transparently passed to the driver (that understands that config) through BTW I might not have been clear but depending on complexity of the network, orchestrator might just be a fancy term for a bash script or a human, say managing a single host network. |
@mapuri concur on the notion of the orchestrator.. it could be anything, including scripts, chef/puppet/ansible, large scale container management frameworks, policy management frameworks like congress or opflex etc.. |
Sorry missing something here.. How do you do “multi-host” without coordination? if the call site for Docker functions only operates on a single host either a single container or a collection of containers then there needs to be some other call site that crosses different hosts.. I was under the impression that libswarm would take on this role, the Docker backend services would have concrete interfaces which would take advantage of the driver/plugin model proposals but there should be a higher level abstraction for multi-host. Like I said in my earlier post; if the composition of services requires knowledge of multiple hosts i.e. Container B:Port 4001:Host 2 -> depends -> on Container A:Port 4000:Host 1, Host 2 should have knowledge of this during this possibly at initialization but definitely at runtime.. It would be great to be able to do late bindings and just discover services from a central registry when needed, but now we are back to the fact that you need some shared state. As far as I understand OVSDB only has a local view of the host it operates on and NSX pulls that along with other DBs to create a global view. What is the expectation here, are we relying on the network community to provide tools for this?
|
@mapuri if we are including humans as orchestrators your point is valid. @gaberger multi-host is not this proposal. That discussion is happening in #8951. |
I have to say, discovery and connectivity although different disciplines go hand in hand. which sort of leads me to believe an approach that takes into consideration the fact that whatever is connecting the "plumbing" together would probably benefit from the knowledge of seeding discovery data.. and as such, it's either something like libswarm et al or something else altogether (possibly multiple somethings) but it's not the docker binary that should be doing this.. and hence, again, my preference for #8216 . As the community decides to get into the "orchestration and discovery" side of things and how best to do it from a docker perspective, the rest of the community still benefits from external tools that can do the connectivity and the discovery data seeding.. including using things like libovsdb outside of docker proper. |
@dave-tucker https://github.com/dave-tucker my bad Dave, crossed streams.. Still you should be careful here when trying to carve up these namespaces.. Services are overloaded onto interface addresses which is one of the core issues here, but will save that discussion for #8591 @mapuri https://github.com/mapuri I haven’t looked into the plugin layer but it seems to me this would require some dispatcher to live within docker daemon in order to broker calls either to the existing engine or a third-pary library like libovsdb
|
@dave-tucker, thanks for clarifying. That was my understanding as well while I was comparing #8952 with #8216 in my previous post. The state availed to the driver/plugin (either K/V or RPC calls) needs to be in a form that it understands i.e. it can't be something generic and it will mostly be dictated by the orchestrator/controller that publishes it in first place. For instance, a policy based framework might push policies to allow/prevent communication between two containers through ACL like rules while a simple vlan based ovs driver implementation might push ovsdb configuration to associate container interfaces in same/different vlans. This brings me back to the conclusion in my original post, that if #8216 and #8952 compare equally in capabilities, do we see any other specific benefits of one over the other that let's us choose one approach over other. I definitely see simplicity of #8216 as a potential plus in it's favor. @gaberger, yes agreed. With plugin layer based approach I can see that docker daemon will need to broker network namespace provisioning calls to a third-party/orchestrator specific plugin, if one is registered. |
+1 to not making plugins be built-in to docker. Running a separate plugin as a distinct daemon or set of exec() calls or set of http hooks means we don't need to hack on docker to experiment with ideas. +1 to less concrete API - we should support a bridgeless mode (think SRIOV). |
I personally favor #8216 over #8952 for the following reasons
Having said that, there might be a use case to have a simple network interface that allows the containers to directly connect to a linux bridge or an OVS for quick experimentation and Proof Of Concept type of works. Hence a lightweight #8951 based on linux bridge / OVS bridge might be useful. But #8216 should be the prime model in my opinion. |
I agree with @joeswaminathan . Theres already lots of work being done to manage the other aspects of this problem (kubernetes handles container scheduling/deployment/monitoring, rudder/flannel handles network address allocation and peer container connections via various methods, like direct routing/ovs/vxlan tunneling). Theres no need to attempt to pull all that into a monlitic docker setup. All docker really needs is a way to specify network interfaces and what they are expected to attach to locally (i.e. internal network/external network), in some common nomenclature that external tools can use to properly manage the virtual cabling for that container. Docker itslef doesn't need to become aware of the off host infrastructure that its living within |
@jainvipin should add #8997 to that list |
@joeswaminathan described very well my concerns. Although the idea of having multiple backends to support different networking needs in Docker is for sure compelling, there are too many possibilities to fullfil. It seems better if Docker could provide the simplest networking plumbing by default and be more friendly to external tools that can do more complex networking plumbing. |
+1 |
Agree, and this should be put with higher priority. |
There's an official proposal for networking drivers which can be found at #9983. This new proposal implements an architecture which has been discussed quite a bit. Implementing a proof of concept of the network drivers was also part of this effort. Should you discover something is confusing or missing from the new proposal, please feel free to comment. Questions and lengthy discussions are more adequate for the #docker-network channel on freenode. Should you just want to talk about this, that is a better place to have the conversation. We'd like to thank everyone who's provided input, especially those who've sent proposals. I will close this proposal now. |
how-to-configure-dhcp-server-in-docker so that we can acces the application running inside it with that ip. |
Authors: @dave-tucker, @mavenugo and @nerdalert.
Problem Statement
We believe that networking in Docker should be driver based with multiple backends to cater for the various different styles of networking. This would provide a great means for supporting alternative vSwitches like Open vSwitch, Snabb Switch or even a twist on the existing Linux Bridge solution.
This is a companion proposal to #8951 as it will be based on the Open vSwitch backend provided here.
Solution
The current bridged networking in docker relies on Linux Bridge with iptables programming.
Linux Bridge is only one of many vSwitch implementation available for Linux. Our proposal is to introduce a driver framework alongside a backends for the most popular vSwitch solutions today - Open vSwitch and Linux Bridge
Driver API
Today, a lot of the networking configuration is handled within libcontainer.
In order to be compatible with a driver-based model, we propose moving all of the code that handles networking inside of Docker.
This allows us to create a configuration pipeline, with hooks, allowing the drivers complete control over the network setup.
This could look as follows:
This means that when a container is created:
Having the driver as a part of Docker also allows us access to contextual information about a given container.
This would enable us to write metadata to a port (in cases where the vSwitch allows it) which is very useful for troubleshooting and debugging network issues.
The driver API allows the concrete driver to be purely implemented in the vSwitch or to be a combination of a vSwitch and other processes (e.g Linux Bridge + iptables).
From a user perspective, the network driver would be chosen by specifying a flag when running the docker daemon. Similarly, a sensible default will be picked. E.g Open vSwitch, if OVS is installed, otherwise fall back to Linux Bridge + iptables.
For the reasoning behind using OVS by default in place of Linux Bridge, please see this document
Open vSwitch Backend
Today libcontainer has a number of network strategies for connecting containers to bridges.
veth
is used for bridged networks and while this solution is widely compatible it is not the most performant. See this blog post for a performance comparison of Linux Bridge and OVS.As such, the OVS driver will support:
This can be added as a a new network strategy to allow code-sharing between drivers, or can be hard-coded in to the Open vSwitch driver itself.
OVS configuration is done using the OVSDB protocol specified in RFC7047.
As such, we have written an open source (Apache 2.0) OVSDB library for Go that can be consumed by Docker for this purpose.
Use of OVSDB is preferred over Open vSwitch CLI
ovs-vsctl
commands as it gives:Linux Bridge Backend
To maintain compatibility with teh existing networking stack, we will also write a Linux Bridge driver.
This will operate in the same fashion as it does today, creating a veth pair unless an alternative network strategy is selected.
Summary
By implementing a Bridged Network Driver framework in Docker we allow for many different implementations of vSwitches to easily integrate with Docker. This gives Docker users choices for performance, reliability and scale in production environments.
The work here should address the following issues:
The text was updated successfully, but these errors were encountered: