Join GitHub today
Proposal: Native Docker Multi-Host Networking #8951
Native Docker Multi-Host Networking
TL;DR Practical SDN for Docker
Application virtualization will have a significant impact on the future of data center networks. Compute virtualization has driven the edge of the network into the server and more specifically the virtual switch. The compute workload efficiencies derived from Docker containers will dramatically increase the density of network requirements in the server. Scaling this density will require reliable network fundamentals, while also ensuring the developer has as much or little interaction with the network as is desired.
A tightly coupled and native integration to Docker will ensure there is a base functionality that capable of integrating into the vast majority of data center network architectures today and help reduce the barriers to Docker adoption for the user. Just as important for the diverse user base, is making Docker networking dead simple for the to integrate, provision and troubleshoot.
The first step is a Native Docker Networking solution today that can handle Multi-Host environment which scales to production requirements and that works well with the existing network deployments / operations.
Though there are a few existing multi-host networking solutions, they are currently designed more as over-the-top solutions on top of Docker that either:
The core of this proposal is to bring multi-host networking as a native part of Docker that handles most of the use-cases, scales and works well with the existing production network and operations. With this provided as a native Docker solution, every orchestration system can enjoy the benefits alike.
There are three ways to approach multi-host networking in docker:
The first option (NAT-based) works by hiding the the containers behind a Docker Host IP address. The TCP port exposed by a given Docker container is mapped to an unique port on the Host machine.
Since the mapped host port has to be unique, containers using well-known port numbers are therefore forced to use ephemeral ports. This adds complexity in network operations, network visibility, troubleshooting and deployment.
For example, the configuration of a front-end load-balancer for a DNS service hosted in a Docker cluster.
If you have firewalls or IDS/IPS devices behind the load-balancer, these also need to know that the DNS service is being hosted on these devices and port numbers.
The second option (IP-based) works by assigning unique IP-Addresses to the containers and thus avoiding the need to do Port-mapping, and solving issues with downstream load-balancers and firewalls by using well-known ports in pre-determined subnets.
We are proposing a Native Multi-Host networking solution to Docker that handles various production-grade deployment scenarios and use cases.
The power of Docker is its simplicity, yet it scales to the demands of hyper-scale deployments. The same cannot be said today for the native networking solution in Docker. This proposal aims to bridge that gap. The intent is to implement a production-ready reliable multi-host networking solutions that is native to Docker while remaining laser focused on the user friendly needs of the developers environment that is at the heart of the Docker transformation.
The new edge of the network is the vSwitch. The virtual port density that application virtualization will drive is an even larger multiplier then the explosion of virtual ports created by OS virtualization. This will create port density far beyond anything to date. In order to scale, the network cannot be seen as merely the existing physical spine/leaf 2-tier physical network architecture but also incorporate the virtual edge. Having Docker natively incorporate clear scalable architectures will avoid the all too common problem of the network blocking innovation.
1. Programmable vSwitch
To implement this solution we require a programmable vSwitch.
Our initial focus will be to develop an API to implement the primitives required of the vSwitch for multi-host networking with a focus on delivering an implementation for Open vSwitch first.
This link, WHY-OVS covers the rational for choosing OVS and why it is important to the Docker ecosystem and virtual networking as a whole. Open vSwitch has a mature Kernel Data-Plane (upstream since 3.7) with a rich set of features that addresses the requirements of mult-host. In addition to the data-plane performance and functionality, Open vSwitch also has an integrated management-plane called OVSDB that abstracts the Switch as a Database for the applications to make use of.
With this proposal the native implementation in Docker will:
2. Network Integration
The various scenarios that we will deal with in this proposal range between existing Port-Mapping solution to VXLAN based Overlays to Native underlay network-integration. There are real deployment scenarios for each of these use-cases / scenarios.
Facilitate the common application HA scenario of a service needing a 1:1 NAT mapping between the container’s back-end ip-address and a front-end IP address from a routable address pool. Alternatively, the containers can also be reachable globally depending on the users IP addressing strategy.
3. Flexible Addressing / IP Address Management (IPAM)
In a multi-host environment, IP Addressing Strategy becomes crucial. Some of the Use-cases, as we will see, will also require reasonable IPAM in place. This discussion will also lead to the production-grade scale requirements of Layer2 vs Layer3 networks.
4. Host Discovery
Single Host Network Deployment Scenarios
This is the native Single-Host Docker Networking model as of today. This is the most basic scenario that the solution that we are proposing must address seamlessly. This scenario brings in the basic Open vSwitch integration into Docker which we can build on top of for the Multi-Host scenarios that follows.
Figure - 1
This scenario adds a Flexible Addressing scheme to the basic single-host use-case where we can provide IP addressing from one of many different sources
Figure - 2
Multi Host Network Deployment Scenarios
This following scenarios enables backend Docker containers to communicate with one another across multiple hosts. This fulfills the need for high availability applications to survive beyond a single node failure.
For environments which need to abstract the physical network, overlay networks need to create a virtual datapath using supported tunneling encapsulations (VXLAN, GRE, etc). It is just as important for these networks to be as reliable and consistent as the underlying network. Our experience leads us towards using similar consistency protocol such as a tenant aware BGP in order to achieve the worry free environment developers and operators desire. This also presents an evolvable architecture if a tighter coupling into the native network is of value in the future.
The overlay datapath is provisioned between tunnel endpoints residing in the Docker host which gives the appearance of all hosts within a given provider segment being directly connected to one another as depicted in the following Diagram 3.
Figure - 3
As a new container comes online, the prefix is updated in the routing protocol announcing its location via a tunnel endpoint. As the other Docker hosts receive the updates the forwarding is installed into OVS for which tunnel endpoint the host resides. When the host is deprovisioned, the similar process occurs and tunnel endpoint Docker hosts remove the forwarding entry for the deprovisioned container.
The backend can also simply be bridged into a networks broadcast domain and rely on upstream networking to provide reachability. Traditional L2 bridging has significant scaling issues but it is still very common in many data centers with flat VLAN architectures to facilitate live workload migrations of their VMs.
This model is fairly critical for DC architectures that require a tight coupling of network and compute as opposed to a ships in the night design of overlays abstracting the physical network.
The underlay network integration can be designed with some specific network architecture in mind and hence we see models like Google Compute where every host is assigned a dedicated Subnet & each pod gets an ip-address from that subnet.
Figure - 4 - Dedicated one Static Subnet per Host*
The entire backend container space can be advertised into the underlying network for IP reachability. IPv6 is becoming attractive for many in this scenario due to v4 constraints.
By extending L3 to the true edge of the network in the vSwitch it enables a proven network scale while still retaining the ability to perform disaggregated network services on the edge. Extending gateway protocols to the host will play a significant role in scaling a tight coupling to the network architecture.
Alternatively, Underlay integration can also provide Flexible addressing combined with /32 host-updates to the network in order to provide the subnet flexibility.
Figure - 5
Implementing the above solution provides a flexible, scalable, multi-host networking as a native part of Docker. This implementation adds a strong networking foundation that is intent on providing an evolvable network architecture for the future.
This sounds good. What I am not seeing is the API and performance. How does one go about setting this up? How much does it hurt performance?
One of the things we are trying to do in GCE is drive container network perf -> native. veth is awful from a perf perspective. We're working on networking (what you call underlay) without veth and a vbridge at all.
I like the idea of underlay networking in Docker. The first question is: how much can be bundled by default? Does an ovs+vxlan solution make sense as a default, in replacement of veth + regular bridge? Or should they be reserved for opt-in plugins?
@thockin do you have opinions on the best system mechanism to use?
Ah. My experience is somewhat limited.
Google has made good use of OVS internally.
veth pair performance is awful and unlikely to get better.
I have not plain with macvlan, but I understand it is ~wire speed, but a bit awkward to use.
We have a patch cooking that fills the need for macvlan-like perf without actually being VLAN (more like old-skool eth0:0 aliases).
If we're going to pick a default, I don't think OVS is the worst choice - it can't be worse perf than veth. But it's maybe more dependency heavy? Not sure.
@thockin @shykes Thanks for the comments.
OVS provides the flexibility of using VXLAN for overlay deployments or native network integration for underlay deployments without sacrificing performance or scale.
I haven't done much work with macvlan to give an answer on how it stacks up to an overall solution that includes functionality, manageability, performance, scale and network operations.
We believe that Native Docker networking solution should be flexible enough to accommodate L2, L3 and Overlay network architectures.
Hi Madhu, Dave and Team:
Definitely a wholesome view of the problem. Thanks for putting it out there. Few questions and comments (on both proposals  and , as they tie into each other quite a bit):
Comments and Questions on proposal on Native-Docker Multi-Host Networking:
[a] OVS Integration: The proposal is to natively instantiate ovs from docker is good.
Comments and Questions on proposal on ‘Network Drivers’:
[g] Multiple-vNICs inside a container: Are the APIs proposed here (CreatePort) handle creation of multiple vNICs inside a container?
[h] Update to Network configuration: Say a bridge is added with a VXLAN-VNID or a VLAN, would your suggestion be to call ‘InitBridge’ or be done during PortCreate() if the VLAN/tunnel/other-parameters-needed-for-port-create does not exist.
[j] Driver API performance/scale requirements: It would be good to state an upfront design target for scale/performance.
As always, will be happy to collaborate on this with you and other developers.
@thockin on the macvlan performance, are there any published figures?
from an underlay integration standpoint, I'd imagine that having a bridge would be much easier to manage as you could trunk all vlans to the vswitch and place the container port in the appropriate vlan.... otherwise with a load of mac addresses loose on your underlay you'd need to configure your underlay edge switches to apply a vlan based on a mac address (which won't be known in advance).
I feel like i'm missing something though so please feel free to correct me if i haven't quite grokked the macvlan use case
@jainvipin thanks for the mega feedback. I think the answer to a lot of your questions lies in these simple statements. I firmly believe that all network configuration should be done natively, as a part of Docker. I also believe that
Orchestration systems populating netns and/or bridge details on the host, then asking Docker to plumb this in to the container doesn't seem right to me. I'd much rather see orchestration systems converge on, or create a driver in this framework (or one like it) that does the necessary configuration in Docker itself.
For multi-host, the Network Driver API will be extended to support the required primitives for programming the dataplane. This could take the form of OF datapath programming in the case of OVS, but it could also be adding plain old ip routes in the kernel. This is really up to the driver.
To that end, all of the improvements we're suggesting here for multi-host designed to be agnostic to the backend used to deliver them.
The caveat here is that Docker can not be everything to everyone, and the
Having networking be externalized with a clean plugin interface (i.e. exec)
On Tue, Nov 4, 2014 at 6:44 PM, Dave Tucker email@example.com
@dave-tucker There are trade-offs of pulling everything (management, data-plane, and control-plane) in docker. While you highlighted the advantages (and I agree with some as indicated in my comment), I was noting a few disadvantages (versioning/compatibility, inefficiency, docker performance, etc.) so we can weigh it better. This is based on my understanding of things reading the proposal (no experimentation yet).
In contrast, if we can incorporate a small change (#8216) in docker, it can perhaps give scheduler/orchestrator/controller a good way to spawn the containers while allowing them to do networking related things themselves, and not have to move all networking natively inside docker – IMHO a good balance for what the pain point is and yet not make docker very heavy.
'docker run' has about 20-25 options now, some of them further provides more options (e.g. ‘-a’, or ‘—security-opt’). I don’t think it will remain 25 in near/short term, and likely grow rapidly to make it a flat unstructured set. The growth would come from valid use-cases (networking or non-networking), but must we consider solving that problem here in this proposal?
I think libswarm can work with either of the two models, where an orchestrator has to play a role of spawning ‘swarmd’ with appropriate network glue points.
What is about weave (https://github.com/zettio/weave)? Weave provides a very convenient SDN solution for Docker from my point of view. And it provides encryption out of the box, which is a true plus. And it is the only solution with out-of-the-box encryption so far, we have found on the open source market.
Nevertheless weaves impact to network performance in HTTP based and REST-like protocols is substantial. About 30% performance loss for small message sizes (< 1000 byte) and up to 70% performance loss for big message sizes (> 200.000 bytes). Performance losses were measured for the indicators time per request, transfer rate and requests per second using apachebench against a simple ping-pong system exchanging data using a HTTP based REST-like protocol.
We are writing a paper for the next CLOSER conference to present our performance results. There are some options to optimize weave performance (e.g. not containerizing the weave router should bring 10% to 15% performance plus according to our data).
At the same time, Docker will always have a default. Ideally that default should be enough for 80% of use cases , with plugins as a solution for the rest. When I ask about ovs as a viable default, it's in the context of this "batteries included but removable" model.
It's really exciting to see this proposal for Docker! The lack of multi-host networking has been a glaring gap in Docker's solution for a while now.
I just want to quickly propose an alternative, lighter-weight model that my colleagues and I have been working on. The OVS approach proposed here is great if it's necessary to put containers in layer 2 broadcast domains, but it's not immediately clear to me that this will be necessary for the majority of containerized workloads.
An alternative approach is pursue network virtualization at Layer 3. A good reference example is Project Calico. This approach uses BGP and ACLs to route traffic between endpoints (in this case containers). This is a much lighter-weight approach, so long as you can accept certain limitations: IP only, and no IP address overlap. Both of these feel like extremely reasonable limitations for a default Docker case.
We've prototyped Calico's approach with Docker, and it works perfectly, so the approach is simple to implement for Docker.
Docker is in a unique position to take advantage of lighter-weight approaches to virtual networking because it doesn't have the legacy weight of hypervisor approaches. It would be a shame to simply follow the path laid by hypervisors without evaluating alternative approaches.
(NB: I spotted #8952 and will comment there as well, I'd like the Calico approach to be viable for integration with Docker regardless of whether it's the default.)
I have some simple opinions here but they may be misguided, so please feel free to correct my assumptions. Sorry if this seems overly simplistic but plenty of this is very new to me, so I’ll focus on how I think this should fit into docker instead. I’m not entirely sure what you wanted me to weigh in on @shykes, so I’m trying to cover everything from a design angle.
I’ll weigh in on the nitty-gritty of the architecture after some more experimentation with openvswitch (you know, when I have a clue :).
After some consideration, I think weave, or something like it, should be the default networking system in docker. While this may ruffle some feathers, we absolutely have to support the simple use case. I think it’s safe to say developers don’t care about openvswitch, they care that they can start postgres and rails and they just work together. Weave brings this capability without a lot of dependencies at the cost of performance, and it’s very possible to embed directly into docker, with some collaborative work between us and the zettio team.
That said, openvswitch should definitely be available and first-class for production use (weave does not appear at a glance to be made especially demanding workloads) and ops professionals will appreciate the necessary complexity with the bonus flexibility. The socketplane guys seem extremely skilled and knowledgable with openvswitch and we should fully leverage that, standing on the shoulders of giants.
In general, I am all for anything that gets rid of this iptables/veth mess we have now. The code is very brittle and racy, with tons of problems, and basically makes life for ops a lot harder than it needs to be even in trivial deployments. At the end of the day, if ops teams can’t scale docker because of a poor network implementation it simply won’t get adopted in a lot of institutions.
The downside to all of this is if we execute on the above, that we have two first-class network solutions, both of which have to be meticulously maintained regularly, and devs and ops may have an impedance mismatch between dev and prod. I think that’s an acceptable trade for “it just works” on the dev side, as painful as it might end up being for docker maintainers. Ops can always create a staging environment (As they should) if they need to test network capabilities between alternatives, or help devs configure openvswitch if that’s absolutely necessary.
I would like to take plugin discussion to the relevant pull requests instead of here, I think it’s distracting from the discussion. Additionally, I don’t think the people behind the work in the plugin system are not specifically focused on networking, but instead a wider goal, so the best place to have that discussion is there.
I hope this was useful. :)
@thockin @jainvipin @shykes I just want to bring your attention to the point that this proposal tries to bring in solid foundation for network plumbing and is in no way precludes higher order orchestrators to add more value on top. I think adding more details on the API and integration will help clarify some of these concerns.
From the past, we have some deep scars in approaches that lets non-native solutions dictate the basic plumbing model, leading to a crippled default behavior and it fractures the community.
@Lukasa Please refer to a couple of important points in this proposal that exactly addresses yours :
"Our experience leads us towards using similar consistency protocol such as a tenant aware BGP in order to achieve the worry free environment developers and operators desire. This also presents an evolvable architecture if a tighter coupling into the native network is of value in the future."
"By extending L3 to the true edge of the network in the vSwitch it enables a proven network scale while still retaining the ability to perform disaggregated network services on the edge. Extending gateway protocols to the host will play a significant role in scaling a tight coupling to the network architecture."
Please refer to #8952 which provides the details on how a driver / plugin can help in choosing appropriate networking backend. I believe that is the right place to bring the discussion on including an alternative choice of another backend that will fit best in a certain scenarios.
This proposal is to explore all the multi-host networking options and exploring the Native Docker integration of those features.
@erikh Thanks for weighing in. Is there anything specific in the proposal that leads you to believe that it will make life of the application developer more complex ? We wanted to provide a wholesome view of the Network operations & choices in a multi-host production deployment and hence the proposal description became network operations heavy. I just wanted to assure you that It will by no way expose any complexity to the application developers.
One of the primary goals of Docker is to provide seamless and consistent mechanism from dev to production. Any impedance mismatch between dev and production should be discouraged.
+1 to "I think it’s safe to say developers don’t care about openvswitch, they care that they can start postgres and rails and they just work together."
This proposal is to bring multi-host networking Native to Docker, Transparent to Developers and Friendly to Operations.
I reckon that architecturally there are three layers here...
Crucially, 2) must make as few assumptions as possible about what docker networking looks like, such as to not artificially constrain/exclude different approaches.
As a strawman for 2), how about wiring a
I would like to see a simple but secure standard network solution (e.g. preventing arp spoofing. The current default config is vulnerable to this.). It should be easy to replace by something more comprehensive. And there should be an API that you can connect to your network management solution.
I'd like to see this as a composable external tool that works well when wrapped up as a Docker plugin, but doesn't assume anything about the containers it is working with. There's no reason why this needs to be specific to Docker. This also will require service discovery and cluster communication to work effectively, which should be a pluggable layer.
@erikh "developers don't care about openvswitch" - I agree.
Our solution is designed to be totally transparent to developers such that they can deploy their rails or postgres containers safe in the knowledge that the plumbing will be taken care of.
The other point of note here is that the backend doesn't have to be Open vSwitch - it could be whatever so long as it honours the API. You could theoretically have multi-host networking using this control plane, but linux bridge, iptables and whatever in the backend.
We prefer OVS, the only downside being that we require "openvswitch" to be installed on the host, but we've wrapped up all the userland elements in a docker container - the kernel module is available in 3.7+
Hi @MalteJ, Thanks for the feedback.
Given my limited experience, I don't see a compelling reason to do anything in L2 (ovs/vxlan). Is there an argument explaining why people want this? Generic UDP Encapsulation (GUE) seems to provide a simple, performant solution to this network overlay problem, and scales across various environments/providers.
@maceip @c4milo isn't GUE super new and poorly supported in the wild? Regarding vxlan+dove, I believe OVS can be used to manage it. Do you think we would be better off hitting the kernel directly? I can see the benefits of not carrying the entire footprint of OVS if we only use a small part of it - but that should be weighed against the difficulty of writing and maintaining new code. We faced a similar tradeoff between continuing to wrap lxc, or carrying our own implementation with libcontainer. Definitely not a no-brainer either way.
@shykes what do you mean by poorly supported? it just landed in the mainline kernel about 1 month ago and it is being worked on by Google.
Regarding VXLAN+DOVE, it certainly can be managed by OVS and I believe work to integrate it into OVS already started as well as into OpenDaylight.
I guess the decision comes down to the sort of networking Docker wants to provide. You can get as crazy as you want with things like Opendaylight, OpenContrail, OVS and the like, or use something simpler/lighter like VXLAN+DOVE or GUE which wouldn't have a fancy control plane or monitoring but that gets the job done too.
By "poorly supported" I simply mean very few machines with Docker installed
On Wednesday, November 5, 2014, Camilo Aguilar firstname.lastname@example.org
As someone else who's in the coalface of building overlay networks based
I'd like to approach this issue from a slightly different perspective
As folks are trying to migrate their existing applications into the
I believe that this use case is actually highly prevalent for a lot of
From our observations, there are a bunch of open questions in the world
One of the abstractions that other orchestration and cloud providers
From what we've done and what we've had customers ask us for, they often
So in conclusion, while the discussion about how all this can be
@mavenugo I am convinced that the proposal doesn't preclude higher order orchestration to add more value, in contrast may be this proposal requires an orchestrator to do that (which I am okay with).
Will all that benefits not come if OVS control/data/mgmt plane is not natively integrated into docker but is completely orchestrated from outside to provide with the network intent. Given that the solution requires some network orchestrator/controller to talk to it, the simplicity comes from that entity/integration and pehraps not native docker integration. Having OVS as a default docker bridge is good, but that may still not require all native integration.
When we say native, we mean native control of the network backend (linux bridge / IP-Tables or OVS or other backend) from the plugin layer (#8952).
This will keep the footprint small and free of external dependencies in order to get network plumbing taken care.
The proposal is trying to find that simplicity for Multi-Host Docker Networking without need for external controllers to manage network plumbing & at the same time not sacrificing on functionality and performance. Please refer to @dave-tucker comment about on the performance comparisons. (We have more data to share on these comparisons shortly).
Also I would recommend jumping to #8952 to discuss on the actual back-end choices via plugin model and we can hash out the API details together.
@liljenstolpe You can do L3-only over VXLAN if you want; choosing a different encapsulation format will disable hardware offload. Likewise OVS with learning disabled can be used as an L3-only vRouter.
Semantics and implementation are orthogonal in many ways, so maybe we should have a more focused discussion on desired semantics for the "batteries included" plugin first and then worry about the implementation. Obvious semantic questions are:
(Disclosure: IBM. We make SDN-VE and OpenDOVE.)
@liljenstolpe I agree and was not trying to imply OVS or VXLAN are the only considerations. I agree with @NetCubist that kernel VXLAN + SD can be a good enough solution. My preferred direction is to leave default networking as-is and use the plugins model to implement any additional networking functionality, but it sounds like Docker has already made their decision.
@thewmf The question is, in an L3-only network, do you NEED an overlay. In L2 networks you certainly do, and in some cases (such as L3 address overlap), an overlay network can address "issues", however they are not the only solutions, and the general case (say 90% of the traffic in a scale-out environment) they are probably not necessary. Therefore, do we want to assume that they will be present? It's an additional "cost" that may not always be justified.
@danehans The question is if we think that overlays are the base? If so, it's burdening the environment when it's not always necessary.
@liljenstolpe I think it's hard to define what is needed without having detailed requirements to build against. One cloud provider may say that supporting overlapping IP's is a requirement but another may say it's not needed. This is a good example of why we, as a community, need to clearly define the requirements. Thus far, high-level analogies are the only thing to build against.
@liljenstolpe @danehans Agreed. Different requirements will lead to different implementations, which is why I suggested that we discuss requirements. I don't think it makes sense to lock in any technology unless it is needed.
I am working in an environment where we want to allow customers to bring their own possibly-overlapping IP addresses so we are definitely looking at overlays, but we can use a plug-in for that. But I'd like to hear people's opinions on the future of default networking. I would like to see Docker move away from NAT and port mapping, but I'm not sure how to do that on random developers' laptops. Maybe IPv6 ULAs... can people stomach that?
referenced this issue
Jan 6, 2015
There's an official proposal for networking drivers which can be found at #9983.
This new proposal implements an architecture which has been discussed quite a bit. Implementing a proof of concept of the network drivers was also part of this effort.
Should you discover something is confusing or missing from the new proposal, please feel free to comment.
Questions and lengthy discussions are more adequate for the #docker-network channel on freenode. Should you just want to talk about this, that is a better place to have the conversation.
We'd like to thank everyone who's provided input, especially those who've sent proposals. I will close this proposal now.
Does this mean docker has no intention of developing/supporting multi-host networking natively? #9983 is just for the creation of a driver scheme, and not the specific goal of multi-host networking. If multi-host networking is still a goal, I would have expected this proposal to remain open, and for it to utilize #9983.
We're reopening this after some discussion with @mavenugo pointing out that our proposal is not a solution for everything in here -- and it should be much closer.
We want this in docker and we don't want to communicate otherwise. So, until we can at least mostly incorporate this proposal into our new extension architecture, we will leave it open and solicit comments.
@c4milo following is the docker-network IRC log between us regarding reopening the proposal.
madhu: erikh: backjlack thanks for all the great work
referenced this issue
Jan 13, 2015
Related to VxLAN and the network "overlay" the stumbling block to implementation/deployment was always the requirement for multicast to be enabled in the network... which is rare.
Last year Cumulus Networks and MetaCloud open sourced VXFLD to implement VxLAN with uni-cast and UDP.
They also submitted it for consideration for consideration as a standard.
MetaCloud has since been acquired by Cisco Systems.
VXFLD consists of 2 components that work together to solve the BUM (Broadcast, unicast Unknown & Multicast) problem with VxLAN by using unicast instead of the traditional multicast.
The 2 components are called VXSND and VXRD.
the source for VXFLD is on Github: https://github.com/CumulusNetworks/vxfld
Be sure to read the two github VXFLD directory .RST files as they describe in more detail the two daemon's for VXFLD ... VXRD and VXSND.
I thought I'd mention VXFLD as it could potentially solve part of your proposal and... the code already exists.
If you use debian or ubuntu Cumulus also has pre-packaged 3 .deb files for VXFLD:
referenced this issue
Feb 25, 2015
I'd like to chime in on this. I've been trying to put together a few arguments for and against doing this transparently to the user, and coming from a telco/"purist SDN" background it's hard to strike a middle ground between ease of use for small deployments and the kind of infrastructure we need to have it scale up into (and integrate with) datacenter solutions.
(I'm rather partial to the OpenVSwitch approach, really, but I understand how weave and pipework can be appealing to a lot of people)
So here are my notes:
This is just a high-level overview of how software-defined networking might work in a Docker/Swarm/Compose environment, written largely from a devops/IaaS perspective but with a fair degree of background on datacenter/telco networking infrastructure, which is fast converging towards full SDN.
There are two sides to the SDN story:
This document will focus largely on the first scenario and a set of user stories, with hints towards the second one at the bottom.
Offhand, there are two possible approaches from an end-user perspective:
This is largely described in http://www.slideshare.net/adrienblind/docker-networking-basics-using-software-defined-networks already, and is what pipework was designed to do.
Arguments for Keeping Things Simple (Sticking to Port Mapping)
Docker's primary networking abstraction is essentially port mapping/linking, with links exposed as environment variables to the containers involved - that makes application configuration very easy, as well as lessening CLI complexity.
Steering substantially away from that will shift the balance towards "full" networking, which is not necessarily the best way to go when you're focused on applications/processes rather than VMs.
Some IaaS providers (like Azure) provide a single network interface by default (which is then NATed to a public IP or tied to a load balancer, etc.), so the underlying transport shouldn't require extra network interfaces to work.
Arguments for Increasing Complexity (Creating Networks)
Docker does not exist in a vacuum. Docker containers invariably have to talk to services hosted in more conventional infrastructure, and Docker is increasingly being used (or at least proposed) by network/datacenter vendors as a way to package and deploy fairly low-level functionality (like traffic inspection, shaping, even routing) using solutions like OpenVSwitch and custom bridges.
Furthermore, containers can already see each other internally to a host - each is provided with a 172.17.0.0/16 IP address, which is accessible from other containers. Allowing users to define networks and bind containers to networks rather than solely ports may greatly simplify establishing connectivity between sets of containers.
However, using Linux kernel plumbing (or OpenVSwitch) to provide Docker containers with what amount to fully-functional network interfaces implies a number of additional considerations (like messing with brctl) that may have unforeseen (and dangerous) consequences in terms of security, not to to mention the need to eventually deal with routing and ACLs (which are currently largely the host's concern).
On the other hand, there is an obvious need to restrict container (outbound) traffic to some extent, and a number of additional benefits that stem from providing limited visibility onto a network segment, internal or otherwise.
There are a few requirements that seem fairly obvious:
Improvements (Step 1):
Further Improvements (Step 2):
Likely Approaches (none favored at this point):
referenced this issue
Mar 5, 2015
You need to pre-provision each docker0 with a different subnet range. Even
On Thu, Mar 19, 2015 at 10:24 PM, mk-qi email@example.com wrote:
@mk-qi : You can use "arping" which is essentially a utility to discover if an IP is already in use within a network. Thats the way you can make sure docker does not use the same set of IPs when its "over" multiple Hosts.
@thockin sorry , i has not draw the picture clearly . in fact the eth0 is the slave of docker0. and as i has said before , i can ping them on each other...
@shykes I saw your fork https://github.com/shykes/docker/tree/extensions/extensions/simplebridge it looks like it have ping ip operation before really assigning it, but i am not sure, i do not know whether you could give more information.
@fzansari thanks for reply , static ip allocation is ok , in fact we had useing pipwork +macvlan( +dhcp) for some small running cluster, but if running much of containers , this is very painful to manage ip, of course we can write tools, but I think hack the docker to directly Solveing the IP conflict problem , the problem will be much simpler. If this way is Possible
Having just implemented keepalived internally I think there would be an enormous benefit from simply implementing an interoperable vrrp protocol. It would allow docker to "play nice" without forcing it on every machine in the network.
Host 1 (ip address 10.0.0.1):
Host 2 (ip address 10.0.0.2: backup service)
Supporting vrrp give a very clean failover story and allows you to simply assign an IP to a service. It would take a lot to flesh out the details but I do think it would be an amazing change.