Rewrite concepts/layer2.md to account for #195 and #257.

metallb · Jul 21, 2018 · 4d621fc · 4d621fc
1 parent f5268e5
commit 4d621fc
Show file tree

Hide file tree

Showing 2 changed files with 92 additions and 83 deletions.
diff --git a/website/content/concepts/_index.md b/website/content/concepts/_index.md
@@ -49,13 +49,12 @@ this: ARP, NDP, or BGP.
 
 ### Layer 2 mode (ARP/NDP)
 
-In layer 2 mode, one machine in the cluster takes ownership of the
-service IPs, and uses standard address discovery protocols
-([ARP](https://en.wikipedia.org/wiki/Address_Resolution_Protocol) for
-IPv4, [NDP](https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol)
-for IPv6) to make those IPs reachable on the local network. From the
-LAN's point of view, the announcing machine simply has multiple IP
-addresses.
+In layer 2 mode, one machine in the cluster takes ownership of the service, and
+uses standard address discovery protocols
+([ARP](https://en.wikipedia.org/wiki/Address_Resolution_Protocol) for IPv4,
+[NDP](https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol) for IPv6) to
+make those IPs reachable on the local network. From the LAN's point of view, the
+announcing machine simply has multiple IP addresses.
 
 The [layer 2 mode]({{% relref "layer2.md" %}}) sub-page has more
 details on the behavior and limitations of layer 2 mode.

diff --git a/website/content/concepts/layer2.md b/website/content/concepts/layer2.md
@@ -3,89 +3,99 @@ title: MetalLB in layer 2 mode
 weight: 1
 ---
 
-In layer 2 mode, one node in your cluster assumes the responsibility
-of advertising all service IPs to the local network. From the
-network's perspective, it simply looks like that machine has multiple
-IP addresses assigned to its network interface.
+In layer 2 mode, one node assumes the responsibility of advertising a service to
+the local network. From the network's perspective, it simply looks like that
+machine has multiple IP addresses assigned to its network interface.
 
-Under the hood, MetalLB responds to [ARP](https://en.wikipedia.org/wiki/Address_Resolution_Protocol)
-requests for IPv4 services, and [NDP](https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol) requests for IPv6.
+Under the hood, MetalLB responds to
+[ARP](https://en.wikipedia.org/wiki/Address_Resolution_Protocol) requests for
+IPv4 services, and
+[NDP](https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol) requests for
+IPv6.
 
-The major advantage of the layer 2 mode is its universality: it will
-work on any ethernet network, with no special hardware required, not
-even fancy routers.
+The major advantage of the layer 2 mode is its universality: it will work on any
+ethernet network, with no special hardware required, not even fancy routers.
 
 ## Load-balancing behavior
 
-In layer 2 mode, all traffic for all service IPs goes to one
-node. From there, `kube-proxy` spreads the traffic to all the
-service's pods.
+In layer 2 mode, all traffic for a service IP goes to one node. From there,
+`kube-proxy` spreads the traffic to all the service's pods.
 
-In that sense, layer 2 does not implement a load-balancer. Rather, it
-implements a failover mechanism so that a different node can take over
-should the current leader node fail for some reason.
+In that sense, layer 2 does not implement a load-balancer. Rather, it implements
+a failover mechanism so that a different node can take over should the current
+leader node fail for some reason.
 
-If the leader node fails for some reason, failover is automatic: the
-old leader's lease times out after 10 seconds, at which point another
-node becomes the leader and takes over ownership of all addresses.
+If the leader node fails for some reason, failover is automatic: the old
+leader's lease times out after 10 seconds, at which point another node becomes
+the leader and takes over ownership of the service IP.
 
 ## Limitations
 
-Layer 2 mode has two main limitations you should be aware of:
-single-node bottlenecking, and potentially slow failover.
-
-As explained above, in layer2 mode a single leader-elected node
-receives all traffic for all service IPs. This means that your cluster
-ingress bandwidth is limited to the bandwidth of a single node. This
-is a fundamental limitation of using ARP and NDP to steer traffic.
-
-In the current implementation, failover between nodes depends on
-cooperation from the clients. When a failover occurs, MetalLB sends a
-number of gratuitous layer 2 packets (a bit of a misnomer - it should
-really be called "unsolicited layer 2 packets") to notify clients that
-the MAC address associated with the service IPs has changed.
-
-Most operating systems handle "gratuitous" packets correctly, and
-update their neighbor caches promptly. In that case, failover happens
-within a few seconds. However, some systems either don't implement
-gratuitous handling at all, or have buggy implementations that delay
-the cache update.
-
-All modern versions of major OSes (Windows, Mac, Linux) implement
-layer 2 failover correctly, so the only situation where issues may
-happen is with older or less common OSes.
-
-To minimize the impact of planned failover on buggy clients, you
-should keep the old leader node up for a couple of minutes after
-flipping leadership, so that it can continue forwarding traffic for
-old clients until their caches refresh.
-
-During an unplanned failover, the service IPs will be unreachable
-until the buggy clients refresh their cache entries.
-
-If you encounter a situation where layer 2 mode failover is slow (more
-than about 10s),
-please [file a bug](https://github.com/google/metallb/issues/new)! We
-can help you investigate and determine if the issue is with the
-client, or a bug in MetalLB.
-
-## Note: Architectural comparison with other similar solutions 
-
-It may seem that metallb in layer 2 mode is very similar to projects such as 
-KeepAliveD that use layer 2 networking protocols such as Virtual Router Redundancy Protocol (VRRP).
- Although the high level functionality is similar, the details are quite different. 
-
-Metallb does not rely 
-on VRRP packets on the wire between the nodes implementing the load balancing/ failover. 
-Arbitration and selection of the active node happens completely in the metallb control plane 
-without need for sending/ receiving special layer 2 packets such as VRRP.  As 
-a consequence, the limit of 255 load balanced/ service IPs per network (which exists with VRRP, 
-KeepAliveD and similar approaches) does not apply in case of metallb. There is also no need 
-for additional configuration objects such as Virtual Router IDs as needed by VRRP.
-
-However as mentioned above, the current implementation of metallb (at least as of release v0.6.2) does not support 
-a mechanism for spreading the location of the service IPs to different nodes in a way that different service IPs 
-are active/ primary on different nodes.  Hence (unlike VRRP based approaches) there is no current ability 
-to distribute network traffic for multiple service IPs to different nodes. This limitation may be 
-addressed in a future release of metallb. 
-
+Layer 2 mode has two main limitations you should be aware of: single-node
+bottlenecking, and potentially slow failover.
+
+As explained above, in layer2 mode a single leader-elected node receives all
+traffic for a service IP. This means that your service's ingress bandwidth is
+limited to the bandwidth of a single node. This is a fundamental limitation of
+using ARP and NDP to steer traffic.
+
+In the current implementation, failover between nodes depends on cooperation
+from the clients. When a failover occurs, MetalLB sends a number of gratuitous
+layer 2 packets (a bit of a misnomer - it should really be called "unsolicited
+layer 2 packets") to notify clients that the MAC address associated with the
+service IP has changed.
+
+Most operating systems handle "gratuitous" packets correctly, and update their
+neighbor caches promptly. In that case, failover happens within a few
+seconds. However, some systems either don't implement gratuitous handling at
+all, or have buggy implementations that delay the cache update.
+
+All modern versions of major OSes (Windows, Mac, Linux) implement layer 2
+failover correctly, so the only situation where issues may happen is with older
+or less common OSes.
+
+To minimize the impact of planned failover on buggy clients, you should keep the
+old leader node up for a couple of minutes after flipping leadership, so that it
+can continue forwarding traffic for old clients until their caches refresh.
+
+During an unplanned failover, the service IPs will be unreachable until the
+buggy clients refresh their cache entries.
+
+If you encounter a situation where layer 2 mode failover is slow (more than
+about 10s), please [file a bug](https://github.com/google/metallb/issues/new)!
+We can help you investigate and determine if the issue is with the client, or a
+bug in MetalLB.
+
+## Comparison to Keepalived
+
+MetalLB's layer2 mode has a lot of similarities to Keepalived, so if you're
+familiar with Keepalived, this should all sound fairly familiar. However, there
+are also a few differences worth mentioning. If you aren't familiar with
+Keepalived, you can skip this section.
+
+Keepalived uses the Virtual Router Redundancy Protocol (VRRP). Instances of
+Keepalived continuously exchange VRRP messages with each other, both to select a
+leader and to notice when that leader goes away.
+
+MetalLB on the other hand relies on Kubernetes to know when pods and nodes go up
+and down. It doesn't need to speak a separate protocol to select leaders,
+instead it just lets Kubernetes do most of the work of deciding which pods are
+healthy, and which nodes are ready.
+
+Keepalived and MetalLB "look" the same from the client's perspective: the
+service IP address seems to migrate from one machine to another when failovers
+happen, and the rest of the time it just looks like machines have more than one
+IP address.
+
+Because it doesn't use VRRP, MetalLB isn't subject to some of the limitations of
+that protocol. For example, the VRRP limit of 255 load-balancers per network
+doesn't exist in MetalLB. You can have as many load-balanced IPs as you want, as
+long as there are free IPs in your network. MetalLB also requires less
+configuration than VRRP – for example, there are no Virtual Router IDs 
+
+On the flip side, because MetalLB relies on Kubernetes for information instead
+of a standard network protocol, it cannot interoperate with third-party
+VRRP-aware routers and infrastructure. This is working as intended: MetalLB is
+specifically designed to provide load balancing and failover _within_ a
+Kubernetes cluster, and in that scenario interoperability with third-party LB
+software is out of scope.