Skip to content

Latest commit

 

History

History
46 lines (30 loc) · 4.61 KB

vxlan-bridging-dci.md

File metadata and controls

46 lines (30 loc) · 4.61 KB
title date tags
VXLAN-to-VXLAN Bridging in DCI Environments
2022-06-20 23:50:00 -0700
VXLAN
data center
WAN

Almost exactly a decade ago I wrote that VXLAN isn't a data center interconnect technology. That's still true, but you can make it a bit better with EVPN -- at the very minimum you'll get an ARP proxy and anycast gateway. Even this combo does not address the other requirements I listed a decade ago, but maybe I'm too demanding and good enough works well enough.

However, there is one other bit that was missing from most VXLAN implementations: LAN-to-WAN VXLAN-to-VXLAN bridging. Sounds weird? Supposedly a picture is worth a thousand words, so here we go.

Most VXLAN-with-EVPN implementations can handle a single unified bridging domain -- an ingress VTEP sends traffic directly to an egress VTEP.

{{

}}

That works well in a data center environment but might result in two challenges when used over WAN links:

  • You're probably using ingress replication (assuming you're not a great fan of enabling large-scale IP multicast), which means that every ingress ToR switch sends a separate copy of a flooded packet over the WAN link to every egress ToR switch in the remote data center. Not exactly what you'd like to see on your expensive WAN link, right?
  • Switching ASICs support a limited number of VXLAN neighbors (usually 256) and a limited number of entries in the ingress replication list (usually 128). You might hit those limits when extending your VXLAN network across multiple sites1

Those challenges have a beautiful solution: VXLAN-to-VXLAN bridging between LAN and WAN bridging domains on the WAN edge switches:

  • WAN edge switches act as final VXLAN VTEP for LAN and WAN peers. LAN peers do not need to care about VTEPs in remote sites. WAN peers do not need to care about local VTEPs.
  • WAN edge switches receive a single copy of a flooded packet (from LAN or WAN side) and flood it further.

{{

}}

For more details, watch the excellent Using VXLAN and EVPN in Multi-Pod and Multi-Site Fabrics presentation by Lukas Krattiger, or read the Multi-Domain EVPN VXLAN document on Arista's web site (warning: regwall).

There's just a tiny little problem – the switching ASIC on the WAN edge devices has to implement VXLAN-to-VXLAN bridging which includes:

  • Split-horizon forwarding: whatever is received from LAN peers should not be sent to WAN peers and vice versa
  • Split-horizon flooding: whatever is received from LAN peers must be flooded to WAN peers and vice versa.
  • No cheating with VXLAN VNI -- identification of LAN and WAN peers must be done based on source IP addresses, not based on different VNIs

For years, it looked like the only ASIC capable of doing VXLAN-to-VXLAN bridging was Cisco's Cloud Scale ASIC... until Arista decided that's a problem worth solving and figured out how to do it with Broadcom Jericho chipset. According to the 2022 EANTC test report, the VXLAN-to-VXLAN stitching also works on Juniper QFX10K and Nokia 7750 SR-1.

More details

Thank You

Remi Locherer sent me a nice email after the June 2022 design clinic saying "your information is a bit outdated" and included the link to 2022 EANTC test report and Arista documentation. I solemnly promise to augment those videos with I was wrong callouts once I get them back from the editor.

Footnotes

  1. Should that be the case, I'm hoping you're not designing your network based on generic blog posts. I'm trying to be less biased than vendor white papers, but if you have such a large network you're deep in the It Depends territory and need a proper network design.