-
Notifications
You must be signed in to change notification settings - Fork 334
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Lei Huang <leih@nvidia.com>
- Loading branch information
Lei Huang
committed
May 15, 2024
1 parent
fc97199
commit 174c7ea
Showing
2 changed files
with
198 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
# Multi-VTEP | ||
|
||
## Introduction | ||
|
||
When a k8s cluser has multiple networks and k8s nodes have multiple SR-IOV | ||
network adapters to provide different overlaying virtual networks over the | ||
physical networks, multple VTEP(Virtual Tunnel EndPoint) devices are | ||
required for encapsulating and de-encapsulating the network traffic from/to | ||
virtual networks. The challenge is in deciding which VTEP to use to | ||
send/receive the packets to/from the wire. This feature instructs OVN to use | ||
the appropriate VTEP IP as the source or destination IP for east-west traffic. | ||
|
||
## Motivation | ||
|
||
For applications with network traffic that can be categorized by purpose and | ||
are sensitive to network latency, it is advantageous to use multiple NICs | ||
designated for different purposes. This enhances the performance, security, | ||
and flexibility of the network. Additionally, with network adapters that | ||
support SR-IOV, traffic can be offloaded to hardware, further reducing | ||
network latency and decreasing host CPU usage. | ||
|
||
|
||
### User-Stories/Use-Cases | ||
|
||
Consider a cloud environment providing online video streaming, | ||
where each node is equipped with three NICs. One NIC could be dedicated to | ||
management traffic, another to video streaming traffic, and the third to | ||
storage. In this setup, each NIC would have its own VTEP IP. This | ||
configuration ensures that overlay traffic from a Pod’s VF is steered through | ||
the specific NIC associated with that VF. For example, streaming traffic must | ||
be routed exclusively through the second NIC, optimizing network performance | ||
and traffic segregation. | ||
|
||
![multi-vtep](../../images/multi-vtep.png) | ||
|
||
## How to enable this feature on an OVN Kubernetes cluster? | ||
|
||
On OVN Kubernetes side, no addtional configuration required to enable this feature. | ||
|
||
This feature depends on a specific underlay network setup; it cannot be turned off | ||
without an adequate underlay network configuration. | ||
|
||
## Implementation Details | ||
|
||
This feature requires an OVN version newer than v24.03.2. | ||
|
||
When the source and destination are overlay IPs, the packets need to be | ||
encapsulated and sent through a tunnel. When there are multiple NICs | ||
used for tunneling traffic, each with a different VTEP IP, OVN needs | ||
to select the right tunnel endpoint to send the traffic, based on the | ||
below criteria: | ||
|
||
* The tunnel/NIC selection should be based on the source & destination | ||
interface of the traffic, which should be aligned with the VF selection | ||
specified by ovn-k8s. | ||
* The source tunnel IP should be selected according to the source | ||
interface of the packet. | ||
* The destination tunnel IP should be selected according to the destination | ||
interface of the packet. | ||
* If an interface is not assigned a VTEP, traffic to/from the interface can | ||
choose any VTEP of the host. | ||
* For BUM(broadcast, unknown-unicast, and multicast traffic) traffic, it | ||
is ok to select any VTEP. | ||
|
||
#### OVN Constructs created in the databases | ||
|
||
OVN Sourthbound Database: | ||
* Port_Binding:encap - Populated by ovn-controller according to VIF | ||
interface’s external_ids:ovn-encap-ip. | ||
* Encap:IP - The IP in the corresponding Encap record will be used | ||
as the VTEP IP. | ||
|
||
OVS Database on chassis: | ||
* Each Chassis may have multiple Encaps, determined by | ||
`open_vswitch:external_ids:ovn-encap-ip` which is a comma separated | ||
string of IPs. | ||
* Each VIF interface (if needs to be assigned to a particular VTEP) | ||
will have `external_ids:ovn-encap-ip`, which must be one of the IPs | ||
configured in open_vswitch:external_ids:ovn-encap-ip of the chassis. | ||
* A option `open_vswitch:external_ids:ovn-pf-encap-ip-mapping` is | ||
added to define the mapping between PF interface name and its VTEP IP. | ||
|
||
|
||
### OVN Kubernetes Implementation Details | ||
|
||
To support this feature, the VTEP interfaces must be configured in advance. | ||
This can be accomplished using a system network management tool, such as | ||
Network Manager. | ||
|
||
Assuming a node has 3 NICs and VTEP interfaces: | ||
| PF | VTEP | VTEP IP | | ||
| :------- | :---- | :----------- | | ||
| enp1s0f0 | vtep0 | 10.0.0.1/23 | | ||
| enp2s0f0 | vtep1 | 10.0.0.2/23 | | ||
| enp3s0f0 | vtep2 | 10.0.0.3/23 | | ||
|
||
|
||
The VTEP interfaces on OVS bridges should be configured as below: | ||
``` | ||
$ ovs-vsctl show | ||
e620a080-c761-44d0-adb3-203bb1df6d67 | ||
Bridge brenp1s0f1 | ||
... | ||
Port vtep0 | ||
tag: 25 # just an example VLAN-ID | ||
Interface vtep0 | ||
type: internal | ||
Bridge brenp2s0f0 | ||
... | ||
Port vtep1 | ||
tag: 25 | ||
Interface vtep1 | ||
type: internal | ||
Bridge brenp3s0f0 | ||
... | ||
Port vtep2 | ||
tag: 25 | ||
Interface vtep2 | ||
type: internal | ||
``` | ||
|
||
Below Open_vSwitch external_ids need to be configured in osvsdb: | ||
- `external_ids:ovn-encap-ip`: set it to a list of vtep IPs, separated by | ||
commas, in any order. | ||
- `external_ids:ovn-pf-encap-ip-mapping`: set it to list of vtep-interface | ||
to IP pairs separated by comma. | ||
|
||
For example: | ||
``` | ||
$ ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:ovn-encap-ip | ||
"10.0.0.1,10.0.0.2,10.0.0.3" | ||
$ ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:ovn-pf-encap-ip-mapping | ||
"enp1s0f0:10.0.0.1,enp2s0f0:10.0.0.2,enp3s0f0:10.0.0.3" | ||
``` | ||
With above `external_ids:ovn-encap-ip` settings, OVN creates 3 tunnel ports for | ||
each encap-ip on remote node, that's, if remote node has 2 encap-ips in | ||
`external_ids:ovn-encap-ip`, 6 tunel ports will be created on the local node. | ||
For example, if remote node only has one encap-ip "10.0.0.10", OVN creates | ||
below tunnel ports on the local node: | ||
``` | ||
$ ovs-vsctl show | ||
Bridge br-int | ||
Port ovn0-5e3e5-0 | ||
Interface ovn0-5e3e5-0 | ||
type: geneve | ||
options: {csum="true", key=flow, local_ip="10.0.0.1", remote_ip="10.0.0.10", tos=inherit} | ||
Port ovn0-e815c-1 | ||
Interface ovn0-e815c-1 | ||
type: geneve | ||
options: {csum="true", key=flow, local_ip="10.0.0.2", remote_ip="10.0.0.10", tos=inherit} | ||
Port ovn-33f2e5-0 | ||
Interface ovn-33f2e5-0 | ||
type: geneve | ||
options: {csum="true", key=flow, local_ip="10.0.0.3", remote_ip="10.0.0.10", tos=inherit} | ||
``` | ||
|
||
|
||
Ovn-controller programs physical flows that create mappings between | ||
<in_port, out_port> and <tunnel port> according to the aforementioned data | ||
model. The <in_port> information needs to be carried through a global register. | ||
**The bits 16-31 of the reg13 can be used for this purpose**. The reg13 is | ||
currently used to store conntrack zones for each VIF. However, it doesn’t | ||
really require 32 bits, and the first 16 bits (0-15) are sufficient. Its | ||
current usage with assignment of all 32 bits in certain pipelines need | ||
to be updated. | ||
|
||
When ovnkube-node adding a Pod's network interfaces to OVS br-int during the Pod | ||
creation, it identifies the corresponding PF interface for each VF interface, it | ||
then looks up `external_ids:ovn-pf-encap-ip-mapping` to determine the encap IP, | ||
which is subsequently assigned to the OVS Port's `external_ids:ovn-encap-ip`. | ||
This encap IP will be utilized as the source IP in the tunnel header. | ||
|
||
To steer outgoing traffic through the correct physical interface, | ||
the following source-based routing rules are required on k8s node: | ||
|
||
``` | ||
$ ip rule list | ||
... | ||
6081: from 10.0.0.1 lookup 6081 proto static | ||
6082: from 10.0.0.2 lookup 6082 proto static | ||
6083: from 10.0.0.3 lookup 6083 proto static | ||
$ ip route list table 6081 | ||
10.0.0.0/23 dev vtep0 scope link | ||
10.0.0.0/16 via 10.0.0.1 dev vtep0 proto static metric 804 | ||
$ ip route list table 6082 | ||
10.0.0.0/23 dev vtep1 scope link | ||
10.0.0.0/16 via 10.0.0.1 dev vtep1 proto static metric 802 | ||
$ ip route list table 6083 | ||
10.0.0.0/23 dev vtep2 scope link | ||
10.0.0.0/16 via 10.0.0.1 dev vtep2 proto static metric 806 | ||
``` | ||
|
||
## Known Limitations | ||
|
||
This feature only works with network adapters that support SR-IOV, | ||
e.g. NVIDIA CONNECTX-6 or NVIDIA BlueField-2 in NIC mode. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.