Skip to content

Commit

Permalink
add GUE encap (#60)
Browse files Browse the repository at this point in the history
Summary:
this is initial work to add GUE encapsulation option for katran.
Reason behind this is that currently we are using IPIP. but for IPIP to
play nice w/ ECMP and RSS we are "faking" src ip of outer IP header.
that makes debuging anything related to "from which load balancer i have
recved this flow" hard. GUE would allow us to have benefit of preserving
(or setting to whatever we want) source IP and at the same time allow
for ECMP/RSS to works (as network or NICs are going to use src port
for hashing. which is going to be unique per flow). the price for this
is 8 bytes of UDP header.

This diff adds gue encapsulation in forwarding plane. we are adding
variant 1 from https://tools.ietf.org/html/draft-ietf-intarea-gue-07
(aka no additional headers w/ metadata)
what is missing for now is:
- gue based encap for healthchecks
- gue decapsulation.

to build katran w/ GUE encapsulation it must be compile w/
-DGUE_ENCAP option.

Also as a part of this diff dissector for wireshark was added
(so encapsulated packets would be visible in wireshark).

Tests:
katran_tester:
added new test fixtures for GUE encap + "-gue" option to use em.
to see the output of the test (and to make sure that everything looks
sane) you can run em with -gue and -pcap_output=/tmp/test.pcap
in that case test.pcap would contain all the input and output packets
of the test. for GUE we are forcing udp checksum to be 0 for outer
packet so some "csum failed" output in wireshark is expected.
example of pcap output: https://gist.github.com/tehnerd/50a8ee7c47000941c3e41872d2fa6136

actual tests w/ katran + test host:
(w/ modifications in example grpc server (which are not in this diff):
```
 diff --git a/example_grpc/katran_server.cpp b/example_grpc/katran_server.cpp
index b3c17ad..8b17118 100644
 --- a/example_grpc/katran_server.cpp
+++ b/example_grpc/katran_server.cpp
@@ -129,6 +129,8 @@ int main(int argc, char** argv) {
   config.forwardingCores = forwardingCores;
   config.numaNodes = numaNodes;
   config.hcInterface = FLAGS_hc_intf;
+  config.katranSrcV4 = "10.0.13.37";
+  config.katranSrcV6 = "fc00:2307::1337";
```

katran's config:
```
2019/10/03 17:06:23 vips len 2
VIP:            fc00:1::1 Port:     22 Protocol: tcp
Vip's flags:
 ->fc00::1           weight: 1
VIP:             10.0.0.1 Port:     22 Protocol: tcp
Vip's flags:
 ->fc00::1           weight: 1
 ->192.168.102.129   weight: 1
exiting
```

reals configuration (to support gue):
```
modprobe fou
ip fou add port 6080 gue
ip link add name gue_v4 type ipip external encap gue encap-sport auto encap dport 6080
ip link set gue_v4 up
sudo net.ipv4.conf.gue_v4.rp_filter=0
net.ipv4.conf.default.rp_filter=0
ip fou add porto 6080 gue -6
ip link add name gue_v6 type ip6tnl external encap gue encap-sport auto encap-dport 6080
ip link set up dev gue_v6

some outputs:

tehnerd@tbox1:~$ ip fou show
port 6080 gue -6
port 6080 gue

tehnerd@tbox1:~$ ip -json link show dev gue_v4
[{
        "ifindex": 7,
        "ifname": "gue_v4",
        "link": null,
        "flags": ["NOARP","UP","LOWER_UP"],
        "mtu": 1480,
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ipip",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0"
    }
]

tehnerd@tbox1:~$ ip -json link show dev gue_v6
[{
        "ifindex": 9,
        "ifname": "gue_v6",
        "link": null,
        "flags": ["NOARP","UP","LOWER_UP"],
        "mtu": 1452,
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::"
    }
]

```

v4 in v4:
vip is 10.0.0.1 in dump we are see packets which goes to load balancer
(first one) then gue from load balancer to local server and then reply
from local server to the client. this convo basically means that local
server was able to decap v4inv4 gue (it passed all the checks,
encapsulated packet was delivered to application and application
replied).

```
16:58:51.008903 IP 192.168.102.140.60000 > 10.0.0.1.22: Flags [S], seq 1298498081, win 32768, length 0
16:58:51.010958 IP 10.0.13.37.1638 > 192.168.102.129.6080: UDP, length 40
16:58:51.010995 IP 10.0.0.1.22 > 192.168.102.140.60000: Flags [S.], seq 122077391, ack 1298498082, win 29200, options [mss 1460], length 0
```

v6inv6:
same as above.

```
17:02:55.576750 IP6 fc00::10.60000 > fc00:1::1.22: Flags [S], seq 1298498081, win 32768, length 0
17:02:55.578319 IP6 fc00:2307::1337.60000 > fc00::1.6080: UDP, length 60
17:02:55.578498 IP6 fc00:1::1.22 > fc00::10.60000: Flags [S.], seq 3064362547, ack 1298498082, win 28800, options [mss 1440], length 0
```

v4inv6:
17:05:15.541911 IP 192.168.102.140.60004 > 10.0.0.1.22: Flags [S], seq 1298498081, win 32768, length 0
17:05:15.543775 IP6 fc00:2307::1337.36072 > fc00::1.6080: UDP, length 40
17:05:15.543812 IP 10.0.0.1.22 > 192.168.102.140.60004: Flags [S.], seq 173938774, ack 1298498082, win 29200, options [mss 1460], length 0
Pull Request resolved: #60

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

canary https://our.intern.facebook.com/intern/tupperware/canary/view/?experiment_id=4836154

Reviewed By: mjoras

Differential Revision: D17741225

Pulled By: udippant

fbshipit-source-id: 539917985f47f688b8b255f78841045843f24743
  • Loading branch information
tehnerd authored and facebook-github-bot committed Nov 5, 2019
1 parent 063f33c commit 74c3338
Show file tree
Hide file tree
Showing 12 changed files with 672 additions and 4 deletions.
28 changes: 28 additions & 0 deletions katran/lib/KatranLb.cpp
Expand Up @@ -40,6 +40,8 @@ constexpr int kError = -1;
constexpr uint32_t kMaxQuicId = 65535; // 2^16-1
constexpr uint32_t kDefaultStatsIndex = 0;
constexpr folly::StringPiece kEmptyString = "";
constexpr uint32_t kSrcV4Pos = 0;
constexpr uint32_t kSrcV6Pos = 1;
} // namespace

KatranLb::KatranLb(const KatranConfig& config)
Expand Down Expand Up @@ -304,6 +306,23 @@ void KatranLb::attachLrus() {
}
}

void KatranLb::setupGueEnvironment() {
auto srcv4 = IpHelpers::parseAddrToBe(folly::IPAddress(config_.katranSrcV4));
auto srcv6 = IpHelpers::parseAddrToBe(folly::IPAddress(config_.katranSrcV6));
uint32_t key = kSrcV4Pos;
auto res = bpfAdapter_.bpfUpdateMap(
bpfAdapter_.getMapFdByName("pckt_srcs"), &key, &srcv4);
if (res < 0) {
throw std::runtime_error("can not update src v4 address for GUE packet");
}
key = kSrcV6Pos;
res = bpfAdapter_.bpfUpdateMap(
bpfAdapter_.getMapFdByName("pckt_srcs"), &key, &srcv6);
if (res < 0) {
throw std::runtime_error("can not update src v6 address for GUE packet");
}
}

void KatranLb::featureDiscovering() {
int res;
res = bpfAdapter_.getMapFdByName("lpm_src_v4");
Expand All @@ -321,6 +340,11 @@ void KatranLb::featureDiscovering() {
VLOG(2) << "katran introspection is enabled";
features_.introspection = true;
}
res = bpfAdapter_.getMapFdByName("pckt_srcs");
if (res >= 0) {
VLOG(2) << "GUE encapsulation is enabled";
features_.gueEncap = true;
}
}

void KatranLb::startIntrospectionRoutines() {
Expand Down Expand Up @@ -351,6 +375,10 @@ void KatranLb::loadBpfProgs() {
initialSanityChecking();
featureDiscovering();

if (features_.gueEncap) {
setupGueEnvironment();
}

// add values to main prog ctl_array
std::vector<uint32_t> balancer_ctl_keys = {kMacAddrPos};

Expand Down
6 changes: 6 additions & 0 deletions katran/lib/KatranLb.h
Expand Up @@ -640,6 +640,12 @@ class KatranLb {
*/
bool changeKatranMonitorForwardingState(KatranMonitorState state);

/*
* setupGueEnvironment prepare katran to run w/ GUE encap (e.g. setting up
* src addresses for outer packets)
*/
void setupGueEnvironment();

/**
* main configurations of katran
*/
Expand Down
6 changes: 6 additions & 0 deletions katran/lib/KatranLbStructs.h
Expand Up @@ -40,6 +40,7 @@ constexpr unsigned int kDefaultLruSize = 8000000;
constexpr uint32_t kNoFlags = 0;
std::string kNoExternalMap = "";
std::string kDefaultHcInterface = "";
std::string kAddressNotSpecified = "";
} // namespace

/**
Expand Down Expand Up @@ -143,6 +144,8 @@ struct KatranMonitorConfig {
* @param std::string hcInterface interface where we want to attach hc bpf prog
* @param KatranMonitorConfig monitorConfig for katran introspection
* @param memlockUnlimited should katran set memlock to unlimited by default
* @param katranSrcV4 string ipv4 source address for GUE packets
* @param katranSrcV6 string ipv6 source address for GUE packets
*
* note about rootMapPath and rootMapPos:
* katran has two modes of operation.
Expand Down Expand Up @@ -186,6 +189,8 @@ struct KatranConfig {
uint32_t xdpAttachFlags = kNoFlags;
struct KatranMonitorConfig monitorConfig;
bool memlockUnlimited = true;
std::string katranSrcV4 = kAddressNotSpecified;
std::string katranSrcV6 = kAddressNotSpecified;
};

/**
Expand Down Expand Up @@ -244,6 +249,7 @@ struct KatranFeatures {
bool srcRouting{false};
bool inlineDecap{false};
bool introspection{false};
bool gueEncap{false};
};

/**
Expand Down
24 changes: 24 additions & 0 deletions katran/lib/bpf/balancer_consts.h
Expand Up @@ -206,6 +206,13 @@
#define COPY_INNER_PACKET_TOS 1
#endif

// defaut GUE dst port
#ifndef GUE_DPORT
#define GUE_DPORT 6080
#endif

#define GUE_CSUM 0

// initial value for jhash hashing function, used to pick up a real server
#ifndef INIT_JHASH_SEED
#define INIT_JHASH_SEED CH_RINGS_SIZE
Expand All @@ -231,13 +238,30 @@
*
* INLINE_DECAP - allow do to inline ipip decapsulation in XDP context
*
* GUE_ENCAP - use GUE (draft-ietf-intarea-gue) as encapsulation method
*
* KATRAN_INTROSPECTION - katran will start to perfpipe packet's header which
* have triggered specific events
*/
#ifdef LPM_SRC_LOOKUP
#define INLINE_DECAP
#endif

#ifdef GUE_ENCAP
#define PCKT_ENCAP_V4 gue_encap_v4
#define PCKT_ENCAP_V6 gue_encap_v6
#else
#define PCKT_ENCAP_V4 encap_v4
#define PCKT_ENCAP_V6 encap_v6
#endif


/**
* positions in pckts_srcs table
*/
#define V4_SRC_INDEX 0
#define V6_SRC_INDEX 1

// maximum size of packets header which we would write to event pipe
// if KATRAN_INTROSPECTION is enabled
#ifndef MAX_EVENT_SIZE
Expand Down
4 changes: 2 additions & 2 deletions katran/lib/bpf/balancer_kern.c
Expand Up @@ -510,11 +510,11 @@ static inline int process_packet(void *data, __u64 off, void *data_end,
}

if (dst->flags & F_IPV6) {
if(!encap_v6(xdp, cval, is_ipv6, &pckt, dst, pkt_bytes)) {
if(!PCKT_ENCAP_V6(xdp, cval, is_ipv6, &pckt, dst, pkt_bytes)) {
return XDP_DROP;
}
} else {
if(!encap_v4(xdp, cval, &pckt, dst, pkt_bytes)) {
if(!PCKT_ENCAP_V4(xdp, cval, &pckt, dst, pkt_bytes)) {
return XDP_DROP;
}
}
Expand Down
14 changes: 14 additions & 0 deletions katran/lib/bpf/balancer_maps.h
Expand Up @@ -164,5 +164,19 @@ struct bpf_map_def SEC("maps") event_pipe = {
};
BPF_ANNOTATE_KV_PAIR(event_pipe, int, __u32);

#endif

#ifdef GUE_ENCAP
// map which src ip address for outer ip packet while using GUE encap
// NOTE: This is not a stable API. This is to be reworked when static
// variables will be available in mainline kernels
struct bpf_map_def SEC("maps") pckt_srcs = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct real_definition),
.max_entries = 2,
.map_flags = NO_FLAGS,
};
BPF_ANNOTATE_KV_PAIR(pckt_srcs, __u32, struct real_definition);
#endif
#endif // of _BALANCER_MAPS
139 changes: 139 additions & 0 deletions katran/lib/bpf/pckt_encap.h
Expand Up @@ -30,6 +30,7 @@
#include "balancer_consts.h"
#include "balancer_helpers.h"
#include "balancer_structs.h"
#include "balancer_maps.h"
#include "bpf.h"
#include "bpf_endian.h"
#include "bpf_helpers.h"
Expand Down Expand Up @@ -87,6 +88,15 @@ __attribute__((__always_inline__)) static inline void create_v6_hdr(
memcpy(ip6h->daddr.s6_addr32, daddr, 16);
}

__attribute__((__always_inline__))
static inline void create_udp_hdr(struct udphdr *udph, __u16 sport, __u16 dport,
__u16 len, __u16 csum) {
udph->source = sport;
udph->dest = bpf_htons(dport);
udph->len = bpf_htons(len);
udph->check = csum;
}

__attribute__((__always_inline__)) static inline bool encap_v6(
struct xdp_md* xdp,
struct ctl_value* cval,
Expand Down Expand Up @@ -221,4 +231,133 @@ decap_v4(struct xdp_md* xdp, void** data, void** data_end) {
return true;
}

#ifdef GUE_ENCAP

__attribute__((__always_inline__))
static inline bool gue_encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
struct packet_description *pckt,
struct real_definition *dst, __u32 pkt_bytes) {
void *data;
void *data_end;
struct iphdr *iph;
struct udphdr *udph;
struct eth_hdr *new_eth;
struct eth_hdr *old_eth;
struct real_definition *src;

__u16 sport = bpf_htons(pckt->flow.port16[0]);
__u32 ipv4_src = V4_SRC_INDEX;

src = bpf_map_lookup_elem(&pckt_srcs, &ipv4_src);
if (!src) {
return false;
}
ipv4_src = src->dst;

sport ^= ((pckt->flow.src >> 16) & 0xFFFF);
__u64 csum = 0;

if (bpf_xdp_adjust_head(
xdp, 0 - ((int)sizeof(struct iphdr) + (int)sizeof(struct udphdr)))) {
return false;
}
data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end;
new_eth = data;
iph = data + sizeof(struct eth_hdr);
udph = (void *)iph + sizeof(struct iphdr);
old_eth = data + sizeof(struct iphdr) + sizeof(struct udphdr);
if (new_eth + 1 > data_end ||
old_eth + 1 > data_end ||
iph + 1 > data_end ||
udph + 1 > data_end) {
return false;
}
memcpy(new_eth->eth_dest, cval->mac, sizeof(new_eth->eth_dest));
memcpy(new_eth->eth_source, old_eth->eth_dest, sizeof(new_eth->eth_source));
new_eth->eth_proto = BE_ETH_P_IP;

create_udp_hdr(
udph,
sport,
GUE_DPORT,
pkt_bytes + sizeof(struct udphdr),
GUE_CSUM);

create_v4_hdr(
iph,
pckt,
ipv4_src,
dst->dst,
pkt_bytes + sizeof(struct udphdr),
IPPROTO_UDP);

return true;
}

__attribute__((__always_inline__))
static inline bool gue_encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
bool is_ipv6, struct packet_description *pckt,
struct real_definition *dst, __u32 pkt_bytes) {
void *data;
void *data_end;
struct ipv6hdr *ip6h;
struct eth_hdr *new_eth;
struct eth_hdr *old_eth;
struct udphdr *udph;
__u32 key = V6_SRC_INDEX;
__u16 payload_len;
__u16 sport;
struct real_definition *src;

src = bpf_map_lookup_elem(&pckt_srcs, &key);
if (!src) {
return false;
}

if (bpf_xdp_adjust_head(
xdp, 0 - ((int)sizeof(struct ipv6hdr) + (int)sizeof(struct udphdr)))) {
return false;
}
data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end;
new_eth = data;
ip6h = data + sizeof(struct eth_hdr);
udph = (void *)ip6h + sizeof(struct ipv6hdr);
old_eth = data + sizeof(struct ipv6hdr) + sizeof(struct udphdr);
if (new_eth + 1 > data_end ||
old_eth + 1 > data_end ||
ip6h + 1 > data_end ||
udph + 1 > data_end) {
return false;
}
memcpy(new_eth->eth_dest, cval->mac, 6);
memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
new_eth->eth_proto = BE_ETH_P_IPV6;


if (is_ipv6) {
sport = (pckt->flow.srcv6[3] & 0xFFFF) ^ pckt->flow.port16[0];
pkt_bytes += (sizeof(struct ipv6hdr) + sizeof(struct udphdr));
} else {
sport = ((pckt->flow.src >> 16) & 0xFFFF) ^ pckt->flow.port16[0];
pkt_bytes += sizeof(struct udphdr);
}

create_udp_hdr(
udph,
sport,
GUE_DPORT,
pkt_bytes,
GUE_CSUM);

create_v6_hdr(ip6h, pckt, src->dstv6, dst->dstv6, pkt_bytes, IPPROTO_UDP);

return true;
}

#endif // of GUE_ENCAP



#endif // of __PCKT_ENCAP_H
1 change: 1 addition & 0 deletions katran/lib/testing/CMakeLists.txt
Expand Up @@ -40,6 +40,7 @@ add_library(xdptester STATIC
XdpTester.h
XdpTester.cpp
KatranTestFixtures.h
KatranGueTestFixtures.h
KatranOptionalTestFixtures.h
)

Expand Down

0 comments on commit 74c3338

Please sign in to comment.