Rewrite linkerd-tcp

linkerd-tcp 0.1.0 constitues a major rewrite. Previously, linkerd-tcp did not properly utilize tokio's task model, which lead to a number of performance and correctness problems. Furthermore, linkerd-tcp's configuration interface was substantially different from linkerd's, which caused some confusion. Now, linkerd-tcp has been redesigned: - to better-leverage tokio's reactor; - to support connection and stream timeouts; - to provide much richer metrics insight; - to be structured like a linkerd-style router; - general correctness improvements. Fixes #26 #40 #49 #50 Depends on linkerd/tacho#20
linkerd · Jun 10, 2017 · ea507f7 · ea507f7
1 parent c182382
commit ea507f7
Show file tree

Hide file tree

Showing 44 changed files with 4,030 additions and 2,709 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1 +1,2 @@
 target
+tmp.discovery
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -19,16 +19,18 @@ bytes = "0.4"
 clap = "2.24"
 futures = "0.1"
 # We use not-yet-released tokio integration on master:
-hyper = { git = "https://github.com/hyperium/hyper", rev = "ca22eae" }
+hyper = { git = "https://github.com/hyperium/hyper", rev = "09fe9e6" }
 log = "0.3"
-rand = "0.3"
+ordermap = "0.2.10"
 pretty_env_logger = "0.1"
+rand = "0.3"
 rustls = "0.8"
 serde = "1.0"
 serde_derive = "1.0"
 serde_json = "1.0"
 serde_yaml = "0.7"
-tacho = "0.3"
+tacho = { path = "../tacho" }
+#tacho = "0.4"
 tokio-core = "0.1"
 tokio-io = "0.1"
 tokio-service = "0.1"

diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ Status: _beta_
 ## Quickstart ##
 
 1. Install [Rust and Cargo][install-rust].
-2. Configure and run [namerd][namerd].
+2. Run [namerd][namerd].  `./namerd.sh` fetches, configures, and runs namerd using a local-fs-backed discovery (in ./tmp.discovery).
 3. From this repository, run: `cargo run -- example.yml`
 
 We :heart: pull requests! See [CONTRIBUTING.md](CONTRIBUTING.md) for info on
@@ -52,34 +52,79 @@ ARGS:
 ### Example configuration ###
 
 ```yaml
-proxies:
+
+# Administrative control endpoints are exposed on a dedicated HTTP server. Endpoints
+# include:
+# - /metrics -- produces a snapshot of metrics formatted for prometheus.
+# - /shutdown -- POSTing to this endpoint initiates graceful shutdown.
+# - /abort -- POSTing to this terminates the process immediately.
+admin:
+  port: 9989
+
+  # By default, the admin server listens only on localhost. We can force it to bind
+  # on all interfaces by overriding the IP.
+  ip: 0.0.0.0
+
+  # Metrics are snapshot at a fixed interval of 10s.
+  metricsIntervalSecs: 10
+
+# A process exposes one or more 'routers'. Routers connect server traffic to
+# load balancers.
+routers:
+
+  # Each router has a 'label' for reporting purposes.
   - label: default
+
     servers:
-      # Listen on two ports, one using a self-signed TLS certificate.
-      - kind: io.l5d.tcp
-        addr: 0.0.0.0:7474
-      - kind: io.l5d.tls
-        addr: 0.0.0.0:7575
-        defaultIdentity:
-          privateKey: private.pem
-          certs:
-            - cert.pem
-            - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
-
-    # Lookup /svc/google in namerd.
-    namerd:
-      url: http://127.0.0.1:4180
-      path: /svc/google
-
-    # Require that the downstream connection be TLS'd, with a `subjectAltName` including
-    # the DNS name _www.google.com_ using either our local CA or the host's default
-    # openssl certificate.
+
+      # Each router has one or more 'servers' listening for incoming connections.
+      # By default, routers listen on localhost. You need to specify a port.
+      - port: 7474
+        dstName: /svc/default
+        # You can limit the amount of time that a server will wait to obtain a
+        # connection from the router.
+        connectTimeoutMs: 500
+
+      # By default each server listens on 'localhost' to avoid exposing an open
+      # relay by default. Servers may be configured to listen on a specific local
+      # address or all local addresses (0.0.0.0).
+      - port: 7575
+        ip: 0.0.0.0
+        # Note that each server may route to a different destination through a
+        # single router:
+        dstName: /svc/google
+        # Servers may be configured to perform a TLS handshake.
+        tls:
+          defaultIdentity:
+            privateKey: private.pem
+            certs:
+              - cert.pem
+              - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
+
+    # Each router is configured to resolve names.
+    # Currently, only namerd's HTTP interface is supported:
+    interpreter:
+      kind: io.l5d.namerd.http
+      baseUrl: http://localhost:4180
+      namespace: default
+      periodSecs: 20
+
+    # Clients may also be configured to perform a TLS handshake.
     client:
-      tls:
-        dnsName: "www.google.com"
-        trustCerts:
-          - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
-          - /usr/local/etc/openssl/cert.pem
+      kind: io.l5d.static
+      # We can also apply linkerd-style per-client configuration:
+      configs:
+        - prefix: /svc/google
+          connectTimeoutMs: 400
+          # Require that the downstream connection be TLS'd, with a
+          # `subjectAltName` including the DNS name _www.google.com_
+          # using either our local CA or the host's default openssl
+          # certificate.
+          tls:
+            dnsName: "www.google.com"
+            trustCerts:
+              - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
+              - /usr/local/etc/openssl/cert.pem
 ```
 
 ### Logging ###

diff --git a/example.yml b/example.yml
@@ -1,30 +1,18 @@
 admin:
-  addr: 0.0.0.0:9989
+  port: 9989
   metricsIntervalSecs: 10
 
-proxies:
+routers:
 
   - label: default
     servers:
-      - kind: io.l5d.tcp
-        addr: 0.0.0.0:7474
-      # - kind: io.l5d.tls
-      #   addr: 0.0.0.0:7575
-      #   identities:
-      #     localhost:
-      #       privateKey: ../eg-ca/localhost.tls/private.pem
-      #       certs:
-      #         - ../eg-ca/localhost.tls/cert.pem
-      #         - ../eg-ca/localhost.tls/ca-chain.cert.pem
+      - port: 7474
+        dstName: /svc/default
+        connectTimeoutMs: 500
+        connectionLifetimeSecs: 60
 
-    namerd:
-      url: http://127.0.0.1:4180
-      path: /svc/default
-      intervalSecs: 5
-
-    # client:
-    #   tls:
-    #     dnsName: "www.google.com"
-    #     trustCerts:
-    #       - ../eg-ca/www.google.com.tls/ca-chain.cert.pem
-    #       #-  /usr/local/etc/openssl/cert.pem
+    interpreter:
+      kind: io.l5d.namerd.http
+      baseUrl: http://localhost:4180
+      namespace: default
+      periodSecs: 20
diff --git a/namerd.sh b/namerd.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+
+set -e
+
+version="1.0.2"
+bin="target/namerd-${version}-exec"
+sha="338428a49cbe5f395c01a62e06b23fa492a7a9f89a510ae227b46c915b07569e"
+url="https://github.com/linkerd/linkerd/releases/download/${version}/namerd-${version}-exec"
+
+validbin() {
+  checksum=$(openssl dgst -sha256 $bin | awk '{ print $2 }')
+  [ "$checksum" = $sha ]
+}
+
+if [ -f "$bin" ] && ! validbin ; then
+    echo "bad $bin" >&2
+    mv "$bin" "${bin}.bad"
+fi
+
+if [ ! -f "$bin" ]; then
+  echo "downloading $bin" >&2
+  curl -L --silent --fail -o "$bin" "$url"
+  chmod 755 "$bin"
+fi
+
+if ! validbin ; then
+    echo "bad $bin. delete $bin and run $0 again." >&2
+    exit 1
+fi
+
+mkdir -p ./tmp.discovery
+if [ ! -f ./tmp.discovery/default ]; then
+    echo "127.1 9991" > ./tmp.discovery/default
+fi
+
+"$bin" -- - <<EOF
+admin:
+  port: 9991
+
+namers:
+  - kind: io.l5d.fs
+    rootDir: ./tmp.discovery
+
+storage:
+  kind: io.l5d.inMemory
+  namespaces:
+    default: /svc => /#/io.l5d.fs;
+
+interfaces:
+  - kind: io.l5d.httpController
+EOF
diff --git a/router.md b/router.md
@@ -0,0 +1,107 @@
+# Rust Stream Balancer Design
+
+## Prototype
+
+The initial implementation is basically a prototype. It proves the concept, but it has
+severe deficiencies that cause performance (and probably correctness) problems.
+Specifically, it implements its own polling... poorly.
+
+At startup, the configuration is parsed. For each **proxy**, the namerd and serving
+configurations are split and connectd by an async channel so that namerd updates are
+processed outside of the serving thread. All of the namerd watchers are collected to be
+run together with the admin server. Once all of the proxy configurations are processed,
+the application is run.
+
+The admin thread is started, initiating all namerd polling and starting the admin server.
+
+Simultaneously, all of the proxies are run in the main thread. For each of these, a
+**connector** is created to determine how all downstream connections are established for
+the proxy. A **balancer** is created with the connector and a stream of namerd updates. An
+**acceptor** is created for each listening interface, which manifests as a stream of
+connections, connections. The balancer is made shareable across servers by creating an
+async channel and each server's connections are streamed into a sink clone. The balancer
+is driven to process all of these connections.
+
+The balancer implements a Sink that manages _all_ I/O and connection management. Each
+time `Balancer::start_send` or `Balancer::poll_complete` is called, the following work is
+done:
+- _all_ conneciton streams are checked for I/O and data is transfered;
+- closed connections are reaped;
+- service discovery is checked for updates;
+- new connections are established;
+- stats are recorded;
+
+## Lessons/Problems
+
+### Inflexible
+
+This model doesn't really reflect that of linkerd. We have no mechanism to _route_
+connections. All connections are simply forwarded. We cannot, for instance, route based on
+client credentials or SNI destination.
+
+### Inefficient
+
+Currently, each balancer is effectively a scheduler, and a pretty poor one at that. I/O
+processing should be far more granular and we shouldn't update load balancer endpoints in
+the I/O path (unless absolutely necessary).
+
+### Timeouts
+
+We need several types of timeouts that are not currently implemented:
+- Connection timeout: time from incoming connection to outbound established.
+- Stream lifetime: maximum time a stream may stay open.
+- Idle timeout: maximum time a connection may stay open without transmitting data.
+
+## Proposal
+
+linkerd-tcp should become a _stream router_. In the same way that linkerd routes requests,
+linkerd-tcp should route connections. The following is a rough, evolving sketch of how
+linkerd-tcp should be refactored to accomodate this:
+
+The linkerd-tcp configuration should support one or more **routers**. Each router is
+configured with one or more **servers**. A server, which may or may not terminate TLS,
+produces a stream of incoming connections comprising an envelope--a source identity (an
+address, but maybe more) and a destination name--and a bidirectional data stream. The
+server may choose the destination by static configuration or as some function of the
+connection (e.g. client credentials, SNI, etc). Each connection envelope may be annotated
+with a standard set of metadata including, for example, an optional connect deadline,
+stream deadline, etc.
+
+The streams of all incoming connections for a router are merged into a single stream of
+enveloped connections. This stream is forwarded to a **binder**. A binder is responsible
+for maintaining a cache of balancers by destination name. When a balancer does not exist
+in the cache, a new namerd lookup is initiated and its result stream (and value) is cached
+so that future connections may resolve quickly. The binder obtains a **balancer** for each
+destination name that maintains a list of endpoints and their load (in terms of
+connections, throughput, etc).
+
+If the inbound connection has not expired (i.e. due to a timeout), it is dispatched to the
+balancer for processing. The balancer maintains a reactor handle and initiates I/O and
+balancer state management on the reactor.
+
+```
+ ------       ------
+| srv0 | ... | srvN |
+ ------   |   ------
+          |
+          | (Envelope, IoStream)
+          V
+ -------------------      -------------
+| binder            |----| interpreter |
+ -------------------      -------------
+  |
+  V
+ ----------
+| balancer |
+ ----------
+  |
+  V
+ ----------
+| endpoint |
+ ----------
+  |
+  V
+ --------
+| duplex |
+ --------
+```