Skip to content

Commit

Permalink
Rewrite linkerd-tcp
Browse files Browse the repository at this point in the history
linkerd-tcp 0.1.0 constitues a major rewrite.

Previously, linkerd-tcp did not properly utilize tokio's task model, which lead
to a number of performance and correctness problems. Furthermore, linkerd-tcp's
configuration interface was substantially different from linkerd's, which
caused some confusion.

Now, linkerd-tcp has been redesigned:
- to better-leverage tokio's reactor;
- to support connection and stream timeouts;
- to provide much richer metrics insight;
- to be structured like a linkerd-style router;
- general correctness improvements.

Fixes #26 #40 #49 #50
Depends on linkerd/tacho#20
  • Loading branch information
olix0r committed Jun 10, 2017
1 parent c182382 commit ea507f7
Show file tree
Hide file tree
Showing 44 changed files with 4,030 additions and 2,709 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
target
tmp.discovery
410 changes: 212 additions & 198 deletions Cargo.lock

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,18 @@ bytes = "0.4"
clap = "2.24"
futures = "0.1"
# We use not-yet-released tokio integration on master:
hyper = { git = "https://github.com/hyperium/hyper", rev = "ca22eae" }
hyper = { git = "https://github.com/hyperium/hyper", rev = "09fe9e6" }
log = "0.3"
rand = "0.3"
ordermap = "0.2.10"
pretty_env_logger = "0.1"
rand = "0.3"
rustls = "0.8"
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"
serde_yaml = "0.7"
tacho = "0.3"
tacho = { path = "../tacho" }
#tacho = "0.4"
tokio-core = "0.1"
tokio-io = "0.1"
tokio-service = "0.1"
Expand Down
97 changes: 71 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Status: _beta_
## Quickstart ##

1. Install [Rust and Cargo][install-rust].
2. Configure and run [namerd][namerd].
2. Run [namerd][namerd]. `./namerd.sh` fetches, configures, and runs namerd using a local-fs-backed discovery (in ./tmp.discovery).
3. From this repository, run: `cargo run -- example.yml`

We :heart: pull requests! See [CONTRIBUTING.md](CONTRIBUTING.md) for info on
Expand All @@ -52,34 +52,79 @@ ARGS:
### Example configuration ###

```yaml
proxies:

# Administrative control endpoints are exposed on a dedicated HTTP server. Endpoints
# include:
# - /metrics -- produces a snapshot of metrics formatted for prometheus.
# - /shutdown -- POSTing to this endpoint initiates graceful shutdown.
# - /abort -- POSTing to this terminates the process immediately.
admin:
port: 9989

# By default, the admin server listens only on localhost. We can force it to bind
# on all interfaces by overriding the IP.
ip: 0.0.0.0

# Metrics are snapshot at a fixed interval of 10s.
metricsIntervalSecs: 10

# A process exposes one or more 'routers'. Routers connect server traffic to
# load balancers.
routers:

# Each router has a 'label' for reporting purposes.
- label: default

servers:
# Listen on two ports, one using a self-signed TLS certificate.
- kind: io.l5d.tcp
addr: 0.0.0.0:7474
- kind: io.l5d.tls
addr: 0.0.0.0:7575
defaultIdentity:
privateKey: private.pem
certs:
- cert.pem
- ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem

# Lookup /svc/google in namerd.
namerd:
url: http://127.0.0.1:4180
path: /svc/google

# Require that the downstream connection be TLS'd, with a `subjectAltName` including
# the DNS name _www.google.com_ using either our local CA or the host's default
# openssl certificate.

# Each router has one or more 'servers' listening for incoming connections.
# By default, routers listen on localhost. You need to specify a port.
- port: 7474
dstName: /svc/default
# You can limit the amount of time that a server will wait to obtain a
# connection from the router.
connectTimeoutMs: 500

# By default each server listens on 'localhost' to avoid exposing an open
# relay by default. Servers may be configured to listen on a specific local
# address or all local addresses (0.0.0.0).
- port: 7575
ip: 0.0.0.0
# Note that each server may route to a different destination through a
# single router:
dstName: /svc/google
# Servers may be configured to perform a TLS handshake.
tls:
defaultIdentity:
privateKey: private.pem
certs:
- cert.pem
- ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem

# Each router is configured to resolve names.
# Currently, only namerd's HTTP interface is supported:
interpreter:
kind: io.l5d.namerd.http
baseUrl: http://localhost:4180
namespace: default
periodSecs: 20

# Clients may also be configured to perform a TLS handshake.
client:
tls:
dnsName: "www.google.com"
trustCerts:
- ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
- /usr/local/etc/openssl/cert.pem
kind: io.l5d.static
# We can also apply linkerd-style per-client configuration:
configs:
- prefix: /svc/google
connectTimeoutMs: 400
# Require that the downstream connection be TLS'd, with a
# `subjectAltName` including the DNS name _www.google.com_
# using either our local CA or the host's default openssl
# certificate.
tls:
dnsName: "www.google.com"
trustCerts:
- ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
- /usr/local/etc/openssl/cert.pem
```

### Logging ###
Expand Down
34 changes: 11 additions & 23 deletions example.yml
Original file line number Diff line number Diff line change
@@ -1,30 +1,18 @@
admin:
addr: 0.0.0.0:9989
port: 9989
metricsIntervalSecs: 10

proxies:
routers:

- label: default
servers:
- kind: io.l5d.tcp
addr: 0.0.0.0:7474
# - kind: io.l5d.tls
# addr: 0.0.0.0:7575
# identities:
# localhost:
# privateKey: ../eg-ca/localhost.tls/private.pem
# certs:
# - ../eg-ca/localhost.tls/cert.pem
# - ../eg-ca/localhost.tls/ca-chain.cert.pem
- port: 7474
dstName: /svc/default
connectTimeoutMs: 500
connectionLifetimeSecs: 60

namerd:
url: http://127.0.0.1:4180
path: /svc/default
intervalSecs: 5

# client:
# tls:
# dnsName: "www.google.com"
# trustCerts:
# - ../eg-ca/www.google.com.tls/ca-chain.cert.pem
# #- /usr/local/etc/openssl/cert.pem
interpreter:
kind: io.l5d.namerd.http
baseUrl: http://localhost:4180
namespace: default
periodSecs: 20
51 changes: 51 additions & 0 deletions namerd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/sh

set -e

version="1.0.2"
bin="target/namerd-${version}-exec"
sha="338428a49cbe5f395c01a62e06b23fa492a7a9f89a510ae227b46c915b07569e"
url="https://github.com/linkerd/linkerd/releases/download/${version}/namerd-${version}-exec"

validbin() {
checksum=$(openssl dgst -sha256 $bin | awk '{ print $2 }')
[ "$checksum" = $sha ]
}

if [ -f "$bin" ] && ! validbin ; then
echo "bad $bin" >&2
mv "$bin" "${bin}.bad"
fi

if [ ! -f "$bin" ]; then
echo "downloading $bin" >&2
curl -L --silent --fail -o "$bin" "$url"
chmod 755 "$bin"
fi

if ! validbin ; then
echo "bad $bin. delete $bin and run $0 again." >&2
exit 1
fi

mkdir -p ./tmp.discovery
if [ ! -f ./tmp.discovery/default ]; then
echo "127.1 9991" > ./tmp.discovery/default
fi

"$bin" -- - <<EOF
admin:
port: 9991
namers:
- kind: io.l5d.fs
rootDir: ./tmp.discovery
storage:
kind: io.l5d.inMemory
namespaces:
default: /svc => /#/io.l5d.fs;
interfaces:
- kind: io.l5d.httpController
EOF
107 changes: 107 additions & 0 deletions router.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Rust Stream Balancer Design

## Prototype

The initial implementation is basically a prototype. It proves the concept, but it has
severe deficiencies that cause performance (and probably correctness) problems.
Specifically, it implements its own polling... poorly.

At startup, the configuration is parsed. For each **proxy**, the namerd and serving
configurations are split and connectd by an async channel so that namerd updates are
processed outside of the serving thread. All of the namerd watchers are collected to be
run together with the admin server. Once all of the proxy configurations are processed,
the application is run.

The admin thread is started, initiating all namerd polling and starting the admin server.

Simultaneously, all of the proxies are run in the main thread. For each of these, a
**connector** is created to determine how all downstream connections are established for
the proxy. A **balancer** is created with the connector and a stream of namerd updates. An
**acceptor** is created for each listening interface, which manifests as a stream of
connections, connections. The balancer is made shareable across servers by creating an
async channel and each server's connections are streamed into a sink clone. The balancer
is driven to process all of these connections.

The balancer implements a Sink that manages _all_ I/O and connection management. Each
time `Balancer::start_send` or `Balancer::poll_complete` is called, the following work is
done:
- _all_ conneciton streams are checked for I/O and data is transfered;
- closed connections are reaped;
- service discovery is checked for updates;
- new connections are established;
- stats are recorded;

## Lessons/Problems

### Inflexible

This model doesn't really reflect that of linkerd. We have no mechanism to _route_
connections. All connections are simply forwarded. We cannot, for instance, route based on
client credentials or SNI destination.

### Inefficient

Currently, each balancer is effectively a scheduler, and a pretty poor one at that. I/O
processing should be far more granular and we shouldn't update load balancer endpoints in
the I/O path (unless absolutely necessary).

### Timeouts

We need several types of timeouts that are not currently implemented:
- Connection timeout: time from incoming connection to outbound established.
- Stream lifetime: maximum time a stream may stay open.
- Idle timeout: maximum time a connection may stay open without transmitting data.

## Proposal

linkerd-tcp should become a _stream router_. In the same way that linkerd routes requests,
linkerd-tcp should route connections. The following is a rough, evolving sketch of how
linkerd-tcp should be refactored to accomodate this:

The linkerd-tcp configuration should support one or more **routers**. Each router is
configured with one or more **servers**. A server, which may or may not terminate TLS,
produces a stream of incoming connections comprising an envelope--a source identity (an
address, but maybe more) and a destination name--and a bidirectional data stream. The
server may choose the destination by static configuration or as some function of the
connection (e.g. client credentials, SNI, etc). Each connection envelope may be annotated
with a standard set of metadata including, for example, an optional connect deadline,
stream deadline, etc.

The streams of all incoming connections for a router are merged into a single stream of
enveloped connections. This stream is forwarded to a **binder**. A binder is responsible
for maintaining a cache of balancers by destination name. When a balancer does not exist
in the cache, a new namerd lookup is initiated and its result stream (and value) is cached
so that future connections may resolve quickly. The binder obtains a **balancer** for each
destination name that maintains a list of endpoints and their load (in terms of
connections, throughput, etc).

If the inbound connection has not expired (i.e. due to a timeout), it is dispatched to the
balancer for processing. The balancer maintains a reactor handle and initiates I/O and
balancer state management on the reactor.

```
------ ------
| srv0 | ... | srvN |
------ | ------
|
| (Envelope, IoStream)
V
------------------- -------------
| binder |----| interpreter |
------------------- -------------
|
V
----------
| balancer |
----------
|
V
----------
| endpoint |
----------
|
V
--------
| duplex |
--------
```

0 comments on commit ea507f7

Please sign in to comment.