From 27026ab12a13651a0ebd5dc674e241a792d7b184 Mon Sep 17 00:00:00 2001 From: Gabor Retvari Date: Fri, 22 Dec 2023 22:20:48 +0100 Subject: [PATCH] doc: Update docs for v1, take 1 --- README.md | 138 +++-- deploy/manifests/default-dataplane.yaml | 2 +- deploy/manifests/static/dataplane.yaml | 2 +- deploy/manifests/stunner-expose-kube-dns.yaml | 13 +- deploy/manifests/stunner-test.yaml | 10 +- docs/AUTH.md | 125 +---- docs/CONCEPTS.md | 22 +- docs/DEPLOYMENT.md | 98 ++-- docs/GATEWAY.md | 203 ++++--- docs/INSTALL.md | 127 ++--- docs/MONITORING.md | 70 +-- docs/OBSOLETE.md | 501 ----------------- docs/README.md | 6 +- docs/SCALING.md | 99 +--- docs/SECURITY.md | 92 +--- docs/WHY.md | 39 +- .../benchmark/performance-stunner.yaml | 8 +- docs/examples/cloudretro/README.md | 4 +- .../cloudretro-stunner-cleanup.yaml | 6 +- docs/examples/cloudretro/stunner-gwcc.yaml | 6 +- docs/examples/direct-one2one-call/README.md | 4 +- .../direct-one2one-call-stunner.yaml | 10 +- docs/examples/jitsi/README.md | 6 +- docs/examples/jitsi/jitsi-call-stunner.yaml | 10 +- .../kurento-magic-mirror-stunner.yaml | 8 +- docs/examples/kurento-one2one-call/README.md | 6 +- .../kurento-one2one-call-stunner.yaml | 8 +- docs/examples/livekit/README.md | 6 +- .../livekit/livekit-call-stunner.yaml | 10 +- docs/examples/mediasoup/README.md | 4 +- .../mediasoup/mediasoup-call-stunner.yaml | 10 +- docs/examples/neko/README.md | 3 +- docs/examples/neko/stunner.yaml | 8 +- docs/examples/simple-tunnel/README.md | 6 +- .../examples/simple-tunnel/iperf-stunner.yaml | 10 +- docs/img/stunner_arch_big.svg | 511 +++++++++++------- 36 files changed, 822 insertions(+), 1369 deletions(-) delete mode 100644 docs/OBSOLETE.md diff --git a/README.md b/README.md index 11ba2bf7..c7b3f20e 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@

-*Note: The below documents the latest development version of STUNner. See the documentation for the stable version [here](https://docs.l7mp.io/en/stable).* +*This is the documentation for the latest development version of STUNner. See the documentation for the stable version [here](https://docs.l7mp.io/en/stable).* # STUNner: A Kubernetes media gateway for WebRTC @@ -54,7 +54,7 @@ Worry no more! STUNner allows you to deploy *any* WebRTC service into Kubernetes integrating it into the [cloud-native ecosystem](https://landscape.cncf.io). STUNner exposes a standards-compliant STUN/TURN gateway for clients to access your virtualized WebRTC infrastructure running in Kubernetes, maintaining full browser compatibility and requiring minimal or no -modification to your existing WebRTC codebase. STUNner implements the standard [Kubernetes Gateway +modification to your existing WebRTC codebase. STUNner supports the [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io) so you can configure it in the familiar YAML-engineering style via Kubernetes manifests. @@ -83,8 +83,8 @@ features we have come to expect from modern network services. Worse yet, the ent on a handful of [public](https://bloggeek.me/google-free-turn-server/) [STUN servers](https://www.npmjs.com/package/freeice) and [hosted TURN services](https://bloggeek.me/managed-webrtc-turn-speed) to connect clients behind a NAT/firewall, -which may create a useless dependency on externally operated services, introduce a bottleneck, -raise security concerns, and come with a non-trivial price tag. +which may create a useless dependency on externally operated services, introduce a performance +bottleneck, raise security concerns, and come with a non-trivial price tag. The main goal of STUNner is to allow *anyone* to deploy their own WebRTC infrastructure into Kubernetes, without relying on any external service other than the cloud-provider's standard hosted @@ -93,8 +93,8 @@ servers can use as a scalable NAT traversal facility (headless model), or it can for ingesting WebRTC media traffic into the Kubernetes cluster by exposing a public-facing STUN/TURN server that WebRTC clients can connect to (media-plane model). This makes it possible to deploy WebRTC application servers and media servers into ordinary Kubernetes pods, taking advantage -of Kubernetes's excellent tooling to manage, scale, monitor and troubleshoot the WebRTC -infrastructure like any other cloud-bound workload. +of the full cloud native feature set to manage, scale, monitor and troubleshoot the WebRTC +infrastructure like any other Kubernetes workload. ![STUNner media-plane deployment architecture](./docs/img/stunner_arch.svg) @@ -146,9 +146,6 @@ way. potentially malicious access; with STUNner *all* media is received through a single ingress port that you can tightly monitor and control. - - - * **Simple code and extremely small size.** Written in pure Go using the battle-tested [pion/webrtc](https://github.com/pion/webrtc) framework, STUNner is just a couple of hundred lines of fully open-source code. The server is extremely lightweight: the typical STUNner @@ -177,15 +174,13 @@ minutes. The simplest way to deploy STUNner is through [Helm](https://helm.sh). STUNner configuration parameters are available for customization as [Helm -Values](https://helm.sh/docs/chart_template_guide/values_files). We recommend deploying STUNner -into a separate namespace and we usually name this namespace as `stunner`, so as to isolate it from -the rest of the workload. +Values](https://helm.sh/docs/chart_template_guide/values_files). ```console helm repo add stunner https://l7mp.io/stunner helm repo update -helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace --namespace=stunner-system -helm install stunner stunner/stunner --create-namespace --namespace=stunner +helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace \ + --namespace=stunner-system ``` Find out more about the charts in the [STUNner-helm repository](https://github.com/l7mp/stunner-helm). @@ -193,10 +188,15 @@ Find out more about the charts in the [STUNner-helm repository](https://github.c ### Configuration The standard way to interact with STUNner is via the standard Kubernetes [Gateway - API](https://gateway-api.sigs.k8s.io). This is much akin to the way you configure *all* - Kubernetes workloads: specify your intents in YAML files and issue a `kubectl apply`, and the - [STUNner gateway operator](https://github.com/l7mp/stunner-gateway-operator) will automatically - reconcile the STUNner dataplane for the new configuration. +API](https://gateway-api.sigs.k8s.io). This is much akin to the way you configure *all* Kubernetes +workloads: specify your intents in YAML files and issue a `kubectl apply`, and the [STUNner gateway +operator](https://github.com/l7mp/stunner-gateway-operator) will automatically create the STUNner +dataplane (that is, the `stunnerd` pods that implement the STUN/TURN service) and downloads the new +configuration to the dataplane pods. + +It is generally a good idea to maintain STUNner configuration into a separate Kubernetes +namespace. Below we will use the `stunner` namespace; create it with `kubectl create namespace +stunner` if it does not exist. 1. Given a fresh STUNner install, the first step is to register STUNner with the Kubernetes Gateway API. This amounts to creating a @@ -212,7 +212,7 @@ The standard way to interact with STUNner is via the standard Kubernetes [Gatewa ``` console kubectl apply -f - < **Warning** -STUNner deviates somewhat from the standard rules Kubernetes uses to handle ports in Services. In +Note that STUNner deviates somewhat from the way Kubernetes handles ports in Services. In Kubernetes each Service is associated with one or more protocol-port pairs and connections via the Service can be made to only these specific protocol-port pairs. WebRTC media servers, however, usually open lots of different ports, typically one per each client connection, and it would be -cumbersome to create a separate backend Service and UDPRoute for each port. In order to simplify +cumbersome to create a separate backend Service and UDPRoute per each port. In order to simplify this, STUNner **ignores the protocol and port specified in the backend service** and allows connections to the backend pods via *any* protocol-port pair. STUNner can therefore use only a -*single* backend Service to reach any port exposed on a WebRTC media server. - - +*single* backend Service to reach any port exposed on a WebRTC media server. And that's all. You don't need to worry about client-side NAT traversal and WebRTC media routing because STUNner has you covered! Even better, every time you change a Gateway API resource in -Kubernetes, say, you update the GatewayConfig to reset your STUN/TURN credentials or change the -protocol or port in one of your Gateways, the [STUNner gateway +Kubernetes, say, you update the GatewayConfig to reset the STUN/TURN credentials or change the +protocol or port in a Gateway, the [STUNner gateway operator](https://github.com/l7mp/stunner-gateway-operator) will automatically pick up your modifications and update the underlying dataplane. Kubernetes is beautiful, isn't it? ### Check your config -The current STUNner dataplane configuration is always made available in a convenient ConfigMap -called `stunnerd-config` (you can choose the name in the GatewayConfig). The STUNner dataplane pods -themselves will use the very same ConfigMap to reconcile their internal state, so you can consider -the content to be the ground truth. +The current STUNner dataplane configuration is always made available in a convenient ConfigMap that +has the same name and namespace as the Gateway it belongs to (so this is supposed to be +`stunner/udp-gateway` as per our example). STUNner comes with a small utility to dump the running configuration in human readable format (you -must have [`jq`](https://stedolan.github.io/jq) installed in your PATH to be able to use it). Chdir -into the main STUNner directory and issue. +must have [`jq`](https://stedolan.github.io/jq) installed in your PATH to be able to use it). Issue +the below from the main STUNner directory. ```console -cmd/stunnerctl/stunnerctl running-config stunner/stunnerd-config -STUN/TURN authentication type: plaintext -STUN/TURN username: user-1 -STUN/TURN password: pass-1 -Listener: udp-listener -Protocol: TURN-UDP -Public address: 34.118.36.108 -Public port: 3478 +cmd/stunnerctl/stunnerctl running-config stunner/udp-gateway +STUN/TURN authentication type: static +STUN/TURN username: user-1 +STUN/TURN password: pass-1 +Listener 1 + Name: stunner/udp-gateway/udp-listener + Listener: stunner/udp-gateway/udp-listener + Protocol: TURN-UDP + Public address: 34.34.150.65 + Public port: 3478 ``` As it turns out, STUNner has successfully assigned a public IP and port to our Gateway and set the @@ -379,7 +371,7 @@ STUN/TURN credentials based on the GatewayConfig. You can use the below to dump configuration; `jq` is there just to pretty-print JSON. ```console -kubectl get cm -n stunner stunnerd-config -o jsonpath="{.data.stunnerd\.conf}" | jq . +kubectl get cm -n stunner udp-gateway -o jsonpath="{.data.stunnerd\.conf}" | jq . ``` ### Testing @@ -394,7 +386,7 @@ a heartwarming welcome message. The below manifest spawns the service in the `default` namespace and wraps it in a Kubernetes service called `media-plane`. Recall, this is the target service in our UDPRoute. Note that the type of the `media-plane` service is `ClusterIP`, which means that Kubernetes will *not* expose - it to the Internet: the only way for clients to obtain a response is via STUNner. + it to the outside world: the only way for clients to obtain a response is via STUNner. ```console kubectl apply -f deploy/manifests/udp-greeter.yaml @@ -413,16 +405,16 @@ a heartwarming welcome message. see a nice greeting from your cluster! ```console - ./turncat - k8s://stunner/stunnerd-config:udp-listener udp://${PEER_IP}:9001 + ./turncat - k8s://stunner/udp-gateway:udp-listener udp://${PEER_IP}:9001 Hello STUNner Greetings from STUNner! ``` -Observe that we haven't specified the public IP address and port: `turncat` is clever enough to -parse the running [STUNner configuration](#check-your-config) from Kubernetes directly. Just -specify the special STUNner URI `k8s://stunner/stunnerd-config:udp-listener`, identifying the -namespace (`stunner` here) and the name for the STUNner ConfigMap (`stunnerd-config`), plus the -listener to connect to (`udp-listener`), and `turncat` will do the heavy lifting. +Note that we haven't specified the public IP address and port: `turncat` is clever enough to parse +the running [STUNner configuration](#check-your-config) from Kubernetes directly. Just specify the +special STUNner URI `k8s://stunner/udp-gateway:udp-listener`, identifying the namespace (`stunner` +here) and the name for the Gateway (`udp-gateway`), and the listener to connect to +(`udp-listener`), and `turncat` will do the heavy lifting. Note that your actual WebRTC clients do *not* need to use `turncat` to reach the cluster: all modern Web browsers and WebRTC clients come with a STUN/TURN client built in. Here, `turncat` is @@ -478,7 +470,7 @@ greeter) by STUNner. ```console kubectl apply -f - < **Warning** - Clients should never query the STUNner authentication service directly to obtain an ICE - config. Instead, the WebRTC application server should retrieve the ICE config in the name of the - client during session establishment and return the generated ICE config to the client. + The ICE configs generated by the [STUNner authentication service](https://github.com/l7mp/stunner-auth-service) are always up to date with the most recent dataplane configuration. This makes sure that whenever you modify the STUNner Gateway API configuration (say, switch from `static` authentication to `ephemeral`), your clients will always receive an ICE config that reflects these changes (that is, the username/password pair will provide a time-windowed ephemeral credential). - The ICE configs generated by the [STUNner authentication - service](https://github.com/l7mp/stunner-auth-service) are always up to date with the most - recent dataplane configuration. This makes sure that whenever you modify the STUNner Gateway API - configuration (say, switch from `static` authentication to `ephemeral`), your clients will - always receive an ICE config that reflects these changes (that is, the username/password pair - will provide a time-windowed ephemeral credential). - - For instance, the below will query the STUnner auth service, which is by default available at - the URL `http://stunner-auth.stunner-system:8088`, for a valid ICE config. + Below is a query to the STUnner auth service, by default available at the URL `http://stunner-auth.stunner-system:8088`, that returns a valid ICE config. ```console curl "http://stunner-auth.stunner-system:8088/ice?service=turn" @@ -65,24 +38,15 @@ The intended authentication workflow in STUNner is as follows. } ``` - Use the below to generate a valid STUNner credential for a user called `my-user` with a lifetime - of one hour (`ttl`, only makes sense when STUNner uses `ephemeral` authentication - credentials). In addition, we select the Gateway called `my-gateway` deployed into the - `my-namespace` namespace on which we intend to receive WebRTC media from the user: + Use the below query to generate a valid STUNner credential to access the Gateway called `my-gateway` deployed into the `my-namespace` namespace: ```console curl "http://stunner-auth.stunner-system:8088/ice?service=turn?ttl=3600&username=my-user&namespace=my-namespace&gateway=my-gateway" ``` -2. The clients *receive the ICE configuration* (usually, from the application server) over a secure - channel. This is outside the context of STUNner; our advice is to return the ICE configuration - during the session setup process, say, along with the initial configuration returned for clients - before starting the call. +2. The clients *receive the ICE configuration* (usually, from the application server) over a secure channel. This is outside the context of STUNner. Our advice is to return the ICE configuration during the session setup process, say, along with the initial configuration returned for clients before starting the call. -3. WebRTC clients are *configured with the ICE configuration* obtained above. The below snippet - shows how to initialize a WebRTC - [`PeerConnection`](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection) - to use the above ICE server configuration in order to use STUNner as the default TURN service. +3. WebRTC clients are *configured with the ICE configuration*. The below snippet shows how to initialize a WebRTC [`PeerConnection`](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection) to use the above ICE server configuration in order to use STUNner as the default TURN service. ``` var iceConfig = @@ -91,34 +55,21 @@ The intended authentication workflow in STUNner is as follows. ## Static authentication -In STUNner, `static` authentication is the simplest and least secure authentication mode, basically -corresponding to a traditional "log-in" username and password pair given to users. - -> **Note** -STUNner accepts (and sometimes reports) the alias `plaintext` to mean the `static` authentication mode; the use of `plaintext` is deprecated and will be removed in a later release. +In STUNner, `static` authentication is the simplest and least secure authentication mode, basically corresponding to a traditional "log-in" username and password pair given to users. -When STUNner is configured to use `static` authentication only a single username/password pair is -used for *all* clients. This makes configuration easy; e.g., the ICE server configuration can be -hardcoded into the static Javascript code served to clients. At the same time, `static` -authentication is prone to leaking the credentials: once an attacker learns a username/password -pair they can use it without limits to reach STUNner (until the administrator rolls the -credentials, see below). +When STUNner is configured to use `static` authentication only a single username/password pair is used for *all* clients. This makes configuration easy; e.g., the ICE server configuration can be hardcoded into the static Javascript code served to clients. At the same time, `static` authentication is prone to leaking credentials: once an attacker learns a username/password pair they can use it without limits to reach STUNner (until the administrator rolls the credentials, see below). -The first step of configuring STUNner for the `static` authentication mode is to create a -Kubernetes Secret to hold the username/password pair. The below will set the username to `my-user` -and the password to `my-password`. If no `type` is set then STUNner defaults to `static` -authentication. +The first step of configuring STUNner for the `static` authentication mode is to create a Kubernetes Secret to hold the username/password pair. The below will set the username to `my-user` and the password to `my-password`. If no `type` is set then STUNner defaults to `static` authentication. ```console kubectl -n stunner create secret generic stunner-auth-secret --from-literal=type=static \ --from-literal=username=my-user --from-literal=password=my-password ``` -Then, we create or update the current [GatewayConfig](REFERENCE.md) to refer STUNner to this secret -for setting the authentication credentials. +Then, we update the [GatewayConfig](REFERENCE.md) to refer STUNner to this Secret for setting authentication credentials. ```yaml -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: GatewayConfig metadata: name: stunner-gatewayconfig @@ -130,54 +81,28 @@ spec: namespace: stunner ``` -The main use of static authentication is for testing. The reason for this is that static -authentication credentials are easily discoverable: since the WebRTC Javascript API uses the TURN -credentials unencrypted, an attacker can easily extract the STUNner credentials from the -client-side Javascript code. In order to mitigate the risk, it is a good security practice to reset -the username/password pair every once in a while. This can be done by simply updating the Secret -that holds the credentials. +It is a good security practice to reset the username/password pair every once in a while. This can be done by simply updating the Secret that holds the credentials. ```yaml kubectl -n stunner edit secret stunner-auth-secret ``` -> **Warning** -Modifying STUNner's credentials goes *without* restarting the TURN server but may affect existing -sessions, in that active sessions will not be able to refresh the TURN allocation established with -the old credentials. +> [!WARNING] +> +> Modifying STUNner's credentials goes *without* restarting the TURN server but may affect existing sessions, in that active sessions will not be able to refresh their TURN allocation any more. This will result in the disconnection of clients using the old credentials. ## Ephemeral authentication -For production use, STUNner provides the `ephemeral` authentication mode that uses per-client -time-limited STUN/TURN authentication credentials. Ephemeral credentials are dynamically generated -with a pre-configured lifetime and, once the lifetime expires, the credential cannot be used to -authenticate (or refresh) with STUNner any more. This authentication mode is more secure since -credentials are not shared between clients and come with a limited lifetime. Configuring -`ephemeral` authentication may be more complex though, since credentials must be dynamically -generated for each session and properly returned to clients. - -> **Note** -STUNner accepts (and sometimes reports) the alias `longterm` to mean the `ephemeral` authentication -mode; the use of `longterm` is deprecated and will be removed in a later release. The alias -`timewindowed` is also accepted. - -To implement this mode, STUNner adopts the [quasi-standard time-windowed TURN authentication -credential format](https://datatracker.ietf.org/doc/html/draft-uberti-behave-turn-rest-00). In this -format, the TURN username consists of a colon-delimited combination of the expiration timestamp and -the user-id parameter, where the user-id is some application-specific id that is opaque to STUNner -and the timestamp specifies the date of expiry of the credential as a UNIX timestamp. The TURN -password is computed from the a secret key shared with the TURN server and the returned username -value, by performing `base64(HMAC-SHA1(secret key, username))`. STUNner extends this scheme -somewhat for maximizing interoperability with WebRTC apps, in that it allows the user-id and the -timestamp to appear in any order in the TURN username and it accepts usernames with a plain -timestamp, without the colon and/or the user-id. +STUNner provides the `ephemeral` authentication mode for production use, which uses per-client time-limited STUN/TURN authentication credentials. Ephemeral credentials are dynamically generated with a pre-configured lifetime and, once the lifetime expires, the credential cannot be used to authenticate (or refresh) with STUNner any more. This authentication mode is more secure since credentials are not shared between clients and come with a limited lifetime. Configuring `ephemeral` authentication may be more complex though, since credentials must be dynamically generated for each session and properly returned to clients. + +STUNner adopts the [quasi-standard time-windowed TURN authentication credential format](https://datatracker.ietf.org/doc/html/draft-uberti-behave-turn-rest-00) for ephemeral authentication. The TURN username consists of a colon-delimited combination of the expiration timestamp and the user-id parameter, where the user-id is some application-specific id that is opaque to STUNner and the timestamp specifies the date of expiry of the credential as a UNIX timestamp. The TURN password is computed from the a secret key shared with the TURN server and the returned username value, by performing `base64(HMAC-SHA1(secret key, username))`. STUNner extends this scheme somewhat for maximizing interoperability with WebRTC apps, in that it allows the user-id and the timestamp to appear in any order in the TURN username and it accepts usernames with a plain timestamp, without the colon and/or the user-id. The advantage of this mechanism is that it is enough to know the shared secret for STUNner to be able to check the validity of a credential. -> **Warning** -The user-id is to ensure that the password generated per user-id is unique, but STUNner in no way -checks whether it identifies a valid user-id in the system. +> [!WARNING] +> +> The user-id is to ensure that the password generated per user-id is unique, but STUNner in no way checks whether it identifies a valid user-id in the system. In order to switch from `static` mode to `ephemeral` authentication, it is enough to update the Secret that holds the credentials. The below will set the shared secret `my-shared-secret` for the diff --git a/docs/CONCEPTS.md b/docs/CONCEPTS.md index 7fb4f26f..87ed660e 100644 --- a/docs/CONCEPTS.md +++ b/docs/CONCEPTS.md @@ -1,27 +1,29 @@ # Concepts -In this guide we describe STUNner's architecture and the most important components of an operational STUNner installation. +This guide describes STUNner's architecture and the most important components of an operational installation. ## Architecture -A STUNner installation consists of two parts, a *control plane* and a *dataplane*. The control plane consists of declarative policies specifying the way STUNner should route WebRTC media traffic to the media servers, plus a gateway operator that renders the high-level policies into an actual dataplane configuration. The dataplane in turn comprises one or more `stunnerd` pods, responsible for actually ingesting media traffic into the cluster through a STUN/TURN server. +A STUNner installation consists of two parts, a *control plane* and a *data plane*. The control plane consists of declarative policies specifying the way STUNner should route WebRTC media traffic to the media servers, plus a gateway operator that renders the high-level policies into an actual dataplane configuration. The data plane in turn comprises one or more `stunnerd` pods, which are responsible for actually ingesting media traffic into the cluster. The dataplane pods are automatically provisioned by the gateway operator so they should come and go as you add and remove STUNner gateways. ![STUNner architecture](img/stunner_arch_big.svg) -The unit of the STUNner configuration is a [designated Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces) that holds the control plane configuration and the dataplane pods. You can run multiple STUNner deployments side-by-side by installing a separate dataplane into a each namespace and defining a distinct gateway hierarchy to configure each dataplane separately. - -### Control plane +## Control plane The STUNner control plane consists of the following components: -* **Gateway hierarchy:** A gateway hierarchy is a collection of [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources) that together describe the way media traffic should enter the cluster, including public IP addresses and ports clients can use to reach STUNner, TURN credentials, forwarding rules, etc. The anchor of the gateway hierarchy is the GatewayClass object, and the rest of the resources form a complete hierarchy underneath it: the GatewayConfig describes general STUNner configuration, Gateways define the port and transport protocol for each TURN server listener, and UDPRoutes point to the backend services client traffic should be forwarded to. See [here](GATEWAY.md) for a full reference. +* **Gateway API resources:** The high-level STUNner configuration is a collection of [Gateway API](https://gateway-api.sigs.k8s.io) resources that together describe the way media traffic should enter the cluster. The anchor of the configuration hierarchy is the GatewayClass object, and the rest of the resources form a complete hierarchy underneath it: the GatewayConfig describes general STUNner configuration, Gateways define the port and transport protocol per each TURN server listener, and UDPRoutes point to the backend services client traffic should be forwarded to. See [here](GATEWAY.md) for a full reference. -* **Gateway operator:** The main purpose of the gateway operator is to watch gateway hierarchies for change and, once a custom resource is added or modified by the user, render a new dataplane configuration. This configuration is then mapped into the filesystem of the `stunnerd` pods running in the same namespace, so that each `stunnerd` instance will use the most recent configuration. The STUNner Helm chart [automatically installs](INSTALL.md) the gateway operator; more information can be found [here](https://github.com/l7mp/stunner-gateway-operator). +* **Gateway operator:** The main purpose of the gateway operator is to watch Gateway API resources and, once a Gateway API resource is added or modified by the user, update the dataplane accordingly (see below). -* **STUNner ConfigMap:** The STUNner ConfigMap contains the running dataplane configuration. Of course, we could let the `stunnerd` pods themselves to watch the control plane for changes, but this would run into scalability limitations for large deployments. Instead, we separate the control plane and the dataplane, which brings cool [benefits](https://en.wikipedia.org/wiki/Software-defined_networking). The STUNner ConfigMap is usually named as `stunnerd-config`, but you can override this from the GatewayConfig. +* **STUNner authentication service** (not shown on the figure): The auth service is an ancillary service that can be used to generate TURN credentials and complete [ICE server configurations](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection#iceservers) to bootstrap clients. See more info [here](AUTH.md). ## Dataplane -The STUNner dataplane is comprised of a fleet of `stunnerd` pods. These pods actually implement the TURN server, using the configuration available in the STUNner ConfigMap which is mapped into the pods' filesystem dynamically. Then, `stunnerd` will watch for changes in the config file and, once a change is detected, it [reconciles](https://kubernetes.io/docs/concepts/architecture/controller) the dataplane to match the new user policies. +The STUNner dataplane is comprised of a fleet of `stunnerd` pods implementing the TURN servers that can be used by clients to create WebRTC connections, plus some additional configuration to expose the TURN services to clients. The complete dataplane configuration per each Gateway is as follows: + +* **`stunnerd` Deployment:** Once you create a new Gateway the gateway operator will spawn a new dataplane for the Gateway automatically. For each Gateway there will be `stunnerd` Deployment with the same name and namespace. The `stunnerd` daemon itself is a TURN server implemented on top of the [pion/turn](https://github.com/pion/turn) Go WebRTC framework. The daemon will instantiate a separate *TURN listener* for each Gateway listener in the gateway configuration to terminate clients' TURN sessions, a *cluster* per each UDPRoute to forward packets to the backend services (e.g., to the media servers), with some ancillary administrative and authentication mechanisms in place to check client credentials, logging, etc. Whenever you modify a Gateway (UDPRoute), the gateway operator renders a new dataplane configuration with the modified listener (cluster, respectively) specs and downloads it to the `stunnerd` pods, which in turn reconcile their internal state with respect the new configuration. You are free to scale the dataplane to as many `stunnerd` pods as you wish: Kubernetes will make sure that new client connections are distributed evenly over the scaled-out STUNner dataplane. + +* **LoadBalancer Service:** STUNner creates a separate LoadBalancer Service per each Gateway to expose the TURN listeners of the `stunnerd` pods to the outside world. Similarly to the case of the `stunnerd` Deployment, there will be a separate LoadBalancer Service per each Gateway with the same name and namespace. -The `stunnerd` daemon itself is essentially a simple TURN server on top of [pion/turn](https://github.com/pion/turn) written in Go. The daemon will instantiate a separate *TURN listener* for each Gateway listener in the gateway hierarchy to terminate clients' TURN sessions, a *cluster* per each UDPRoute to forward packets to the backend services (e.g., to the media servers), with some ancillary administrative and authentication mechanisms in place to check client credentials before admitting traffic into the cluster, logging, etc. There is a one-to-one mapping between the control-plane Gateway listeners and the `stunnerd` TURN listeners, as well as between the UDPRoute resources and `stunnerd`'s clusters. Whenever you modify a Gateway (UDPRoute), the gateway operator renders a new dataplane configuration with the modified listener (cluster, respectively) specs and the `stunnerd` pods reconcile their internal state to the new configuration. You are free to scale the dataplane to as many `stunnerd` pods as you like: Kubernetes will make sure that new client connections are distributed evenly over the scaled-out STUNner dataplane. +* **STUNner ConfigMap**: In order to simplify troubleshooting a STUNner setup, the dataplane configuration of each Gateway is always made available in a ConfigMap for human inspection. Again, the name and namespace of the ConfigMap is the same as those of the corresponding Gateway. Note that this ConfigMap is no longer used by the dataplane for reconciliation, it is there only fo debugging purposes and may be silently removed in a later release. You can use STUNner's own [Config Discovery Service client](https://pkg.go.dev/github.com/l7mp/stunner@v0.16.2/pkg/config/client) to obtain dataplane configuration right from the gateway operator instead. diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index 3e0dd317..24da4338 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -6,10 +6,9 @@ can act either as a simple headless STUN/TURN server or a fully fledged ingress an entire Kubernetes-based media server pool. Second, when STUNner is configured as an ingress gateway then there are multiple [ICE models](#ice-models), based on whether only the client connects via STUNner or both clients and media servers use STUNner to set up the media-plane -connection. Third, STUNner can run in one of several [control plane models](#control-plane-models), -based on whether the user manually supplies STUNner configuration or there is a separate STUNner -control plane that automatically reconciles the dataplane based on a high-level [declarative -API](https://gateway-api.sigs.k8s.io). +connection. Third, STUNner can run in one of several [data plane models](#data-plane-models), based +on whether the dataplane is automatically provisioned or the user has to manually supply the +dataplane pods for STUNner. ## Architectural models @@ -26,11 +25,11 @@ this case the STUN/TURN servers are deployed into Kubernetes. ![STUNner headless deployment architecture](img/stunner_standalone_arch.svg) -> **Warning** -For STUNner to be able to connect WebRTC clients and servers in the headless model *all* the -clients and servers *must* use STUNner as the TURN server. This is because STUNner opens the -transport relay connections *inside* the cluster, on a private IP address, and this address is -reachable only to STUNner itself, but not for external STUN/TURN servers. + + + + + ### Media-plane deployment model @@ -52,12 +51,18 @@ for clients' UDP transport streams then STUNner can be scaled freely, otherwise result the [disconnection of a small number of client connections](https://cilium.io/blog/2020/11/10/cilium-19/#maglev). -#### Asymmetric ICE mode +## ICE models -The standard mode to supply an ICE server configuration for clients and media servers in the -media-plane deployment model of STUNner is the *asymmetric ICE mode*. In this model the client is -configured with STUNner as the TURN server and media servers run with no STUN or TURN servers -whatsoever. +The peers willing to create a connection via STUNner (e.g., two clients as per the headless model, +or a client and a media server in the media-plane deployment model) need to decide how to create +ICE candidates. + +### Asymmetric ICE mode + +In *asymmetric ICE mode*, one peer is configured with STUNner as the TURN server and the other peer +runs with no STUN or TURN servers whatsoever. The first peer will create a TURN transport relay +connection via STUNner to which the other peer can directly join. Asymmetric ICE mode is the +recommended way for the media-plane deployment model. ![STUNner asymmetric ICE mode](img/stunner_asymmetric_ice.svg) @@ -71,37 +76,34 @@ connection. In contrast, servers run without any STUN/TURN server whatsoever, so only. Due to servers being deployed into ordinary Kubernetes pods, the server's host candidate will likewise contain a private pod IP address. Then, since in the Kubernetes networking model ["pods can communicate with all other pods on any other node without a -NAT"](https://kubernetes.io/docs/concepts/services-networking), clients' relay candidates and the -servers' host candidates will have direct connectivity in the Kubernetes private container network -and the ICE connectivity check will succeed. See more explanation +NAT"](https://kubernetes.io/docs/concepts/services-networking), the client's relay candidate and +the server's host candidate will have direct connectivity in the Kubernetes private container +network and the ICE connectivity check will succeed. See more explanation [here](examples/kurento-one2one-call/README.md#what-is-going-on-here). -> **Warning** -Refrain from configuring additional public STUN/TURN servers, apart from STUNner itself. The rules -to follow in setting the [ICE server +Refrain from configuring additional public STUN/TURN servers apart from STUNner itself. The rules +to follow for setting the [ICE server configuration](https://github.com/l7mp/stunner#configuring-webrtc-clients) in asymmetric ICE mode are as below: -> - on the client, set STUNner as the *only* TURN server and configure *no* STUN servers, whereas -> - on the server do *not* configure *any* STUN or TURN servers whatsoever. - -Most users will want to deploy STUNner using the asymmetric ICE mode. In the rest of the docs we -assume the asymmetric ICE mode with the media plane deployment model, unless noted otherwise. +- on the client, set STUNner as the *only* TURN server and configure *no* STUN servers, and +- on the server do *not* configure *any* STUN or TURN server whatsoever. -> **Warning** -Deviating from the above rules *might* work in certain cases, but may have uncanny and -hard-to-debug side-effects. For instance, configuring clients and servers with public STUN servers -in certain unlucky situations may allow them to connect via server-reflexive ICE candidates, -completely circumventing STUNner. This is on the one hand extremely fragile and, on the other hand, -a security vulnerability; remember, STUNner should be the *only* external access point to your -media plane. It is a good advice to set the `iceTransportPolicy` to `relay` on the clients to avoid -side-effects: this will prevent clients from generating host and server-reflexive ICE candidates, -leaving STUNner as the only option to obtain an ICE candidate from. +Deviating from these rules *might* work in certain cases, but may have uncanny and hard-to-debug +side-effects. For instance, configuring clients and servers with public STUN servers in certain +unlucky situations may allow them to connect via server-reflexive ICE candidates, completely +circumventing STUNner. This is on the one hand extremely fragile and, on the other hand, a security +vulnerability; remember, STUNner should be the *only* external access point to your media plane. It +is a good advice to set the `iceTransportPolicy` to `relay` on the clients to avoid side-effects: +this will prevent clients from generating host and server-reflexive ICE candidates, leaving STUNner +as the only option to obtain an ICE candidate from. -#### Symmetric ICE mode +### Symmetric ICE mode In the symmetric ICE mode both the client and the server obtain an ICE [relay candidate](https://developer.mozilla.org/en-US/docs/Web/API/RTCIceCandidate/type) from STUNner and -the connection occurs directly via STUNner. +the connection occurs directly via STUNner. This is the simplest mode for the headless deployment +model, but symmetric mode can also be used for the media-plane model as well to connect clients to +media servers. ![STUNner symmetric ICE mode](img/stunner_symmetric_ice.svg) @@ -118,7 +120,7 @@ priorities](https://www.ietf.org/rfc/rfc5245.txt) to different connection types) is a good practice to configure the STUNner TURN URI in the server-side ICE server configuration with the *internal* IP address and port used by STUNner (i.e., the ClusterIP of the `stunner` Kubernetes service and the corresponding port), otherwise the server might connect via the external -LoadBalancer IP causing an unnecessary roundtrip. +LoadBalancer IP causing an unnecessary roundtrip (hairpinning). The symmetric mode means more overhead compared to the asymmetric mode, since STUNner now performs TURN encapsulation/decapsulation for both sides. However, the symmetric mode comes with certain @@ -127,20 +129,10 @@ internal IP addresses in the ICE candidates from attackers; note that this is no but feel free to open an issue if [exposing internal IP addresses](SECURITY.md) is blocking you from adopting STUNner. -## Control plane models - -STUNner can run in one of several modes. - -In the default mode STUNner configuration is controlled by a *gateway-operator* component based on -high-level intent encoded in [Kubernetes Gateway API resources](https://gateway-api.sigs.k8s.io), -while in the *standalone model* the user configures STUNner manually. The standalone mode provides -perfect control over the way STUNner ingests media, but at the same time it requires users to deal -with the subtleties of internal STUNner APIs that are subject to change between subsequent -releases. As of v0.16, STUNner's operator-ful mode is feature complete and the standalone model is -considered obsolete. If still interested, comprehensive documentation for the standalone can be -found [here](OBSOLETE.md), but this mode is no longer supported. +## Data plane models -In addition, STUNner supports two dataplane provisioning modes. In the *legacy* mode the dataplane -is supposed to be deployed by the user manually (by installing the `stunner/stunner` Helm chart -into the target namespaces) while in the *managed* mode the dataplane pods are provisioned by the -gateway operator automatically. As of STUNner v0.16.0, the default is the *legacy* dataplane mode. +STUNner supports two dataplane provisioning modes. In the default *managed* mode, the dataplane +pods (i.e., the `stunnerd` pods) are provisioned automatically per each Gateway existing in the +cluster. In the *legacy* mode, the dataplane is supposed to be deployed by the user manually by +installing the `stunner/stunner` Helm chart into the target namespaces. Legacy mode is considered +obsolete at this point and it will be removed in a later release. diff --git a/docs/GATEWAY.md b/docs/GATEWAY.md index 16742336..295ef4f1 100644 --- a/docs/GATEWAY.md +++ b/docs/GATEWAY.md @@ -1,23 +1,15 @@ # Reference -The [STUNner gateway operator](https://github.com/l7mp/stunner-gateway-operator) exposes the control plane configuration using the standard [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io). This allows to configure STUNner in the familiar YAML-engineering style via Kubernetes manifests. The below reference gives a quick overview of the Gateway API. Note that STUNner implements only a subset of the full [spec](GATEWAY.md), see [here](https://github.com/l7mp/stunner-gateway-operator#caveats) for a list of the most important simplifications. - -## Overview - -The main unit of the control plane configuration is the *gateway hierarchy*. Here, a Gateway hierarchy is a collection of [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources) that together describe the way media traffic should enter the cluster via STUNner, including public IP addresses and ports clients can use to reach STUNner, TURN credentials, routing rules, etc. The anchor of the gateway hierarchy is the GatewayClass object, and the rest of the resources form a complete hierarchy underneath it. - -![Gateway hierarchy](img/gateway_api.svg) - -In general, the scope of a gateway hierarchy is a single namespace, but this is not strictly enforced: e.g., the GatewayClass is [cluster-scoped](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions) so it is outside the namespace, GatewayClasses can refer to GatewayConfigs across namespaces, Routes can attach to Gateways across a namespace boundary (if the Gateway [allows](https://gateway-api.sigs.k8s.io/guides/multiple-ns) this), etc. Still, it is a good practice to keep all control plane configuration, plus the actual dataplane pods, in a single namespace as much as possible. +The [STUNner gateway operator](https://github.com/l7mp/stunner-gateway-operator) exposes the control plane configuration using the standard [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io). This allows to configure STUNner in the familiar YAML-engineering style via Kubernetes manifests. The below reference gives an overview of the subset of the Gateway API supported by STUNner, see [here](https://github.com/l7mp/stunner-gateway-operator#caveats) for a list of the most important simplifications. ## GatewayClass -The GatewayClass resource provides the root of the gateway hierarchy. GatewayClass resources are cluster-scoped, so they can be attached to from any namespace, and we usually assume that each namespaced gateway hierarchy will have a separate global GatewayClass as the anchor. +The GatewayClass resource provides the root of a STUNner gateway configuration. GatewayClass resources are cluster-scoped, so they can be attached to from any namespace. -Below is a sample GatewayClass resource. Each GatewayClass must specify a controller that will manage the Gateway objects created under the hierarchy; this must be set to `stunner.l7mp.io/gateway-operator` for the STUNner gateway operator to pick up the GatewayClass. In addition, a GatewayClass can refer to further implementation-specific configuration via a `parametersRef`; in the case of STUNner this will always be a GatewayConfig object (see [below](#gatewayconfig)). +Below is a sample GatewayClass resource. Each GatewayClass specifies a controller that will manage the Gateway objects created under the class; this must be set to `stunner.l7mp.io/gateway-operator` for the STUNner gateway operator to pick up the GatewayClass. In addition, a GatewayClass can refer to further implementation-specific configuration via a `parametersRef`; in the case of STUNner this will always be a GatewayConfig object (see [below](#gatewayconfig)). ```yaml -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: stunner-gatewayclass @@ -41,12 +33,12 @@ Below is a quick reference of the most important fields of the GatewayClass [`sp ## GatewayConfig -The GatewayConfig resource provides general configuration for STUNner, most importantly the STUN/TURN authentication [credentials](AUTH.md) clients can use to connect to STUNner. GatewayClass resources attach a STUNner configuration to the hierarchy by specifying a particular GatewayConfig in the GatewayClass `parametersRef`. GatewayConfig resources are namespaced, and every hierarchy can contain at most one GatewayConfig. Failing to specify a GatewayConfig is an error because the authentication credentials cannot be learned by the dataplane otherwise. +The GatewayConfig resource provides general configuration for STUNner, most importantly the STUN/TURN authentication [credentials](AUTH.md) clients can use to connect to STUNner. GatewayClass resources attach a STUNner configuration to the hierarchy by specifying a particular GatewayConfig in the GatewayClass `parametersRef`. GatewayConfig resources are namespaced, and every hierarchy can contain at most one GatewayConfig. Failing to specify a GatewayConfig is an error because the authentication credentials cannot be learned otherwise. -The following example takes the [STUNner authentication settings](AUTH.md) from the Secret called `stunner-auth-secret` in the `stunner` namespace, sets the authentication realm to `stunner.l7mp.io`, sets the dataplane loglevel to `all:DEBUG,turn:INFO` (this will set all loggers to `DEBUG` level except the TURN protocol machinery's logger which is set to `INFO`), and sets the default URL for metric scraping. +The following example takes the [STUNner authentication settings](AUTH.md) from the Secret called `stunner-auth-secret` in the `stunner` namespace, sets the authentication realm to `stunner.l7mp.io`, and sets the dataplane loglevel to `all:DEBUG,turn:INFO` (this will set all loggers to `DEBUG` level except the TURN protocol machinery's logger which is set to `INFO`). ```yaml -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: GatewayConfig metadata: name: stunner-gatewayconfig @@ -54,42 +46,38 @@ metadata: spec: logLevel: "all:DEBUG,turn:INFO" realm: stunner.l7mp.io - authRef: + authRef: name: stunner-auth-secret namespace: stunner - metricsEndpoint: "http://0.0.0.0:8080/metrics" ``` -Below is a quick reference of the most important fields of the GatewayConfig [`spec`](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects) +Below is a reference of the most important fields of the GatewayConfig [`spec`](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects) | Field | Type | Description | Required | | :--- | :---: | :--- | :---: | -| `stunnerConfig` | `string` | The name of the ConfigMap into which the operator renders the `stunnerd` running configuration. Default: `stunnerd-config`. | No | -| `logLevel` | `string` | Logging level for the dataplane daemon pods (`stunnerd`). Default: `all:INFO`. | No | +| `dataplane` | `string` | The name of the Dataplane template to use for provisioning `stunnerd` pods. Default: `default`. | No | +| `logLevel` | `string` | Logging level for the dataplane pods. Default: `all:INFO`. | No | | `realm` | `string` | The STUN/TURN authentication realm to be used for clients to authenticate with STUNner. The realm must consist of lower case alphanumeric characters or `-` and must start and end with an alphanumeric character. Default: `stunner.l7mp.io`. | No | | `authRef` | `reference` | Reference to a Secret (`namespace` and `name`) that defines the STUN/TURN authentication mechanism and the credentials. | No | | `authType` | `string` | Type of the STUN/TURN authentication mechanism. Valid only if `authRef` is not set. Default: `static`. | No | | `username` | `string` | The username for [`static` authentication](AUTH.md). Valid only if `authRef` is not set. | No | | `password` | `string` | The password for [`static` authentication](AUTH.md). Valid only if `authRef` is not set. | No | | `sharedSecret` | `string` | The shared secret for [`ephemeral` authentication](AUTH.md). Valid only if `authRef` is not set. | No | -| `metricsEndpoint` | `string` | The metrics server (Prometheus) endpoint URL for the `stunnerd` pods.| No | -| `healthCheckEndpoint` | `string` | HTTP health-check endpoint exposed by `stunnerd`. Liveness check will be available on path `/live` and readiness check on path `/ready`. Default is to enable health-checking on `http://0.0.0.0:8086/ready` and `http://0.0.0.0:8086/live`, use an empty string to disable.| No | | `authLifetime` | `int` | The lifetime of [`ephemeral` authentication](AUTH.md) credentials in seconds. Not used by STUNner.| No | -| `loadBalancerServiceAnnotations` | `map[string]string` | A list of annotations that will go into the LoadBalancer services created automatically by STUNner to obtain a public IP addresses. See more detail [here](https://github.com/l7mp/stunner/issues/32). | No | +| `loadBalancerServiceAnnotations` | `map[string]string` | A list of annotations that will go into the LoadBalancer services created automatically by STUNner to obtain a public IP address. See more detail [here](https://github.com/l7mp/stunner/issues/32). | No | -> **Warning** At least a valid username/password pair *must* be supplied for `static` authentication, or a `sharedSecret` for the `ephemeral` mode, either via an external Secret or inline in the GatewayConfig. External authentication settings override inline settings. Missing both is an error. -Except the TURN authentication realm, all GatewayConfig resources are safe for modification. That is, the `stunnerd` daemons know how to reconcile a change in the GatewayConfig without restarting listeners/TURN servers. Changing the realm, however, induces a *full* TURN server restart (see below). +Except the TURN authentication realm, all GatewayConfig resources are safe for modification. That is, the `stunnerd` daemons know how to reconcile a change in the GatewayConfig without restarting listeners/TURN servers. Changing the realm, however, induces a *full* dataplane restart. ## Gateway Gateways describe the STUN/TURN server listeners exposed to clients. -The below Gateway will configure STUNner to open a STUN/TURN listener over the UDP port 3478 and automatically expose it on a public IP address and port by creating a [LoadBalancer service](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer). The name and namespace of the automatically provisioned service are the same as those of the Gateway, and the service is automatically updated if the Gateway changes (e.g., a port changes). +The below Gateway resource will configure STUNner to open a STUN/TURN listener over the UDP port 3478 and make it available on a public IP address and port to clients. Each Gateway will have a `stunnerd` Deployment that will run the dataplane, a LoadBalancer Service that will expose the dataplane to the Internet, and an ancillary ConfigMap that will hold the corresponding configuration, each using the same name and namespace as the Gateway. Once the Gateway is removed, the corresponding resources are automatically garbage-collected. ```yaml -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: udp-gateway @@ -102,10 +90,10 @@ spec: protocol: TURN-UDP ``` -The below more complex example defines two TURN listeners: a TURN listener at the UDP:3478 port that accepts routes from any namespace, and a TURN listener at port TLS/TCP:443 that accepts routes from all namespaces labeled as `app:dev`. +The below example defines two TURN listeners: a TURN listener at the UDP:3478 port that accepts routes from any namespace (see below), and a TURN listener at port TLS/TCP:443 that accepts routes only from namespaces labeled with `app=dev`. ```yaml -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: complex-gateway @@ -142,7 +130,7 @@ spec: app: dev ``` -Below is a quick reference of the most important fields of the Gateway [`spec`](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects). +Below is a reference of the most important fields of the Gateway [`spec`](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects). | Field | Type | Description | Required | | :--- | :---: | :--- | :---: | @@ -150,21 +138,35 @@ Below is a quick reference of the most important fields of the Gateway [`spec`]( | `listeners` | `list` | The list of TURN listeners. | Yes | | `addresses` | `list` | The list of manually hinted external IP addresses for the rendered service (only the first one is used). | No | -Each TURN `listener` is defined by a unique name, a transport protocol and a port. In addition, a -`tls` configuration is required for TURN-TLS and TURN-DTLS listeners. +> [!WARNING] +> +> Gateway resources are *not* safe for modification. This means that certain changes to a Gateway will restart the underlying TURN server listener, causing all active client sessions to terminate. The particular rules are as follows: +> - adding or removing a listener will start/stop *only* the TURN listener being created/removed, without affecting the rest of the listeners on the same Gateway; +> - changing the transport protocol, port or TLS keys/certs of an *existing* listener will restart the TURN listener but leave the rest of the listeners intact; +> - changing the TURN authentication realm will restart *all* TURN listeners. + +### Listener configuration + +Each TURN `listener` is defined by a unique name, a transport protocol and a port. In addition, a `tls` configuration is required for TURN-TLS and TURN-DTLS listeners. Per-listener configuration is as follows. | Field | Type | Description | Required | | :--- | :---: | :--- | :---: | -| `name` | `string` | Name of the TURN listener. | Yes | +| `name` | `string` | Name of the TURN listener. Must be unique per Gateway. | Yes | | `port` | `int` | Network port for the TURN listener. | Yes | | `protocol` | `string` | Transport protocol for the TURN listener. Either TURN-UDP, TURN-TCP, TURN-TLS or TURN-DTLS. | Yes | | `tls` | `object` | [TLS configuration](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io%2fv1beta1.GatewayTLSConfig).| Yes (for TURN-TLS/TURN-DTLS) | -| `allowedRoutes.from` | `object` | [Route attachment policy](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io/v1beta1.AllowedRoutes), either `All`, `Selector`, or `Same` (default is `Same`) | No | +| `allowedRoutes.from` | `object` | [Route attachment policy](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io/v1beta1.AllowedRoutes), either `All`, `Selector`, or `Same`. Default: `Same`. | No | For TURN-TLS/TURN-DTLS listeners, `tls.mode` must be set to `Terminate` or omitted (`Passthrough` does not make sense for TURN), and `tls.certificateRefs` must be a [reference to a Kubernetes Secret](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io%2fv1beta1.GatewayTLSConfig) of type `tls` or `opaque` with exactly two keys: `tls.crt` must hold the TLS PEM certificate and `tls.key` must hold the TLS PEM key. +### Load balancer configuration + STUNner will automatically generate a Kubernetes LoadBalancer service to expose each Gateway to clients. All TURN listeners specified in the Gateway are wrapped by a single Service and will be assigned a single externally reachable IP address. If you want multiple TURN listeners on different public IPs, create multiple Gateways. TURN over UDP and TURN over DTLS listeners are exposed as UDP services, TURN-TCP and TURN-TLS listeners are exposed as TCP. +STUNner implements two ways to customize the automatically created Service, both involving certain per-defined [annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations) added to the Service. This is useful to, e.g., specify health-check settings for the Kubernetes load-balancer controller. The special annotation `stunner.l7mp.io/service-type` can be used to customize the type of the Service created by STUNner. The value can be either `ClusterIP`, `NodePort`, or `LoadBalancer` (this is the default); for instance, setting `stunner.l7mp.io/service-type: ClusterIP` will prevent STUNner from exposing a Gateway publicly (useful for testing). + +By default, each key-value pair set in the GatewayConfig `loadBalancerServiceAnnotations` field will be copied verbatim into the Service. Service annotations can be customized on a per-Gateway basis as well, by adding the corresponding annotations to a Gateway resource. STUNner copies all annotations from the Gateway into the Service, overwriting the annotations specified in the GatewayConfig on conflict. + Manually hinted external address describes an address that can be bound to a Gateway. It is defined by an address type and an address value. Note that only the first address is used. Setting the `spec.addresses` field in the Gateway will result in the rendered Service's [loadBalancerIP](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#service-v1-core:~:text=non%20%27LoadBalancer%27%20type.-,loadBalancerIP,-string) and [externalIPs](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#service-v1-core:~:text=and%2Dservice%2Dproxies-,externalIPs,-string%20array) fields to be set. | Field | Type | Description | Required | @@ -172,13 +174,14 @@ Manually hinted external address describes an address that can be bound to a Gat | `type` | `string` | Type of the address. Currently only `IPAddress` is supported. | Yes | | `value` | `string` | Address that should be bound to the Gateway's service. | Yes | -> **Warning** -Be careful when using this feature. Since Kubernetes v1.24 the `loadBalancerIP` field is deprecated and it will be ignored if the cloud-provider or your Kubernetes install do not support the feature. In addition, the `externalIPs` field is denied by some cloud-providers. +> [!WARNING] +> +> Be careful when using this feature. Since Kubernetes v1.24 the `loadBalancerIP` field is deprecated and it will be ignored if the cloud-provider or your Kubernetes install do not support the feature. In addition, the `externalIPs` field is denied by some cloud-providers. -Mixed multi-protocol Gateways are supported: this means if you want to expose a UDP and a TCP port on the same LoadBalancer service you can do it with a single Gateway. The below Gateway will expose both ports with their respective protocols. +[Mixed multi-protocol LoadBalancer Services](https://kubernetes.io/docs/concepts/services-networking/service/#load-balancers-with-mixed-protocol-types) are supported: this means if you want to expose a UDP and a TCP port on the same IP you can do it with a single Gateway. The below Gateway will expose both ports with their respective protocols. ```yaml -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: mixed-protocol-gateway @@ -195,22 +198,32 @@ spec: protocol: TURN-TCP ``` -> **Warning** -> Since mixed-protocol LB support is not supported in many popular Kubernetes offerings, STUNner currently defaults to disabling this feature for compatibility reasons. You can re-enable mixed-protocol LBs by annotating your Gateway with the `stunner.l7mp.io/enable-mixed-protocol-lb: true` key-value pair. +> [!WARNING] +> +> Since mixed-protocol LB support is not supported in many popular Kubernetes offerings, STUNner currently defaults to disabling this feature. You can enable mixed-protocol LBs by annotating a Gateway with the `stunner.l7mp.io/enable-mixed-protocol-lb: true` key-value pair. -STUNner implements two ways to customize the automatically created Service, both involving adding certain [annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations) to the Service. First, if any annotation is set in the GatewayConfig `loadBalancerServiceAnnotations` field then those will be copied verbatim into the Service. Note that `loadBalancerServiceAnnotations` affect *all* LoadBalancer Services created by STUNner under the current Gateway hierarchy. Second, Service annotations can be customized on a per-Gateway basis as well by adding the annotations to Gateway resources. STUNner then copies all annotations from the Gateway verbatim into the Service, overwriting the annotations specified in the GatewayConfig on conflict. This is useful to, e.g., specify health-check settings for the Kubernetes load-balancer controller. The special annotation `stunner.l7mp.io/service-type` can be used to customize the type of the Service created by STUNner. The value can be either `ClusterIP`, `NodePort`, or `LoadBalancer` (this is the default); for instance, setting `stunner.l7mp.io/service-type: ClusterIP` will prevent STUNner from exposing a Gateway publicly (useful for testing). +## UDPRoute -> **Warning** -Gateway resources are *not* safe for modification. This means that certain changes to a Gateway will restart the underlying TURN server listener, causing all active client sessions to terminate. The particular rules are as follows: -> - adding or removing a listener will start/stop *only* the TURN listener to be started/stopped, without affecting the rest of the listeners on the same Gateway; -> - changing the transport protocol, port or TLS keys/certs of an *existing* listener will restart the TURN listener but leave the rest of the listeners intact; -> - changing the TURN authentication realm will restart *all* TURN listeners. +UDPRoute resources can be attached to Gateways in order to specify the backend services permitted to be reached via the Gateway. Multiple UDPRoutes can attach to the same Gateway, and each UDPRoute can specify multiple backend services; in this case access to *all* backends in *each* of the attached UDPRoutes is allowed. An UDPRoute can be attached to a Gateway by setting the `parentRef` to the Gateway's name and namespace. This is, however, contingent on whether the Gateway accepts routes from the given namespace: customize the `allowedRoutes` per each Gateway listener to control which namespaces the listener accepts routes from. -## UDPRoute +The below UDPRoute will configure STUNner to route client connections received on the Gateway called `udp-gateway` to *any UDP port* on the pods of the media server pool identified by the Kubernetes service `media-server-pool` in the `media-plane` namespace. -UDPRoute resources can be attached to Gateways in order to specify the backend services permitted to be reached via the Gateway. Multiple UDPRoutes can attach to the same Gateway, and each UDPRoute can specify multiple backend services; in this case access to *all* backends in *each* of the attached UDPRoutes is allowed. An UDPRoute can be attached only to a Gateway in any namespace by setting the `parentRef` to the Gateway's name and namespace. This is, however, contingent on whether the Gateway accepts routes from the given namespace: customize the `allowedRoutes` for each Gateway listener to control which namespaces the listener accepts routes from. +```yaml +apiVersion: stunner.l7mp.io/v1 +kind: UDPRoute +metadata: + name: media-plane-route + namespace: stunner +spec: + parentRefs: + - name: udp-gateway + rules: + - backendRefs: + - name: media-server-pool + namespace: media-plane +``` -The below UDPRoute will configure STUNner to route client connections received on the Gateway called `udp-gateway` to the media server pool identified by the Kubernetes service `media-server-pool` in the `media-plane` namespace. +Note that STUNner provides its own UDPRoute resource instead of the official UDPRoute resource available in the Gateway API. In contrast to the official version, still at version v1alpha2, STUNner's UDPRoutes can be considered stable and expected to be supported throughout the entire lifetime of STUNner v1. You can still use the official UDPRoute resource as well, by changing the API version and adding an arbitrary port to the backend references (this is required by the official API). Note that the port will be omitted. ```yaml apiVersion: gateway.networking.k8s.io/v1alpha2 @@ -225,26 +238,37 @@ spec: - backendRefs: - name: media-server-pool namespace: media-plane + port: 1 ``` -Below is a quick reference of the most important fields of the UDPRoute [`spec`](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects). +Below is a reference of the most important fields of the STUNner UDPRoute `spec`. | Field | Type | Description | Required | | :--- | :---: | :--- | :---: | | `parentRefs` | `list` | Name/namespace of the Gateways to attach the route to. If no namespace is given, then the Gateway will be searched in the UDPRoute's namespace. | Yes | -| `rules.backendRefs` | `list` | A list of `name`/`namespace` pairs specifying the backend Service(s) reachable through the UDPRoute. It is allowed to specify a service from a namespace other than the UDPRoute's own namespace. | No | +| `rules.backendRefs` | `list` | A list of backends (Services or StaticServices) reachable through the UDPRoute. It is allowed to specify a service from a namespace other than the UDPRoute's own namespace. | No | + +Backend reference configuration is as follows: + +| Field | Type | Description | Required | +| :--- | :---: | :--- | :---: | +| `group` | `string` | API group for the backend, either empty string for Service backends or `stunner.l7mp.io` for StaticService backends. Default: `""`. | No | +| `kind` | `string` | The kind of the backend resource, either `Service` or `StaticService`. Default: `Service`. | No | +| `name` | `string` | Name of the backend Service or StaticService. | Yes | +| `namespace` | `string` | Namespace of the backend Service or StaticService. | Yes | +| `port` | `int` | Port to use to reach the backend. If empty, make all ports available on the backend. Default: empty.| No | +| `endPort` | `int` | If port is also specified, then access to the backend is restricted to the port range [port, endPort] inclusive. If port and endPort are empty, make all ports available on the backend. If port is given but endPort is not, admit the singleton port range [port,port]. Default: empty.| No | UDPRoute resources are safe for modification: `stunnerd` knows how to reconcile modified routes without restarting any listeners/TURN servers. ## StaticService -When the target backend of a UDPRoute is running *inside* Kubernetes then the backend is always a proper Kubernetes Service. However, when the target is deployed *outside* Kubernetes then there is no Kubernetes Service that could be configured as a backend. This is particularly important when STUNner is used as a public TURN service. The StaticService resource provides a way to assign a routable IP address range to a UDPRoute for these cases. +When the target backend of a UDPRoute is running *inside* Kubernetes then the backend is always a proper Kubernetes Service. However, when the target is deployed *outside* Kubernetes then there is no Kubernetes Service that could be configured as a backend. This is particularly problematic in the cases when STUNner is used as a public TURN service. For such deployments, the StaticService resource provides a way to assign a routable IP address range to a UDPRoute. -The below StaticService represents a hypothetical Kubernetes Service backing a set of pods with IP -addresses in the range `192.0.2.0/24` or `198.51.100.0/24`. +The below StaticService represents a hypothetical Kubernetes Service backing a set of pods with IP addresses in the range `192.0.2.0/24` or `198.51.100.0/24`. ```yaml -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: StaticService metadata: name: static-svc @@ -258,7 +282,7 @@ spec: Assigning this StaticService to a UDPRoute will make sure allows access to *any* IP address in the specified ranges. ```yaml -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: media-plane-route @@ -275,16 +299,69 @@ spec: The StaticService `spec.prefixes` must be a list of proper IPv4 prefixes: any IP address in any of the listed prefixes will be whitelisted. Use the single prefix `0.0.0.0/0` to provide wildcard access via an UDPRoute. -> **Warning** -Never use StaticServices to access Services running *inside* Kubernetes, this may open up an unintended backdoor to your cluster. Use StaticServices only with *external* target backends. +> [!WARNING] +> +> Never use StaticServices to access Services running *inside* Kubernetes, this may open up an unintended backdoor to your cluster. Use StaticServices only with *external* target backends. -## Status +## Dataplane -Most Kubernetes resources contain a `status` subresource that describes the current state of the resource, supplied and updated by the Kubernetes system and its components. The Kubernetes control plane continually and actively manages every object's actual state to match the desired state you supplied and updates the status field to indicate whether any error was encountered during the reconciliation process. +The Dataplane resource is used as a template for provisioning of `stunnerd` pods. This is useful to choose the image origin and version, set custom command line arguments and environment variables to the `stunnerd` daemon, configure resource requests/limits, etc. -If you are not sure about whether the STUNner gateway operator successfully picked up your Gateways or UDPRoutes, it is worth checking the status to see what went wrong. +Below is the `default` Dataplane installed by STUNner. -```console -kubectl get -n -o jsonpath='{.status}' +```yaml +apiVersion: stunner.l7mp.io/v1 +kind: Dataplane +metadata: + name: default +spec: + command: + - stunnerd + args: + - -w + - --udp-thread-num=16 + image: l7mp/stunnerd:latest + resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 128Mi + terminationGracePeriodSeconds: 3600 ``` +Below is a reference of the most important fields of the Dataplane `spec` that can be used to customize the provisioning of `stunnerd` pods. + +| Field | Type | Description | Required | +|:--------------------------------|:----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:| +| `image` | `string` | The container image. | Yes | +| `imagePullPolicy` | `string` | Policy for if/when to pull an image, can be either `Always`, `Never`, or `IfNotPresent`. Default: `Always` if `:latest` tag is specified on the image, or `IfNotPresent` otherwise. | No | +| `command` | `list` | Entrypoint array. Default: `stunnerd`. | No | +| `args` | `list` | Arguments to the entrypoint. | Yes | +| `envFrom` | `list` | List of sources to populate environment variables in the container. Default: empty. | No | +| `env` | `list` | List of environment variables to set in the container. Default: empty. | No | +| `replicas` | `int` | Number of `stunnerd` pods per Gateway to provision. Not enforced if the `stunnerd` Deployment replica count is overwritten manually or by an autoscaler. Default: 1. | No | +| `hostNetwork` | `bool` | Deploy `stunnerd` into the host network namespace of Kubernetes nodes. Useful for implementing headless TURN services. May require elevated privileges. Default: false. | No | +| `resources` | `object` | Compute resources required by `stunnerd`. Default: whatever Kubernetes assigns. | No | +| `affinity` | `object` | Scheduling constraints. Default: none. | No | +| `tolerations` | `object` | Tolerations. Default: none. | No | +| `disableHealthCheck` | `bool` | Disable health-checking. If true, enable HTTP health-checks on port 8086: liveness probe responder will be exposed on path `/live` and readiness probe on path `/ready`. Default: true. | No | +| `enableMetricsEndpoint` | `bool` | Enable Prometheus metrics scraping. If true, a metrics endpoint will be available at `http://0.0.0.0:8080`. Default: false. | No | +| `terminationGracePeriodSeconds` | `duration` | Optional duration in seconds for `stunnerd` to terminate gracefully. Default: 30 seconds. | No | + +There can be multiple Dataplane resources defined in a cluster, say, one for the production workload and one for development. Use the `spec.dataplane` field in the corresponding GatewayConfig to choose the Dataplane for each STUNner install. + +> [!WARNING] +> +> A Dataplane resource called `default` must always be available in the cluster, otherwise the operator will not know how to provision dataplane pods. Removing the `default` template will break your STUNner installation. + + + + + + + + + + diff --git a/docs/INSTALL.md b/docs/INSTALL.md index 48abb57a..f5f8f28c 100644 --- a/docs/INSTALL.md +++ b/docs/INSTALL.md @@ -2,112 +2,95 @@ ## Prerequisites -You need a Kubernetes cluster (>1.22), and the `kubectl` command-line tool must be installed and -configured to communicate with your cluster. STUNner should be compatible with *any* major hosted -Kubernetes service or any on-prem Kubernetes cluster; if not, please file an issue. +You need a Kubernetes cluster (>1.22), and the `kubectl` command-line tool must be installed and configured to communicate with your cluster. STUNner should be compatible with *any* major hosted Kubernetes service or any on-prem Kubernetes cluster; if not, please file an issue. -The simplest way to expose STUNner to clients is through Kubernetes [LoadBalancer -services](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer); -these are automatically managed by STUNner. This depends on a functional LoadBalancer integration -in your cluster (if using Minikube, try `minikube tunnel` to get an idea of how this -works). STUNner automatically detects if LoadBalancer service integration is functional and falls -back to using NodePorts when it is not; however, this may require manual tweaking of the firewall -rules to admit the UDP NodePort range into the cluster. +The simplest way to expose STUNner to clients is through Kubernetes [LoadBalancer services](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer); these are automatically managed by STUNner. This depends on a functional LoadBalancer integration in your cluster (if using Minikube, try `minikube tunnel` to get an idea of how this works). STUNner automatically detects if LoadBalancer service integration is functional and falls back to using NodePorts when it is not; however, this may require manual tweaking of the firewall rules to admit the UDP NodePort range into the cluster. -To recompile STUNner, at least Go v1.19 is required. Building the container images requires -[Docker](https://docker.io) or [Podman](https://podman.io). +To compile STUNner, at least Go v1.19 is required. Building the container images requires [Docker](https://docker.io) or [Podman](https://podman.io). -## Basic installation +## Installation -The simplest way to deploy the full STUNner distro, with the dataplane and the controller -automatically installed, is through [Helm](https://helm.sh). STUNner configuration parameters are -available for customization as [Helm -Values](https://helm.sh/docs/chart_template_guide/values_files). We recommend deploying each -STUNner dataplane into a separate Kubernetes namespace (e.g., `stunner`), while the gateway -operator should go into the `stunner-system` namespace (but effectively any namespace would work). +The simplest way to deploy STUNner is through [Helm](https://helm.sh). STUNner configuration parameters are available for customization as [Helm Values](https://helm.sh/docs/chart_template_guide/values_files); see the [STUNner-helm](https://github.com/l7mp/stunner-helm) repository for a list of the available customizations. -First, register the STUNner repository with Helm. +The first step is to register the STUNner repository with Helm. ```console helm repo add stunner https://l7mp.io/stunner helm repo update ``` -Install the control plane: +### Stable version + +The below will install the stable version of STUNner. In particular, the this will install only the STUNner control plane, i.e., the gateway operator and the authentication service, the dataplane will be automatically provisioned by the operator when needed (but see below). We recommend to use the `stunner-system` namespace to keep the full STUNner control plane in a single scope. ```console -helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace --namespace=stunner-system +helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace \ + --namespace=stunner-system ``` -Install the dataplane: +And that's all: you don't need to install the dataplane separately, this is handled automatically by the operator. The `stunnerd` pods created by the operator can be customized using the Dataplane custom resource: you can specify the `stunnerd` container image version, provision resources per each `stunenrd` pod, deploy into the host network namespace, etc.; see the documentation [here](https://pkg.go.dev/github.com/l7mp/stunner-gateway-operator/api/v1alpha1#DataplaneSpec). + +### Development version + +By default, the Helm chart installs the stable version of STUNner. To track the bleeding edge, STUNner provides a `dev` release channel that tracks the latest development version. Use it at your own risk: we do not promise any stability for the dev-channel. ```console -helm install stunner stunner/stunner --create-namespace --namespace=stunner +helm install stunner-gateway-operator stunner/stunner-gateway-operator-dev --create-namespace \ + --namespace=stunner-system ``` -## Parallel deployments +### Legacy mode + +In the default *managed dataplane mode*, the STUNner gateway operator automatically provisions the dataplane, which substantially simplifies operations and removes lot of manual and repetitive work. For compatibility reasons the traditional operational model, called the *legacy mode*, is still available. In this mode the user is responsible for provisioning both the control plane, by installing the `stunner-gateway-operator` Helm chart, and the dataplane(s), by helm-installing the `stunner` chart possibly multiple times. + +```console +helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace \ + --namespace=stunner-system --set stunnerGatewayOperator.dataplane.mode=legacy +helm install stunner stunner/stunner --create-namespace --namespace=stunner +``` -You can install multiple STUNner dataplanes side-by-side, provided that the corresponding -namespaces are different. For instance, to create a `prod` dataplane installation for your -production workload and a `dev` installation for experimentation, the below commands will install -two dataplanes, one into the `stunner-prod` and another one into the `stunner-dev` namespace. +You can install multiple legacy STUNner dataplanes side-by-side, provided that the corresponding namespaces are different. For instance, to create a `prod` dataplane installation for your production workload and a `dev` installation for experimentation, the below commands will install two dataplanes, one into the `stunner-prod` and another one into the `stunner-dev` namespace. ```console helm install stunner-prod stunner/stunner --create-namespace --namespace=stunner-prod helm install stunner-dev stunner/stunner --create-namespace --namespace=stunner-dev ``` -Now, you can build a separate [gateway hierarchy](CONCEPTS.md) per each namespace to supply a -distinct ingress gateway configuration per dataplane. +## Customization -For the list of available customizations, see the -[STUNner-helm](https://github.com/l7mp/stunner-helm) repository. For installing STUNner in the -standalone mode, consult the documentation [here](OBSOLETE.md). +The Helm charts let you fine-tune STUNner features, including [compute resources](#resources) provisioned for `stunnerd` pods, [UDP multithreading](#udp-multithreading), and[graceful shutdown](#graceful-shutdown). -## Development version +### Resources requests/limits -STUNner provides a `dev` release channel, which allows to track the latest development version. Use -it at your own risk: we do not promise any stability for STUNner installed from the dev-channel. +it is important to manage the [amount of CPU and memory resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers) available for each `stunnerd` pod. The [default](https://github.com/l7mp/stunner-helm/blob/main/helm/stunner-gateway-operator/values.yaml) resource request and limit is set as follows: -```console -helm install stunner-gateway-operator stunner/stunner-gateway-operator-dev --create-namespace --namespace=stunner-system -helm install stunner stunner/stunner-dev --create-namespace --namespace=stunner -``` +```yaml +resources: + limits: + cpu: 2 + memory: 512Mi + requests: + cpu: 500m + memory: 128Mi +``` -## Managed mode +This means that every `stunnerd` pod will request 0.5 CPU cores and 128 Mibytes of memory. Note that the pods will start only if Kubernetes can successfully allocate the given amount of resources. In order to avoid stressing the Kubernetes scheduler, it is advised to keep the limits at the bare minimum and scale out by [increasing the number of running `stunnerd` pods](SCALING.md) if needed. -From v0.16.0 STUNner provides a new way to provision dataplane pods that is called the *managed mode*. In the traditional operational model (called the *legacy mode*), the user was responsible for provisioning both the control plane, by installing the `stunner-gateway-operator` Helm chart, and the dataplane(s), by helm-installing the `stunner` chart [possibly multiple times](#parallel-deployments). In the managed mode the operator *automatically* provisions the necessary dataplanes by creating a separate `stunnerd` Deployment per each Gateway, plus the usual LoadBalancer service to expose it. This substantially simplifies operations and removes lot of manual and repetitive work. +### UDP multithreading -To install the gateway operator using the new manged mode, start with a clean Kubernetes cluster and install the `stunner-gateway-operator` Helm chart, setting the flag `stunnerGatewayOperator.dataplane.mode` to `managed`. Observe that we do not install the `stunner` Helm chart separately; the operator will readily create the `stunnerd` pods as needed. +STUNner can run multiple UDP listeners over multiple parallel readloops for loadbalancing. Namely, ech `stunnerd` pod can create a configurable number of UDP server sockets using `SO_REUSEPORT` and then spawn a separate goroutine to run a parallel readloop per each. The kernel will load-balance allocations across the sockets/readloops per the IP 5-tuple, so the same allocation will always stay at the same CPU. This allows UDP listeners to scale to multiple CPUs, improving performance. Note that this is required only for UDP: TCP, TLS and DTLS listeners spawn a per-client readloop anyway. Also note that `SO_REUSEPORT` is not portable, so currently we enable this only for UNIX architectures. -```console -helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace \ - --namespace=stunner-system --set stunnerGatewayOperator.dataplane.mode=managed +The feature is exposed via the command line flag `--udp-thread-num=` in `stunnerd`. In the Helm chart, it can be enabled or disabled with the `--set stunner.deployment.container.stunnerd.udpMultithreading.enabled=true` flag. By default, UDP multithreading is enabled with 16 separate readloops per each UDP listener. + +```yaml +udpMultithreading: + enabled: true + readLoopsPerUDPListener: 16 ``` -The `stunnerd` pods created by the operator can be customized using the Dataplane CR: for instance you can specify the `stunnerd` container image version to be used as the dataplane, provision resources for each `stunenrd` pod, deploy into the host network namespace, etc.; see the documentation [here](https://pkg.go.dev/github.com/l7mp/stunner-gateway-operator/api/v1alpha1#DataplaneSpec). All gateways will use the `default` Dataplane; you can override this by creating a new Dataplane CR and setting the name in the [`spec.dataplane` field](https://pkg.go.dev/github.com/l7mp/stunner-gateway-operator@v0.15.2/api/v1alpha1#GatewayConfigSpec) of the corresponding GatewayConfig. +### Graceful shutdown + +STUNner has full support for [graceful shutdown](SCALING.md). This means that `stunner` pods will remain alive as long as there are active allocations in the embedded TURN server, and a pod will automatically remove itself once all allocations are deleted or time out. This enables the full support for graceful scale-down: the user can scale the number of `stunner` instances up and down and no harm should be made to active client connections meanwhile. + +The default termination period is set to 3600 seconds (1 hour). To modify, use the `--set stunner.deployment.container.terminationGracePeriodSeconds=` flag. -```console -kubectl get dataplanes.stunner.l7mp.io default -o yaml -apiVersion: stunner.l7mp.io/v1alpha1 -kind: Dataplane -metadata: - name: default -spec: - image: l7mp/stunnerd:latest - imagePullPolicy: Always - command: - - stunnerd - args: - - -w - - --udp-thread-num=16 - hostNetwork: false - resources: - limits: - cpu: 2 - memory: 512Mi - requests: - cpu: 500m - memory: 128Mi - terminationGracePeriodSeconds: 3600 -``` diff --git a/docs/MONITORING.md b/docs/MONITORING.md index 9608d94d..d011279a 100644 --- a/docs/MONITORING.md +++ b/docs/MONITORING.md @@ -1,43 +1,18 @@ # Monitoring -STUNner can export various statistics into an external timeseries database like -[Prometheus](https://prometheus.io). This allows one to observe the state of the STUNner media -gateway instances, like CPU or memory use, as well as the amount of data received and sent, in -quasi-real-time. These statistics can then be presented to the operator in easy-to-use monitoring -dashboards in [Grafana](https://grafana.com). +STUNner can export various statistics into an external timeseries database like [Prometheus](https://prometheus.io). This allows one to observe the state of the STUNner media gateway instances, like CPU or memory use or the amount of data received and sent in quasi-real-time. These statistics can then be presented to the operator in a monitoring dashboard using, e.g., [Grafana](https://grafana.com). ## Configuration -Metrics collection is *not* enabled in the default installation. In order to open the -metrics-collection endpoint for a [gateway hierarchy](GATEWAY.md#overview), configure an -appropriate HTTP URL in the `metricsEndpoint` field of corresponding the -[GatewayConfig](GATEWAY.md#gatewayconfig) resource. - -For instance, the below GatewayConfig will expose the metrics-collection server on the URL -`http://:8080/metrics` in all the STUNner media gateway instances of the current gateway hierarchy. - -```yaml -apiVersion: stunner.l7mp.io/v1alpha1 -kind: GatewayConfig -metadata: - name: stunner-gatewayconfig - namespace: stunner -spec: - userName: "my-user" - password: "my-password" - metricsEndpoint: "http://:8080/metrics" -``` +Metrics collection is *not* enabled by default. To enable it, set the `enableMetricsEndpoint` field to true in the [Dataplane](GATEWAY.md#dataplane) template. This will configure the `stunnerd` dataplane pods to expose a HTTP metrics endpoint at port 8080 that Prometheus can scrape for metrics. ## Metrics -STUNner exports two types of metrics: the *Go collector metrics* describe the state of the Go -runtime, while the *Connection statistics* expose traffic monitoring data. +STUNner exports two types of metrics: the *Go collector metrics* describe the state of the Go runtime, while the *Connection statistics* expose traffic monitoring data. ### Go collector metrics -Each STUNner gateway instance exports a number of standard metrics that describe the state of the -current Go process runtime. Some notable metrics as listed below, see more in the -[documentation](https://github.com/prometheus/client_golang). +Each STUNner gateway instance exports a number of standard metrics that describe the state of the current Go process. Some notable metrics as listed below, see more in the [documentation](https://github.com/prometheus/client_golang). | Metric | Description | | :--- | :--- | @@ -50,8 +25,7 @@ current Go process runtime. Some notable metrics as listed below, see more in th ### Connection statistics -STUNner provides deep visibility into the amount of traffic sent and received on each listener -(downstream connections) and cluster (upstream connections). The particular metrics are as follows. +STUNner provides deep visibility into the amount of traffic sent and received on each listener (downstream connections) and cluster (upstream connections). The particular metrics are as follows. | Metric | Description | Type | Labels | | :--- | :--- | :--- | :--- | @@ -59,18 +33,16 @@ STUNner provides deep visibility into the amount of traffic sent and received on | `stunner_listener_connections_total` | Number of downstream connections at a listener. | counter | `name=` | | `stunner_listener_packets_total` | Number of datagrams sent or received at a listener. Unreliable for listeners running on a connection-oriented transport protocol (TCP/TLS). | counter | `direction=`, `name=`| | `stunner_listener_bytes_total` | Number of bytes sent or received at a listener. | counter | `direction=`, `name=` | -| `stunner_cluster_connections` | Number of *active* upstream connections on behalf of a listener. | gauge | `name=` | -| `stunner_cluster_connections_total` | Number of upstream connections on behalf of a listener. | counter | `name=` | -| `stunner_cluster_packets_total` | Number of datagrams sent to backends or received from backends on behalf of a listener. Unreliable for clusters running on a connection-oriented transport protocol (TCP/TLS).| counter | `direction=`, `name=` | -| `stunner_cluster_bytes_total` | Number of bytes sent to backends or received from backends on behalf of a listener. | counter | `direction=`, `name=` | +| `stunner_cluster_packets_total` | Number of datagrams sent to backends or received from backends of a cluster. Unreliable for clusters running on a connection-oriented transport protocol (TCP/TLS).| counter | `direction=`, `name=` | +| `stunner_cluster_bytes_total` | Number of bytes sent to backends or received from backends of a cluster. | counter | `direction=`, `name=` | ## Integration with Prometheus and Grafana -Collection and visualization of STUNner relies on Prometheus and Grafana services. The STUNer helm repository provides a way to [install](#installation) a ready-to-use Prometheus and Grafana stack. In addition, metrics visualization requires [user input](#configuration-and-usage) on configuring the plots; see below. +Collection and visualization of STUNner relies on Prometheus and Grafana services. The STUNer helm repository provides a way to [install](https://github.com/l7mp/stunner-helm#monitoring) a ready-to-use Prometheus and Grafana stack. In addition, metrics visualization requires [user input](#configuration) on configuring the plots; see below. ### Installation -A full-fledged Prometheus+Grafana helm chart is available in the STUNner helm repo. To use this chart, the installation steps involve enabling monitoring in STUNner, and installing the Prometheus+Grafana stack with helm. +A full-fledged Prometheus+Grafana helm chart is available in the [STUNner helm repo](https://github.com/l7mp/stunner-helm#monitoring). To use this chart, the installation steps involve enabling monitoring in STUNner, and installing the Prometheus+Grafana stack with helm. 1. Install STUNner with Prometheus support: @@ -78,7 +50,7 @@ A full-fledged Prometheus+Grafana helm chart is available in the STUNner helm re helm install stunner stunner/stunner --create-namespace --namespace=stunner --set stunner.deployment.monitoring.enabled=true ``` -2. Configure STUNner to expose the metrics by [exposing the STUNner metrics-collection server in the GatewayConfig](#configuration). +2. Configure STUNner to expose the metrics. 3. Install the Prometheus+Grafana stack with a helm chart. @@ -87,23 +59,14 @@ A full-fledged Prometheus+Grafana helm chart is available in the STUNner helm re ```console helm repo add stunner https://l7mp.io/stunner helm repo update - helm install prometheus stunner/stunner-prometheus ``` -### Configuration and Usage +### Configuration The helm chart deploys a ready-to-use Prometheus and Grafana stack, but leaves the Grafana dashboard empty to let the user pick metrics and configure their visualization. An interactive way to visualize STUNner metrics is to use the Grafana dashboard. -#### Access the Grafana dashboard - -To open the Grafana dashboard navigate a web browser to `grafana` NodePort service IP and port 80. - -The default username is **admin** with the password **admin**. - -At the first login you can change the password or leave as it is (use the *Skip* button). - -#### Visualize STUNner metrics +To open the Grafana dashboard navigate a web browser to `grafana` NodePort service IP and port 80. The default username is **admin** with the password **admin**. At the first login you can change the password or leave as it is (use the *Skip* button). As an example, let us plot the STUNner metric `stunner_listener_connections`. First step is to create a new panel, then to configure the plot parameters. @@ -131,10 +94,7 @@ Below is an example dashboard with data collected from the [simple-tunnel](examp Prometheus and Grafana both provide a dashboard to troubleshoot a running system, and to check the flow of metrics from STUNner to Prometheus, and from Prometheus to Grafana. -### Check Prometheus operations via its dashboard -The Prometheus dashboard is available as the `prometheus` NodePort service (use the node IP and node port to connect with a web browser). - -The dashboard enables checking running Prometheus configuration and testing the metrics collection. +The Prometheus dashboard is available as the `prometheus` NodePort service (use the node IP and node port to connect with a web browser). The dashboard enables checking running Prometheus configuration and testing the metrics collection. For example, to observe the `stunner_listener_connections` metric on the Prometheus dashboard: @@ -144,9 +104,7 @@ For example, to observe the `stunner_listener_connections` metric on the Prometh ![Prometheus Dashboard](img/prometheus-dashboard.png) -Note: some STUNner metrics are not available when they are inactive (e.g., there is no active cluster). - -#### Check Prometheus data source in Grafana +Note that some STUNner metrics may not be available when they are inactive (e.g., there is no active cluster). To configure/check the Prometheus data source in Grafana, first click on *Configuration* (1), then *Data sources* (2), as shown here: diff --git a/docs/OBSOLETE.md b/docs/OBSOLETE.md deleted file mode 100644 index 6beaa545..00000000 --- a/docs/OBSOLETE.md +++ /dev/null @@ -1,501 +0,0 @@ -# Standalone mode - -In order to gain full control over media ingestion, STUNner can be deployed without the gateway -operator component. In this standalone mode, the user is fully in charge of creating and -maintaining the configuration of the `stunnerd` pods. With the introduction of the STUNner gateway -operator *the standalone mode is considered obsolete* as of STUNner v0.11. The below documentation -is provided only for historical reference; before the gateway operator existed this was *the* -recommended way to interact with STUNner. - -## Table of contents - -- [Standalone mode](#standalone-mode) - - [Table of contents](#table-of-contents) - - [Prerequisites](#prerequisites) - - [Installation](#installation) - - [Installation with Helm](#installation-with-helm) - - [Manual installation](#manual-installation) - - [Configuration](#configuration) - - [Learning the external IP and port](#learning-the-external-ip-and-port) - - [Configuring WebRTC clients](#configuring-webrtc-clients) - - [Authentication](#authentication) - - [Access control](#access-control) - - [Enabling TURN transport over TCP](#enabling-turn-transport-over-tcp) - - [Enabling TURN transport over TLS and DTLS](#enabling-turn-transport-over-tls-and-dtls) - -## Prerequisites - -The below installation instructions require an operational cluster running a supported version of -Kubernetes (>1.22). Make sure that the cluster comes with a functional [load-balancer -integration](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer), -otherwise STUNner will not be able to allocate a public IP address for clients to reach your WebRTC -infra. In the standalone mode STUNner relies on Kubernetes ACLs (`NetworkPolicy`) with [port -ranges](https://kubernetes.io/docs/concepts/services-networking/network-policies/#targeting-a-range-of-ports) -to block malicious access; make sure your Kubernetes installation supports these. - -## Installation - -### Installation with Helm - -Use the [Helm charts](https://github.com/l7mp/stunner-helm) for installing STUNner, setting the -`standalone.enabled` feature gate to `true`: - -```console -helm repo add stunner https://l7mp.io/stunner -helm repo update -helm install stunner stunner/stunner --set stunner.standalone.enabled=true -``` - -The below will create a new namespace named `stunner` and install the STUNner dataplane pods into that -namespace. - -```console -helm install stunner stunner/stunner --set stunner.standalone.enabled=true --create-namespace --namespace=stunner -``` - -Note that we do not install the usual control plane: in this mode we ourselves need to manually -provide the dataplane configuration for STUNner. - -### Manual installation - -If Helm is not an option, you can perform a manual installation using the static Kubernetes -manifests packaged with STUNner. - -First, clone the STUNner repository. - -```console -git clone https://github.com/l7mp/stunner.git -cd stunner -``` - -Then, customize the default settings in the STUNner service -[manifest](https://github.com/l7mp/stunner/blob/main/deploy/manifests/stunner-standalone.yaml) and deploy it via `kubectl`. - -```console -kubectl apply -f deploy/manifests/stunner-standalone.yaml -``` - -By default, all resources are created in the `default` namespace. - -## Configuration - -The default STUNner installation will create the below Kubernetes resources: - -1. a ConfigMap that stores STUNner local configuration, -2. a Deployment running one or more STUNner daemon replicas, -3. a LoadBalancer service to expose the STUNner deployment on a public IP address and UDP port - (by default, the port is UDP 3478), and finally -4. a NetworkPolicy, i.e., an ACL/firewall policy to control network communication from STUNner to - the rest of the Kubernetes workload. - -The installation scripts packaged with STUNner will use hard-coded configuration defaults that must -be customized prior to deployment. In particular, make absolutely sure to customize the access -tokens (`STUNNER_USERNAME` and `STUNNER_PASSWORD` for `plaintext` authentication, and -`STUNNER_SHARED_SECRET` and possibly `STUNNER_DURATION` for the `longterm` authentication mode), -otherwise STUNner will use hard-coded STUN/TURN credentials. This should not pose a major security -risk (see [here](SECURITY.md) for more info), but it is still safer to customize the access -tokens before exposing STUNner to the Internet. - -The most recent STUNner configuration is always available in the Kubernetes ConfigMap named -`stunnerd-config`. This configuration is made available to the `stunnerd` pods by -[mapping](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#define-container-environment-variables-using-configmap-data) -the `stunnerd-config` ConfigMap into the pods as environment variables. Note that changes to this -ConfigMap will take effect only once STUNner is restarted. - -The most important STUNner configuration settings are as follows. -* `STUNNER_PUBLIC_ADDR` (no default): The public IP address clients can use to reach STUNner. By - default, the public IP address will be dynamically assigned during installation. The installation - scripts take care of querying the external IP address from Kubernetes and automatically setting - `STUNNER_PUBLIC_ADDR`; for manual installation the external IP must be set by hand (see - [details](#learning-the-external-ip-and-port) below). -* `STUNNER_PUBLIC_PORT` (default: 3478): The public port used by clients to reach STUNner. Note - that the Helm installation scripts may overwrite this configuration if the installation falls - back to the `NodePort` service (i.e., when STUNner fails to obtain an external IP from the - Kubernetes ingress load balancer), see [details](#learning-the-external-ip-and-port) below. -* `STUNNER_PORT` (default: 3478): The internal port used by STUNner for communication inside the - cluster. It is safe to set this to the public port. -* `STUNNER_TRANSPORT_UDP_ENABLE` (default: "1", enabled): Enable UDP TURN transport. -* `STUNNER_TRANSPORT_TCP_ENABLE` (default: "", disabled): Enable TCP TURN transport. -* `STUNNER_REALM` (default: `stunner.l7mp.io`): the - [`REALM`](https://www.rfc-editor.org/rfc/rfc8489.html#section-14.9) used to guide the user agent - in authenticating with STUNner. -* `STUNNER_AUTH_TYPE` (default: `plaintext`): the STUN/TURN authentication mode, either `plaintext` - using the username/password pair `$STUNNER_USERNAME`/`$STUNNER_PASSWORD`, or `longterm`, using - the [STUN/TURN long-term credential](https://www.rfc-editor.org/rfc/rfc8489.html#section-9.2) - mechanism with the secret `$STUNNER_SHARED_SECRET`. -* `STUNNER_USERNAME` (default: `user`): the - [username](https://www.rfc-editor.org/rfc/rfc8489.html#section-14.3) attribute clients can use to - authenticate with STUNner over `plaintext` authentication. Make sure to customize! -* `STUNNER_PASSWORD` (default: `pass`): the password clients can use to authenticate with STUNner - in `plaintext` authentication. Make sure to customize! -* `STUNNER_SHARED_SECRET`: the shared secret used for `longterm` authentication mode. Make sure to - customize! -* `STUNNER_DURATION` (default: `86400` sec, i.e., one day): the lifetime of STUNner credentials in - `longterm` authentication. -* `STUNNER_LOGLEVEL` (default: `all:WARN`): the default log level used by the STUNner daemons. -* `STUNNER_MIN_PORT` (default: 10000): smallest relay transport port assigned by STUNner. -* `STUNNER_MAX_PORT` (default: 20000): highest relay transport port assigned by STUNner. - -The default configuration can be overridden by setting custom command line arguments when -[launching the STUNner pods](cmd/stunnerd.md). All examples below assume that STUNner is -deployed into the `default` namespace; see the installation notes below on how to override this. - -Note that changing in the configuration values becomes valid only once STUNner is restarted (see -below). - -## Learning the external IP and port - -There are two ways to expose the STUN/TURN ingress gateway service with STUNner: through a standard -Kubernetes [`LoadBalancer` -service](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) (the -default) or as a [`NodePort` -service](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport), used as a -fallback if an ingress load-balancer is not available. In both cases the external IP address and -port that WebRTC clients can use to reach STUNner may be set dynamically by Kubernetes. (Kubernetes -lets you use your own [fix IP address and domain -name](https://kubernetes.io/docs/concepts/services-networking/service/#choosing-your-own-ip-address), -but the default installation scripts do not support this.) - -In general, WebRTC clients will need to learn STUNner's external IP and port somehow. In order to -simplify the integration of STUNner into the WebRTC application server, STUNner stores the dynamic -IP address/port assigned by Kubernetes into the `stunnerd-config` ConfigMap under the key -`STUNNER_PUBLIC_IP` and `STUNNER_PUBLIC_PORT`. Then, WebRTC application pods can map this ConfigMap -as environment variables and communicate the IP address and port back to the clients (see an -[example](#configuring-webrtc-clients) below). - -The [Helm installation](#helm) scripts should take care of setting the IP address and port -automatically in the ConfigMap during installation. However, if later the LoadBalancer services -change for some reason then the new external IP address and port will need to be configured -manually in the ConfigMap. Similar is the case when using the static Kubernetes manifests to deploy -STUNner. The below instructions simplify this process. - -After a successful installation, you should see something similar to the below: - -```console -kubectl get all -NAME READY STATUS RESTARTS AGE -pod/stunner-XXXXXXXXXX-YYYYY 1/1 Running 0 8s - -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -service/kubernetes ClusterIP 10.72.128.1 443/TCP 6d4h -service/stunner ClusterIP 10.72.130.61 3478/UDP 81s -service/stunner-standalone-lb LoadBalancer 10.72.128.166 A.B.C.D 3478:30630/UDP 81s - -NAME READY UP-TO-DATE AVAILABLE AGE -deployment.apps/stunner 1/1 1 1 8s -``` - -Note the external IP address allocated by Kubernetes for the `stunner-standalone-lb` service -(`EXTERNAL-IP` marked with a placeholder `A.B.C.D` in the above): this will be the public STUN/TURN -access point that your WebRTC clients will need to use in order to access the WebRTC media service -via STUNner. - -Wait until Kubernetes assigns a valid external IP to STUNner and query the public IP address and -port used by STUNner from Kubernetes. - -```console -until [ -n "$(kubectl get svc stunner-standalone-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" ]; do sleep 1; done -export STUNNER_PUBLIC_ADDR=$(kubectl get svc stunner-standalone-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -export STUNNER_PUBLIC_PORT=$(kubectl get svc stunner-standalone-lb -o jsonpath='{.spec.ports[0].port}') -``` - -If this hangs for minutes, then your Kubernetes load-balancer integration is not working (if using -[Minikube](https://github.com/kubernetes/minikube), make sure `minikube tunnel` is -[running](https://minikube.sigs.k8s.io/docs/handbook/accessing)). This may still allow STUNner to -be reached externally, using a Kubernetes `NodePort` service (provided that your [Kubernetes -supports -NodePorts](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#no_direct_external_inbound_connections_for_private_clusters)). In -this case, but only in this case!, set the IP address and port from the NodePort: - -```console -export STUNNER_PUBLIC_ADDR=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}') -export STUNNER_PUBLIC_PORT=$(kubectl get svc stunner-standalone-lb -o jsonpath='{.spec.ports[0].nodePort}') -``` - -Check that the IP address/port `${STUNNER_PUBLIC_ADDR}:${STUNNER_PUBLIC_PORT}` is reachable by your -WebRTC clients; some Kubernetes clusters are installed with private node IP addresses that may -prevent NodePort services to be reachable from the Internet. - -If all goes well, the STUNner service is now exposed on the IP address `$STUNNER_PUBLIC_ADDR` and -UDP port `$STUNNER_PUBLIC_PORT`. Finally, store the public IP address and port back into STUNner's -configuration, so that the WebRTC application server can learn this information and forward it to -the clients. - -```console -kubectl patch configmap/stunnerd-config --type merge \ - -p "{\"data\":{\"STUNNER_PUBLIC_ADDR\":\"${STUNNER_PUBLIC_ADDR}\",\"STUNNER_PUBLIC_PORT\":\"${STUNNER_PUBLIC_PORT}\"}}" -``` - -## Configuring WebRTC clients - -The last step is to configure your WebRTC clients to use STUNner as the TURN server. The below -JavaScript snippet will direct WebRTC clients to use STUNner; make sure to substitute the -placeholders (like ``) with the correct configuration from the above. - -```javascript -var ICE_config = { - 'iceServers': [ - { - 'url': "turn::?transport=udp', - 'username': , - 'credential': , - }, - ], -}; -var pc = new RTCPeerConnection(ICE_config); -``` - -## Authentication - -STUNner relies on the STUN [long-term credential -mechanism](https://www.rfc-editor.org/rfc/rfc8489.html#page-26) to provide user authentication. See -[here](AUTH.md) for more detail on STUNner's authentication modes. - -The below commands will configure STUNner to use `plaintext` authentication using the -username/password pair `my-user/my-password` and restart STUNner for the new configuration to take -effect. - -```console -kubectl patch configmap/stunnerd-config --type merge \ - -p "{\"data\":{\"STUNNER_AUTH_TYPE\":\"plaintext\",\"STUNNER_USERNAME\":\"my-user\",\"STUNNER_PASSWORD\":\"my-password\"}}" -kubectl rollout restart deployment/stunner -``` - -The below commands will configure STUNner to use `longterm` authentication mode, using the shared -secret `my-secret`. By default, STUNner credentials are valid for one day. - -```console -kubectl patch configmap/stunnerd-config --type merge \ - -p "{\"data\":{\"STUNNER_AUTH_TYPE\":\"longterm\",\"STUNNER_SHARED_SECRET\":\"my-secret\"}}" -kubectl rollout restart deployment/stunner -``` - -## Access control - -The security risks and best practices associated with STUNner are described -[here](SECURITY.md), below we summarize the only step that is specific to the standalone mode: -configuring access control. - -By default, a standalone STUNner installation comes with an open route: this essentially means -that, possessing a valid TURN credential, an attacker can reach *any* UDP service inside the -Kubernetes cluster via STUNner. This is because, without an operator, there is no control plane to -supply [endpoint-discovery -service](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/service_discovery#endpoint-discovery-service-eds) -for the dataplane and therefore `stunnerd` does not know whether the peer address a client wished -to reach belongs to the legitimate backend service or not. In order to prevent open access through -STUNner, the default standalone installation comes with a default-deny Kubernetes NetworkPolicy -that locks down *all* access from the STUNner pods to the rest of the workload. - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: stunner-network-policy -spec: - podSelector: - matchLabels: - app: stunner - policyTypes: - - Egress -``` - -In order for clients to reach a media server pod via STUNner the user must explicitly whitelist the -target service in this access control rule. Suppose that we want STUNner to reach the media server -pods labeled as `app=media-server` over the UDP port range `[10000:20000]`, but we don't want -connections via STUNner to succeed to any other pod. This will be enough to support WebRTC media, -but will not allow clients to, e.g., reach the Kubernetes DNS service. - -Assuming that the entire workload is deployed into the `default` namespace, the below -`NetworkPolicy` ensures that all access from any STUNner pod to any media server pod is allowed -over any UDP port between 10000 and 20000, and all other network access from STUNner is denied. - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: stunner-network-policy -spec: -# Choose the STUNner pods as source - podSelector: - matchLabels: - app: stunner - policyTypes: - - Egress - egress: - # Allow only this rule, everything else is denied - - to: - # Choose the media server pods as destination - - podSelector: - matchLabels: - app: media-server - ports: - # Only UDP ports 10000-20000 are allowed between - # the source-destination pairs - - protocol: UDP - port: 10000 - endPort: 20000 -``` - -If your Kubernetes CNIs does not support [network policies with port -ranges](https://kubernetes.io/docs/concepts/services-networking/network-policies/#targeting-a-range-of-ports), -then the below will provide an access control rule similar to the above, except that it opens up -*all* UDP ports on the media server instead of limiting access to the UDP port range -`[10000:20000]`. - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: stunner-network-policy -spec: - podSelector: - matchLabels: - app: stunner - policyTypes: - - Egress - egress: - - to: - - podSelector: - matchLabels: - app: media-server - ports: - - protocol: UDP -``` - -## Enabling TURN transport over TCP - -Some corporate firewalls block all UDP access from the private network, except DNS. To make sure -that clients can still reach STUNner, you can expose STUNner over a [TCP-based TURN -transport](https://www.rfc-editor.org/rfc/rfc6062). To maximize the chances of getting through a -zealous firewall, below we expose STUNner over the default HTTPS port 443. - -First, enable TURN transport over TCP in STUNner. - -```console -kubectl patch configmap/stunnerd-config --type merge -p "{\"data\":{\"STUNNER_TRANSPORT_TCP_ENABLE\":\"1\"}}" -``` - -Then, delete the default Kubernetes service that exposes STUNner over UDP and re-expose it over the -TCP port 443. -```console -kubectl delete service stunner-standalone-lb -kubectl expose deployment stunner-standalone-lb --protocol=TCP --port=443 --type=LoadBalancer -``` - -Wait until Kubernetes assigns a public IP address. -```console -until [ -n "$(kubectl get svc stunner-standalone-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" ]; do sleep 1; done -export STUNNER_PUBLIC_ADDR=$(kubectl get svc stunner-standalone-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -export STUNNER_PUBLIC_PORT=$(kubectl get svc stunner-standalone-lb -o jsonpath='{.spec.ports[0].port}') -kubectl patch configmap/stunnerd-config --type merge \ - -p "{\"data\":{\"STUNNER_PUBLIC_ADDR\":\"${STUNNER_PUBLIC_ADDR}\",\"STUNNER_PUBLIC_PORT\":\"${STUNNER_PUBLIC_PORT}\"}}" -``` - -Restart STUNner with the new configuration. -```console -kubectl rollout restart deployment/stunner -``` - -Finally, direct your clients to the re-exposed STUNner TCP service with the below `PeerConnection` configuration; don't -forget to rewrite the TURN transport to TCP by adding the query `transport=tcp` to the -STUNner URI. -```javascript -var ICE_config = { - 'iceServers': [ - { - 'url': "turn::?transport=tcp", - 'username': , - 'credential': , - }, - ], -}; -var pc = new RTCPeerConnection(ICE_config); -``` - -## Enabling TURN transport over TLS and DTLS - -The ultimate tool to work around aggressive firewalls and middleboxes is exposing STUNner via TLS -and/or DTLS. Fixing the TLS listener port at 443 will make it impossible for the corporate firewall -to block TURN/TLS connections without blocking all external HTTPS access, so most probably at least -the TCP/443 port will be open to encrypted connections. - -Start with a fresh Kubernetes install. Below we create a self-signed certificate for testing; make -sure to replace the cert/key pair below with your own trusted credentials. - -```console -openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=example.domain.com" -kubectl create secret tls stunner-tls --key /tmp/tls.key --cert /tmp/tls.crt -``` - -Patch the TLS cert/key into the pre-configured static manifest and deploy the STUNner gateway. - -```console -cd stunner -cat deploy/manifests/stunner-standalone-tls.yaml.template | \ - perl -pe "s%XXXXXXX%`cat /tmp/tls.key | base64 -w 0`%g" | - perl -pe "s%YYYYYYY%`cat /tmp/tls.crt | base64 -w 0`%g" | - kubectl apply -f - -``` - -This will fire up STUNner with two TURN listeners, a TLS/TCP and a DTLS/UDP listener, both at port -443, and create two LoadBalancer services to expose these to clients. - -Wait until Kubernetes assigns a public IP address and learn the new public addresses. -```console -until [ -n "$(kubectl get svc stunner-tls -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" ]; do sleep 1; done -until [ -n "$(kubectl get svc stunner-dtls -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" ]; do sleep 1; done -export STUNNER_PUBLIC_ADDR_TLS=$(kubectl get svc stunner-tls -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -export STUNNER_PUBLIC_ADDR_DTLS=$(kubectl get svc stunner-dtls -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -``` - -Check your configuration with the handy [`turncat`](cmd/turncat.md) utility and the [UDP -greeter](https://github.com/l7mp/stunner#testing) service. First, query the UDP greeter service via TLS/TCP. Here, the -`turncat` command line argument `-i` puts `turncat` into insecure mode in order to accept our -self-signed TURN sever TLS certificate. - -```console -cd stunner -go build -o turncat cmd/turncat/main.go -kubectl apply -f deploy/manifests/udp-greeter.yaml -export PEER_IP=$(kubectl get svc media-plane -o jsonpath='{.spec.clusterIP}') -export STUNNER_USERNAME=$(kubectl get cm stunner-config -o yaml -o jsonpath='{.data.STUNNER_USERNAME}') -export STUNNER_PASSWORD=$(kubectl get cm stunner-config -o yaml -o jsonpath='{.data.STUNNER_PASSWORD}') -./turncat -i - turn://${STUNNER_USERNAME}:${STUNNER_PASSWORD}@${STUNNER_PUBLIC_ADDR_TLS}:443?transport=tls udp://${PEER_IP}:9001 -Hello STUNner via TLS -Greetings from STUNner! -``` - -Type anything once `turncat` is running to receive a nice greeting from STUNner. DTLS/UDP should -also work fine: - -```console -./turncat -i - turn://${STUNNER_USERNAME}:${STUNNER_PASSWORD}@${STUNNER_PUBLIC_ADDR_DTLS}:443?transport=dtls udp://${PEER_IP}:9001 -Another hello STUNner, now via DTLS! -Greetings from STUNner! -``` - -Remember, you can always direct your clients to your TURN listeners by setting the TURN URIs in the -ICE server configuration on your `PeerConnection`s. - -```javascript -var ICE_config = { - 'iceServers': [ - { - 'url': "turn::443?transport=tls", - 'username': , - 'credential': , - }, - { - 'url': "turn::443?transport=dtls", - 'username': , - 'credential': , - }, - ], -}; -var pc = new RTCPeerConnection(ICE_config); -``` - -Note that the default Kubernetes manifest -['stunner-standalone-tls.yaml'](https://github.com/l7mp/stunner/blob/main/deploy/manifests/stunner-standalone-tls.yaml.template) opens up the -NetworkPolicy for the `media-plane/default` service only, make sure to configure this to your own -setup. diff --git a/docs/README.md b/docs/README.md index 33448b4b..9832cce7 100644 --- a/docs/README.md +++ b/docs/README.md @@ -18,11 +18,11 @@ ## User guides -* [Gateway API reference](GATEWAY.md) * [Authentication](AUTH.md) * [Monitoring](MONITORING.md) * [Scaling](SCALING.md) * [Security](SECURITY.md) +* [Reference](GATEWAY.md) ## Tutorials @@ -49,7 +49,3 @@ * [`stunnerd` manual](cmd/stunnerd.md) * [`turncat` manual](cmd/turncat.md) * [`stunnerctl` manual](cmd/stunnerctl.md) - -## Obsolete features - -* [Standalone mode](OBSOLETE.md) diff --git a/docs/SCALING.md b/docs/SCALING.md index b615c501..98e9cfdd 100644 --- a/docs/SCALING.md +++ b/docs/SCALING.md @@ -1,85 +1,33 @@ # Scaling -[Autoscaling](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) is one of -the key features in Kubernetes. This means that Kubernetes will automatically increase the number -of pods that run a service as the demand for the service increases, and reduce the number of pods -when the demand drops. This improves service quality, simplifies management, and reduces -operational costs by avoiding the need to over-provision services to the peak load. Most -importantly, autoscaling saves you from having to guess the number of nodes or pods needed to run -your workload: Kubernetes will automatically and dynamically resize your workload based on demand. +[Autoscaling](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) is one of the key features in Kubernetes. This means that Kubernetes will automatically increase the number of pods that run a service as the demand for the service increases, and reduce the number of pods when the demand drops. This improves service quality, simplifies management, and reduces operational costs by avoiding the need to over-provision services to the peak load. Most importantly, autoscaling saves you from having to guess the number of nodes or pods needed to run your workload: Kubernetes will automatically and dynamically resize your workload based on demand. Further factors to autoscale your WebRTC workload are: - smaller load on each instance: this might result in better and more stable performance; - smaller blast radius: less calls will be affected if a pod fails for some reason. -Autoscaling a production service, especially one as sensitive to latency and performance as WebRTC, -can be challenging. This guide will provide the basics on autoscaling; see the [official -docs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) for more detail. +Autoscaling a production service, especially one as sensitive to latency and performance as WebRTC, can be challenging. This guide will provide the basics on autoscaling; see the [official docs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) for more detail. ## Horizontal scaling -It is a good practice to scale Kubernetes workloads -[horizontally](https://openmetal.io/docs/edu/openstack/horizontal-scaling-vs-vertical-scaling) -(that is, by adding or removing service pods) instead of vertically (that is, by migrating to a -more powerful server) when demand increases. Correspondingly it is a good advice to set the -[resource limits and -requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) to the -bare minimum and let Kubernetes to automatically scale out the service by adding more pods if -needed. Note that that HPA [uses the requested amount of -resources](https://pauldally.medium.com/horizontalpodautoscaler-uses-request-not-limit-to-determine-when-to-scale-97643d808997) -to determine when to scale-up or down the number of instances. - -STUNner comes with a full support for horizontal scaling using the the Kubernetes built-in -[HorizontalPodAutoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) -(HPA). The triggering event can be based on arbitrary metric, say, the [number of active client -connections](#MONITORING.md) per STUNner dataplane pod. Below we use the CPU utilization for -simplicity. - -Scaling STUNner *up* occurs by Kubernetes adding more pods to the STUNner dataplane deployment and -load-balancing client requests across the running pods. This should (theoretically) never interrupt -existing calls, but new calls should be automatically routed by the cloud load balancer to the new -endpoint(s). Automatic scale-up means that STUNner should never become the bottleneck in the -system. Note that in certain cases scaling STUNner up would require adding new Kubernetes nodes to -your cluster: most modern hosted Kubernetes services provide horizontal node autoscaling out of the -box to support this. - -Scaling STUNner *down*, however, is trickier. Intuitively, when a running STUNner dataplane pod is -terminated on scale-down, all affected clients with active TURN allocations on the terminating pod -would be disconnected. This would then require clients to go through an [ICE -restart](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/restartIce) to -re-connect, which may cause prolonged connection interruption and may not even be supported by all -browsers. - -In order to avoid client disconnects on scale-down, STUNner supports a feature called [graceful -shutdown](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace). This -means that `stunnerd` pods would refuse to terminate as long as there are active TURN allocations -on them, and automatically remove themselves only once all allocations are deleted or timed out. It -is important that *terminating* pods will not be counted by the HorizontalPodAutoscaler towards the -average CPU load, and hence would not affect autoscaling decisions. In addition, new TURN -allocation requests would never be routed by Kubernetes to terminating `stunnerd` pods. - -Graceful shutdown enables full support for scaling STUNner down without affecting active client -connections. As usual, however, some caveats apply: -1. Currently the max lifetime for `stunnerd` to remain alive is 1 hour after being deleted: this - means that `stunnerd` will remain active only for 1 hour after it has been deleted/scaled-down - even if active allocations would last longer. You can always set this by adjusting the - `terminationGracePeriod` on your `stunnerd` pods. -2. STUNner pods may remain alive well after the last client connection goes away. This occurs when - an TURN-UDP allocation is left open by a client (spontaneous UDP client-side connection closure - cannot be reliably detected by the server). As the default TURN refresh lifetime is [10 - minutes](https://www.rfc-editor.org/rfc/rfc8656#section-3.2-3), it may take 10 minutes until all - allocations time out, letting `stunnerd` to finally terminate. -3. If there are active (or very recent) TURN allocations then the `stunnerd` pod may refuse to be - removed after a `kubectl delete`. Use `kubectl delete pod --grace-period=0 --force stunner-XXX` - to force removal. +It is a good practice to scale your STUNner deployment [horizontally](https://openmetal.io/docs/edu/openstack/horizontal-scaling-vs-vertical-scaling) (that is, by adding or removing `stunnderd` pods) instead of vertically (that is, by increasing the resource limits of your pods) when demand increases. We advice to set the [resource limits and requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) of the `stunnerd` pods to the bare minimum (this can be set in the [Dataplane](GATEWAY.md#dataplane) template used for provisioning `stunnerd` pods) and let Kubernetes to automatically scale out the STUNner dataplane by adding more `stunnerd` pods if needed. Note that HPA uses the [requested amount of resources](https://pauldally.medium.com/horizontalpodautoscaler-uses-request-not-limit-to-determine-when-to-scale-97643d808997) to determine when to scale-up or down the number of instances. + +STUNner comes with a full support for horizontal scaling using the the Kubernetes built-in [HorizontalPodAutoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) (HPA). The triggering event can be based on arbitrary metric, say, the [number of active client connections](#MONITORING.md) per STUNner dataplane pod. Below we use the CPU utilization for simplicity. + +Scaling STUNner *up* occurs by Kubernetes adding more pods to the STUNner dataplane deployment and load-balancing client requests across the running pods. This should (theoretically) never interrupt existing calls, but new calls should be automatically routed by the cloud load balancer to the new endpoint(s). Automatic scale-up means that STUNner should never become the bottleneck in the system. Note that in certain cases scaling STUNner up would require adding new Kubernetes nodes to your cluster: most modern hosted Kubernetes services provide horizontal node autoscaling out of the box to support this. + +Scaling STUNner *down*, however, is trickier. Intuitively, when a running STUNner dataplane pod is terminated on scale-down, all affected clients with active TURN allocations on the terminating pod would be disconnected. This would then require clients to go through an [ICE restart](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/restartIce) to re-connect, which may cause prolonged connection interruption and may not even be supported by all browsers. + +In order to avoid client disconnects on scale-down, STUNner supports a feature called [graceful shutdown](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace). This means that `stunnerd` pods would refuse to terminate as long as there are active TURN allocations on them, and automatically remove themselves only once all allocations are deleted or timed out. It is important that *terminating* pods will not be counted by the HorizontalPodAutoscaler towards the average CPU load, and hence would not affect autoscaling decisions. In addition, new TURN allocation requests would never be routed by Kubernetes to terminating `stunnerd` pods. + +Graceful shutdown enables full support for scaling STUNner down without affecting active client connections. As usual, however, some caveats apply: +1. The default is to provision `stunnerd` pods with at most 2 CPU cores and 16 listener threads, both can be customized in the [Dataplane](GATEWAY.md#dataplane) template used to provision `stunnerd` pods. +2. Currently the max lifetime for `stunnerd` to remain alive is 1 hour after being deleted: this means that `stunnerd` will remain active only for 1 hour after it has been deleted/scaled-down even if active allocations would last longer. You can adjust the grace period in the `terminationGracePeriod` setting in the [Dataplane](GATEWAY.md#dataplane) template. +3. STUNner pods may remain alive well after the last client connection is gone. This occurs when an allocation is left open by a client (e.g., spontaneous UDP client-side connection closure cannot be reliably detected by the server). As the default TURN refresh lifetime is [10 minutes](https://www.rfc-editor.org/rfc/rfc8656#section-3.2-3) it may take 10 minutes until all allocations time out, letting `stunnerd` to finally terminate. In such cases `stunnerd` may refuse to stop after a `kubectl delete`. Use `kubectl delete pod --grace-period=0 --force stunner-XXX` to force removal. ### Example -Below is a simple -[HorizontalPodAutoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) -config for autoscaling `stunnerd`. The example assumes that the [Kubernetes metric -server](https://github.com/kubernetes-sigs/metrics-server#installation) is available in the -cluster. +Below is a simple [HorizontalPodAutoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) config for autoscaling `stunnerd`. The example assumes that the [Kubernetes metric server](https://github.com/kubernetes-sigs/metrics-server#installation) is available in the cluster. ```yaml apiVersion: autoscaling/v2 @@ -103,9 +51,7 @@ spec: averageUtilization: 300 ``` -Here, `scaleTargetRef` selects the STUNner Deployment named `stunnerd` as the scaling target and -the deployment will always run at least 1 pod and at most 10 pods. Understanding how Kubernetes -chooses the number of running pods is, however, a bit tricky. +Here, `scaleTargetRef` selects the STUNner Deployment named `stunnerd` as the scaling target and the deployment will always run at least 1 pod and at most 10 pods. Understanding how Kubernetes chooses the number of running pods is, however, a bit tricky. Suppose that the configured resources in the STUNner deployment are the following. @@ -119,12 +65,5 @@ resources: memory: 128Mi ``` -Suppose that, initially, there is only a single `stunnerd` pod in the cluster. As new calls come -in, CPU utilization is increasing. Scale out will be triggered when CPU usage of the `stunnerd` pod -reaches 1500 millicore CPU (three times the requested CPU). If more calls come and the total CPU -usage of the `stunnerd` pods reaches 3000 millicore, which amounts to 1500 millicore on average, -scale out would happen again. When users leave, load will drop and the total CPU utilization will -fall under 3000 millicore. At this point Kubernetes will automatically scale-in and remove one of -the `stunnerd` instances. Recall, this would never affect existing connections thanks to graceful -shutdown. +Initially, there is only a single `stunnerd` pod in the cluster. As new calls arrive, CPU utilization is increasing. Scale out will be triggered when CPU usage of the `stunnerd` pod reaches 1500 millicore CPU (three times the requested CPU). If more calls come and the total CPU usage of the `stunnerd` pods reaches 3000 millicore, which amounts to 1500 millicore on average, scale out would happen again. When users leave, load will drop and the total CPU utilization will fall under 3000 millicore. At this point Kubernetes will automatically scale-in and remove one of the `stunnerd` instances. Recall, this would never affect existing connections thanks to graceful shutdown. diff --git a/docs/SECURITY.md b/docs/SECURITY.md index 49ba8fb0..2131a6c7 100644 --- a/docs/SECURITY.md +++ b/docs/SECURITY.md @@ -1,30 +1,22 @@ # Security -Like any conventional gateway service, an improperly configured STUNner service may easily end up -exposing sensitive services to the Internet. The below security guidelines will allow to minimize -the risks associated with a misconfigured STUNner gateway service. +Like any conventional gateway service, an improperly configured STUNner service may easily end up exposing sensitive services to the Internet. The below security guidelines will allow to minimize the risks associated with a misconfigured STUNner gateway service. ## Threat -Before deploying STUNner, it is worth evaluating the potential [security -risks](https://www.rtcsec.com/article/slack-webrtc-turn-compromise-and-bug-bounty) a poorly -configured public STUN/TURN server poses. To demonstrate the risks, below we shall use the -[`turncat`](cmd/turncat.md) utility and `dig` to query the Kubernetes DNS service through a -misconfigured STUNner gateway. +Before deploying STUNner, it is worth evaluating the potential [security risks](https://www.rtcsec.com/article/slack-webrtc-turn-compromise-and-bug-bounty) a poorly configured public STUN/TURN server poses. To demonstrate the risks, below we shall use the [`turncat`](cmd/turncat.md) utility and `dig` to query the Kubernetes DNS service through a misconfigured STUNner gateway. -Start with a [fresh STUNner installation](INSTALL.md) into an empty namespace called `stunner` -and apply the below configuration. +Start with a [fresh STUNner installation](INSTALL.md) into an empty namespace called `stunner` and apply the below configuration. ```console cd stunner kubectl apply -f deploy/manifests/stunner-expose-kube-dns.yaml ``` -This will open a STUNner Gateway at port UDP:3478 and add a UDPRoute with the Kubernetes cluster -DNS service as the backend: +This will open a STUNner Gateway called `udp-gateway` at port UDP:3478 and add a UDPRoute with the Kubernetes cluster DNS service as the backend: ```yaml -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: stunner-udproute @@ -44,28 +36,21 @@ Learn the virtual IP address (`ClusterIP`) assigned by Kubernetes to the cluster export KUBE_DNS_IP=$(kubectl get svc -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[0].spec.clusterIP}') ``` -Build `turncat`, the Swiss-army-knife [testing tool](cmd/turncat.md) for STUNner, fire up a -UDP listener on `localhost:5000`, and forward all received packets to the cluster DNS service -through STUNner. +Build `turncat`, the Swiss-army-knife [testing tool](cmd/turncat.md) for STUNner, fire up a UDP listener on `localhost:5000`, and forward all received packets to the cluster DNS service through STUNner. ```console -./turncat --log=all:DEBUG udp://127.0.0.1:5000 k8s://stunner/stunnerd-config:udp-listener udp://${KUBE_DNS_IP}:53 +./turncat --log=all:DEBUG udp://127.0.0.1:5000 k8s://stunner/udp-gateway:udp-listener udp://${KUBE_DNS_IP}:53 ``` Now, in another terminal query the Kubernetes DNS service through the `turncat` tunnel. ```console -dig +short @127.0.0.1 -p 5000 stunner.default.svc.cluster.local +dig +short @127.0.0.1 -p 5000 kubernetes.default.svc.cluster.local ``` -You should see the internal Cluster IP address allocated by Kubernetes for the STUNner dataplane -service. Experiment with other FQDNs, like `kubernetes.default.svc.cluster.local`, etc.; the -Kubernetes cluster DNS service will readily return the the corresponding internal service IP -addresses. +You should see the internal Cluster IP address for the Kubernetes API server. -This little experiment demonstrates the threats associated with a poorly configured STUNner -gateway: it may allow external access to *any* UDP service running inside your cluster. The -prerequisites for this: +This little experiment should demonstrate the threats associated with a poorly configured STUNner gateway: it may allow external access to *any* UDP service running inside your cluster. The prerequisites for this: 1. the target service *must* run over UDP (e.g., `kube-dns`), 2. the target service *must* be wrapped with a UDPRoute @@ -76,7 +61,7 @@ Should any of these prerequisites fail, STUNner will block access to the target Now rewrite the backend service in the UDPRoute to an arbitrary non-existent service. ```yaml -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: stunner-udproute @@ -89,39 +74,23 @@ spec: - name: dummy ``` -Repeat the above `dig` command to query the Kubernetes DNS service again and observe how the query -times out. This demonstrates that a properly locked down STUNner installation blocks all access -outside of the backend services explicitly opened up via a UDPRoute. +Repeat the above `dig` command to query the Kubernetes DNS service again and observe how the query times out. This demonstrates that a properly locked down STUNner installation blocks all access outside of the backend services explicitly opened up via a UDPRoute. ## Locking down STUNner -Unless properly locked down, STUNner may be used maliciously to open a tunnel to any UDP service -running inside a Kubernetes cluster. Accordingly, it is critical to tightly control the pods and -services exposed via STUNner. +Unless properly locked down, STUNner may be used maliciously to open a tunnel to any UDP service running inside a Kubernetes cluster. Accordingly, it is critical to tightly control the pods and services exposed via STUNner. STUNner's basic security model is as follows: -> In a properly configured deployment, STUNner provides the same level of security as a media -server pool exposed to the Internet over a public IP address, protected by a firewall that admits -only UDP access. A malicious attacker, even possessing a valid TURN credential, can reach only the -media servers deployed behind STUNner, but no other services. +> In a properly configured deployment, STUNner provides the same level of security as a media server pool exposed to the Internet over a public IP address, protected by a firewall that admits only UDP access. A malicious attacker, even possessing a valid TURN credential, can reach only the media servers deployed behind STUNner, but no other services. -The below security considerations will greatly reduce this attack surface even further. In any -case, use STUNner at your own risk. +The below security considerations will greatly reduce this attack surface even further. In any case, use STUNner at your own risk. ## Authentication -By default, STUNner uses a single static username/password pair for all clients and the password is -available in plain text at the clients (`static` authentication mode). Anyone with access to the -static STUNner credentials can open a UDP tunnel via STUNner, provided that they know the private -IP address of the target service or pod and provided that a UDPRoute exists that specifies the -target service as a backend. This means that a service is exposed only if STUNner is explicitly -configured so. +By default, STUNner uses a single static username/password pair for all clients and the password is available in plain text at the clients (`static` authentication mode). Anyone with access to the static STUNner credentials can open a UDP tunnel via STUNner, provided that they know the private IP address of the target service or pod and provided that a UDPRoute exists that specifies the target service as a backend. This means that a service is exposed only if STUNner is explicitly configured so. -For more security sensitive workloads, we recommend the `ephemeral` authentication mode, which uses -per-client fixed lifetime username/password pairs. This makes it more difficult for attackers to -steal and reuse STUNner's TURN credentials. See the [authentication guide](AUTH.md) for configuring -STUNner with `ephemeral` authentication. +For production deployments we recommend the `ephemeral` authentication mode, which uses per-client fixed lifetime username/password pairs. This makes it more difficult for attackers to steal and reuse STUNner's TURN credentials. See the [authentication guide](AUTH.md) for configuring STUNner with `ephemeral` authentication. ## Access control @@ -130,7 +99,7 @@ a proper UDPRoute. For instance, the below UDPRoute allows access *only* to the service in the `media-plane` namespace, and nothing else. ```yaml -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: stunner-udproute @@ -141,12 +110,9 @@ spec: rules: - backendRefs: - name: media-server - - namespace: media-plane + namespace: media-plane ``` -> **Note** -To avoid potential misuse, STUNner disables open wildcard access to the entire cluster unless explicitly requested to do so by a [specifying an open StaticService as a backend](GATEWAY.md#staticservice). - For hardened deployments, it is possible to add a second level of isolation between STUNner and the rest of the workload using the Kubernetes NetworkPolicy facility. Creating a NetworkPolicy will essentially implement a firewall, blocking all access from the source to the target workload except the services explicitly whitelisted by the user. The below example allows access from STUNner to *any* media server pod labeled as `app=media-server` in the `default` namespace over the UDP port range `[10000:20000]`, but nothing else. ```yaml @@ -176,24 +142,12 @@ spec: endPort: 20000 ``` -Kubernetes network policies can be easily [tested](https://banzaicloud.com/blog/network-policy) -before exposing STUNner publicly; e.g., the [`turncat` utility](cmd/turncat.md) packaged with -STUNner can be used conveniently for this [purpose](examples/simple-tunnel/README.md). +Kubernetes network policies can be easily [tested](https://banzaicloud.com/blog/network-policy) before exposing STUNner publicly; e.g., the [`turncat` utility](cmd/turncat.md) packaged with STUNner can be used conveniently for this [purpose](examples/simple-tunnel/README.md). ## Exposing internal IP addresses -The trick in STUNner is that both the TURN relay transport address and the media server address are -internal pod IP addresses, and pods in Kubernetes are guaranteed to be able to connect -[directly](https://sookocheff.com/post/kubernetes/understanding-kubernetes-networking-model/#kubernetes-networking-model), -without the involvement of a NAT. This makes it possible to host the entire WebRTC infrastructure -over the private internal pod network and still allow external clients to make connections to the -media servers via STUNner. At the same time, this also has the bitter consequence that internal IP -addresses are now exposed to the WebRTC clients in ICE candidates. - -The threat model is that, possessing the correct credentials, an attacker can scan the *private* IP -address of all STUNner pods and all media server pods. This should not pose a major security risk -though: remember, none of these private IP addresses can be reached externally. The attack surface -can be further reduced to the STUNner pods' private IP addresses by using the [symmetric ICE -mode](DEPLOYMENT.md#symmetric-ice-mode). +The trick in STUNner is that both the TURN relay transport address and the media server address are internal pod IP addresses, and pods in Kubernetes are guaranteed to be able to connect [directly](https://sookocheff.com/post/kubernetes/understanding-kubernetes-networking-model/#kubernetes-networking-model) without the involvement of a NAT. This makes it possible to host the entire WebRTC infrastructure over the private internal pod network and still allow external clients to make connections to the media servers via STUNner. At the same time, this also has the bitter consequence that internal IP addresses are now exposed to the WebRTC clients in ICE candidates. + +The threat model is that, possessing the correct credentials, an attacker can scan the *private* IP address of all STUNner pods and all media server pods. This should pose no major security risk though: remember, none of these private IP addresses can be reached externally. The attack surface can be further reduced to the STUNner pods' private IP addresses by using the [symmetric ICE mode](DEPLOYMENT.md#symmetric-ice-mode). Nevertheless, if worried about information exposure then STUNner may not be the best option at the moment. In later releases, we plan to implement a feature to obscure the relay transport addresses returned by STUNner. Please file an issue if you think this limitation is a blocker for your use case. diff --git a/docs/WHY.md b/docs/WHY.md index ce632d1b..5a663018 100644 --- a/docs/WHY.md +++ b/docs/WHY.md @@ -9,43 +9,40 @@ used outside of this context (e.g., as a regular STUN/TURN server), but this is ## The problem -The main pain points STUNner is trying to solve are all related to that Kubernetes and WebRTC are +The pain points STUNner is trying to solve are all related to that Kubernetes and WebRTC are currently foes, not friends. Kubernetes has been designed and optimized for the typical HTTP/TCP Web workload, which makes streaming workloads, and especially UDP/RTP based WebRTC media, feel like a foreign citizen. Most importantly, Kubernetes runs the media server pods/containers over a private L3 network over a -private IP address and the network dataplane applies several rounds of Network Address Translation -(NAT) steps to ingest media traffic into this private pod network. Most cloud load-balancers apply -a DNAT step to route packets to a node and then an SNAT step to put the packet to the private pod +private IP address and the several rounds of Network Address Translation (NAT) steps are required +to ingest media traffic into this private pod network. Most cloud load-balancers apply a DNAT step +to route packets to a Kubernetes node and then an SNAT step to inject a packet into the private pod network, so that by the time a media packet reaches a pod essentially all header fields in the [IP 5-tuple](https://www.techopedia.com/definition/28190/5-tuple) are modified except the destination port. Then, if any pod sends the packet over to another pod via a Kubernetes service load-balancer then the packet will again undergo a DNAT step, and so on. -The *Kubernetes dataplane teems with NATs*. This is not a big deal for the usual HTTP/TCP web -protocols Kubernetes was designed for, since an HTTP/TCP session contains an HTTP header that fully -describes it. Once an HTTP/TCP session is accepted by a server it does not need to re-identify the -client per each received packet, because it has session context. - -This is not the case with the prominent WebRTC media protocol encapsulation though, RTP over -UDP. RTP does not have anything remotely similar to an HTTP header. Consequently, the only -"semi-stable" connection identifier WebRTC servers can use to identify a client is by expecting the -client's packets to arrive from a negotiated IP source address and source port. When the IP 5-tuple -changes, for instance because there is a NAT in the datapath, then WebRTC media connections -break. Due to reasons which are mostly historical at this point, *UDP/RTP connections do not -survive not even a single NAT step*, let alone the 2-3 rounds of NATs a packet regularly undergoes -in the Kubernetes dataplane. +The *Kubernetes dataplane teems with NATs*. This is not a big deal for the web protocols Kubernetes +was designed for, since each HTTP/TCP connection involves a session context that can be used by a +server to identify clients. This is not the case with WebRTC media protocol stack though, since +UDP/RTP connections do not involve anything remotely similar to an HTTP context. Consequently, the +only "semi-stable" connection identifier WebRTC servers can use to identify a client is by +expecting the client's packets to arrive from a negotiated IP source address and source port. When +the IP 5-tuple changes, for instance because there is a NAT in the datapath, then WebRTC media +connections break. Due to reasons which are mostly historical at this point, *UDP/RTP connections +do not survive not even a single NAT step*, let alone the 2-3 rounds of NATs a packet regularly +undergoes in the Kubernetes dataplane. ## The state-of-the-art The current stance is that the only way to deploy a WebRTC media server into Kubernetes is to exploit a [well-documented Kubernetes anti-pattern](https://kubernetes.io/docs/concepts/configuration/overview): *running the media -server pods in the host network namespace* (using the `hostNetwork=true` setting in the pod's -container template). This way the media server shares the network namespace of the host (i.e., the -Kubernetes node) it is running on, inheriting the public address (if any) of the host and -(hopefully) sidestepping the private pod network with the involved NATs. +server pods in the host network namespace* of Kubernetes nodes (using the `hostNetwork=true` +setting in the pod's container template). This way the media server shares the network namespace of +the host (i.e., the Kubernetes node) it is running on, inheriting the public address (if any) of +the host and (hopefully) sidestepping the private pod network with the involved NATs. There are *lots* of reasons why this deployment model is less than ideal: diff --git a/docs/examples/benchmark/performance-stunner.yaml b/docs/examples/benchmark/performance-stunner.yaml index 7f786748..bd6dc97b 100644 --- a/docs/examples/benchmark/performance-stunner.yaml +++ b/docs/examples/benchmark/performance-stunner.yaml @@ -1,4 +1,4 @@ -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: stunner-gatewayclass @@ -12,7 +12,7 @@ spec: description: "STUNner is a WebRTC ingress gateway for Kubernetes" --- -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: GatewayConfig metadata: name: stunner-gatewayconfig @@ -24,7 +24,7 @@ spec: password: "pass-1" --- -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: udp-gateway @@ -36,7 +36,7 @@ spec: port: 9001 protocol: UDP --- -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: iperf-server diff --git a/docs/examples/cloudretro/README.md b/docs/examples/cloudretro/README.md index 15130a57..e7e1b75b 100644 --- a/docs/examples/cloudretro/README.md +++ b/docs/examples/cloudretro/README.md @@ -106,7 +106,7 @@ can connect from behind even the most over-zealous enterprise NAT or firewall. ```console kubectl apply -f - < **Warning** +> [!WARNING] +> > In case of [managed mode](/docs/INSTALL.md), update the `neko-plane` UDPRoute by replacing `stunner` in backendRefs with the generated deployment, e.g., `udp-gateway`. This will expose STUNner on a public IP on UDP port 3478. A Kubernetes `LoadBalancer` assigns an diff --git a/docs/examples/neko/stunner.yaml b/docs/examples/neko/stunner.yaml index 98d57d13..d0db2009 100644 --- a/docs/examples/neko/stunner.yaml +++ b/docs/examples/neko/stunner.yaml @@ -1,4 +1,4 @@ -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: stunner-gatewayclass @@ -12,7 +12,7 @@ spec: description: "STUNner is a WebRTC ingress gateway for Kubernetes" --- -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: GatewayConfig metadata: name: stunner-gatewayconfig @@ -24,7 +24,7 @@ spec: password: "pass-1" --- -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: udp-gateway @@ -37,7 +37,7 @@ spec: protocol: TURN-UDP --- -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: neko-plane diff --git a/docs/examples/simple-tunnel/README.md b/docs/examples/simple-tunnel/README.md index 25337d15..f0b49adf 100644 --- a/docs/examples/simple-tunnel/README.md +++ b/docs/examples/simple-tunnel/README.md @@ -65,7 +65,7 @@ that the UDPRoute specifies the `iperf-server` service as the `backendRef`, whic STUNner will forward the client connections received in any of the Gateways to the iperf server. ```yaml -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: udp-gateway @@ -78,7 +78,7 @@ spec: protocol: TURN-UDP --- -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: tcp-gateway @@ -91,7 +91,7 @@ spec: protocol: TURN-TCP --- -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: iperf-server diff --git a/docs/examples/simple-tunnel/iperf-stunner.yaml b/docs/examples/simple-tunnel/iperf-stunner.yaml index d30ff989..f21e3ff7 100644 --- a/docs/examples/simple-tunnel/iperf-stunner.yaml +++ b/docs/examples/simple-tunnel/iperf-stunner.yaml @@ -1,4 +1,4 @@ -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: stunner-gatewayclass @@ -12,7 +12,7 @@ spec: description: "STUNner is a WebRTC ingress gateway for Kubernetes" --- -apiVersion: stunner.l7mp.io/v1alpha1 +apiVersion: stunner.l7mp.io/v1 kind: GatewayConfig metadata: name: stunner-gatewayconfig @@ -24,7 +24,7 @@ spec: password: "pass-1" --- -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: udp-gateway @@ -37,7 +37,7 @@ spec: protocol: TURN-UDP --- -apiVersion: gateway.networking.k8s.io/v1beta1 +apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: tcp-gateway @@ -50,7 +50,7 @@ spec: protocol: TURN-TCP --- -apiVersion: gateway.networking.k8s.io/v1alpha2 +apiVersion: stunner.l7mp.io/v1 kind: UDPRoute metadata: name: iperf-server diff --git a/docs/img/stunner_arch_big.svg b/docs/img/stunner_arch_big.svg index 9c4f54da..d659fa75 100644 --- a/docs/img/stunner_arch_big.svg +++ b/docs/img/stunner_arch_big.svg @@ -2,9 +2,9 @@ + + + + + + + + + + + + + transform="translate(-101.99988,-74.454498)"> + style="opacity:0.92;fill:#ffffff;stroke-width:0.264999;stroke-miterlimit:4;stroke-dasharray:none" + id="rect17965" + width="145.36844" + height="124.38023" + x="101.99988" + y="74.454498" /> + y="86.053436" /> GatewayClass + y="91.999321">GatewayClass + y="101.32552" /> GatewayConfig + y="107.27142">GatewayConfig + y="102.0098" /> Gateway + y="106.36816">Gateway + gw-ns/gw + + Deployment + + LB Service + + ConfigMap + y="118.40133" /> UDPRoute + y="124.34724">UDPRoute + y="134.04221" /> Service + y="139.98804">Service + y="134.11505" /> Service + y="140.06088">Service - STUNner - ConfigMap - - - stunnerd. - conf + x="119.60569" + y="157.10823" /> + y="118.31358" /> UDPRoute + y="124.25938">UDPRoute + y="134.02731" /> Service + y="139.97322">Service @@ -480,24 +543,31 @@ style="fill:none;stroke:#000000;stroke-width:0.2;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:50;stroke-dasharray:none;stroke-opacity:1;paint-order:fill markers stroke" id="rect11705" width="92.906738" - height="66.294807" + height="62.699524" x="145.10664" y="84.031242" /> + STUNner Gateway Hierarchy + x="145.85818" + y="82.809853">Gateway API + transform="matrix(0.7,0,0,0.7,27.506081,50.259234)" + style="opacity:0.92;stroke-width:1.42857"> stunnerd + x="161.94887" + y="160.68379">stunnerd + sodipodi:nodetypes="cc" /> + Render + x="126.29139" + y="101.20586">Control Map + x="134.42651" + y="88.119209">Watch Watch + x="105.49356" + y="173.49677">STUN/TURN + + UDP + + UDP + x="130.05907" + y="172.40094" /> + x="150.91322" + y="166.99174" /> Cluster + x="154.32851" + y="172.00801">Cluster Listener + x="133.41063" + y="177.0959">Listener + x="150.91328" + y="179.03008" /> Cluster + x="154.32864" + y="184.04634">Cluster + d="m 126.88437,161.82693 h 44.42234 a 3.0382178,4.4979167 0 0 1 3.03822,4.49792 v 19.75365 a 3.0382178,4.4979167 0 0 1 -3.03822,4.49792 h -44.42234 a 3.0382178,4.4979167 0 0 1 -3.03822,-4.49792 v -19.75365 a 3.0382178,4.4979167 0 0 1 3.03822,-4.49792 z" + style="opacity:0.92;fill:none;fill-opacity:1;stroke:#000000;stroke-width:0.285714;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" /> + gw-ns/gw + gw-ns/gw + gw-ns/gw