River is a reverse proxy application written in Rust, supported as a Prossimo Initiative, built on top of the Pingora engine from Cloudflare. River began development in Q1 2024, and is currently in an early developer preview state.
This announcement covers the v0.5.0 release of River, the second development release made public. This release was focused on initial development of core functionality necessary for evaluation of River for early experimental users. This release was led by OneVariable UG, and sponsored by Prossimo.
For installation instructions, see the release details on GitHub. This release includes
binaries for x86_64 Linux (GNU and MUSL builds), as well as a release for aarch64 MacOS
(M-series Macs). Binaries are provided, as well as an installation script. Release tooling
is provided by cargo-dist
.
Initially, it was intended to ship three releases for this spike of work, according to our published roadmap. However, as development of these features didn't end up being linear, we instead skipped releases v0.3.0 and v0.4.0 in order to ship these all of the included features in v0.5.0.
Release v0.5.0 is being made directly after v0.2.x releases to better communicate the level of readiness and to remain consistent with the roadmap.
The following are the notable features included in the v0.5.0 release. For a full list of changes, please refer to the release details on GitHub.
This second milestone was about extending core functionality of River, to move it closer to a state where users may begin experimentation and evaluation.
Primary objectives included:
- Development of "multiple upstream" features, including:
- Supporting Load Balancing of upstream servers
- Supporting Health Checks of upstream servers
- Supporting Service Discovery of upstream servers
- Developer and Operator Quality of Life features, including:
- Supporting basic static HTML file serving
- Supporting semi-dynamic observability endpoints, e.g. for Prometheus polling
- Support for hot-reloading of configuration
- CI for build and test checks on pull requests
- Development of initial Robustness features, including:
- Rate limiting of connections and/or requests
- CIDR/API range-based filtering for rejecting connections
The v0.2.0 release shipped with support for only a single upstream. This meant that it was not possible to perform "load balancing" responsibilities, proxying downstream requests to multiple upstream servers.
The v0.5.0 release has added numerous features that now allow for the ability to support multiple upstreams for a single service, laying the foundation necessary for properly supporting this use case.
Many of these features are based on capabilities from the pingora_load_balancing
crate.
River now supports a number of strategies for selecting an upstream server when multiple are provided. These currently include:
- Round Robin - the upstream servers are selected on a rotating basis, proxying an even number of requests to each upstream server
- Random - An upstream server is selected at random, proxying a statistically even number of requests to each upstream server (over a long enough operational time)
- FNV Hashing - FNV hashing uses the Fowler-Noll-Vo Hash Function, a simple and fast non-cryptographic hash on a component of the request, to select an upstream server
- Ketama Hashing - Ketama hashing is a consistent hash used in other services such as NGINX
and
memcached
. It is intended to reduce cache misses when upstream servers are added or removed
For the hashing based load balancing strategies, two data sources are currently supported:
- The URI path of the request
- The Source IP address and URI path of the request, combined
In the future, River intends to support additional Load Balancing Strategies, and additional data sources for hashing, allowing for greater user configuration and flexibility. Please feel free to open an issue if you have requests or suggestions for strategies or data sources you would like to see implemented!
For deployments where upstream servers may come and go, for example when scaling in response to load, it is necessary to support dynamically discovering what servers are available.
Infrastructure from pingora_load_balancing
was introduced during this spike, in order to
support this in the future. At the moment, only the "static" discovery strategy is supported,
which enabled the ability to specify multiple servers at configuration and startup time.
In the future, it is planned to support dynamically discovered servers, using techniques such as DNS Service Discovery. Please feel free to open an issue if you have requests or suggestions for service discovery strategies you would like to see implemented!
Full implementation for Service Discovery is currently planned on our published roadmap for the v0.7.x release.
As part of Service Discovery, it is necessary to monitor whether the current list of upstream
servers is capable of accepting proxied requests. The infrastructure for supporting these
capabilities were also introduced in this release, provided by the pingora_load_balancing
crate.
Health Checks include active observation, ensuring that we can continue to establish connections with potential upstream servers, as well as querying health check specific endpoints to allow upstream servers to report their current status.
Full implementation for Health Checks is currently planned on our published roadmap for the v0.7.x release.
Effort was spent in the v0.5.x release to make River more pleasant to develop and use for operators of River. These features are outside the typical proxying responsibilities of River, but assist in the usage and operation of River as an application.
Many reverse proxy applications also allow for serving of static files, such as images, HTML files, CSS files, or other non-dynamic content. Although River does not currently consider this a "core competency", it is a common expectation of users of reverse proxies such as NGINX.
In v0.5.0, River introduced the ability to serve local static files, which was made possible by integrating components from the Pandora Web Server project, developed by Wladimir Palant (@palant).
Users may select a base filesystem path, where all contents of the path will be available when requested.
Outside of the planned support for Service Discovery, River has generally made the choice that configuration should be static and not modifyable at runtime. However, it is often necessary to change configuration of a reverse proxy, without observable downtime from the perspective of downstream clients.
To support this, the v0.5.0 release of River has added the ability to hot reload River, meaning that a second instance of the application can be launched, which can take over existing services and listening sockets, meaning that after the switchover begins, all new connections will be served by the new instance, and existing connections will be given a grace period to complete before the connection is terminated.
For more information on how to perform a hot reload, please refer to the Hot Reloading Section of the River User Manual.
As this release introduced a number of new configuration options, it became apparent that specifying deeply structured configuration became overly cumbersome in the existing TOML configuration file format.
This implementation experimented with, and has now committed to supporting the KDL Document Language as the primary configuration language for River. Although KDL is not currently widely used, it allows for very expressive and intuitive structuring of data, and has a structure that is familiar to other configuration formats such as NGINX's configuration files.
As a direct comparison, here is an example of the previous TOML file format:
[system]
threads-per-service = 8
[[basic-proxy]]
name = "Example1"
[[basic-proxy.listeners]]
[basic-proxy.listeners.source]
kind = "Tcp"
[basic-proxy.listeners.source.value]
addr = "0.0.0.0:8080"
[[basic-proxy.listeners]]
[basic-proxy.listeners.source]
kind = "Tcp"
[basic-proxy.listeners.source.value]
addr = "0.0.0.0:4443"
[basic-proxy.listeners.source.value.tls]
cert_path = "./assets/test.crt"
key_path = "./assets/test.key"
[basic-proxy.connector]
proxy_addr = "91.107.223.4:443"
tls_sni = "onevariable.com"
[basic-proxy.path-control]
upstream-request-filters = [
{ kind = "remove-header-key-regex", pattern = ".*(secret|SECRET).*" },
{ kind = "upsert-header", key = "x-proxy-friend", value = "river" },
]
upstream-response-filters = [
{ kind = "remove-header-key-regex", pattern = ".*ETag.*" },
{ kind = "upsert-header", key = "x-with-love-from", value = "river" },
]
[[basic-proxy]]
name = "Example2"
listeners = [
{ source = { kind = "Tcp", value = { addr = "0.0.0.0:8000" } } }
]
connector = { proxy_addr = "91.107.223.4:80" }
Whereas in KDL, it looks like this:
system {
threads-per-service 8
}
services {
Example1 {
listeners {
"0.0.0.0:8080"
"0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key"
"0.0.0.0:8443" cert-path="./assets/test.crt" key-path="./assets/test.key"
}
connectors {
"91.107.223.4:443" tls-sni="onevariable.com"
}
path-control {
upstream-request {
filter kind="remove-header-key-regex" pattern=".*SECRET.*"
filter kind="remove-header-key-regex" pattern=".*secret.*"
filter kind="upsert-header" key="x-proxy-friend" value="river"
}
upstream-response {
filter kind="remove-header-key-regex" pattern=".*ETag.*"
filter kind="upsert-header" key="x-with-love-from" value="river"
}
}
}
Example2 {
listeners {
"0.0.0.0:8000"
}
connectors {
"91.107.223.4:80"
}
}
}
Additionally, work as been made to enable helpful diagnostics for the KDL file support, allowing for
helpful feedback when configuration mistakes are made. For example, if a user mis-spells health-check
as
health-cheque
, they will see an error that looks like this:
Error rendering config from KDL file: × Incorrect configuration contents
╭─[83:1]
83 │ discovery "Static"
84 │ health-cheque "None"
· ──────────┬─────────
· ╰── incorrect
85 │ }
╰────
help: Unknown setting: 'health-cheque'
At the moment, River will continue to support both TOML and KDL configuration, however many new features included in this release cannot be specified in the TOML format.
It is planned to deprecate TOML configuration file support in 0.6.x releases, leaving KDL as the only "human oriented" configuration language, though we may support other formats such as JSON in the future, to support machine-generated configuration inputs.
When sharing River development progress with people for feedback, it was realized that River was missing documentation oriented towards users and operators of River, rather than developers of River.
Documentation was added as part of the v0.5.0 release, published as the River User Manual. This includes specification of core concepts, instructions for operational concerns such as hot reloading, and a specification of configuration file formats.
The River User Manual is temporarily hosted on the OneVariable domain, but will likely move to a more permanent home in the future.
It was planned to support semi-dynamic pages for performance counters and logs in this release.
This feature was delayed until a further release.
Basic CI testing was added as part of this release. Formatting, build checks, unit tests, and runtime validation
of example configuration files are now all tested on every pull request and commit to the main
branch.
The 0.5.0 release also added support for "robustness" features, typically around refusal of unwanted requests.
These features are necessary when operating in adverse environments.
This release introduced the concept of rate limiting based on configurable settings, as well as selectable rules.
River now supports configuration of rate limiting, as in the following example:
// Apply Rate limiting to this service
//
// Note that ALL rules are applied, and a request must receive a token from all
// applicable rules.
//
// For example:
//
// A request to URI `/index.html` from IP 1.2.3.4 will only need to get a token from
// the `source-ip` rule.
//
// A request to URI `/static/style.css` from IP 1.2.3.4 will need to get a token from
// BOTH the `source-ip` rule (from the `1.2.3.4` bucket), AND the `specific-uri` rule
// (from the `/static/style.css` bucket)
rate-limiting {
// This rate limiting rule is based on the source IP address
//
// * Up to the last 4000 IP addresses will be remembered
// * Each IP address can make a burst of 10 requests
// * The bucket for each IP will refill at a rate of 1 request per 10 milliseconds
rule kind="source-ip" \
max-buckets=4000 tokens-per-bucket=10 refill-qty=1 refill-rate-ms=10
// This rate limiting is based on the specific URI path
//
// * Up to the last 2000 URI paths will be remembered
// * Each URI path can make a burst of 20 requests
// * The bucket for each URI will refill at a rate of 5 requests per 1 millisecond
rule kind="specific-uri" pattern="static/.*" \
max-buckets=2000 tokens-per-bucket=20 refill-qty=5 refill-rate-ms=1
// This rate limiting is based on ANY URI paths that match the pattern
//
// * A single bucket will be used for all URIs that match the pattern
// * We allow a burst of up to 50 requests for any MP4 files
// * The bucket for all MP4 files will refill at a rate of 2 requests per 3 milliseconds
rule kind="any-matching-uri" pattern=r".*\.mp4" \
tokens-per-bucket=50 refill-qty=2 refill-rate-ms=3
}
For more details regarding implementation details and capabilities, refer to the rate limiting section of the River User Manual.
River now supports rejection of requests from entire CIDR ranges, specified using slash
notation, such as 192.168.0.0/16
. This has been implemented as a path-control
filter
during the early request-filter
stage, meaning that these requests can be rejected
extremely early in the request handling stage.
These may be specified as in this example:
path-control {
request-filters {
filter kind="block-cidr-range" addrs="192.168.0.0/16, 10.0.0.0/8, 2001:0db8::0/32"
}
// ...
}
We had one more miscellaneous feature that landed in this release that is worth mentioning:
Previously, we did not expose a way to specify whether HTTP1.0 or HTTP2.0 would be used for either downstream or upstream connections. This meant that all requests were only handled as HTTP1.0 connections.
Downstream listeners can now be configured to offer HTTP2 support if clients would like to upgrade:
listeners {
"0.0.0.0:8080"
"0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key" offer-h2=true
}
Additionally, upstream connections can be set with any of the following options:
h1-only
: Only HTTP1.0 will be used to connecth2-only
: Only HTTP2.0 will be used to connecth2-or-h1
: HTTP2.0 will be preferred, with fallback to HTTP1.0
For example:
connectors {
"91.107.223.4:443" tls-sni="onevariable.com" proto="h2-or-h1"
}
We'd like to thank @branlwyd for their first contribution to River in this release, as well for their role in advising during these efforts.