Finalize remaining TODOs, add Development Practices

memorysafety · Feb 22, 2024 · 43a8162 · 43a8162
1 parent 6682cb3
commit 43a8162
Show file tree

Hide file tree

Showing 2 changed files with 184 additions and 20 deletions.
diff --git a/docs/what-is-it.md b/docs/what-is-it.md
@@ -123,13 +123,12 @@ messages from clients.
 2. River MUST support a configurable timeouts for:
     1. Connections
     2. Requests
+    3. Successful health checks
 3. River MUST support pooling of connections, including:
     1. Reuse of TCP sessions for all HTTP versions
     2. Reuse of HTTP2.0 streams for HTTP2.0
 4. River MUST support health checks of upstream servers
-    1. **TODO: “Configurable TTL override & cache drop upon health check failure for backends'
-       hostnames in DNS. (i.e. allow lower TTLs than the DNS standard; re-resolve DNS if health
-       checks fail)”**
+5. River MUST support the disabling of use of an upstream server based on failed health checks
 5. River MUST support load balancing of upstream servers
 6. River MUST support sending information for protocols used for pre-proxying, including:
     1. v1 and v2 of the PROXY protocol
@@ -173,9 +172,8 @@ application.
    given service
 3. River MUST support the use of SRV records to provide a list of upstream servers for a given
    service
-4. **TODO: xDS?**
-5. River MUST have a configurable timeout for re-polling poll-based service discovery mechanisms
-6. River MUST support the use of DNS TTL as timeout value for re-polling poll-based service
+4. River MUST have a configurable timeout for re-polling poll-based service discovery mechanisms
+5. River MUST support the use of DNS TTL as timeout value for re-polling poll-based service
    discovery mechanisms
 
 ### 2.4 - Request Path Control
@@ -217,18 +215,18 @@ direction between downstream client and upstream server
     7. Response Body (partial response fragments)
 2. River MUST support rejecting a connection by returning an error response
 3. River MUST support CIDR/API range-based filtering allow and deny lists
-4. River MUST support rate limiting of requests or response on the basis of one or more of the
-   following:
-    1. TODO
-5. River MUST support removal of HTTP headers on a glob or regex matching basis
-6. River MUST support addition of fixed HTTP headers to a request
-7. TODO: Do we need some kind of metadata/template/context based content matching or filling?
-8. TODO: Normalization of headers/bodies?
-    1. EX: URL/URI normalization using browser rules
-    2. Some kind of OWASP list for this?
-9. TODO: Support External Authentication Requests?
-    * Make subrequest to auth provider - NGINX (free module, maybe 3rd party? - need the name)
-    * <https://nginx.org/en/docs/http/ngx_http_auth_request_module.html>
+4. River MUST support rate limiting of requests or responses on the basis of one or
+   more of the following:
+    1. A fixed rate per second
+    2. A "burst" rate - allowing for short increases above the fixed rate
+5. River MUST support application of rate limiting of requests or responses on the per-endpoint
+   basis.
+6. River MUST support removal of HTTP headers on a glob or regex matching basis
+7. River MUST support addition of fixed HTTP headers to a request
+8. River MUST support the normalization of request and response headers and bodies, including:
+    1. URI normalization
+    2. Text encoding
+
 
 ### 2.5 - Observability
 
@@ -311,3 +309,81 @@ without user interaction.
     2. The number of days until the certificate will expired
 6. The application MUST support API Version 2 of the ACME protocol
 7. The application MAY support API Version 1 of the ACME protocol
+
+## 3 - Development Practices
+
+The following are development practice requirements for initial implementers of River.
+
+### 3.1 - Documentation Practices
+
+These requirements relate to the technical documentation of River.
+
+1. The implementers MUST maintain complete developer-facing documentation, or "doc comments"
+    1. This MAY be achieved using the `#![deny(missing_docs)]` directive or similar flags in CI
+       testing
+2. The implementers MUST maintain a separate user-facing documentation, describing usage,
+   configuration, installation, and other details and examples.
+    1. This MAY be achieved using a tool such as `mdBook`, creating a user facing "Book" for River
+3. The implementers MUST automatically publish the developer- and user- facing documentation for
+   all released versions
+4. The implementers MUST automatically publish the developer- and user- facing documentation for
+   the main development branch
+    1. This MAY be on a per-pull request basis, or on a scheduled basis e.g. once per day.
+5. The implementers MUST document how to build developer- and user- facing documentation
+
+### 3.2 - Benchmarking Practices
+
+These requirements relate to the performance benchmarking of River. No specific performance
+metrics are required or specified here, instead weight is placed on measurements over time, allowing
+improvements or regressions to be visible and measurable throughout the development process.
+
+1. The implementers MUST maintain a test suite of performance tests, expected to exercise:
+    1. Typical Use Cases
+    2. Unusual or "Worst Case" use cases
+    3. Use cases previously reported as performance regressions
+2. The implementers MUST run and record the results of performance tests on a regular basis, such
+   as on every pull request, or on a scheduled daily/weekly basis.
+3. The performance tests MUST track the following metrics:
+    1. Peak and Average CPU usage during test execution
+    2. Peak and Average Memory usage during test execution
+    3. CPU and Wall Clock time of test execution
+4. The performance tests MAY track the following "perf counter" metrics:
+    1. Branch prediction failures
+    2. Page faults
+    3. Cache Misses
+    4. Context Switches
+5. The implementers MUST document how to build and execute performance tests
+6. The implementers MAY provide a suite of comparison tests, executing a subset of performance tests
+   against contemporary reverse proxy applications, such as NGINX or Apache.
+
+### 3.3 - Continuous Integration Practices
+
+These requirements document tooling practices expected for the development of River.
+
+1. The implementers MUST provide a set of automated checks that are required to pass prior to merges
+   to the main development branch. These automated checks MAY include:
+    1. Code Formatting checks, e.g. `cargo fmt`
+    2. Code linting checks, e.g. `cargo clippy`
+    3. Unit test execution, e.g. `cargo test`
+    4. Documentation build steps (for user- and developer- facing documentation)
+    5. Integration test execution
+    5. Performance test execution
+2. The implementers MUST provide a set of automated checks that are required to run on a periodic
+   basis. These automated checks MAY include:
+    1. Building against the latest stable, beta, or nightly versions of the Rust compiler and
+       toolchain
+    2. Performance test execution
+    3. Documentation build steps
+    4. Documentation publishing steps
+2. The implementers MUST provide and document the process for running all automated checks locally,
+   in order to allow contributors to perform these checks prior to submitting a Pull Request.
+
+### 3.4 - Contribution Practices
+
+1. The implementers MUST provide and enforce a Code of Conduct for contribution
+    1. The implementors MAY use the [Contributor Covenant] to achieve this goal
+2. The implementers MUST provide and maintain a Contribution guide for third party contributions
+3. The implementers MUST provide and maintain a security policy, to allow for private disclosure
+   of vulnerabilities
+
+[Contributor Covenant]: https://www.contributor-covenant.org/version/1/3/0/code-of-conduct/
diff --git a/docs/what-to-build.md b/docs/what-to-build.md
@@ -143,6 +143,9 @@ This work is primarily in two parts:
 1. Adding support for relevant Service Discovery protocols
 2. Making the load balancing algorithm(s) aware of these changes
 
+This work will also need to be designed in tandem with Configuration, making it possible to
+specify the desired service discovery options in a declarative way.
+
 ### 4.4 - Request Path Control
 
 Proxy Customization Options, allowing an operator to specifies customization of behaviors applied
@@ -161,6 +164,10 @@ Implementers are suggested to pick reasonable, safe defaults, with the goal that
 with no configuration effort always being an acceptable (if not ideal) choice with respect to
 security and performance.
 
+It is likely that there will be additional feature requests in this area in the future, beyond the
+initial requirements, including functionality such as checking authentication prior to proxying.
+Care should be taken with respect to future extensibility.
+
 This work will also need to be designed in tandem with Configuration, making it possible to
 specify the desired request path control options in a declarative way.
 
@@ -170,6 +177,18 @@ An observability system, allowing operators to inspect and make observations abo
 system, both in an exploratory way as a human, as well as an automated way as part of a larger
 monitoring system.
 
+Currently, `pingora` uses the `log` ecosystem in Rust. It may be worth investigating switching to
+`tracing`, or using an integration with the `tracing` ecosystem.
+
+There are a number of existing integrations for push based aggregation systems (e.g. OpenTracing or
+OpenTelemetry), or pull based aggregation systems (e.g. Prometheus).
+
+Metrics may also be emitted as structured fields via the same infrastructure.
+
+This work will also need to be designed in tandem with Configuration, making it possible to
+specify the desired log/trace level and metrics calculation options in a declarative way.
+
+
 ### 4.6 - Configuration
 
 A configuration system, allowing users to specify all of the options that follow. Likely based on
@@ -179,8 +198,77 @@ options.
 System-wide Performance and Resource Options, describing things like rate limiting, connection
 pooling behaviors, timeouts and back-offs, and other similar parameters.
 
+Together with Request Path Control, the design and implementation of the configuration system is
+likely to be a significant part of the integration work. This is for two main reasons:
+
+1. The configuration system is required to configure quite a bit of complexity, exposing a wide
+   array of dials
+2. The configuration system is largely the "user interface" of the system - meaning people will have
+   strong opinions on how it should function.
+
+In the future, there will likely be a need for a scripting interface, or integrated scripting
+language/runtime, such as Rhai, WASM, or others.
+
+Until then, it's recommended to be as conservative as possible in what can be done with the
+configuration file, in order to meet the necessary feature set.
+
+As configuration is the primary user interface, care should be taken to help users understand
+the impact of their configuration choices.
+
 ### 4.7 - Environmental Requirements
 
+In general, River is intended to be run on a Linux system for production usage. This maybe be on
+"bare metal", in a virtual machine, or in a containerized environment.
+
+The `pingora` engine allows for a "two stage" start, the first runs at whatever the user/group
+context that was used to launch the program. This can be used to enable a greater level of access
+such as loading secrets or configuration files from the filesystem. Once this "setup" phase is
+completed, the program is forked, and "steady state" is launched using the user and group that was
+configured.
+
+It is not expected to require any additional work to support this use case - it is already
+supported by `pingora` itself. However any code that wraps `pingora` may need to keep this
+operational model in mind.
+
+### 4.8 - Graceful Reloading
+
+Graceful reloading allows operators to stop, reconfigure, and restart the River server, with minimal
+or no visible downtime to downstream clients.
+
+This capability is important, as other than Upstream Service Discovery, no other way is provided
+to change configuration of operational River instances. This approach was chosen largely because:
+
+1. This is the model chosen by `pingora`
+2. It greatly simplifies logic - as we don't need to worry about "cache invalidation" of
+   configuration or other settings.
+
+It is not expected to require any additional work to support this use case - it is already
+supported by `pingora` itself. However any code that wraps `pingora` may need to keep this
+capability/working model in mind.
+
+### 4.9 - Certificate Provisioning and Management
+
+There is desire for River to be able to automatically provision certificates for domains served
+by it. This presents as two major capabilities:
+
+1. Obtaining a new certificate - on first run, it will be necessary to obtain a certificate before
+   serving any TLS secured traffic
+2. Renewing an existing certificate - in steady state, it will be necessary to periodically (on the
+   order of weeks/months) renew a certificate, and replace old ones with new ones.
+
+By having the reverse proxy perform this step automatically, it avoids the need to have manual or
+other setups in order to deploy or manage the reverse proxy, such as one-shot or scheduled container
+runs.
+
+For new certificates: It is likely (though unspecified) how this should be achieved. It is likely
+that if configured to obtain/manage certificates automatically, and none exist, this should be
+performed BEFORE serving traffic for the relevant listeners.
+
+For existing certificates: It is unspecified whether renewing certificates is something that should
+be done "in flight", or whether it requires a graceful reload to occur.
+
+In both cases, care should be taken (and documentation) should make it clear how these features
+interact with potentially unprivileged "steady state" operational modes.
 
-A Service Discovery System, allowing for runtime updates of the list of potential upstream servers
-to connect to.
+Where it is not possible to handle this "in flight", reference examples should be provided to
+document how users are expected to setup their systems correctly.