merge nb4-rc1 to main

nosqlbench · Dec 11, 2020 · 86624c5 · 86624c5
2 parents cbe2c51 + d27da81
commit 86624c5
Show file tree

Hide file tree

Showing 91 changed files with 5,078 additions and 8,918 deletions.
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -16,7 +16,7 @@ jobs:
       - name: setup java
         uses: actions/setup-java@v1
         with:
-          java-version: '14'
+        java-version: '15'
           java-package: jdk
           architecture: x64
 

diff --git a/Dockerfile b/Dockerfile
@@ -1,4 +1,4 @@
-FROM openjdk:14-alpine
+FROM openjdk:15-alpine
 RUN apk --no-cache add curl
 
 COPY nb/target/nb.jar nb.jar

diff --git a/RELEASENOTES.md b/RELEASENOTES.md
@@ -0,0 +1,8 @@
+- 526d09cd (HEAD -> nb4-rc1) auto create dirs for grafana_apikey
+- b4ec4c9a (origin/nb4-rc1) trigger build
+- af87ef9c relaxed requirement for finicky test
+- 3436ec61 trigger build
+- 17ed4c1e annotator and dashboard fixes
+- 4dab9b89 move annotations enums to package
+- 6d514cb6 bump middle version number to required java version '15'
+- fa78e27f set NB4 to Java 15
diff --git a/devdocs/devguide/hybrid_ratelimiter.md b/devdocs/devguide/hybrid_ratelimiter.md
@@ -0,0 +1,244 @@
+## RateLimiter Design
+
+The nosqlbench rate limiter is a hybrid design, combining ideas from
+well-known algorithms with a heavy dose of mechanical sympathy. The
+resulting implementation provides the following:
+
+1. A basic design that can be explained in one page (this page!)
+2. High throughput, compared to other rate limiters tested.
+3. Graceful degradation with increasing concurrency.
+4. Clearly defined behavioral semantics.
+5. Efficient burst capability, for tunable catch-up rates.
+6. Efficient calculation of wait time.
+
+## Parameters
+
+**rate** - In simplest terms, users simply need to configure the *rate*.
+For example, `rate=12000` specifies an op rate of 12000 ops/second.
+
+**burst rate** - Additionally, users may specify a burst rate which can be
+used to recover unused time when a client is able to go faster than the
+strict limit. The burst rate is multiplied by the _op rate_ to arrive at
+the maximum rate when wait time is available to recover. For
+example, `rate=12000,1.1`
+specifies that a client may operate at 12000 ops/s _when it is caught up_,
+while allowing it to go at a rate of up to 13200 ops/s _when it is behind
+schedule_.
+
+## Design Principles
+
+The core design of the rate limiter is based on
+the [token bucket](https://en.wikipedia.org/wiki/Token_bucket) algorithm
+as established in the telecom industry for rate metering. Additional
+refinements have been added to allow for flexible and reliable use on
+non-realtime systems.
+
+The unit of scheduling used in this design is the token, corresponding
+directly to a nanosecond of time. The scheduling time that is made
+available to callers is stored in a pool of tokens which is set to a
+configured size. The size of the token pool determines how many grants are
+allowed to be dispatched before the next one is forced to wait for
+available tokens.
+
+At some regular frequency, a filler thread adds tokens (nanoseconds of
+time to be distributed to waiting ops) to the pool. The callers which are
+waiting for these tokens consume a number of tokens serially. If the pool
+does not contain the requested number of tokens, then the caller is
+blocked using basic synchronization primitives. When the pool is filled
+any blocked callers are unblocked.
+
+The hybrid rate limiter tracks and accumulates both the passage of system
+time and the usage rate of this time as a measurement of progress. The
+delta between these two reference points in time captures a very simple
+and empirical value of imposed wait time.
+
+That is, the time which was allocated but which was not used always
+represents a slow down which is imposed by external factors. This
+manifests as slower response when considering the target rate to be
+equivalent to user load.
+
+## Design Details
+
+In fact, there are three pools. The _active_ pool, the _bursting_ pool,
+and the
+_waiting_ pool. The active pool has a limited size based on the number of
+operations that are allowed to be granted concurrently.
+
+The bursting pool is sized according to the relative burst rate and the
+size of the active pool. For example, with an op rate of 1000 ops/s and a
+burst rate of 1.1, the active pool can be sized to 1E9 nanos (one second
+of nanos), and the burst pool can be sized to 1E8 (1/10 of that), thus
+yielding a combined pool size of 1E9 + 1E8, or 1100000000 ns.
+
+The waiting pool is where all extra tokens are held in reserve. It is
+unlimited except by the size of a long value. The size of the waiting pool
+is a direct measure of wait time in nanoseconds.
+
+Within the pools, tokens (time) are neither created nor destroyed. They
+are added by the filler based on the passage of time, and consumed by
+callers when they become available. In between these operations, the net
+sum of tokens is preserved. In short, when time deltas are observed in the
+system clock, this time is accumulated into the available scheduling time
+of the token pools. In this way, the token pool acts as a metered
+dispenser of scheduling time to waiting (or not) consumers.
+
+The filler thread adds tokens to the pool according to the system
+real-time clock, at some estimated but unreliable interval. The frequency
+of filling is set high enough to give a reliable perception of time
+passing smoothly, but low enough to avoid wasting too much thread time in
+calling overhead. (It is set to 1K/s by default). Each time filling
+occurs, the real-time clock is check-pointed, and the time delta is fed
+into the pool filling logic as explained below.
+
+## Visual Explanation
+
+The diagram below explains the moving parts of the hybrid rate limiter.
+The arrows represent the flow of tokens (ns) as a form of scheduling
+currency.
+
+The top box shows an active token filler thread which polls the system
+clock and accumulates new time into the token pool.
+
+The bottom boxes represent concurrent readers of the token pool. These are
+typically independent threads which do a blocking read for tokens once
+they are ready to execute the rate-limited task.
+
+![Hybrid Ratelimiter Schematic](hybrid_ratelimiter.png)
+
+In the middle, the passive component in this diagram is the token pool
+itself. When the token filler adds tokens, it never blocks. However, the
+token filler can cause any readers of the token pool to unblock so that
+they can acquire newly available tokens.
+
+When time is added to the token pool, the following steps are taken:
+
+1) New tokens (based on measured time elapsed since the last fill) are
+   added to the active pool until it is full.
+2) Any extra tokens are added to the waiting pool.
+3) If the waiting pool has any tokens, and there is room in the bursting
+   pool, some tokens are moved from the waiting pool to the bursting pool
+   according to how many will fit.
+
+When a caller asks for a number of tokens, the combined total from the
+active and burst pools is available to that caller. If the number of
+tokens needed is not yet available, then the caller will block until
+tokens are added.
+
+## Bursting Logic
+
+Tokens in the waiting pool represent time that has not been claimed by a
+caller. Tokens accumulate in the waiting pool as a side-effect of
+continuous filling outpacing continuous draining, thus creating a backlog
+of operations.
+
+The pool sizes determine both the maximum instantaneously available
+operations as well as the rate at which unclaimed time can be back-filled
+back into the active or burst pools.
+
+### Normalizing for Jitter
+
+Since it is not possible to schedule the filler thread to trigger on a
+strict and reliable schedule (as in a real-time system), the method of
+moving tokens from the waiting pool to the bursting pool must account for
+differences in timing. Thus, tokens which are activated for bursting are
+scaled according to the amount of time added in the last fill, relative to
+the maximum active pool. This means that a full pool fill will allow a
+full burst pool fill, presuming wait time is positive by that amount. It
+also means that the same effect can be achieved by ten consecutive fills
+of a tenth the time each. In effect, bursting is normalized to the passage
+of time along with the burst rate, with a maximum cap imposed when
+operations are unclaimed by callers.
+
+## Mechanical Trade-offs
+
+In this implementation, it is relatively easy to explain how accuracy and
+performance trade-off. They are competing concerns. Consider these two
+extremes of an isochronous configuration:
+
+### Slow Isochronous
+
+For example, the rate limiter could be configured for strict isochronous
+behavior by setting the active pool size to *one* op of nanos and the
+burst rate to 1.0, thus disabling bursting. If the op rate requested is 1
+op/s, this configuration will work relatively well, although *any* caller
+which doesn't show up (or isn't already waiting) when the tokens become
+available will incur a waittime penalty. The odds of this are relatively
+low for a high-velocity client.
+
+### Fast Isochronous
+
+However, if the op rate for this type of configuration is set to 1E8
+operations per second, then the filler thread will be adding 100 ops worth
+of time when there is only *one* op worth of active pool space. This is
+due to the fact that filling can only occur at a maximal frequency which
+has been set to 1K fills/s on average. That will create artificial wait
+time, since the token consumers and producers would not have enough pool
+space to hold the tokens needed during fill. It is not possible on most
+systems to fill the pool at arbitrarily high fill frequencies. Thus, it is
+important for users to understand the limits of the machinery when using
+high rates. In most scenarios, these limits will not be onerous.
+
+### Boundary Rules
+
+Taking these effects into account, the default configuration makes some
+reasonable trade-offs according to the rules below. These rules should
+work well for most rates below 50M ops/s. The net effect of these rules is
+to increase work bulking within the token pools as rates go higher.
+
+Trying to go above 50M ops/s while also forcing isochronous behavior will
+result in artificial wait-time. For this reason, the pool size itself is
+not user-configurable at this time.
+
+- The pool size will always be at least as big as two ops. This rule
+  ensures that there is adequate buffer space for tokens when callers are
+  accessing the token pools near the rate of the filler thread. If this
+  were not ensured, then artificial wait time would be injected due to
+  overflow error.
+- The pool size will always be at least as big as 1E6 nanos, or 1/1000 of
+  a second. This rule ensures that the filler thread has a reasonably
+  attainable update frequency which will prevent underflow in the active
+  or burst pools.
+- The number of ops that can fit in the pool will determine how many ops
+  can be dispatched between fills. For example, an op rate of 1E6 will
+  mean that up to 1000 ops worth of tokens may be present between fills,
+  and up to 1000 ops may be allowed to start at any time before the next
+  fill.
+
+  .1 ops/s : .2 seconds worth 1 ops/s : 2 seconds worth 100 ops/s : 2
+  seconds worth
+
+In practical terms, this means that rates slower than 1K ops/S will have
+their strictness controlled by the burst rate in general, and rates faster
+than 1K ops/S will automatically include some op bulking between fills.
+
+## History
+
+A CAS-oriented method which compensated for RTC calling overhead was used
+previously. This method afforded very high performance, but it was
+difficult to reason about.
+
+This implementation replaces that previous version. Basic synchronization
+primitives (implicit locking via synchronized methods) performed
+surprisingly well -- well enough to discard the complexity of the previous
+implementation.
+
+Further, this version is much easier to study and reason about.
+
+## New Challenges
+
+While the current implementation works well for most basic cases, high CPU
+contention has shown that it can become an artificial bottleneck. Based on
+observations on higher end systems with many cores running many threads
+and high target rates, it appears that the rate limiter becomes a resource
+blocker or forces too much thread management.
+
+Strategies for handling this should be considered:
+
+1) Make callers able to pseudo-randomly (or not randomly) act as a token
+   filler, such that active consumers can do some work stealing from the
+   original token filler thread.
+2) Analyze the timing and history of a high-contention scenario for
+   weaknesses in the parameter adjustment rules above.
+3) Add internal micro-batching at the consumer interface, such that
+   contention cost is lower in general.
+4) Partition the rate limiter into multiple slices.
diff --git a/devdocs/devguide/hybrid_ratelimiter.png b/devdocs/devguide/hybrid_ratelimiter.png