-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Rate limiting sampler #819
Adds Rate limiting sampler #819
Conversation
@anuraaga @trustin fun change to look at. We are trying to include a java 6 compat dependency free non-blocking rate limited sampler. For simplicity this would allow the bucket to empty immediately (ex if 1000 requests/second we don't partition it into 10millis buckets, rather one big bucket and so if all requests deplete in the first millis, so be it). That said, what thoughts have you on the impl and where we go from here. This is one of the most requested features and is a gap as folks try to bridge from the AWS sdk which has a similar impl but using clock time instead. |
@devinsba question around license headers.. yeah this can wait. we do package the LICENSE file since a long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advocate for buckets, it does add some complexity but don't know if it's a huge amount. It feels weird to me that a periodic process that always runs towards the end of a second has basically no chance of ever being sampled.
I would say add decrementBy to brave.internal.Platform
in jre less than 8 it would be a loop, but that's ok as many will use the
normal method.
…On Sun, 11 Nov 2018, 20:23 Brian Devins-Suresh, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In brave/src/main/java/brave/sampler/RateLimitingSampler.java
<#819 (comment)>:
> + this.tracesPerSecond = tracesPerSecond;
+ this.usage = new AtomicLong(0);
+ this.nextUpdate = new AtomicLong(getNextUpdateValue(System.nanoTime()));
+ }
+
+
+ @OverRide public boolean isSampled(long ignoredTraceId) {
+ long now = System.nanoTime();
+ long updateAt = nextUpdate.get();
+
+ // now is past update time OR
+ // (update time is positive (near Long.MAX_VALUE) AND now is negative (long overflow))
+ if (now > updateAt || (updateAt > 0 && now < 0)) {
+ if (nextUpdate.compareAndSet(updateAt, getNextUpdateValue(updateAt))) {
+ usage.set(1);
+ return true;
So all the decrement by functionality was added in Java 8 so we only have
decrement by one available.
Do we think it is important that first x requests per second is exactly
adhered to? If we lost a single sample in favor of a later one would that
be terrible?
I'm having trouble finding a path without adding a synchronized block
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#819 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD615sJkHOpA8vUXlDYV-A2UWMXxR5kks5uuBbJgaJpZM4YEXB_>
.
|
gonna pull this local and try it out |
added a cleanup commit |
here are bench results. It does take time to issue nanoTime, so some overhead is normal. Keep in mind this only applies to first span in a trace and we are talking about operations per microsecond
|
I take back the advice for "I would say add decrementBy to brave.internal.Platform" it isn't needed. |
was thinking about the main problem with the "all samples at the beginning of the second" problem @devinsba @anuraaga. What if we setup some basic rules with a "rollover plan" Ex. ex rate is 100 and if no requests until the last decisecond, there are 100 available still, and 10 again after the next second turns. OTOH, if 13 requests happen in the first 2 deciseconds, the next decisecond has 17 max available. then some function over that as the rate increases, either a higher magnitude per decisecond or smaller bucket than decisecond. Note: fixed buckets, especially few fixed buckets can allow for some interesting impls. Ex our counting sampler is cute due to known bounds https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/sampler/CountingSampler.java Thoughts? |
one reminder on the bucket thing is that the minimum rate in my suggestion is 1/s. it is easier code and matches Amazon's ReservoirSize https://docs.aws.amazon.com/xray/latest/api/API_SamplingRule.html slow rates like per minute or hour is complicated and I don't think we want to handle that until/unless explicitly requested |
That SGTM - not thinking about per minute / hour seems fine since they would rarely sample less than 100%. |
instrumentation/benchmarks/src/main/java/brave/sampler/SamplerBenchmarks.java
Show resolved
Hide resolved
If someone else has the time, energy, and know how to get this over the finish line I'd appreciate it. I've been sick and time has been tight. |
I have a long flight tomorrow.. my intention is to help.. unless they
have 6 liam neeson films..
…On Fri, Nov 30, 2018 at 2:21 AM Brian Devins-Suresh ***@***.***> wrote:
If someone else has the time, energy, and know how to get this over the finish line I'd appreciate it. I've been sick and time has been tight.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
if we do this we need to synchronize the value then also re adjust. since
not everyone will set this dynamic and also we can set dynamic in a
different way... what if instead the sampler itself is replaced?
…On Fri, 7 Dec 2018, 22:46 0x6875790d0a ***@***.*** wrote:
***@***.**** commented on this pull request.
------------------------------
In brave/src/main/java/brave/sampler/RateLimitingSampler.java
<#819 (comment)>:
> + */
+public class RateLimitingSampler extends Sampler {
+ public static Sampler create(int tracesPerSecond) {
+ if (tracesPerSecond < 0) throw new IllegalArgumentException("tracesPerSecond < 0");
+ if (tracesPerSecond == 0) return Sampler.NEVER_SAMPLE;
+ return new RateLimitingSampler(tracesPerSecond);
+ }
+
+ static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1);
+ static final int NANOS_PER_DECISECOND = (int) (NANOS_PER_SECOND / 10);
+
+ final MaxFunction maxFunction;
+ final AtomicInteger usage = new AtomicInteger(0);
+ final AtomicLong nextReset;
+
+ RateLimitingSampler(int tracesPerSecond) {
@adriancole <https://github.com/adriancole> personally I may want this
variable to be reset-able, which mean that we could dynamically change it
from outside. We may just need an apply function which take new
tracesPerSecond and replace maxFunction with new one, HDYT?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#819 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61yOogZfWrmX16n49323nDZVjPiWwks5u2nEigaJpZM4YEXB_>
.
|
To Note: I had intended to do the "replace the sampler" approach for the XRay wrapper of this |
reason I suggest doing dynamic adjustment externally is that it is in
practice hard to guess how you will update.. would you pass AtomicInteger?
a callback with a new rate? or we expect a synchronous call to setRate?
there are multiple ways to update a value. Then as the change is defined
someone also wants to compose a 5% overflow on top! in other words I would
prefer to help solve the dynamic adjustment thing externally until we know
how the rate is adjusted is actually shared.
|
@adriancole replace the sampler itself sounds better, totally forgot about that. Anw the implementation looks good for me. |
do we gonna make a builder?
nice thing about builder is we can add late. right now there is only one
parameter so we can defer until there is more than one and still not break
simple case
… |
instrumentation/benchmarks/src/main/java/brave/sampler/SamplerBenchmarks.java
Outdated
Show resolved
Hide resolved
I think all this needs is javadoc polishing and readme. @anuraaga are you happy with impl? |
Here are benchmark results: Benchmark (traceId) Mode Cnt Score Error Units
SamplerBenchmarks.sampler_rateLimited_1 -9223372036854775808 sample 779286 0.153 ± 0.001 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.00 -9223372036854775808 sample 0.031 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.50 -9223372036854775808 sample 0.140 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.90 -9223372036854775808 sample 0.151 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.95 -9223372036854775808 sample 0.154 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.99 -9223372036854775808 sample 0.197 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.999 -9223372036854775808 sample 7.321 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.9999 -9223372036854775808 sample 14.022 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p1.00 -9223372036854775808 sample 101.120 us/op
SamplerBenchmarks.sampler_rateLimited_1:·gc.alloc.rate -9223372036854775808 sample 15 0.224 ± 0.031 MB/sec
SamplerBenchmarks.sampler_rateLimited_1:·gc.alloc.rate.norm -9223372036854775808 sample 15 0.013 ± 0.002 B/op
SamplerBenchmarks.sampler_rateLimited_1:·gc.count -9223372036854775808 sample 15 ≈ 0 counts
SamplerBenchmarks.sampler_rateLimited_1 1234567890987654321 sample 783515 0.149 ± 0.001 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.00 1234567890987654321 sample 0.032 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.50 1234567890987654321 sample 0.140 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.90 1234567890987654321 sample 0.150 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.95 1234567890987654321 sample 0.154 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.99 1234567890987654321 sample 0.186 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.999 1234567890987654321 sample 3.119 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p0.9999 1234567890987654321 sample 13.150 us/op
SamplerBenchmarks.sampler_rateLimited_1:sampler_rateLimited_1·p1.00 1234567890987654321 sample 130.176 us/op
SamplerBenchmarks.sampler_rateLimited_1:·gc.alloc.rate 1234567890987654321 sample 15 0.220 ± 0.023 MB/sec
SamplerBenchmarks.sampler_rateLimited_1:·gc.alloc.rate.norm 1234567890987654321 sample 15 0.013 ± 0.001 B/op
SamplerBenchmarks.sampler_rateLimited_1:·gc.count 1234567890987654321 sample 15 ≈ 0 counts
SamplerBenchmarks.sampler_rateLimited_100 -9223372036854775808 sample 727518 0.153 ± 0.002 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.00 -9223372036854775808 sample 0.030 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.50 -9223372036854775808 sample 0.140 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.90 -9223372036854775808 sample 0.155 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.95 -9223372036854775808 sample 0.165 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.99 -9223372036854775808 sample 0.205 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.999 -9223372036854775808 sample 8.560 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.9999 -9223372036854775808 sample 15.964 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p1.00 -9223372036854775808 sample 110.848 us/op
SamplerBenchmarks.sampler_rateLimited_100:·gc.alloc.rate -9223372036854775808 sample 15 0.240 ± 0.021 MB/sec
SamplerBenchmarks.sampler_rateLimited_100:·gc.alloc.rate.norm -9223372036854775808 sample 15 0.015 ± 0.001 B/op
SamplerBenchmarks.sampler_rateLimited_100:·gc.count -9223372036854775808 sample 15 ≈ 0 counts
SamplerBenchmarks.sampler_rateLimited_100 1234567890987654321 sample 725738 0.147 ± 0.001 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.00 1234567890987654321 sample 0.029 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.50 1234567890987654321 sample 0.139 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.90 1234567890987654321 sample 0.153 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.95 1234567890987654321 sample 0.166 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.99 1234567890987654321 sample 0.196 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.999 1234567890987654321 sample 5.924 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p0.9999 1234567890987654321 sample 13.140 us/op
SamplerBenchmarks.sampler_rateLimited_100:sampler_rateLimited_100·p1.00 1234567890987654321 sample 77.824 us/op
SamplerBenchmarks.sampler_rateLimited_100:·gc.alloc.rate 1234567890987654321 sample 15 0.229 ± 0.035 MB/sec
SamplerBenchmarks.sampler_rateLimited_100:·gc.alloc.rate.norm 1234567890987654321 sample 15 0.015 ± 0.002 B/op
SamplerBenchmarks.sampler_rateLimited_100:·gc.count 1234567890987654321 sample 15 ≈ 0 counts |
I will repost benchmarks as I forgot to include gc data |
ran benchmarks against another tracer which won't be mentioned.. our results are an order of magnitude more efficient in terms of allocation and runtime. |
Polished up and ran against x-ray. Note their perf is expectedly a little quicker than ours at higher rates as we try to be fair. Also, they have a race condition resetting which we don't. I will raise a pull request to correct theirs.
|
added aws/aws-xray-sdk-java#47 to give back to amazon |
👍 Thanks for picking this up @adriancole, I'll see if I can find time to fix that XRay test today so your PR can get merged |
I will look into the test failure in aws/aws-xray-sdk-java#47 before merge. thanks! |
Amazon's test case found a legit glitch. Fixed in both places |
* Improves reservoir to avoid a race condition and to be more fair This re-uses the implementation we just made in Brave based on feedback on yours. Enjoy! See openzipkin/brave#819 * Makes rate setting lazy
whoot xray now has this impl aws/aws-xray-sdk-java#47 |
The rate-limited sampler allows you to choose an amount of traces to accept on a per-second interval. The minimum number is 0 and the max is 2,147,483,647 (max int).
For example, to allow 10 traces per second, you'd initialize the following:
Appropriate Usage
If the rate is 10 or more traces per second, an attempt is made to distribute the accept decisions equally across the second. For example, if the rate is 100, 10 will pass every decisecond as opposed to bunching all pass decisions at the beginning of the second.
This sampler is efficient, but not as efficient as the
BoundarySampler
. However, this sampler is insensitive to the trace ID and will operate correctly even if they are not perfectly random.Implementation
The implementation uses
System.nanoTime()
and tracks how many yes decisions occur across a second window. When the rate is at least 10/s, the yes decisions are equally split over 10 deciseconds, allowing a roll-over of unused yes decisions up until the end of the second.Prior art
RateLimitingSampler
was made to allow Amazon X-Ray rules to be expressed in Brave. We considered their Reservoir design. Our implementation differs as it removes a race condition and attempts to be more fair by distributing accept decisions every decisecond. If aws/aws-xray-sdk-java#47 is merged, we'll have the same implementation!Thanks!
Thanks @devinsba for spiking the implementation and @anuraaga for suggesting how to be more fair when sampling large numbers of requests. Also appreciate review from @huydx and @zeagord