Added ratelimit to handle throtling cache #8226

pierresouchay · 2020-07-02T15:59:26Z

This implements a solution for #7863

It does:

Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = 1
Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}

So, by default, the cache does not get updates faster than 1/s, even if the service is modified more frequently than that. Having such value ensures it will not modify the default behavior too much (in our case, we plan to have a value around 0.2 aka 1 modification every 5s). Having a default burst of 2 ensures that on a stable service, you get updates quickly anyway (for instance if you have a probe), so on a stable service, if you register a new service to get the latency at registration, you will see the service appear/disappear almost immediately with default settings

…vices.

pierresouchay · 2020-07-14T15:46:06Z

@banks @i0rek Does this mechanism to throttle cache would suit you?

dnephin

Thank you for implementing this!

I had a look over most of it. I left a few comments and questions.

I think it might be easier to review this as a smaller PR. What do you think about stashing most of the UX changes (config, and docs) in another branch and initially making this PR just the rate limiter and integration with the cache? That way we can focus the discussion on the core implementation, and once that is worked out we can figure out the UX.

agent/cache/cache.go

agent/config/runtime.go

lib/ratelimitspec.go

lib/ratelimitspec_test.go

agent/config/runtime_test.go

lib/ratelimitspec.go

agent/config/runtime.go

agent/cache/ratelimit.go

* Renamed CacheRateLimitPerEntry into EntryFetchRateLimit * Renamed CacheConfiguration into Cache * Use tabs instead of spaces in runtime_test.go

Moved implementation away from cache as well

pierresouchay · 2020-07-20T08:51:02Z

Hello @dnephin ,

I think it might be easier to review this as a smaller PR. What do you think about stashing most of the UX changes (config, and docs) in another branch and initially making this PR just the rate limiter and integration with the cache? That way we can focus the discussion on the core implementation, and once that is worked out we can figure out the UX.

Smaller PRs have drawbacks: we sometimes have to wait a long time before having a review and having 3 PRs instead of 1 is hard to follow on both sides (ours and yours) furthermore, we backport those changes as well on your versions and having separated versions we have sometimes to rebase make it too hard for us. And finally, it is sometimes hard to provide a feature partially implemented (what kind of split do you want ? A first with a hardcoded limit ?)

In any case, I am trying to provide PRs with clear separation in commits that are atomic, do work, compile, and pass tests.

pierresouchay · 2020-07-22T12:23:08Z

hello @dnephin & @i0rek,

Using the built-in cache from Golang reduces quite a lot the size of PR, so, I simplified everything, there are fewer data structures and I am using the same semantics for RPC than the current rate limiter in RPC.

I also added an end-to-end test that validates the behavior in both directions (aka to have a minimum of updates and a maximum). The test was tricky to implement because of timing issues, but I added enough tolerance to work reliably on my machine with concurrency.

This unit test validates that rate-limit works effectively with different values.

dnephin

Thanks! This is looking good with x/time/rate

Left some more suggestions for testing, and naming.

dnephin · 2020-07-22T18:01:58Z

agent/agent_test.go

+		test.Run(fmt.Sprintf("rate_limit_at_%v", currentTest.rateLimit), func(t *testing.T) {
+			tt := currentTest
+			t.Parallel()
+			a := NewTestAgent(t, fmt.Sprintf("cache = { rate_limit_per_entry = %v, rate_limit_burst = 1 }", tt.rateLimit))


Do we need a full agent to test this change?

It seems like the change is fairly localized to the agent/cache package, so I would expect the test to be in that same package. If we use a fake cache.Type implementation the test should read better as there will be fewer things that are not related to the behaviour we are trying to test. A fake implementation could report on the number of times it was called, which should make it easy to test that refresh is blocked on a limit.

That should also make it easier to show that different keys do not share a rate limit.

I think it is the best proof it is working from end-to-end. What would you suggest exactly?

I am not very familiar with Mocks (and to be fair, not always a big fan), but if you point me to an example of what you have in mind...

agent/agent_test.go

agent/cache/cache.go

dnephin · 2020-07-22T18:06:59Z

agent/cache/cache.go

+	// EntryFetchMaxBurst max burst size of RateLimit for a single cache entry
+	EntryFetchMaxBurst int


According to https://godoc.org/golang.org/x/time/rate#Limiter.Burst it's the number of tokens per Wait call, which I believe is also going to be 1. Do we need to make this configurable? Or can we remove it and always use 1 ?

Yes, the bust rate is important: it allows the rate-limit to be overridden for burst-size time (defaults to 2), but globally respect the rate-limit (in our case, we will probably configure it with a value around 4 or 5 I think and 0.1 qps => you can have quick updates on a stable service, but when too many updates are firing on a service, you get rate-limited at 1 query every 10s) I use the same configuration as the rate-limit you already implemented for RPC which includes burst size as well.

I think either I have misunderstood the docs I linked, or the docs are misleading, or it doesn't work quite that way. The docs suggest this only impacts Wait(), and we always use Wait() with a value of 1 , because we only ever fetch an entry one at a time. I'll need to read the code to better understand how burst works.

@dnephin This is exactly how it works for RPC Max Rate limit as well: https://github.com/hashicorp/consul/blob/master/agent/agent.go#L1404

Wait(10) is equivalent to fetching 10 times Wait(1) in an atomic way. Thus, it represents the burst that can be applied when the RPC calls start being rate-limited.
Imagine the following scenario with cache.entry_fetch_rate_limit = 0.2 (aka 1 rpc every 5s), so every 5s, a token gets added to the reservation

imagine you watch a service without any change for 1 min, so reservation get full (aka you have n buckets available)

for some reason, the service becomes unstable at t1 and get updated every 1 second

Time X-Consul-Index on server idx with burst_size = 1 idx with burst_size = 5

0 42 tokens=1, 42 tokens=5, 42

1 43 tokens=0, 43 tokens=4, 43

2 44 tokens=0, 43 tokens=3, 44

3 45 tokens=0, 43 tokens=2, 45

4 46 tokens=0, 43 tokens=1, 46

5 47 tokens=1, 47 tokens=1, 47

6 48 tokens=0, 47 tokens=0, 47

7 49 tokens=0, 47 tokens=0, 47

8 50 tokens=0, 47 tokens=0, 47

9 51 tokens=1, 51 tokens=1 51

10 52 tokens=0, 51 tokens=0 51

I looked into this as well and was confused at first, because seemed to me that burst only determines how many tokens can be taken by WaitN (which we don't use), but there is more:

https://godoc.org/golang.org/x/time/rate#Limiter:

It implements a "token bucket" of size b, [...] with a maximum burst size of b events.

Being able to configure a bigger token bucket is important in order to be able to allow for multiple Wait calls in quick succession (what @pierresouchay means with burst, not what the library refers to as burst when using WaitN or other ...N functions).

Thank you both for the clarification!

It sounds like this setting is actually the bucket size. I wonder if hat would have been a better name for the config field. I guess since we already call this burst elsewhere in the config we probably have to keep it burst here as well.

To me, burst make sense from a functional point of view: the operator want to allow burst of data until the rate limit is reached. Bucket is more technical about the implementation. But as you highlighted it, it is anyway already used that way in RPC, so I think it does not make sense to change it

agent/cache/cache.go

agent/config/config.go

agent/config/runtime.go

agent/config/config.go

website/pages/docs/agent/options.mdx

agent/config/builder.go

dnephin · 2020-07-22T18:28:49Z

Smaller PRs have drawbacks

There are always trade-offs! Larger PRs can take longer to review and require more back and forth, but it is true that if you have to wait a while for an initial review, maybe it is easier to do more at once. Hopefully the ci automation makes it easier to split things up a bit more and get them all cherry-picked into the release branches.

it is sometimes hard to provide a feature partially implemented

That is very true. It's hard to say exactly where the right line is. Sometime what I try to do is open one small PR with the core part of the change, and then a second (or sometimes more) draft PRs that help demonstrate how that core feature might be exposed to the user, or integrated in other places. The draft PR doesn't need to be totally ready for review, so it can save some time up front. But maybe that approach won't always work.

In any case, I am trying to provide PRs with clear separation in commits that are atomic, do work, compile, and pass tests.

Thank you! We really appreciate your PRs!

* Renamed rate_limit_per_entry into entry_fetch_rate_limit * Renamed rate_limit_burst into entry_fetch_max_burst * Multi-lines for RateLimiter: rate.NewLimiter(...) to avoid too long lines

… a structure in RuntimeConfig

added comment for panic()

hanshasselberg

Left a couple of comments and clarifications.

hanshasselberg · 2020-07-23T07:53:55Z

agent/cache/cache.go

+	// EntryFetchMaxBurst max burst size of RateLimit for a single cache entry
+	EntryFetchMaxBurst int


I looked into this as well and was confused at first, because seemed to me that burst only determines how many tokens can be taken by WaitN (which we don't use), but there is more:

https://godoc.org/golang.org/x/time/rate#Limiter:

It implements a "token bucket" of size b, [...] with a maximum burst size of b events.

Being able to configure a bigger token bucket is important in order to be able to allow for multiple Wait calls in quick succession (what @pierresouchay means with burst, not what the library refers to as burst when using WaitN or other ...N functions).

agent/cache/cache.go

agent/config/builder.go

agent/config/config.go

agent/agent_test.go

hanshasselberg

Found a last little thing - the docs.

hanshasselberg · 2020-07-27T19:58:15Z

website/pages/docs/agent/options.mdx

+- `cache` Cache configuration of agent. The configurable values are the following:
+
+  - `entry_fetch_max_burst`: The size of the token bucket used to recharge the rate-limit per
+    cache entry. The default value is 2 and means that when cache has not been updated


The default is now rate.Inf.

@i0rek The cache.entry_fetch_max_burst is actually 2 by default (even if it has no real meaning until cache.entry_fetch_rate is set to something different than +Inf...
I propose to add: This value has no real meaning unless entry_fetch_rate is also set.

I am sorry, I looked at the wrong thing!

hanshasselberg

LGTM!

hanshasselberg · 2020-07-27T20:32:05Z

website/pages/docs/agent/options.mdx

+- `cache` Cache configuration of agent. The configurable values are the following:
+
+  - `entry_fetch_max_burst`: The size of the token bucket used to recharge the rate-limit per
+    cache entry. The default value is 2 and means that when cache has not been updated


I am sorry, I looked at the wrong thing!

hashicorp-ci · 2020-07-27T21:11:45Z

🍒✅ Cherry pick of commit 505de6d onto release/1.8.x succeeded!

This implements a solution for #7863 It does: Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf Add cache.entry_fetch_max_burst size of rate limit (default value = 2) The new configuration now supports the following syntax for instance to allow 1 query every 3s: command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}' in JSON { "cache": { "entry_fetch_rate": 0.333 } }

ltagliamonte-dd · 2020-09-01T21:38:26Z

@i0rek @dnephin would be great to have a 1.7.X release with this change as well. Thanks

pierresouchay force-pushed the rate_limit_for_cached_entries branch from c54e9eb to 4d7e4d6 Compare July 2, 2020 16:16

pierresouchay mentioned this pull request Jul 2, 2020

[Performance] Limit impact of Cache on Consul Servers #7863

Closed

pierresouchay force-pushed the rate_limit_for_cached_entries branch from 4d7e4d6 to e14ab2d Compare July 3, 2020 13:00

Added ratelimit to handle throttling cache refreshes for unstable ser…

87aa38f

…vices.

pierresouchay force-pushed the rate_limit_for_cached_entries branch from e14ab2d to 87aa38f Compare July 3, 2020 13:40

Add documentation for cache.rate_limit_per_entry

cf55f62

pierresouchay mentioned this pull request Jul 6, 2020

Implement cached results with empty body and HTTP 304 semantics #7974

Closed

4 tasks

dnephin suggested changes Jul 17, 2020

View reviewed changes

pierresouchay added 3 commits July 18, 2020 16:59

Renamed structs according to @dnephin comments

72cb3ae

* Renamed CacheRateLimitPerEntry into EntryFetchRateLimit * Renamed CacheConfiguration into Cache * Use tabs instead of spaces in runtime_test.go

Use t.Run() instead of t.Helper() for ratelimitspec_test.go

4c24f1c

Moved ratelimit to its own package

ec5138b

Moved implementation away from cache as well

pierresouchay force-pushed the rate_limit_for_cached_entries branch from 557f4e7 to ec5138b Compare July 20, 2020 07:44

pierresouchay force-pushed the rate_limit_for_cached_entries branch from d0b57ed to bcb2fd7 Compare July 20, 2020 17:51

Removed all of lib/ratelimit replacing it with golang.org/x/time/rate

5fd7610

pierresouchay force-pushed the rate_limit_for_cached_entries branch from bcb2fd7 to 5fd7610 Compare July 20, 2020 18:06

pierresouchay added 2 commits July 20, 2020 20:29

Added cache.rate_limit_burst documentation

7b497bd

Minor indentation cleanup / fixed some comments

b296eb2

pierresouchay force-pushed the rate_limit_for_cached_entries branch 4 times, most recently from 6f3e168 to daf5fe0 Compare July 22, 2020 12:18

pierresouchay force-pushed the rate_limit_for_cached_entries branch 2 times, most recently from 39723a5 to 75855a0 Compare July 22, 2020 15:31

Unit test for cache rate-limiting

96a9dda

This unit test validates that rate-limit works effectively with different values.

pierresouchay force-pushed the rate_limit_for_cached_entries branch from 75855a0 to 96a9dda Compare July 22, 2020 15:42

pierresouchay requested a review from dnephin July 22, 2020 16:02

dnephin reviewed Jul 22, 2020

View reviewed changes

Renaming of fields and structures

00df40b

* Renamed rate_limit_per_entry into entry_fetch_rate_limit * Renamed rate_limit_burst into entry_fetch_max_burst * Multi-lines for RateLimiter: rate.NewLimiter(...) to avoid too long lines

pierresouchay force-pushed the rate_limit_for_cached_entries branch from 1e0b1d2 to 00df40b Compare July 22, 2020 22:42

pierresouchay added 2 commits July 23, 2020 01:12

Structures CacheConfig rename to Cache + use cache.options instead of…

48e50fe

… a structure in RuntimeConfig

Renamed entry.RateLimiter into entry.FetchRateLimiter

d3eeb4f

added comment for panic()

pierresouchay force-pushed the rate_limit_for_cached_entries branch from 060705e to d3eeb4f Compare July 22, 2020 23:32

Allow to tune default values easily

593b18b

hanshasselberg requested changes Jul 23, 2020

View reviewed changes

pierresouchay added 3 commits July 23, 2020 11:56

Disable rate limit by default

b3947c3

Renamed entry_fetch_rate_limit into entry_fetch_rate

472f975

Properly shutdown RateLimiter when cache is closed

7b23877

pierresouchay requested review from dnephin, hanshasselberg and banks July 25, 2020 13:26

hanshasselberg added the backport/1.8 label Jul 27, 2020

hanshasselberg requested changes Jul 27, 2020

View reviewed changes

pierresouchay requested a review from hanshasselberg July 27, 2020 20:29

hanshasselberg approved these changes Jul 27, 2020

View reviewed changes

hanshasselberg merged commit 505de6d into hashicorp:master Jul 27, 2020

dependabot bot mentioned this pull request Mar 14, 2021

Bump github.com/hashicorp/consul/api from 1.5.0 to 1.8.1 fangyincheng/dubbo-go#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added ratelimit to handle throtling cache #8226

Added ratelimit to handle throtling cache #8226

pierresouchay commented Jul 2, 2020 •

edited

Loading

pierresouchay commented Jul 14, 2020

dnephin left a comment

pierresouchay commented Jul 20, 2020 •

edited

Loading

pierresouchay commented Jul 22, 2020

dnephin left a comment

dnephin Jul 22, 2020

pierresouchay Jul 22, 2020

dnephin Jul 22, 2020

pierresouchay Jul 22, 2020

dnephin Jul 22, 2020

pierresouchay Jul 22, 2020 •

edited

Loading

hanshasselberg Jul 23, 2020

dnephin Jul 23, 2020

pierresouchay Jul 23, 2020 •

edited

Loading

dnephin commented Jul 22, 2020

hanshasselberg left a comment

hanshasselberg Jul 23, 2020

hanshasselberg left a comment

hanshasselberg Jul 27, 2020

pierresouchay Jul 27, 2020 •

edited

Loading

hanshasselberg Jul 27, 2020

hanshasselberg left a comment

hanshasselberg Jul 27, 2020

hashicorp-ci commented Jul 27, 2020

ltagliamonte-dd commented Sep 1, 2020

		// EntryFetchMaxBurst max burst size of RateLimit for a single cache entry
		EntryFetchMaxBurst int

Time	X-Consul-Index on server	idx with burst_size = 1	idx with burst_size = 5
0	42	tokens=1, 42	tokens=5, 42
1	43	tokens=0, 43	tokens=4, 43
2	44	tokens=0, 43	tokens=3, 44
3	45	tokens=0, 43	tokens=2, 45
4	46	tokens=0, 43	tokens=1, 46
5	47	tokens=1, 47	tokens=1, 47
6	48	tokens=0, 47	tokens=0, 47
7	49	tokens=0, 47	tokens=0, 47
8	50	tokens=0, 47	tokens=0, 47
9	51	tokens=1, 51	tokens=1 51
10	52	tokens=0, 51	tokens=0 51

Added ratelimit to handle throtling cache #8226

Added ratelimit to handle throtling cache #8226

Conversation

pierresouchay commented Jul 2, 2020 • edited Loading

pierresouchay commented Jul 14, 2020

dnephin left a comment

Choose a reason for hiding this comment

pierresouchay commented Jul 20, 2020 • edited Loading

pierresouchay commented Jul 22, 2020

dnephin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierresouchay Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierresouchay Jul 23, 2020 • edited Loading

Choose a reason for hiding this comment

dnephin commented Jul 22, 2020

hanshasselberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanshasselberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierresouchay Jul 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanshasselberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hashicorp-ci commented Jul 27, 2020

ltagliamonte-dd commented Sep 1, 2020

pierresouchay commented Jul 2, 2020 •

edited

Loading

pierresouchay commented Jul 20, 2020 •

edited

Loading

pierresouchay Jul 22, 2020 •

edited

Loading

pierresouchay Jul 23, 2020 •

edited

Loading

pierresouchay Jul 27, 2020 •

edited

Loading