-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce dynamic client reply buffer size - save memory on idle clients #9822
Conversation
8970dc0
to
ae82f4f
Compare
@ranshid thanks. in theory, we can realloc the client from time to time. i.e. create a small one initially, and make it grow when we see it normally consumes more than that first buffer. |
Thank you @oranagra for you quick reply! given said all that I would be happy to followup and consider a better alternative :) |
Since the client output buffer is dynamically allocated with an initial static buffer of 16k and then a linked list of dynamic buffers, I'm guessing that any usage pattern that needs large buffers will cause dynamic allocations anyway so doing this dynamically doesn't make sense to me. This should be either a compile time constant (like it is today) or a config (like proposed here). Both are good solutions. |
Thank you @yoav-steinberg ! |
As you said, this buffer is need for performance (avoid allocations and also the indirection and possibly cache locality we'd have if it was a pointer). you can proceed to run some performance tests, maybe we'll realize something new. |
How about this:
|
Thank you @yoav-steinberg! |
I'm not sure a counter is the right direction here. When are we going to increase this counter? Different traffic patterns will produce different results but not necessarily meaning a higher value requires a larger buffer. My thought was that if the buffer was maxed out it means we will benefit from a larger buffer. This means a simple flag might be enough.
I'm not worried about cache because my assumption is that we do the reallocation only periodically and this interval needs to be large enough to minimize performance implications which include both the cache invalidation and then
Excellent! waiting for the results.. |
I have made several performance tests.
|
a few random notes:
|
I think the benchmark does show us how significant splitting the reply between the static buffer and the dynamic list is. Once the reply is split into two buffers we see a major performance impact. This is enough to convince us the original idea of having a big pre-allocated buffer is a good idea. |
thank you @oranagra |
I agree with you @yoav-steinberg . I did perform a comparative test of dynamic VS static buffer:
As you can see the results currently does not reflect any issue with the dynamic buffer (less than 1% TPS degradation) |
This is good news. This means we might be able to have a single preallocated
Regarding reallocating many |
I want to stress two points form my past discussions with Salvatore.
|
Thank you @oranagra! |
Hi @oranagra and @yoav-steinberg In order to support the dynamic reply buffer I have implemented a very simple POC which does the folowing:
I have performed some benchmark tests to validate the dynamic buffer overhead:
in the first test I compared 3 variations:
in the second test I run the STATIC vs DYNAMIC variation AND used pipeline of 5 commands
IMO the pipeline scenario illustrates a very intensive workload, however I think there is no special reason not to define a high idle time in order to shrink the buffer. Please share your thoughts, and if we agree to continue I can introduce the more constructed version of my POC |
Thank you @ranshid, this looks promising. A few questions:
|
while for the IDLE-TIME=2:
I think though the numbers are not high, given the tests are identical in terms of setup and workload, we should expect no expansions. BTW: I was wrong before claiming the initial buffer size is 1Kib. in my POC it is 16Kib... |
According to this design proposal:
You won't need the idle mechanism because shrinking will be handled in the cron even when there's no traffic. In any case I still don't understand why you had any idle time in your benchmarks. I also think that your benchmark might be shrinking the buffer too often in case both flags contradict. I might be wrong because I have a feeling I don't totally understand your logic in the POC. But since the results seem promising I suggest you create a (possibly draft)PR of your code so we can review the algorithm. @oranagra does this make sense? |
ae82f4f
to
9e16df5
Compare
@yoav-steinberg @oranagra - I pushed the latest POC implementation of the dynamic reply buf size.
I have retested benchmark results in compare to the static buffer case: <style> </style>
|
@ranshid Can you update the PR with your new design (instead of config option for buffer size) and update the title accordingly. |
@yoav-steinberg done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some random comments (can be ignored), and small indentation fix.
i think this one is ready to be merged.
Co-authored-by: Oran Agra <oran@redislabs.com>
@redis/core-team please approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some typos.
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
After introducing redis#9822 need to prevent client reply buffer shrink to maintain correct client memory math.
since #9822, the static reply buffer is no longer part of the client structure, so we need to dismiss it.
Current implementation simple idle client which serves no traffic still
use ~17Kb of memory. this is mainly due to a fixed size reply buffer
currently set to 16kb here: https://github.com/redis/redis/blob/unstable/src/server.h#L157
We have encountered some cases in which the server operates in a low memory environments.
In such cases a user who wishes to create large connection pools to support potential burst period,
will exhaust a large amount of memory to maintain connected Idle clients.
Some users may choose to "sacrifice" performance in order to save memory.
This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak.
the algorithm works as follows:
new peak = MAX( last peak, current written size)
a. matching the current buffer size - in which case we expend (resize) the buffer size by 100%
b. less than half the buffer size - in which case we shrink the buffer size by 50%
a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size
b. the value of (current buffer usable size/2) is less than 1Kib
c. the value of (current buffer usable size*2) is larger than 16Kib
Interface changes:
CIENT LIST - now contains 2 new extra fields:
rbs= < the current size in bytes of the client reply buffer >
rbp=< the current value in bytes of the last observed buffer peak position >
INFO STATS - now contains 2 new statistics:
reply_buffer_shrinks = < total number of buffer shrinks performed >
reply_buffer_expends = < total number of buffer expends performed >
Results:
The main concern in this case was the potential performance degradation as a result of buffer dereferencing.
In order to verify the change we have performed multiple benchmarking tests.
all tests performed on:
m5.2xlarge instance
3 io-threads (main+2 threads)
468750 keys with 4K values
15 c5n.2xlarge client machines running single threaded redis-benchmark.
each redis benchmark running 50 clients with only get requests and run for 900 seconds
comparing the results of the static vs dynamic case:
Also looking into cache-misses stats using perf verified the TLB and L1 cache misses was increased ~1%.
sudo perf stat -a -g --pid 'pidof redis-servert' -e cache-misses -- sleep 60
Static Buf:
Dynamic Buffer: