Reduce getNodeByQuery overhead #13221

fcostaoliveira · 2024-04-17T13:08:07Z

The following PR does the following changes based upon on CPU profile info. The getNodeByQuery function represents 8.2% of an overhead of 12.3% when comparing single shard cluster with standalone.
Proposed changes:

inlinging keyHashSlot to reduce overhead of that function call
Reduce duplicate calls to getCommandFlags within getNodeByQuery
moving crc16 to header file. inlining crc16 to reduce overhead of that function call

The above changes represent an improvement of approximately 5% on the achievable ops/sec as showcase bellow.

results

steps to reproduce

2 Nodes.
DB node (swap ip) :

taskset -c 0 ./src/redis-server --save '' --cluster-announce-ip 192.168.1.200 --bind 192.168.1.200 --protected-mode no --requirepass perf --daemonize yes --cluster-enabled yes --logfile redis.log
redis-cli -h 192.168.1.200 -a perf flushall
redis-cli -h 192.168.1.200 -a perf cluster flushslots
redis-cli -h 192.168.1.200 -a perf cluster addslotsrange 0 16383 
redis-cli -h 192.168.1.200 -a perf cluster info

Client node:

taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram

results unstable (`804110a`)

root@micro1:~# taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   107876070 ops,  599582 (avg:  599300) ops/sec, 83.99MB/sec (avg: 83.95MB/sec),  0.83 (avg:  0.83) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       599300.19          ---          ---         0.00         0.00         0.83133         0.82300         1.25500         3.07100     85967.12 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     599300.19         0.00         0.00         0.00         0.00         0.83133         0.82300         1.25500         3.07100     85967.12

results this PR (`7e1d2ea`)

root@micro1:~# taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   112251930 ops,  618650 (avg:  623611) ops/sec, 86.66MB/sec (avg: 87.36MB/sec),  0.80 (avg:  0.80) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       623611.19          ---          ---         0.00         0.00         0.79897         0.79100         1.20700         2.76700     89454.70 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     623611.19         0.00         0.00         0.00         0.00         0.79897         0.79100         1.20700         2.76700     89454.70

…t function call

jhershberg · 2024-04-18T03:23:04Z

src/cluster.h

+    for (e = s+1; e < keylen; e++)
+        if (key[e] == '}') break;
+
+    /* No '}' or nothing between {} ? Hash the whole key. */


While we're at it, do we want to put an "unlikely" here? I am far from an expert, just asking.

will to the test of the impact on the results and reply back.

filipecosta90 and others added 5 commits April 15, 2024 23:08

Reduce duplicate calls to getCommandFlags within getNodeByQuery

4a6773d

inlinging keyHashSlot to reduce overhead of that function call

db2dde7

moving crc16 to header file. inlining crc16 to reduce overhead of tha…

7f3b581

…t function call

removed declaration of crc16 on server.h

9dab4d9

Merge branch 'redis:unstable' into improve.getNodeByQuery

7e1d2ea

fcostaoliveira changed the title ~~Improve.get node by query~~ Reduce getNodeByQuery overhead Apr 17, 2024

jhershberg reviewed Apr 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce getNodeByQuery overhead #13221

Reduce getNodeByQuery overhead #13221

fcostaoliveira commented Apr 17, 2024 •

edited

jhershberg Apr 18, 2024

fcostaoliveira Apr 22, 2024

Reduce getNodeByQuery overhead #13221

Are you sure you want to change the base?

Reduce getNodeByQuery overhead #13221

Conversation

fcostaoliveira commented Apr 17, 2024 • edited

results

steps to reproduce

results unstable (804110a)

results this PR (7e1d2ea)

jhershberg Apr 18, 2024

Choose a reason for hiding this comment

fcostaoliveira Apr 22, 2024

Choose a reason for hiding this comment

fcostaoliveira commented Apr 17, 2024 •

edited

results unstable (`804110a`)

results this PR (`7e1d2ea`)