[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

yzhao244 · 2023-07-07T17:42:17Z

Describe the bug

When enabling Master/Slave FLASH storage option, Master may get failed over if many client connections write to the Master. Furthermore, this failover can be consistently seen by running a GET benchmark right after a SET benchmark with something like ten thousand connections.

To reproduce
Please note that we are using 4U8G VMs for hosting Master and Slave. The sentinel monitors and if master is down over 10 seconds and trigger a failover.

./memtier_benchmark-12 -s -p 6379 -t 200 -c 50 -n 500 --ratio 1:0 --key-pattern=P:P --data-size-range=2048-4096 --key-minimum=1 --key-maximum=5000000 --hide-histogram

Execute the following GET benchmark right after the SET benchmark above
./memtier_benchmark-12 -s -p 6379 -t 200 -c 50 -n 500 --ratio 0:1 --key-pattern=P:P --data-size-range=2048-4096 --key-minimum=1 --key-maximum=5000000 --hide-histogram

I ran the two benchmarks in a script. As you can see that, right after the SET benchmark is finished, the next GET benchmark shows many zero qps and eventually leads to a failover.

Expected behavior

It looks the Master could get unresponsive when READING or UPDATING existing keys right after they just finished being written. Expecting READING or UPDATING existing keys shouldn't make Master become unresponsive and eventually triggered failover.

Additional information

If we chose to wait for a bit such as a couple of minutes, we didnt see any 0 qps for the GET benchmark and no failover happened.

keydb.conf example

protected-mode no
tcp-keepalive 30
timeout 0
maxmemory 4gb
maxclients 10010
save ""
unixsocketperm 600
client-output-buffer-limit slave 143165576 143165576 60
tcp-backlog 10000
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
slave-lazy-flush yes
rdbcompression no
databases 16
appendonly no
appendfilename appendonly.aof
appendfsync no
no-appendfsync-on-rewrite yes
auto-aof-rewrite-percentage 0
lazyfree-lazy-user-flush yes
maxmemory-policy allkeys-lru
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
latency-monitor-threshold 0
repl-backlog-size 1073741824
repl-backlog-ttl 3600
slowlog-log-slower-than 10000
slowlog-max-len 128
lua-time-limit 5000
repl-timeout 60
proto-max-bulk-len 536870912
master-read-only no
maxstorage 68719476736
server-threads 3
min-clients-per-thread 50
server-thread-affinity false
enable-async-commands no
maxmemory-eviction-tenacity 35
storage-provider flash /

The text was updated successfully, but these errors were encountered:

msotheeswaran-sc added the Priority 2 label Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

yzhao244 commented Jul 7, 2023 •

edited

[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

Comments

yzhao244 commented Jul 7, 2023 • edited

yzhao244 commented Jul 7, 2023 •

edited