Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Master gets failed over for many client connections when enabling FLASH storage option #691

Open
yzhao244 opened this issue Jul 7, 2023 · 0 comments

Comments

@yzhao244
Copy link

yzhao244 commented Jul 7, 2023

Describe the bug

When enabling Master/Slave FLASH storage option, Master may get failed over if many client connections write to the Master. Furthermore, this failover can be consistently seen by running a GET benchmark right after a SET benchmark with something like ten thousand connections.

To reproduce
Please note that we are using 4U8G VMs for hosting Master and Slave. The sentinel monitors and if master is down over 10 seconds and trigger a failover.

./memtier_benchmark-12 -s -p 6379 -t 200 -c 50 -n 500 --ratio 1:0 --key-pattern=P:P --data-size-range=2048-4096 --key-minimum=1 --key-maximum=5000000 --hide-histogram

Execute the following GET benchmark right after the SET benchmark above
./memtier_benchmark-12 -s -p 6379 -t 200 -c 50 -n 500 --ratio 0:1 --key-pattern=P:P --data-size-range=2048-4096 --key-minimum=1 --key-maximum=5000000 --hide-histogram

I ran the two benchmarks in a script. As you can see that, right after the SET benchmark is finished, the next GET benchmark shows many zero qps and eventually leads to a failover.
image

Expected behavior

It looks the Master could get unresponsive when READING or UPDATING existing keys right after they just finished being written. Expecting READING or UPDATING existing keys shouldn't make Master become unresponsive and eventually triggered failover.

Additional information

If we chose to wait for a bit such as a couple of minutes, we didnt see any 0 qps for the GET benchmark and no failover happened.

keydb.conf example

protected-mode no
tcp-keepalive 30
timeout 0
maxmemory 4gb
maxclients 10010
save ""
unixsocketperm 600
client-output-buffer-limit slave 143165576 143165576 60
tcp-backlog 10000
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
slave-lazy-flush yes
rdbcompression no
databases 16
appendonly no
appendfilename appendonly.aof
appendfsync no
no-appendfsync-on-rewrite yes
auto-aof-rewrite-percentage 0
lazyfree-lazy-user-flush yes
maxmemory-policy allkeys-lru
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
latency-monitor-threshold 0
repl-backlog-size 1073741824
repl-backlog-ttl 3600
slowlog-log-slower-than 10000
slowlog-max-len 128
lua-time-limit 5000
repl-timeout 60
proto-max-bulk-len 536870912
master-read-only no
maxstorage 68719476736
server-threads 3
min-clients-per-thread 50
server-thread-affinity false
enable-async-commands no
maxmemory-eviction-tenacity 35
storage-provider flash /
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants