Test with Redis Sentinel #170

xiaogaozi · 2021-02-03T09:39:16Z

What would you like to be added:
Run adequate test with Redis Sentinel to verify Redis high availability is working.

Why is this needed:
Redis Sentinel is the official high availability solution for Redis. It can automatic failover when Redis master is not working.

Depends on #169

chnliyong · 2021-04-29T14:40:57Z

Test JuiceFS with Redis Sentinel

Here we test the automatic failover process of Redis Sentinel and whether JuiceFS can automatically switch to the new master as expected.

We use ASCII art in order to show the configuration in a graphical format, we use the same symbols as described here .

       +----+
       | M1 |
       | S1 |  Box1(10.0.101.68)
       +----+
          |
+----+    |    +----+
| R2 |----+----|JFS |
| S2 |         | S3 |
+----+         +----+
 Box2           Box3
(10.0.101.16)  (10.0.101.214)
Configuration: quorum = 2

We use the above 3 boxes deployment: 2 redis instances M1 (master) and R2 (replica), 3 sentinel instances S1, S2 , S3 . The JFS means the JuiceFS process which act as a redis client in this test.

Mount JuiceFS using redis sentinel URL on Box3 :

# REDIS_PASSWORD={{REDIS_PASSWORD}} ./juicefs mount -d \
redis://mymaster,10.0.101.68,10.0.101.16,10.0.101.214:26379/1 /jfs

Run juicefs bench on Box3 :

# juicefs bench --big-file-size=1 --small-file-count=10000

Here we increse --small-file-count to 10000 to ensure the benchmark not exit too quickly, and continuously access Redis (write small file).

Then we stop redis instance on Box1 (M1) , then the juicefs bench is hang as the it cannot finishing writing the metadata, here we show the JuiceFS log (/var/log/syslog)

Apr 29 08:55:18 box3 juicefs[2387]: juicefs[2387] <ERROR>: error: redis: Conn is in a bad state: EOF
Apr 29 08:55:18 box3 juicefs[2387]: juicefs[2387] <WARNING>: write inode:9659 error: input/output error
Apr 29 08:55:18 box3 juicefs[2387]: juicefs[2387] <ERROR>: write inode:9659 indx:0  input/output error
Apr 29 08:56:30 box3 juicefs[2387]: juicefs[2387] <INFO>: slow operation: getattr (9489): OK (9489,[drwxr-xr-x:0040755,2,0,0,1619686508,1619686517,1619686517,4096]) <72.112070>

and show the S2 log ( /var/log/redis/redis-sentinel.log ):

1884:X 29 Apr 08:55:47.848 # +sdown master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:47.906 # +odown master mymaster 10.0.101.68 6379 #quorum 2/2
1884:X 29 Apr 08:55:47.906 # +new-epoch 10
1884:X 29 Apr 08:55:47.906 # +try-failover master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:47.909 # +vote-for-leader a2cc7f64dca16cf0f3724c49b90d92ed7fa4cf85 10
1884:X 29 Apr 08:55:47.915 # 91956b594fe1417de6b9ce3dfde9e8ef757cc935 voted for a2cc7f64dca16cf0f3724c49b90d92ed7fa4cf85 10
1884:X 29 Apr 08:55:47.916 # 08e969b58de967234dd7f54e886b1f870f2e1400 voted for a2cc7f64dca16cf0f3724c49b90d92ed7fa4cf85 10
1884:X 29 Apr 08:55:47.992 # +elected-leader master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:47.992 # +failover-state-select-slave master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.082 # +selected-slave slave 10.0.101.16:6379 10.0.101.16 6379 @ mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.082 * +failover-state-send-slaveof-noone slave 10.0.101.16:6379 10.0.101.16 6379 @ mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.172 * +failover-state-wait-promotion slave 10.0.101.16:6379 10.0.101.16 6379 @ mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.858 # +promoted-slave slave 10.0.101.16:6379 10.0.101.16 6379 @ mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.858 # +failover-state-reconf-slaves master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.910 # +failover-end master mymaster 10.0.101.68 6379
1884:X 29 Apr 08:55:48.910 # +switch-master mymaster 10.0.101.68 6379 10.0.101.16 6379
1884:X 29 Apr 08:55:48.910 * +slave slave 10.0.101.68:6379 10.0.101.68 6379 @ mymaster 10.0.101.16 6379
1884:X 29 Apr 08:56:18.956 # +sdown slave 10.0.101.68:6379 10.0.101.68 6379 @ mymaster 10.0.101.16 6379

The redis replica ( R2 ) become master, and JuiceFS become normal, the juicefs bench continue.

Then we start the stopped redis instance on Box1 , the S2 log shows:

1884:X 29 Apr 09:51:40.512 # -sdown slave 10.0.101.68:6379 10.0.101.68 6379 @ mymaster 10.0.101.16 6379
1884:X 29 Apr 09:51:50.446 * +convert-to-slave slave 10.0.101.68:6379 10.0.101.68 6379 @ mymaster 10.0.101.16 6379

The redis instance in Box1 now is slave and sync data from new master (Box2 ), the redis log (/var/log/redis/redis-server.log ) on Box1 :

4683:M 29 Apr 09:51:40.120 * DB loaded from disk: 0.001 seconds
4683:M 29 Apr 09:51:40.120 * Ready to accept connections
4683:S 29 Apr 09:51:50.421 * Before turning into a slave, using my master parameters to synthesize a cached master: I ma
y be able to synchronize with the new master with just a partial transfer.
4683:S 29 Apr 09:51:50.421 * SLAVE OF 10.0.101.16:6379 enabled (user request from 'id=3 addr=10.0.101.16:42040 fd=7 name
=sentinel-a2cc7f64-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events
=r cmd=exec')
4683:S 29 Apr 09:51:50.422 # CONFIG REWRITE executed with success.
4683:S 29 Apr 09:51:51.145 * Connecting to MASTER 10.0.101.16:6379
4683:S 29 Apr 09:51:51.145 * MASTER <-> SLAVE sync started
4683:S 29 Apr 09:51:51.146 * Non blocking connect for SYNC fired the event.
4683:S 29 Apr 09:51:51.146 * Master replied to PING, replication can continue...
4683:S 29 Apr 09:51:51.147 * Trying a partial resynchronization (request 3276ec3cc8172fe9dcf3ef520345de3846c4c188:1).
4683:S 29 Apr 09:51:51.147 * Full resync from master: 0ff1c4d8380e4fef0923cfdabc17b63962a1fa3b:22991034
4683:S 29 Apr 09:51:51.147 * Discarding previously cached master state.
4683:S 29 Apr 09:51:51.203 * MASTER <-> SLAVE sync: receiving 2882 bytes from master
4683:S 29 Apr 09:51:51.203 * MASTER <-> SLAVE sync: Flushing old data
4683:S 29 Apr 09:51:51.203 * MASTER <-> SLAVE sync: Loading DB in memory
4683:S 29 Apr 09:51:51.203 * MASTER <-> SLAVE sync: Finished with success

In above test, JuiceFS can switch and connect to the new redis master when the redis sentinel failover finish. During the failover, the redis master is unavailable, JuiceFS would fail to access redis and return input/output error .

JuiceFS works as expected when using Redis Sentinel.

For more details about redis sentinel, please read the official documentation .

xiaogaozi added kind/feature New feature or request area/metadata Issues or PRs related to metadata labels Feb 3, 2021

xiaogaozi added this to the Release 1.0 milestone Feb 3, 2021

davies assigned chnliyong Apr 26, 2021

chnliyong closed this as completed Apr 29, 2021

juicedata locked and limited conversation to collaborators Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Test with Redis Sentinel #170

Test with Redis Sentinel #170

xiaogaozi commented Feb 3, 2021

chnliyong commented Apr 29, 2021 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Test with Redis Sentinel #170

Test with Redis Sentinel #170

Comments

xiaogaozi commented Feb 3, 2021

chnliyong commented Apr 29, 2021 • edited Loading

Test JuiceFS with Redis Sentinel

This issue was moved to a discussion.

chnliyong commented Apr 29, 2021 •

edited

Loading