Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

ybonda · 2016-04-19T16:30:50Z

Hi Niko, seems like problem really exists with redisson driver for cluster.

Here is our configuration:
aa19c3c2a927384b6a4673539cd7dcdb2efa33e8 10.20.21.113:7001 slave d20242223349a4d51b30da40dad09f0dbc9a2cc8 0 1461081575219 1 connected d20242223349a4d51b30da40dad09f0dbc9a2cc8 10.20.21.44:7000 myself,master - 0 0 1 connected 0-5461 ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 10.20.21.113:7000 master - 0 1461081576222 3 connected 5462-10922 4a21be5ffb25369bf4243918c34467103fa9c8a4 10.20.21.59:7001 slave ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 0 1461081577222 3 connected f6b4e5a94409ca397731165acad90f2193154c62 10.20.21.59:7000 master - 0 1461081576422 0 connected 10923-16383 6a7c1e8510d29aed17dc1e0ca7f0e6a7278b4de5 10.20.21.44:7001 slave f6b4e5a94409ca397731165acad90f2193154c62 0 1461081576721 5 connected

I'm shutting down Redis instances on 10.20.21.113: ba12c571516b130a9973fdc95ba8cebbfaf5ac7e and aa19c3c2a927384b6a4673539cd7dcdb2efa33e8

When I did this Redis cluster reorganized to following:
aa19c3c2a927384b6a4673539cd7dcdb2efa33e8 10.20.21.113:7001 slave,fail d20242223349a4d51b30da40dad09f0dbc9a2cc8 1461081896304 1461081894700 1 disconnected d20242223349a4d51b30da40dad09f0dbc9a2cc8 10.20.21.44:7000 myself,master - 0 0 1 connected 0-5461 ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 10.20.21.113:7000 master,fail - 1461081896204 1461081895202 3 disconnected 4a21be5ffb25369bf4243918c34467103fa9c8a4 10.20.21.59:7001 master - 0 1461081995791 6 connected 5462-10922 f6b4e5a94409ca397731165acad90f2193154c62 10.20.21.59:7000 master - 0 1461081994787 0 connected 10923-16383 6a7c1e8510d29aed17dc1e0ca7f0e6a7278b4de5 10.20.21.44:7001 slave f6b4e5a94409ca397731165acad90f2193154c62 0 1461081996293 5 connected

As you can see cluster is ok, but requests are failing for some reason.
If I restart tomcat without touching anything on Redis requests are not failing.
Very strange.

This is the link to the redisson log: https://www.dropbox.com/s/pluyzxtxpd0u23z/redisson.log?dl=0

Please assist,
Thank you!

The text was updated successfully, but these errors were encountered:

mrniko · 2016-04-19T16:59:05Z

As I see you have shutdown all nodes for slot range [0-5461], so any objects bounded to this range will be unavailable and all the requests for them fail too.

mrniko · 2016-04-19T17:17:38Z

I didn't review your log before, sorry. It seems like it's a real bug. Will fix it

mrniko · 2016-04-19T17:21:46Z

Could you please update to build from "master" and try it due to bug with slots migration. Then please share your log

ybonda · 2016-04-19T18:23:36Z

Sorry, didn't understand your request.
What exactly you want me to do? Retry my scenario with a latest Redisson build?

mrniko · 2016-04-20T07:22:16Z

Yes

mrniko · 2016-04-20T08:24:05Z

Fixed. Please try

ybonda · 2016-04-20T09:05:53Z

This is the log of exactly the same scenario, but with last redisson-2.2.12 (seems like before your fix :)
Please see: https://www.dropbox.com/s/7vm1g028vhfz2an/redisson_2.2.12.log?dl=0

ybonda · 2016-04-20T15:02:35Z

I checked again same scenario after your fix with same results, falling requests.
JMeter output:
summary + 1 in 0s = 35.7/s Avg: 28 Min: 28 Max: 28 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 summary + 102 in 7.2s = 14.2/s Avg: 178 Min: 3 Max: 4445 Err: 4 (3.92%) Active: 10 Started: 10 Finished: 0 summary = 103 in 7.2s = 14.3/s Avg: 177 Min: 3 Max: 4445 Err: 4 (3.88%) summary + 118 in 10s = 11.9/s Avg: 523 Min: 3 Max: 4405 Err: 14 (11.86%) Active: 17 Started: 17 Finished: 0 summary = 221 in 13.2s = 16.7/s Avg: 361 Min: 3 Max: 4445 Err: 18 (8.14%) summary + 123 in 10.4s = 11.9/s Avg: 894 Min: 2 Max: 4417 Err: 25 (20.33%) Active: 25 Started: 25 Finished: 0 summary = 344 in 19.2s = 17.9/s Avg: 552 Min: 2 Max: 4445 Err: 43 (12.50%) summary + 136 in 10s = 13.6/s Avg: 1067 Min: 2 Max: 4408 Err: 33 (24.26%) Active: 32 Started: 32 Finished: 0 summary = 480 in 25s = 19.2/s Avg: 698 Min: 2 Max: 4445 Err: 76 (15.83%)

Attaching redisson log(after the fix): https://www.dropbox.com/s/eyqllb66qz49w6b/redisson_2.2.12_after_fix.log?dl=0

mrniko · 2016-04-20T15:21:10Z

It's strange I don't see "cluster nodes state from" since 20 Apr 2016 14:55:56 from your log. Cluster state update method should try next cluster node to get cluster state. But in your case something goes wrong.

ybonda · 2016-04-20T16:02:04Z

we also have Connection pool exhausted! error all the time, may be these are related?

20 Apr 2016 14:57:12  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:12  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size

20 Apr 2016 14:57:36 ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size. org.redisson.client.RedisConnectionException: Connection pool exhausted! All connections are busy. Increase connection pool size. at org.redisson.connection.pool.ConnectionPool.get(ConnectionPool.java:164) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.pool.MasterConnectionPool.get(MasterConnectionPool.java:25) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.MasterSlaveEntry.connectionWriteOp(MasterSlaveEntry.java:346) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.MasterSlaveConnectionManager.connectionWriteOp(MasterSlaveConnectionManager.java:564) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.command.CommandAsyncService.async(CommandAsyncService.java:395) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.command.CommandAsyncService$7.run(CommandAsyncService.java:444) ~[redisson-2.2.12-SNAPSHOT.jar:?] at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_71]

mrniko · 2016-04-20T19:19:31Z

I'll take a look at this tomorrow

ybonda · 2016-04-21T07:39:15Z

Important to note that problem exists only if I shutdown redis instances on 10.20.21.113
When I shutdown redis instances on 10.20.21.59 redisson works well, even despite that I can see in the log Connection pool exhausted! error.

…ode fails. #476

mrniko · 2016-04-21T10:26:53Z

Please try again

ybonda · 2016-04-21T11:36:17Z

I did same scenario on different environments and the problem always occurred if I'm shutting down redis instances on the middle machine.

The scenario: https://www.dropbox.com/s/nei0va82j2hzymo/redis-cluster-middle-host-down.jpg?dl=0

thelamon · 2016-04-21T11:38:27Z

+1 for this bug. I get it on stage every time trying to check redis sentinel configuration.
Jedis works nice (application uses both libraries).

mrniko · 2016-04-21T11:47:04Z

@ybonda so nothing has changed at all with new fix?

ybonda · 2016-04-21T11:47:45Z

Not tested yet.

Will test it in 10 min

mrniko · 2016-04-21T11:48:14Z

Will wait for your results

ybonda · 2016-04-21T12:20:23Z

Yes @mrniko !
This fixes the issue.
I did a test not only for the middle host, but also checked all possible scenarios, shutting down different instances, and it worked.

Great job man!
Thanks a lot!

mrniko · 2016-04-21T13:40:53Z

@ybonda At your service!

mrniko · 2016-04-21T13:41:09Z

@thelamon could you please test it too?

thelamon · 2016-04-21T13:45:31Z

@mrniko Thanks for the service!Could test it next Monday only (

mrniko · 2016-04-21T13:48:29Z

@thelamon OK I'll close it then. If you will have any problems please post a comment

mrniko added the bug label Apr 19, 2016

mrniko added this to the 2.2.12 milestone Apr 19, 2016

mrniko pushed a commit that referenced this issue Apr 20, 2016

Fixed cluster initialization. #476

13664c3

mrniko pushed a commit that referenced this issue Apr 21, 2016

Fixed - cluster state update manager can't try next node if current n…

16c5df2

…ode fails. #476

mrniko closed this as completed Apr 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

ybonda commented Apr 19, 2016 •

edited

mrniko commented Apr 19, 2016 •

edited

mrniko commented Apr 19, 2016

mrniko commented Apr 19, 2016

ybonda commented Apr 19, 2016 •

edited

mrniko commented Apr 20, 2016

mrniko commented Apr 20, 2016

ybonda commented Apr 20, 2016

ybonda commented Apr 20, 2016

mrniko commented Apr 20, 2016

ybonda commented Apr 20, 2016 •

edited

mrniko commented Apr 20, 2016

ybonda commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016 •

edited

thelamon commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016 •

edited

mrniko commented Apr 21, 2016

mrniko commented Apr 21, 2016

thelamon commented Apr 21, 2016

mrniko commented Apr 21, 2016

Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

Comments

ybonda commented Apr 19, 2016 • edited

mrniko commented Apr 19, 2016 • edited

mrniko commented Apr 19, 2016

mrniko commented Apr 19, 2016

ybonda commented Apr 19, 2016 • edited

mrniko commented Apr 20, 2016

mrniko commented Apr 20, 2016

ybonda commented Apr 20, 2016

ybonda commented Apr 20, 2016

mrniko commented Apr 20, 2016

ybonda commented Apr 20, 2016 • edited

mrniko commented Apr 20, 2016

ybonda commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016 • edited

thelamon commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 21, 2016 • edited

mrniko commented Apr 21, 2016

mrniko commented Apr 21, 2016

thelamon commented Apr 21, 2016

mrniko commented Apr 21, 2016

ybonda commented Apr 19, 2016 •

edited

mrniko commented Apr 19, 2016 •

edited

ybonda commented Apr 19, 2016 •

edited

ybonda commented Apr 20, 2016 •

edited

ybonda commented Apr 21, 2016 •

edited

ybonda commented Apr 21, 2016 •

edited