Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redisson fails to recognize slave that become a master due to master down in Redis Cluster #476

Closed
ybonda opened this issue Apr 19, 2016 · 23 comments
Labels
Milestone

Comments

@ybonda
Copy link

ybonda commented Apr 19, 2016

Hi Niko, seems like problem really exists with redisson driver for cluster.

Here is our configuration:
aa19c3c2a927384b6a4673539cd7dcdb2efa33e8 10.20.21.113:7001 slave d20242223349a4d51b30da40dad09f0dbc9a2cc8 0 1461081575219 1 connected d20242223349a4d51b30da40dad09f0dbc9a2cc8 10.20.21.44:7000 myself,master - 0 0 1 connected 0-5461 ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 10.20.21.113:7000 master - 0 1461081576222 3 connected 5462-10922 4a21be5ffb25369bf4243918c34467103fa9c8a4 10.20.21.59:7001 slave ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 0 1461081577222 3 connected f6b4e5a94409ca397731165acad90f2193154c62 10.20.21.59:7000 master - 0 1461081576422 0 connected 10923-16383 6a7c1e8510d29aed17dc1e0ca7f0e6a7278b4de5 10.20.21.44:7001 slave f6b4e5a94409ca397731165acad90f2193154c62 0 1461081576721 5 connected

I'm shutting down Redis instances on 10.20.21.113: ba12c571516b130a9973fdc95ba8cebbfaf5ac7e and aa19c3c2a927384b6a4673539cd7dcdb2efa33e8

When I did this Redis cluster reorganized to following:
aa19c3c2a927384b6a4673539cd7dcdb2efa33e8 10.20.21.113:7001 slave,fail d20242223349a4d51b30da40dad09f0dbc9a2cc8 1461081896304 1461081894700 1 disconnected d20242223349a4d51b30da40dad09f0dbc9a2cc8 10.20.21.44:7000 myself,master - 0 0 1 connected 0-5461 ba12c571516b130a9973fdc95ba8cebbfaf5ac7e 10.20.21.113:7000 master,fail - 1461081896204 1461081895202 3 disconnected 4a21be5ffb25369bf4243918c34467103fa9c8a4 10.20.21.59:7001 master - 0 1461081995791 6 connected 5462-10922 f6b4e5a94409ca397731165acad90f2193154c62 10.20.21.59:7000 master - 0 1461081994787 0 connected 10923-16383 6a7c1e8510d29aed17dc1e0ca7f0e6a7278b4de5 10.20.21.44:7001 slave f6b4e5a94409ca397731165acad90f2193154c62 0 1461081996293 5 connected

As you can see cluster is ok, but requests are failing for some reason.
If I restart tomcat without touching anything on Redis requests are not failing.
Very strange.

This is the link to the redisson log: https://www.dropbox.com/s/pluyzxtxpd0u23z/redisson.log?dl=0

Please assist,
Thank you!

@mrniko
Copy link
Member

mrniko commented Apr 19, 2016

As I see you have shutdown all nodes for slot range [0-5461], so any objects bounded to this range will be unavailable and all the requests for them fail too.

@mrniko
Copy link
Member

mrniko commented Apr 19, 2016

I didn't review your log before, sorry. It seems like it's a real bug. Will fix it

@mrniko mrniko added the bug label Apr 19, 2016
@mrniko mrniko added this to the 2.2.12 milestone Apr 19, 2016
@mrniko
Copy link
Member

mrniko commented Apr 19, 2016

Could you please update to build from "master" and try it due to bug with slots migration. Then please share your log

@ybonda
Copy link
Author

ybonda commented Apr 19, 2016

Sorry, didn't understand your request.
What exactly you want me to do? Retry my scenario with a latest Redisson build?

@mrniko
Copy link
Member

mrniko commented Apr 20, 2016

Yes

mrniko pushed a commit that referenced this issue Apr 20, 2016
@mrniko
Copy link
Member

mrniko commented Apr 20, 2016

Fixed. Please try

@ybonda
Copy link
Author

ybonda commented Apr 20, 2016

This is the log of exactly the same scenario, but with last redisson-2.2.12 (seems like before your fix :)
Please see: https://www.dropbox.com/s/7vm1g028vhfz2an/redisson_2.2.12.log?dl=0

@ybonda
Copy link
Author

ybonda commented Apr 20, 2016

I checked again same scenario after your fix with same results, falling requests.
JMeter output:
summary + 1 in 0s = 35.7/s Avg: 28 Min: 28 Max: 28 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 summary + 102 in 7.2s = 14.2/s Avg: 178 Min: 3 Max: 4445 Err: 4 (3.92%) Active: 10 Started: 10 Finished: 0 summary = 103 in 7.2s = 14.3/s Avg: 177 Min: 3 Max: 4445 Err: 4 (3.88%) summary + 118 in 10s = 11.9/s Avg: 523 Min: 3 Max: 4405 Err: 14 (11.86%) Active: 17 Started: 17 Finished: 0 summary = 221 in 13.2s = 16.7/s Avg: 361 Min: 3 Max: 4445 Err: 18 (8.14%) summary + 123 in 10.4s = 11.9/s Avg: 894 Min: 2 Max: 4417 Err: 25 (20.33%) Active: 25 Started: 25 Finished: 0 summary = 344 in 19.2s = 17.9/s Avg: 552 Min: 2 Max: 4445 Err: 43 (12.50%) summary + 136 in 10s = 13.6/s Avg: 1067 Min: 2 Max: 4408 Err: 33 (24.26%) Active: 32 Started: 32 Finished: 0 summary = 480 in 25s = 19.2/s Avg: 698 Min: 2 Max: 4445 Err: 76 (15.83%)

Attaching redisson log(after the fix): https://www.dropbox.com/s/eyqllb66qz49w6b/redisson_2.2.12_after_fix.log?dl=0

@mrniko
Copy link
Member

mrniko commented Apr 20, 2016

It's strange I don't see "cluster nodes state from" since 20 Apr 2016 14:55:56 from your log. Cluster state update method should try next cluster node to get cluster state. But in your case something goes wrong.

@ybonda
Copy link
Author

ybonda commented Apr 20, 2016

we also have Connection pool exhausted! error all the time, may be these are related?

20 Apr 2016 14:57:12  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:12  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:13  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size.
20 Apr 2016 14:57:14  ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size

20 Apr 2016 14:57:36 ERROR - An error occurred when opening a session. Error: Connection pool exhausted! All connections are busy. Increase connection pool size. org.redisson.client.RedisConnectionException: Connection pool exhausted! All connections are busy. Increase connection pool size. at org.redisson.connection.pool.ConnectionPool.get(ConnectionPool.java:164) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.pool.MasterConnectionPool.get(MasterConnectionPool.java:25) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.MasterSlaveEntry.connectionWriteOp(MasterSlaveEntry.java:346) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.connection.MasterSlaveConnectionManager.connectionWriteOp(MasterSlaveConnectionManager.java:564) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.command.CommandAsyncService.async(CommandAsyncService.java:395) ~[redisson-2.2.12-SNAPSHOT.jar:?] at org.redisson.command.CommandAsyncService$7.run(CommandAsyncService.java:444) ~[redisson-2.2.12-SNAPSHOT.jar:?] at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) ~[netty-common-4.0.35.Final.jar:4.0.35.Final] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_71]

@mrniko
Copy link
Member

mrniko commented Apr 20, 2016

I'll take a look at this tomorrow

@ybonda
Copy link
Author

ybonda commented Apr 21, 2016

Important to note that problem exists only if I shutdown redis instances on 10.20.21.113
When I shutdown redis instances on 10.20.21.59 redisson works well, even despite that I can see in the log Connection pool exhausted! error.

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

Please try again

@ybonda
Copy link
Author

ybonda commented Apr 21, 2016

I did same scenario on different environments and the problem always occurred if I'm shutting down redis instances on the middle machine.

The scenario: https://www.dropbox.com/s/nei0va82j2hzymo/redis-cluster-middle-host-down.jpg?dl=0

@thelamon
Copy link

+1 for this bug. I get it on stage every time trying to check redis sentinel configuration.
Jedis works nice (application uses both libraries).

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

@ybonda so nothing has changed at all with new fix?

@ybonda
Copy link
Author

ybonda commented Apr 21, 2016

Not tested yet.

Will test it in 10 min

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

Will wait for your results

@ybonda
Copy link
Author

ybonda commented Apr 21, 2016

Yes @mrniko !
This fixes the issue.
I did a test not only for the middle host, but also checked all possible scenarios, shutting down different instances, and it worked.

Great job man!
Thanks a lot!

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

@ybonda At your service!

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

@thelamon could you please test it too?

@thelamon
Copy link

@mrniko Thanks for the service!Could test it next Monday only (

@mrniko
Copy link
Member

mrniko commented Apr 21, 2016

@thelamon OK I'll close it then. If you will have any problems please post a comment

@mrniko mrniko closed this as completed Apr 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants