-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Version: spring-session-data-redis-2.2.5.RELEASE
In my application, I am using Jedis to connect to the Amazon Redis cluster. Everything went well until recently when AWS forced an upgrade of all the Redis server versions from 5.0.3
to 5.0.6
.
What we observed is that after the upgrade, the application still works without any restart. However, the org.springframework.data.redis.connection.RedisConnectionCommands#ping
took much longer than normal. If we restart the application, the command ping
is fast again.
We spent much time to analyze this issue because it was a big issue in our application. In the analysis, we found that the issue comes from the org.springframework.data.redis.connection.jedis.JedisClusterConnection.JedisClusterTopologyProvider#getTopology(..)
method when upgrading Redis version (we don't know exactly what Amazon did), but we can see that it created new cluster with new IP addresses and its slaves. The old nodes from the old cluster are cached but they were no longer available. The current logic shuffles
the list, then loops over the list and tries to reach the node. So, if the invalid nodes are at the beginning of the list, it will take time to reach the valid node(s). It explains why the ping
command is randomly slow & fast. It depends on the shuffle
method.
It would be great if the team could have a look and see if it is really a bug?
Thanks and best regards