-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to read the updated Master when connected Master/Slave through sentinel #1293
Comments
Lettuce listens continously to Sentinel Pub/Sub channels for topology updates. This approach is the most elaborate one as Sentinels actively publish changes in master and replica configuration. Can you provide a simple, reproducible test case or the logs from the time of the failover until the command failure? |
@mp911de I face that issue intermittently , I tried it few times and here are the logs that i see. 1.I failed the Redis Master and it recovered fine. i face the below errors after repeating the above steps few times(attached the logs Im actually unsure if there is something im missing in the configuration or it is a bug. |
Hello guys! This is the driver configuration we're using:
We're using the version 6.0.1, but we're facing this issue for a long time in other versions. In the application's log we see this kind of message:
Is there some kind of debugging we can do to get more information from the driver itself? |
I don't know if it's completely related, but we faced a bit of a similar issue some time ago where newly started applications would sometimes complain about not finding a master and staying in a broken state. The only way to 'fix' them was by sending a SENTINEL RESET to one of the Sentinel servers, so all clients would get info sent by the Sentinels. We couldn't quite figure out what was going wrong through the debug logging. After more digging, we did stumble upon a change some time ago where we reduced the Redis timeout from 10s to 1s. After reverting this change, the problem stopped occurring. |
I am seeing the same (or a similiar) issue as this. I think my scenario was roughly the following (this was on my local dev setup, luckily):
With this in place, Lettuce continuously fails to reconnect to the master node if it was down on the initial connection. Here is the stack trace (up until the first application-level line):
The proper Redis master node is on 192.168.97.12 in this case, but Lettuce fails to ever realize it. Issuing a @mp911de Is my assumption that the fact that the 192.168.97.12 node was unavailable on startup, and Lettuce hence removed it from its list of "potential master nodes" anyway near correct, or is this a false assumption on my behalf? The Sentinel nodes was also down when the connection was made, so... could it be that it never connected to the Sentinel nodes in this case? (only to the Redis nodes). This would be one plausible theory about why it never would receive the pub/sub toplogy update in this case. Note, only guessing, I haven't looked at the Lettuce internals in this case. OTOH, issuing the Sentinel reset manually did indeed make it work, so I guess it must have been connected to Sentinel at that point at least... Could it be like this: Sentinel was down when it first tried to connect => Lettuce never got the initial pub/sub topology update/updates. Once Lettuce had managed to connect it, all was fine in terms of subsequent topology updates, but the actual update/updates when the 192.168.97.12 was made the master was gone => it never managed to recover. Distributed systems are indeed hard... 😄 |
Another theory: it could actually have been that 192.168.97.12 was actually already the master. When it was firewalled => no other master could be elected. Once it came back again, the Redis slaves would reconnect to it nicely. But no topology update was posted, since the topology didn't actually change. (it was more a matter of "master came back", it was just the fact that Lettuce had never managed to connect to this node.) |
We are also facing the same issue with the master-slave sentinel setup. Using lettuce 5.2.2.RELEASE. |
We are also facing the same issue using lettuce 5.1.8.RELEASE. |
I am also facing same issue. Currently using 6.1.9 version. We recently introduced Master Slave concept, before we were just connecting to master node. exception is Also some logs here: Not able to identify the cause , but for temporary fix have to restart the service and then it went well after that. |
Folks, to do any progress on this issue we would need a minimal reproducible example that could help pinpoint the problem. |
Bug Report
Im using Master/Slave connection through sentinel and using the below configuration to connect to redis sentinel.
Current Behavior
we hosted our app in kubernetes and when the master node fails for the first few times, Lettuce is responding by updating the sentinel configuration, after that its returning as error saying the "Cannot find the master node"
here are the logs
Expected behavior/code
Sentinel should refresh the topology continuously even after receiving updated master details from sentinel if there is some issue with the current details
Environment
The text was updated successfully, but these errors were encountered: