Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sentinel publishing wrong master switch info on switch-master event, whereas the sentinel logs shows the correct master switch info. #12798

Open
benimohit opened this issue Nov 22, 2023 · 3 comments

Comments

@benimohit
Copy link

benimohit commented Nov 22, 2023

Describe the bug

Sentinel publishing wrong master switch info on switch-master event, whereas the sentinel logs shows the correct master switch info.

To reproduce

Couldn't reproduce, this was detected with Jedis client as it subscribe for +switch-master.
Problem happens when one of the 3 sentinels is down for patching and meanwhile master switches. And when the patched sentinel comes back up, it publishes +switch-master[problem always happened with this particular publish]

Expected behavior

Jedis client basically change the master, and try writing to replica.

Additional information

Any additional information that is relevant to the problem.

@enjoy-binbin
Copy link
Collaborator

can you post the event and the sentinel logs?

@benimohit
Copy link
Author

Sentinel logs
One running sentinel
17:X 20 Nov 2023 22:16:31.682 # +switch-master master-6371 ad1-fd1-401 6371 ad3-fd1-401 6371
Other Sentinel publishing +switch-master once its up
13:X 20 Nov 2023 22:27:53.117 # +switch-master master-6371 ad1-fd1-401 6371 ad3-fd1-401 6371

client switching the pool to a new master[Correctly]
20 Nov 2023 22:16:31.694 Created JedisSentinelPool to master at ad3-fd1-401:6371
client switching the pool to a new master[Incorrectly]
20 Nov 2023 22:27:53.132 Created JedisSentinelPool to master at ad1-fd1-401:6371

@benimohit
Copy link
Author

benimohit commented Nov 29, 2023

My suspicion is on Sentinel because we have been using the same jedis client version for almost 2 years now and started seeing this issue when we made infrastructure changes that allow one sentinel to be down during failover.
Also, it's really hard to reproduce and when this happens it happens with all 10-15 of our clients using jedis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants