New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zkfarmer wrongly thinks it's successfully connected to zookeeper #42
Comments
Ok, a little more insights: 2018-08-22T21:28:37 [INFO]: Connecting to 10.240.82.12:2185 So here is the problem: the max_retries limit is reached and session is being closed already, but zkfarmer is doing nothing about it. It just hangs in there. I see no point in doing that while it will never attempt to connect again. I would think it should just exit with a failure or just exit to be able to initiate another join transaction. Also had it tested against 3 nodes vs 5 nodes cluster. In 5-nodes cluster kazoo tolerates one node death while in 3-nodes cluster it does not tolerate it at all. Seems strange to me since zookeeper says the cluster should stay fully operational while the majority is there. Anyway, it's just a zookeeper changing that state - I guess can do nothing here. |
zkfarmer does not account for the transition from SUSPENDED to LOST: Valid State Transitions
Need to account for that. |
Resolved the problem with the following diff - might be kinda ugly... but it's perfect for my current needs. |
Can you please send a PR? |
@rs , sure, how do I do that? It seems that I need to give you my key or something because otherwise it's giving me 403 :) |
You need to fork the project, then create a branch and then send a PR thru github. You can find an article that describes this process in more details here: https://github.com/susam/gitpr. |
Fixed by #43 |
Using zkfarmer for third-party software registration for our project. It's working great except for one strange thing. Hit this issue a couple of times. The behavior is as following:
zkfarmer connects to zookeeper successfully and runs for some without a problem. Then (according to the logs from one of the zookeeper nodes) one node goes unreachable. Which means the quorum is broken since we have only 3 nodes (5 ideally advised by zookeeper at the moment). zookeeper stays reachable overall - using load balancer link to access it. The third node quickly gets back (maybe 10 minutes overall). And zkfarmer does not report any connection drop or any problem nowhere in the logs (I'm watching over zkfarmer process). So it's there and it thinks it is still connected to zookeeper while in fact it's not. If you go to zookeeper you won't be able to find an ephemeral node, which is an obvious indication that zkfarmer is not connected.
The text was updated successfully, but these errors were encountered: