Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cluster: attempt reconnect control conn more often
If the control connection is faulty and hence metadata fetch fails, it is advisable that further attempts to reconnect and fetch take place more frequently. The motivation is: if the control connection fails, it is possible that the node has changed its IP and hence we need to fetch new metadata ASAP to discover its new address. Therefore, the ClusterWorker's sleep time is changed from 60 seconds to 1 second once a metadata fetch fails, and is only reverted back to 60 seconds after a fetch succeeds. We are still not good enough: if all nodes change their IPs at once, we will discover them only after the next metadata fetch is issued, which may happen only after 60 seconds (if previous fetch succeeded). Hence, the next commit introduces immediate signalling that the control connection is broken, so that ClusterWorker begins instantly its every-1-second-attempt phase.
- Loading branch information