Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved connection/reconnection handling, plus minor fixes #9

Merged
merged 1 commit into from
Apr 12, 2016

Conversation

GlenWalker
Copy link
Contributor

We've been using this patched version in heavy production for a while now, and wanted to contribute our improvements back upstream. The main changes relate to improved handling of DNS round-robin, reconnection / expiry, and fixes to kill_session.

  • Correctly handle connections to ZooKeeper using DNS names that resolve to multiple IP addresses
    • Resolve round-robin DNS names into list of server IP addresses
  • Rotate through server list when connecting, same as Java implementation
    • See StaticHostProvider.java: StaticHostProvider.next(), StaticHostProvider.onConnected()
  • Use 1000ms reconnect timer, like Java implementation does
    • See ClientCnxn.java: ClientCnxn::SendThread.startConnect()
  • Only resolve server DNS names on creation, not for every connection attempt, same as Java implementation
  • Handle reply timeouts when connecting to ZooKeeper
  • Retry connection even if no servers available on initial attempt
  • Ignore message for any sockets that aren't related to the current TCP connection
  • Never try to send data on a closed socket, and don't attempt calls when not connected
  • Improved session expiration handling
    • Reconnect after expire now optional
    • Connection state monitor notified of session expiry as soon as it is detected, and of connection after reconnection
  • Notify callers of error when socket an outstanding call was sent on is closed
  • Recently added kill_session method was broken, it was using a cast but expecting a call
    • Use gen_server:call for kill_session since we expect to handle_call for kill_session
  • Use correct state variable in handle_call for kill_session
  • We don't need a heartbeat watcher on TCP connections used purely for kill_session
  • TCP connection used for kill_session should not be active, we don't want tcp_closed messages from it
  • Use rand module rather than random, so initial seed is not constant and list is shuffled differently each time (18+)
    • random: "The new and improved rand module should be used instead of this module,"
  • Use linger option on TCP sockets to improve chances of connection close message being delivered to server
  • Refactor duplicated connection closing code into function close_connection

Correctly handle connections to ZooKeeper using DNS names that resolve to multiple IP addresses
* Resolve round-robin DNS names into list of server IP addresses
Rotate through server list when connecting, same as Java implementation
* See StaticHostProvider.java:  StaticHostProvider.next(), StaticHostProvider.onConnected()
Use 1000ms reconnect timer, like Java implementation does
* See ClientCnxn.java: ClientCnxn::SendThread.startConnect()
Only resolve server DNS names on creation, not for every connection attempt, same as Java implementation
Handle reply timeouts when connecting to ZooKeeper
Retry connection even if no servers available on initial attempt

Ignore message for any sockets that aren't related to the current TCP connection

Never try to send data on a closed socket, and don't attempt calls when not connected

Improved session expiration handling
* Reconnect after expire now optional
* Connection state monitor notified of session expiry as soon as it is detected, and of connection after reconnection

Notify callers of error when socket an outstanding call was sent on is closed

Recently added kill_session method was broken, it was using a cast but expecting a call
* Use gen_server:call for kill_session since we expect to handle_call for kill_session
Use correct state variable in handle_call for kill_session
We don't need a heartbeat watcher on TCP connections used purely for kill_session
TCP connection used for kill_session should not be active, we don't want tcp_closed messages from it

Use rand module rather than random, so initial seed is not constant and list is shuffled differently each time (18+)
* random: "The new and improved rand module should be used instead of this module,"

Use linger option on TCP sockets to improve chances of connection close message being delivered to server

Refactor duplicated connection closing code into function close_connection
@belltoy
Copy link
Member

belltoy commented Apr 12, 2016

Great improvements!

Thanks for your contribution!

@belltoy belltoy merged commit 38739e7 into huaban:master Apr 12, 2016
@GlenWalker
Copy link
Contributor Author

Awesome, should we give it a new version tag?

@GlenWalker
Copy link
Contributor Author

... and bump the version number in package.exs and erlzk.app.src?

@belltoy
Copy link
Member

belltoy commented Apr 12, 2016

@GlenWalker Yes, of course. It's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants