New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API-138, API-394] Set the cluster connection timeout to infinite by default #346
[API-138, API-394] Set the cluster connection timeout to infinite by default #346
Conversation
@@ -69,13 +69,14 @@ The following are configuration element descriptions: | |||
- ``retry_max_backoff``: Specifies the upper limit for the backoff in | |||
seconds. Its default value is ``30``. It must be non-negative. | |||
- ``retry_multiplier``: Factor to multiply the backoff after a failed | |||
retry. Its default value is ``1``. It must be greater than or equal | |||
retry. Its default value is ``1.05``. It must be greater than or equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to the old connection_attempt_limit
?
Is there no longer an option to control the number of connection attempts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this configuration option is removed from the client in the 4.0 release with a more enhanced exponential backoff feature. You can get the almost same behavior by configuring the cluster_connect_timeout
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my use case, I just want to either attempt 1 or 2 connection attempts and then stop (as part of an integration test).
The exponential backoff is a nice feature (and it will be useful in production), but it looks like the logic for limiting the connection attempts is now somewhat convoluted.
Unless I just don't understand.
https://hazelcast.readthedocs.io/en/stable/client.html#hazelcast.client.HazelcastClient
cluster_connect_timeout
seems to relate to a single connection attempt, but I'm trying to shorten the total time (or total attempts) spent attempting to reconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can agree on some of your points but let me clarify the cluster_connect_timeout
a little bit.
It is actually an absolute timeout for whole connection attempts.
Let say that you specified 3 possible member addresses in your configuration. Client first will store the time it started trying to connect the cluster. (Let say that it is start_time
)
Then, client tries to connect at least one of the three. If it cannot connect any of them, then client tries to sleep a little bit. But before sleeping, it will check the the time passed since the first connection attempt(time.time() - start_time
). If it is more than the cluster_connect_timeout
, it will throw an exception saying that client could not connect to any of the members. If it is less than that, it will sleep and try to connect possible addresses again. This loop continues until the client reaches the cluster_connect_timeout
or connects one of the members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same logic applies to reconnection process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdumandag thanks for the quick response and clarification, that's very helpful.
Timeout value in seconds for the client to give up
a
connection attempt to the cluster. Must be non-negative. By default, set to 120.0.
Should this description be adjusted? The "a connection attempt" is what threw me off.
I'd be happy to change it.
Sorry for hijacking the PR 😅 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all, happy to help you. Let me rephrase that part tomorrow
d29e0ee
to
578b30d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good other than a few nitpicks.
docs/client_connection_strategy.rst
Outdated
When the client is disconnected from the cluster or trying to connect | ||
to a one for the first time, it searches for new connections. You can | ||
configure the frequency of the connection attempts and the client | ||
shutdown behavior using the arguments below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one would read a bit better: "The client searches for new connections when it is trying to connect to the cluster. Both the frequency of connection attempts and the client shutdown behavior can be configured using the arguments below."
docs/client_connection_strategy.rst
Outdated
@@ -86,6 +87,7 @@ A pseudo-code is as follows: | |||
while (try_connect(connection_timeout)) != SUCCESS) { | |||
if (get_current_time() - begin_time >= CLUSTER_CONNECT_TIMEOUT) { | |||
// Give up to connecting to the current cluster and switch to another if exists. | |||
// For the default values, CLUSTER_CONNECT_TIMEOUT is infinite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: "CLUSTER_CONNECT_TIMEOUT is infinite by default."?
hazelcast/connection.py
Outdated
# If the no timeout is specified by the | ||
# user, or set to -1 explicitly, set | ||
# the timeout to infinite. | ||
cluster_connect_timeout = six.MAXSIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I've checked sys.maxsize
is available on all versions of CPython. Python 2.7 and Python 3.6 both evaluate sys,maxsize
to 9223372036854775807
. So, I think it would be better to replace six.MAXSIZE
with sys.maxsize
.
hazelcast/config.py
Outdated
@@ -531,8 +531,8 @@ def __init__(self): | |||
self._retry_initial_backoff = 1.0 | |||
self._retry_max_backoff = 30.0 | |||
self._retry_jitter = 0.0 | |||
self._retry_multiplier = 1.0 | |||
self._cluster_connect_timeout = 120.0 | |||
self._retry_multiplier = 1.05 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move 1.05
to a global variable, something like DEFAULT_RETRY_MULTIPLIER
and replace instances of it?
hazelcast/config.py
Outdated
self._retry_multiplier = 1.0 | ||
self._cluster_connect_timeout = 120.0 | ||
self._retry_multiplier = 1.05 | ||
self._cluster_connect_timeout = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move -1
to a global variable, something like DEFAULT_CLUSTER_CONNECT_TIMEOUT
and replace instances of it?
docs/client_connection_strategy.rst
Outdated
to give up to connect to the current cluster. Its default value is | ||
``120``. | ||
to give up connecting to the cluster. Its default value is | ||
``-1``. For the default value, client will not stop trying to connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can rephrase "For the default value ..." as "By default ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even better: "The client will continuously try to connect by default."
We decided to set the default cluster connection timeout to infinite to have a better user experience out-of-the-box for most of the users/use cases. Note that, this is a breaking change for the users who rely on the client to shutdown after some time. Affected users might set the `cluster_connect_timeout` to a finite value. Also, updated the default value of the `retry_multiplier` to back off more after the client tries to connect to the target cluster for some time.
578b30d
to
3fccbcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review @yuce |
We decided to set the default cluster connection timeout to
infinite to have a better user experience out-of-the-box
for most of the users/use cases. Note that, this is a breaking
change for the users who rely on the client to shutdown after
some time. Affected users might set the
cluster_connect_timeout
to a finite value.
Also, updated the default value of the
retry_multiplier
to back offmore after the client tries to connect to the target cluster for
some time.
Closes #335
Closes #306