[API-138, API-394] Set the cluster connection timeout to infinite by default #346

mdumandag · 2021-01-28T11:48:48Z

We decided to set the default cluster connection timeout to
infinite to have a better user experience out-of-the-box
for most of the users/use cases. Note that, this is a breaking
change for the users who rely on the client to shutdown after
some time. Affected users might set the cluster_connect_timeout
to a finite value.

Also, updated the default value of the retry_multiplier to back off
more after the client tries to connect to the target cluster for
some time.

Closes #335
Closes #306

Kilo59 · 2021-02-02T13:30:19Z

docs/client_connection_strategy.rst

@@ -69,13 +69,14 @@ The following are configuration element descriptions:
 - ``retry_max_backoff``: Specifies the upper limit for the backoff in
  seconds. Its default value is ``30``. It must be non-negative.
 - ``retry_multiplier``: Factor to multiply the backoff after a failed
-  retry. Its default value is ``1``. It must be greater than or equal
+  retry. Its default value is ``1.05``. It must be greater than or equal


What happened to the old connection_attempt_limit ?

Is there no longer an option to control the number of connection attempts?

Hi, this configuration option is removed from the client in the 4.0 release with a more enhanced exponential backoff feature. You can get the almost same behavior by configuring the cluster_connect_timeout parameter.

In my use case, I just want to either attempt 1 or 2 connection attempts and then stop (as part of an integration test).

The exponential backoff is a nice feature (and it will be useful in production), but it looks like the logic for limiting the connection attempts is now somewhat convoluted.

Unless I just don't understand.

https://hazelcast.readthedocs.io/en/stable/client.html#hazelcast.client.HazelcastClient

cluster_connect_timeout seems to relate to a single connection attempt, but I'm trying to shorten the total time (or total attempts) spent attempting to reconnect.

I can agree on some of your points but let me clarify the cluster_connect_timeout a little bit.

It is actually an absolute timeout for whole connection attempts.

Let say that you specified 3 possible member addresses in your configuration. Client first will store the time it started trying to connect the cluster. (Let say that it is start_time)

Then, client tries to connect at least one of the three. If it cannot connect any of them, then client tries to sleep a little bit. But before sleeping, it will check the the time passed since the first connection attempt(time.time() - start_time). If it is more than the cluster_connect_timeout, it will throw an exception saying that client could not connect to any of the members. If it is less than that, it will sleep and try to connect possible addresses again. This loop continues until the client reaches the cluster_connect_timeout or connects one of the members.

The same logic applies to reconnection process

@mdumandag thanks for the quick response and clarification, that's very helpful.

Timeout value in seconds for the client to give up a connection attempt to the cluster. Must be non-negative. By default, set to 120.0.

Should this description be adjusted? The "a connection attempt" is what threw me off.
I'd be happy to change it.

Sorry for hijacking the PR 😅 .

No problem at all, happy to help you. Let me rephrase that part tomorrow

yuce

The PR looks good other than a few nitpicks.

yuce · 2021-02-04T09:49:15Z

docs/client_connection_strategy.rst

+When the client is disconnected from the cluster or trying to connect
+to a one for the first time, it searches for new connections. You can
+configure the frequency of the connection attempts and the client
+shutdown behavior using the arguments below.


I think this one would read a bit better: "The client searches for new connections when it is trying to connect to the cluster. Both the frequency of connection attempts and the client shutdown behavior can be configured using the arguments below."

yuce · 2021-02-04T10:04:39Z

docs/client_connection_strategy.rst

@@ -86,6 +87,7 @@ A pseudo-code is as follows:
    while (try_connect(connection_timeout)) != SUCCESS) {
        if (get_current_time() - begin_time >= CLUSTER_CONNECT_TIMEOUT) {
            // Give up to connecting to the current cluster and switch to another if exists.
+            // For the default values, CLUSTER_CONNECT_TIMEOUT is infinite.


How about: "CLUSTER_CONNECT_TIMEOUT is infinite by default."?

yuce · 2021-02-04T10:13:05Z

hazelcast/connection.py

+            # If the no timeout is specified by the
+            # user, or set to -1 explicitly, set
+            # the timeout to infinite.
+            cluster_connect_timeout = six.MAXSIZE


As far as I've checked sys.maxsize is available on all versions of CPython. Python 2.7 and Python 3.6 both evaluate sys,maxsize to 9223372036854775807. So, I think it would be better to replace six.MAXSIZE with sys.maxsize.

yuce · 2021-02-04T10:16:11Z

hazelcast/config.py

@@ -531,8 +531,8 @@ def __init__(self):
        self._retry_initial_backoff = 1.0
        self._retry_max_backoff = 30.0
        self._retry_jitter = 0.0
-        self._retry_multiplier = 1.0
-        self._cluster_connect_timeout = 120.0
+        self._retry_multiplier = 1.05


Can we move 1.05 to a global variable, something like DEFAULT_RETRY_MULTIPLIER and replace instances of it?

yuce · 2021-02-04T10:16:45Z

hazelcast/config.py

-        self._retry_multiplier = 1.0
-        self._cluster_connect_timeout = 120.0
+        self._retry_multiplier = 1.05
+        self._cluster_connect_timeout = -1


Can we move -1 to a global variable, something like DEFAULT_CLUSTER_CONNECT_TIMEOUT and replace instances of it?

yuce · 2021-02-04T10:21:10Z

docs/client_connection_strategy.rst

-  to give up to connect to the current cluster. Its default value is
-  ``120``.
+  to give up connecting to the cluster. Its default value is
+  ``-1``. For the default value, client will not stop trying to connect


I think we can rephrase "For the default value ..." as "By default ..."

Maybe even better: "The client will continuously try to connect by default."

We decided to set the default cluster connection timeout to infinite to have a better user experience out-of-the-box for most of the users/use cases. Note that, this is a breaking change for the users who rely on the client to shutdown after some time. Affected users might set the `cluster_connect_timeout` to a finite value. Also, updated the default value of the `retry_multiplier` to back off more after the client tries to connect to the target cluster for some time.

yuce

LGTM

mdumandag · 2021-02-11T09:41:06Z

Thanks for the review @yuce

mdumandag added Type: Enhancement Type: Documentation Source: Internal labels Jan 28, 2021

mdumandag added this to the 4.1 milestone Jan 28, 2021

mdumandag self-assigned this Jan 28, 2021

Kilo59 reviewed Feb 2, 2021

View reviewed changes

mdumandag force-pushed the infinite-cluster-connect-timeout branch 2 times, most recently from d29e0ee to 578b30d Compare February 4, 2021 09:10

yuce requested changes Feb 4, 2021

View reviewed changes

mdumandag added 2 commits February 5, 2021 14:38

update the cluster_connect_timeout documentation a bit

134f925

mdumandag mentioned this pull request Feb 9, 2021

connection_attempt_limit is not in latest version #366

Closed

address review comments

3fccbcc

mdumandag force-pushed the infinite-cluster-connect-timeout branch from 578b30d to 3fccbcc Compare February 11, 2021 09:27

yuce approved these changes Feb 11, 2021

View reviewed changes

mdumandag merged commit d459ff7 into hazelcast:master Feb 11, 2021

mdumandag deleted the infinite-cluster-connect-timeout branch February 11, 2021 09:41

mdumandag changed the title ~~Set the cluster connection timeout to infinite by default~~ [API-138] Set the cluster connection timeout to infinite by default Apr 15, 2021

degerhz changed the title ~~[API-138] Set the cluster connection timeout to infinite by default~~ [API-138, API-394] Set the cluster connection timeout to infinite by default Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API-138, API-394] Set the cluster connection timeout to infinite by default #346

[API-138, API-394] Set the cluster connection timeout to infinite by default #346

mdumandag commented Jan 28, 2021 •

edited

Kilo59 Feb 2, 2021

mdumandag Feb 2, 2021

Kilo59 Feb 2, 2021 •

edited

mdumandag Feb 2, 2021 •

edited

mdumandag Feb 2, 2021

Kilo59 Feb 2, 2021 •

edited

mdumandag Feb 2, 2021

yuce left a comment

yuce Feb 4, 2021

yuce Feb 4, 2021

yuce Feb 4, 2021 •

edited

yuce Feb 4, 2021

yuce Feb 4, 2021

yuce Feb 4, 2021

yuce Feb 4, 2021

yuce left a comment

mdumandag commented Feb 11, 2021

[API-138, API-394] Set the cluster connection timeout to infinite by default #346

[API-138, API-394] Set the cluster connection timeout to infinite by default #346

Conversation

mdumandag commented Jan 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kilo59 Feb 2, 2021 • edited

Choose a reason for hiding this comment

mdumandag Feb 2, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kilo59 Feb 2, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuce Feb 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuce left a comment

Choose a reason for hiding this comment

mdumandag commented Feb 11, 2021

mdumandag commented Jan 28, 2021 •

edited

Kilo59 Feb 2, 2021 •

edited

mdumandag Feb 2, 2021 •

edited

Kilo59 Feb 2, 2021 •

edited

yuce Feb 4, 2021 •

edited