Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API-138, API-394] Set the cluster connection timeout to infinite by default #346

Merged

Conversation

mdumandag
Copy link
Contributor

@mdumandag mdumandag commented Jan 28, 2021

We decided to set the default cluster connection timeout to
infinite to have a better user experience out-of-the-box
for most of the users/use cases. Note that, this is a breaking
change for the users who rely on the client to shutdown after
some time. Affected users might set the cluster_connect_timeout
to a finite value.

Also, updated the default value of the retry_multiplier to back off
more after the client tries to connect to the target cluster for
some time.

Closes #335
Closes #306

@@ -69,13 +69,14 @@ The following are configuration element descriptions:
- ``retry_max_backoff``: Specifies the upper limit for the backoff in
seconds. Its default value is ``30``. It must be non-negative.
- ``retry_multiplier``: Factor to multiply the backoff after a failed
retry. Its default value is ``1``. It must be greater than or equal
retry. Its default value is ``1.05``. It must be greater than or equal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the old connection_attempt_limit ?

Is there no longer an option to control the number of connection attempts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this configuration option is removed from the client in the 4.0 release with a more enhanced exponential backoff feature. You can get the almost same behavior by configuring the cluster_connect_timeout parameter.

Copy link

@Kilo59 Kilo59 Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my use case, I just want to either attempt 1 or 2 connection attempts and then stop (as part of an integration test).

The exponential backoff is a nice feature (and it will be useful in production), but it looks like the logic for limiting the connection attempts is now somewhat convoluted.

Unless I just don't understand.

https://hazelcast.readthedocs.io/en/stable/client.html#hazelcast.client.HazelcastClient

cluster_connect_timeout seems to relate to a single connection attempt, but I'm trying to shorten the total time (or total attempts) spent attempting to reconnect.

Copy link
Contributor Author

@mdumandag mdumandag Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can agree on some of your points but let me clarify the cluster_connect_timeout a little bit.

It is actually an absolute timeout for whole connection attempts.

Let say that you specified 3 possible member addresses in your configuration. Client first will store the time it started trying to connect the cluster. (Let say that it is start_time)

Then, client tries to connect at least one of the three. If it cannot connect any of them, then client tries to sleep a little bit. But before sleeping, it will check the the time passed since the first connection attempt(time.time() - start_time). If it is more than the cluster_connect_timeout, it will throw an exception saying that client could not connect to any of the members. If it is less than that, it will sleep and try to connect possible addresses again. This loop continues until the client reaches the cluster_connect_timeout or connects one of the members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same logic applies to reconnection process

Copy link

@Kilo59 Kilo59 Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdumandag thanks for the quick response and clarification, that's very helpful.

Timeout value in seconds for the client to give up a connection attempt to the cluster. Must be non-negative. By default, set to 120.0.

Should this description be adjusted? The "a connection attempt" is what threw me off.
I'd be happy to change it.

Sorry for hijacking the PR 😅 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all, happy to help you. Let me rephrase that part tomorrow

@mdumandag mdumandag force-pushed the infinite-cluster-connect-timeout branch 2 times, most recently from d29e0ee to 578b30d Compare February 4, 2021 09:10
Copy link
Contributor

@yuce yuce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good other than a few nitpicks.

Comment on lines 49 to 52
When the client is disconnected from the cluster or trying to connect
to a one for the first time, it searches for new connections. You can
configure the frequency of the connection attempts and the client
shutdown behavior using the arguments below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one would read a bit better: "The client searches for new connections when it is trying to connect to the cluster. Both the frequency of connection attempts and the client shutdown behavior can be configured using the arguments below."

@@ -86,6 +87,7 @@ A pseudo-code is as follows:
while (try_connect(connection_timeout)) != SUCCESS) {
if (get_current_time() - begin_time >= CLUSTER_CONNECT_TIMEOUT) {
// Give up to connecting to the current cluster and switch to another if exists.
// For the default values, CLUSTER_CONNECT_TIMEOUT is infinite.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: "CLUSTER_CONNECT_TIMEOUT is infinite by default."?

# If the no timeout is specified by the
# user, or set to -1 explicitly, set
# the timeout to infinite.
cluster_connect_timeout = six.MAXSIZE
Copy link
Contributor

@yuce yuce Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I've checked sys.maxsize is available on all versions of CPython. Python 2.7 and Python 3.6 both evaluate sys,maxsize to 9223372036854775807. So, I think it would be better to replace six.MAXSIZE with sys.maxsize.

@@ -531,8 +531,8 @@ def __init__(self):
self._retry_initial_backoff = 1.0
self._retry_max_backoff = 30.0
self._retry_jitter = 0.0
self._retry_multiplier = 1.0
self._cluster_connect_timeout = 120.0
self._retry_multiplier = 1.05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move 1.05 to a global variable, something like DEFAULT_RETRY_MULTIPLIER and replace instances of it?

self._retry_multiplier = 1.0
self._cluster_connect_timeout = 120.0
self._retry_multiplier = 1.05
self._cluster_connect_timeout = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move -1 to a global variable, something like DEFAULT_CLUSTER_CONNECT_TIMEOUT and replace instances of it?

to give up to connect to the current cluster. Its default value is
``120``.
to give up connecting to the cluster. Its default value is
``-1``. For the default value, client will not stop trying to connect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rephrase "For the default value ..." as "By default ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even better: "The client will continuously try to connect by default."

We decided to set the default cluster connection timeout to
infinite to have a better user experience out-of-the-box
for most of the users/use cases. Note that, this is a breaking
change for the users who rely on the client to shutdown after
some time. Affected users might set the `cluster_connect_timeout`
to a finite value.

Also, updated the default value of the `retry_multiplier` to back off
more after the client tries to connect to the target cluster for
some time.
@mdumandag mdumandag force-pushed the infinite-cluster-connect-timeout branch from 578b30d to 3fccbcc Compare February 11, 2021 09:27
Copy link
Contributor

@yuce yuce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdumandag
Copy link
Contributor Author

Thanks for the review @yuce

@mdumandag mdumandag merged commit d459ff7 into hazelcast:master Feb 11, 2021
@mdumandag mdumandag deleted the infinite-cluster-connect-timeout branch February 11, 2021 09:41
@mdumandag mdumandag changed the title Set the cluster connection timeout to infinite by default [API-138] Set the cluster connection timeout to infinite by default Apr 15, 2021
@degerhz degerhz changed the title [API-138] Set the cluster connection timeout to infinite by default [API-138, API-394] Set the cluster connection timeout to infinite by default Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants