Skip to content

Conversation

@mdumandag
Copy link
Contributor

@mdumandag mdumandag commented Sep 7, 2020

This PR contains required changes for client to work with 4.0 clusters. Namely, ownerless client implementation, protocol changes, connection strategy config, and some part of the serialization improvements.
This PR is pretty big so I will try to summarize the changes I have made.

  • Documentation is updated with the recent client implementation, logs, and config changes
  • Code samples are updated. That includes changes required for the client 4.0 and the removal of the boilerplate code that stops the test runner from importing and executing them. With this change, they are easier to read with less indentation. Namely, the following part
if __name__ == "__main__":
    # example code
   pass
  • config names are shortened. For example, to access an element from SSLConfig one had to do something like config.network_config.ssl_config.enabled which is a mouthful. Now that becomes config.network.ssl.enabled. We have also a PRD to improve the configuration. We may also change that in the scope of this PRD.
  • Some public APIs which should be private is converted to private (like client.statistics, client.id etc.)
  • Services over the client objects are renamed with more verbose alternatives. Like client.invoker -> client.invocation_service, client.proxy -> client.proxy_manager
  • Load balancer configuration is added with RoundRobinLB and RandomLB as provided implementations. (Now, defaults to round-robin. It was random before).
  • Support for client labels is added.
  • ConnectionStrategyConfig and ConnectionRetryConfig is added with their respective implementations to provide exponential backoffs over connections.
  • IndexConfig is added with the same validations as the Java client.
  • Ability to shuffle member list is added as a property. Implementation of it also supports primary and secondary addresses to prioritize some addresses. Like prioritizing 5701 over 5702 and 5703 when no address is provided.
  • A client message reader that is backed by io.BytesIO is implemented instead of extending and copying a bytearray while dealing with client messages. It supports batch reads and can handle fragmented messages.
  • hazelcast.exception module is renamed as hazelcast.errors
  • Client message is customized according to the Python client's needs. It has a single buffer for outbound messages and the same framed approach as the Java client for inbound messages.
  • Scripts used in tests (start-rc.sh and run-tests.sh) are updated to close the remote controller on exit.

For more, please refer to the changes.

Protocol PR: hazelcast/hazelcast-client-protocol#341

self.serialization_config = SerializationConfig()
"""Hazelcast serialization configuration"""

self.logger_config = LoggerConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any special reason why we left _config name in logger_config while we have removed it from all others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly leftover, fixed in #219

self.live = False
if self._connect_all_members_timer:
self._connect_all_members_timer.cancel()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does _connect_all_members_timer wait for already running action to finish?
If not we can end up with an open connection if I am not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are canceling the timer.

  • If the timer is not started to execute yet, it won't be run.
  • If the timer is executing right now, it will try to finish its execution(in the reactor thread, but we are not waiting for it right there). In the timer, before trying to get_or_connect to a member, we do have a lifecycle.running check. (which will be False after the shutdown call). If we can pass this check in the reactor thread, we can infact create a connection. We do have a cleanup code in the reactor.shutdown but it is cleaning up connection before executing the timers. (it may create connection after entering the loop in which we iterate over connections) Therefore, I moved waiting for timers to finish before cleaning up the connections. That should solve the problem

if error_class:
return error_class(message, _create_error(error_holders, idx + 1))
else:
return UndefinedErrorCodeError(message, error_holder.class_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added cause to UndefinedErrorCodeError on java. You can do the same here as well.


def on_timeout(self):
self.set_exception(TimeoutError("Request timed out after %d seconds." % self._invocation_timeout))
self.set_exception(HazelcastTimeoutError("Request timed out."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different than the java side behavior. We are not cancelling the invocation when invocation.timeout.seconds reached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Right now, client removes the pending connection on timeout. Also, added a test that verifies that there is nothing on the pending list after timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually meant that java client is not doing anything when invocation timeout is reached.
It could be the case that invocation is a long-running task. We should check the timeout only when an exception occurs. Otherwise, we should not take any action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought that property means what I did since we had a timer for the timeout before in the Python client. It was added 5 years ago (d962b88) maybe it meant something like this back then 😄

I will remove the timer altogether then

self._member_list_snapshot = _EMPTY_SNAPSHOT
self._initial_list_fetched = threading.Event()

def start(self, membership_listeners):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this leak to public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be a private API but since we are using this inside the client.start, I made this public. The convention that the user should follow is, if it is not documented, don't use it.

We could make this protected and still be able to call it inside the client module but that could annoy the linters(we don't have any right now, but I am planning to integrate pylint and black later) due to access to the protected members. Anyways, we could disable that check for this access or remove the method altogether. There are other places in which I did something similar to this. So, I am going to create an issue to track that. There are still to many moving parts, so I am planning to go over it once the structure of the client stabilizes a little bit. Is that OK?

Copy link
Contributor

@sancar sancar Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention that the user should follow is, if it is not documented, don't use it.

If this is a well-known convention for Python users, then it is ok.
Still does not feel good, since what happens if we want to document something but it is a private API.

@mdumandag mdumandag merged commit 00e19fd into hazelcast:master Oct 5, 2020
@mdumandag mdumandag deleted the 4.0 branch October 5, 2020 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants