Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1431: implement client stack restart on reached connections limit. #799

Merged
merged 7 commits into from
Jul 11, 2018

Conversation

sergey-shilov
Copy link
Contributor

Client stack is restarted if connections threshold is reached and
specified minimal time (configurable, should be greater than TIME_WAIT)
is spent since previous client stack restart to avoid sockets overhead
in TIME_WAIT or FIN_WAIT state.

Signed-off-by: Sergey Shilov sergey.shilov@dsr-company.com

Sergey Shilov added 3 commits July 9, 2018 11:14
Client stack is restarted if connections threshold is reached and
specified minimal time (configurable, should be greater than TIME_WAIT)
is spent since previous client stack restart to avoid sockets overhead
in TIME_WAIT or FIN_WAIT state.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
def init_stack_restart_params(self):
self.connected_clients_num = 0
self.stack_restart_is_needed = False
self.last_start_time = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use time.perf_counter()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like time.perf_counter() is not appropriate here. From time module reference:
"The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid."
But I'm calculating a time point in future. So using time.time() function is much more understandable for me as it returns the time in seconds since the epoch.
And also about time.perf_counter(): "A clock with the highest available resolution to measure a short duration." We don't need the highest resolution here as calculated time out is about tens of minutes.

if event['event'] == zmq.EVENT_ACCEPTED:
self.connected_clients_num += 1
if event['event'] == zmq.EVENT_DISCONNECTED:
assert self.connected_clients_num > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts will be dropped in production. Should we use an exception here? Or it can be expected that we get disconnected event without connected one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a condition not to get negative numbers in addition to assertion. Also added logging. My concern here is about lost ZMQ events and I don't think that exception is needed here.

def restart(self):
logger.warning("Stopping client stack on node {}".format(self))
self.stop()
time.sleep(0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number 0.2 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for safety.

@@ -410,6 +410,8 @@ def restart_clientstack(self):
time.sleep(0.2)
logger.debug("Starting clientstack on node {}".format(self))
self.clientstack.start()
# Sleep to allow disconnected clients to reconnect before sending replies from the server side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use restart method from clientstack here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, done.

def _can_restart(self):
return self.next_restart_min_time < time.time()

def check_for_stack_restart(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.min_stack_restart_timeout + \
randint(0, self.max_stack_restart_time_deviation)

def check_listener_events(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for this method and listener monitor

@@ -946,6 +955,20 @@ def prepare_to_send(self, msg: Any):
self.msgLenVal.validate(msg_bytes)
return msg_bytes

@staticmethod
def get_monitor_events(monitor_socket, non_block=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There is a similar method in Remote. Should we get rid of duplication?
  2. There is events.append(message['event']) line in Remote's implementation. Should it be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment for this method. My concern is that working with monitor socket is not fully correct and should be reviewed. Now we call get_monitor_socket() each time when we want to get it, it's strange. If it is a singleton then it is ok, but I'm not sure. So I'll create a separate ticket and clean up this implementation.

@@ -359,6 +365,9 @@ def open(self):
)

def close(self):
if self.listener_monitor is not None:
self.listener.disable_monitor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling monitor on the non-listener socket is done differently. Is it intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Sergey Shilov added 2 commits July 9, 2018 16:12
Now connections tracking and stack restart are separated and optional.
NOTE: connections tracking must be enabled if stack restart is enabled
as stack restart uses connections tracking mechanism for triggering of
restart.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
'please check your configuration'.format(self)
raise RuntimeError(error_str)

self.track_connected_clients_num_enabled = config.TRACK_CONNECTED_CLIENTS_NUM_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be self.track_connected_clients_num_enabled=create_listener_monitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_listener_monitor is an input parameter of ZStack class while config.TRACK_CONNECTED_CLIENTS_NUM_ENABLED is a configuration parameter, I don't understand why it should be replaced.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
node.clientstack.connected_clients_num = max_connected_clients_num + 1
node.clientstack.handle_connections_limit()

assert is_restarted is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check that we eventually restart (after the timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
@ashcherbakov ashcherbakov merged commit 7f36aba into hyperledger:master Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants