-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDY-1431: implement client stack restart on reached connections limit. #799
INDY-1431: implement client stack restart on reached connections limit. #799
Conversation
Client stack is restarted if connections threshold is reached and specified minimal time (configurable, should be greater than TIME_WAIT) is spent since previous client stack restart to avoid sockets overhead in TIME_WAIT or FIN_WAIT state. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
plenum/common/stacks.py
Outdated
def init_stack_restart_params(self): | ||
self.connected_clients_num = 0 | ||
self.stack_restart_is_needed = False | ||
self.last_start_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use time.perf_counter()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like time.perf_counter() is not appropriate here. From time module reference:
"The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid."
But I'm calculating a time point in future. So using time.time() function is much more understandable for me as it returns the time in seconds since the epoch.
And also about time.perf_counter(): "A clock with the highest available resolution to measure a short duration." We don't need the highest resolution here as calculated time out is about tens of minutes.
if event['event'] == zmq.EVENT_ACCEPTED: | ||
self.connected_clients_num += 1 | ||
if event['event'] == zmq.EVENT_DISCONNECTED: | ||
assert self.connected_clients_num > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts will be dropped in production. Should we use an exception here? Or it can be expected that we get disconnected event without connected one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a condition not to get negative numbers in addition to assertion. Also added logging. My concern here is about lost ZMQ events and I don't think that exception is needed here.
def restart(self): | ||
logger.warning("Stopping client stack on node {}".format(self)) | ||
self.stop() | ||
time.sleep(0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number 0.2 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for safety.
plenum/test/test_node.py
Outdated
@@ -410,6 +410,8 @@ def restart_clientstack(self): | |||
time.sleep(0.2) | |||
logger.debug("Starting clientstack on node {}".format(self)) | |||
self.clientstack.start() | |||
# Sleep to allow disconnected clients to reconnect before sending replies from the server side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use restart
method from clientstack here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, done.
plenum/common/stacks.py
Outdated
def _can_restart(self): | ||
return self.next_restart_min_time < time.time() | ||
|
||
def check_for_stack_restart(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add unit tests for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/stacks.py
Outdated
self.min_stack_restart_timeout + \ | ||
randint(0, self.max_stack_restart_time_deviation) | ||
|
||
def check_listener_events(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for this method and listener monitor
@@ -946,6 +955,20 @@ def prepare_to_send(self, msg: Any): | |||
self.msgLenVal.validate(msg_bytes) | |||
return msg_bytes | |||
|
|||
@staticmethod | |||
def get_monitor_events(monitor_socket, non_block=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There is a similar method in Remote. Should we get rid of duplication?
- There is
events.append(message['event'])
line in Remote's implementation. Should it be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment for this method. My concern is that working with monitor socket is not fully correct and should be reviewed. Now we call get_monitor_socket() each time when we want to get it, it's strange. If it is a singleton then it is ok, but I'm not sure. So I'll create a separate ticket and clean up this implementation.
@@ -359,6 +365,9 @@ def open(self): | |||
) | |||
|
|||
def close(self): | |||
if self.listener_monitor is not None: | |||
self.listener.disable_monitor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling monitor on the non-listener socket is done differently. Is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
Now connections tracking and stack restart are separated and optional. NOTE: connections tracking must be enabled if stack restart is enabled as stack restart uses connections tracking mechanism for triggering of restart. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
'please check your configuration'.format(self) | ||
raise RuntimeError(error_str) | ||
|
||
self.track_connected_clients_num_enabled = config.TRACK_CONNECTED_CLIENTS_NUM_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be self.track_connected_clients_num_enabled=create_listener_monitor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create_listener_monitor is an input parameter of ZStack class while config.TRACK_CONNECTED_CLIENTS_NUM_ENABLED is a configuration parameter, I don't understand why it should be replaced.
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
node.clientstack.connected_clients_num = max_connected_clients_num + 1 | ||
node.clientstack.handle_connections_limit() | ||
|
||
assert is_restarted is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check that we eventually restart (after the timeout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>
Client stack is restarted if connections threshold is reached and
specified minimal time (configurable, should be greater than TIME_WAIT)
is spent since previous client stack restart to avoid sockets overhead
in TIME_WAIT or FIN_WAIT state.
Signed-off-by: Sergey Shilov sergey.shilov@dsr-company.com