Shutdown revamp #68

agis · 2018-09-17T11:38:51Z

This patch aims to minimize the downtime for producers and make the
shutdown process generally more robust. The flow is now the following:

shutdown signal is received
manager: stop accepting new consumers and close existing consumers
server: stop accepting new producers and close existing producers
shutdown and restart

This way, producer downtime is reduced to the time elapsed between (3)
and (5), which should be less than a second. Nevertheless, clients
should still have sane retry defaults configured anyway, because some
downtime will still occur.

Also added a hard timeout of 5 seconds around the shutdown process.

ctrochalakis · 2018-09-27T07:56:36Z

server.go

+// non-blocking manner
+func (s *Server) shutdown() {
+	// stop accepting new consumers
+	s.managerCancel()


I think this should be a blocking call. What about something like s.manager.StopAcceptingConsumers() to push this behavior to the ConsumerManager?

ctrochalakis · 2018-09-27T10:32:04Z

server.go

+// shutdown closes current clients and also stops accepting new clients in a
+// non-blocking manner
+func (s *Server) shutdown() {
+	s.manager.StopAcceptingConsumers()


This is still racy (not trully blocking). As implemented the mutex protects the teardown variable, the map is not protected, so we still can end up with a new consumer when the map iteration takes place.

ctrochalakis

All the steps look logical, and having a server.shutdown() that orchestrates & documents the shutdown sequence is definitely a win. I believe the code is easier to follow now.

👍

This patch aims to minimize the downtime for producers during server upgrades and generally make the shutdown process more robust. The flow is now the following: 1) shutdown signal is received 2) stop accepting new consumers and close clients with at least 1 consumer 3) stop accepting new producers and close rest of the clients 4) shutdown and restart This way, producer downtime is reduced to the time elapsed between (3) and (4), which should be less than a second. Nevertheless, clients should still have sane retry defaults configured anyway, because some downtime will still occur. This patch also changes the client IDs (Client.ID) to be unique per-client. These are internal and should not affect the consumer logic, since we introduce Client.consID and we use that for consumer IDs.

agis requested a review from ctrochalakis September 17, 2018 11:38

agis force-pushed the shutdown-revamp branch 4 times, most recently from 07c2b8c to dd8ab53 Compare September 21, 2018 12:28

agis force-pushed the shutdown-revamp branch from 46c3f8f to fa27ed6 Compare September 24, 2018 13:04

ctrochalakis reviewed Sep 27, 2018

View reviewed changes

agis force-pushed the shutdown-revamp branch 2 times, most recently from e9e5663 to 2b414af Compare September 27, 2018 10:00

ctrochalakis reviewed Sep 27, 2018

View reviewed changes

agis force-pushed the shutdown-revamp branch 2 times, most recently from bc5789c to ea337b1 Compare September 27, 2018 11:54

ctrochalakis approved these changes Sep 27, 2018

View reviewed changes

agis added 4 commits September 27, 2018 15:27

Ignore core files

d80c008

Only mention run-time dependencies on readme

271020a

test: Actually retry connecting if server is down

5e402e1

agis force-pushed the shutdown-revamp branch from 424ac6a to 5e402e1 Compare September 27, 2018 12:28

agis merged commit 5cc8235 into master Sep 27, 2018

agis added a commit that referenced this pull request Sep 27, 2018

Update changelog for #68

a990e34

agis deleted the shutdown-revamp branch May 15, 2019 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutdown revamp #68

Shutdown revamp #68

agis commented Sep 17, 2018 •

edited

ctrochalakis Sep 27, 2018

agis Sep 27, 2018

ctrochalakis Sep 27, 2018

agis Sep 27, 2018

ctrochalakis left a comment

Shutdown revamp #68

Shutdown revamp #68

Conversation

agis commented Sep 17, 2018 • edited

ctrochalakis Sep 27, 2018

Choose a reason for hiding this comment

agis Sep 27, 2018

Choose a reason for hiding this comment

ctrochalakis Sep 27, 2018

Choose a reason for hiding this comment

agis Sep 27, 2018

Choose a reason for hiding this comment

ctrochalakis left a comment

Choose a reason for hiding this comment

agis commented Sep 17, 2018 •

edited