Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown revamp #68

Merged
merged 4 commits into from
Sep 27, 2018
Merged

Shutdown revamp #68

merged 4 commits into from
Sep 27, 2018

Conversation

agis
Copy link
Contributor

@agis agis commented Sep 17, 2018

This patch aims to minimize the downtime for producers and make the
shutdown process generally more robust. The flow is now the following:

  1. shutdown signal is received
  2. manager: stop accepting new consumers and close existing consumers
  3. server: stop accepting new producers and close existing producers
  4. shutdown and restart

This way, producer downtime is reduced to the time elapsed between (3)
and (5), which should be less than a second. Nevertheless, clients
should still have sane retry defaults configured anyway, because some
downtime will still occur.

Also added a hard timeout of 5 seconds around the shutdown process.

// non-blocking manner
func (s *Server) shutdown() {
// stop accepting new consumers
s.managerCancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a blocking call. What about something like s.manager.StopAcceptingConsumers() to push this behavior to the ConsumerManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@agis agis force-pushed the shutdown-revamp branch 2 times, most recently from e9e5663 to 2b414af Compare September 27, 2018 10:00
// shutdown closes current clients and also stops accepting new clients in a
// non-blocking manner
func (s *Server) shutdown() {
s.manager.StopAcceptingConsumers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still racy (not trully blocking). As implemented the mutex protects the teardown variable, the map is not protected, so we still can end up with a new consumer when the map iteration takes place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@agis agis force-pushed the shutdown-revamp branch 2 times, most recently from bc5789c to ea337b1 Compare September 27, 2018 11:54
Copy link
Member

@ctrochalakis ctrochalakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the steps look logical, and having a server.shutdown() that orchestrates & documents the shutdown sequence is definitely a win. I believe the code is easier to follow now.

👍

This patch aims to minimize the downtime for producers during server
upgrades and generally make the shutdown process more robust.

The flow is now the following:

1) shutdown signal is received
2) stop accepting new consumers and close clients with at least 1
consumer
3) stop accepting new producers and close rest of the clients
4) shutdown and restart

This way, producer downtime is reduced to the time elapsed between (3)
and (4), which should be less than a second. Nevertheless, clients
should still have sane retry defaults configured anyway, because some
downtime will still occur.

This patch also changes the client IDs (Client.ID) to be unique per-client.
These are internal and should not affect the consumer logic, since we
introduce Client.consID and we use that for consumer IDs.
@agis agis merged commit 5cc8235 into master Sep 27, 2018
agis added a commit that referenced this pull request Sep 27, 2018
@agis agis deleted the shutdown-revamp branch May 15, 2019 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants