Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tkt-38155: fix(middlewared/jail): handle start/stop gracefully using iocage #1591

Merged
merged 3 commits into from Jul 24, 2018

Conversation

william-gr
Copy link
Member

Ticket: #38155

@ghost ghost assigned william-gr Jul 24, 2018
@ghost ghost added the review label Jul 24, 2018
@bugclerk bugclerk changed the title fix(middlewared/jail): handle start/stop gracefully using iocage tkt-38155: fix(middlewared/jail): handle start/stop gracefully using iocage Jul 24, 2018
@william-gr william-gr removed the request for review from themylogin July 24, 2018 13:52
@william-gr william-gr added the WIP label Jul 24, 2018
Copy link
Contributor

@themylogin themylogin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to wait for jails to terminate properly?

Won't SIGTERM abort our stop_on_shutdown call?

@william-gr
Copy link
Member Author

william-gr commented Jul 24, 2018

@themylogin thats why its still WIP. Any ideas on how to fix that? I was thinking of cancel all tasks only after all terminate() calls?

@themylogin
Copy link
Contributor

I am not familiar with FreeBSD shutdown system. Will a blocking call last in rc.shutdown.local? Then we might just call middleware method (with reasonable timeout) instead of sending event. It would be great to delay middleware shutdown for this time and not to involve terminate here at all, it would be a pain for development when middleware restarts occur frequently.

@william-gr
Copy link
Member Author

william-gr commented Jul 24, 2018

@themylogin it would but I would rather not.

In your approach there is no reasonable timeout we can ever come up with, it needs to be up to the plugin.
Who knows how many VMs/jails there are to wait long enough?

I dont see the problem in involving terminate ? I am handling that pain using the event and the lock, whats the flaw in that?

It has not effect whatsoever in development since the shutdown event is not sent unless you actually shutdown.

@themylogin
Copy link
Contributor

I dont see the problem in involving terminate? I am handling that pain using the event and the lock, whats the flaw in that?

The problem I've imagined is rc sending SIGKILL to middlewared that spent too much time not responding to SIGTERM.

@william-gr
Copy link
Member Author

And why is that a problem? All services are the same way.

We would be just transferring responsibility, same happens on shutdown, rc.shutdown.local is just part of /etc/rc.d/local.

@themylogin
Copy link
Contributor

I see. My concern is that timeout that is reasonable for shutting down one process may not be enough for shutting down a jail (which may consist of several processes) or even several jails, no matter if shutdown is sequential or parallel. If we do it in middleware, we risk unclear middleware shutdown, if we do it in shutdown rc script, we risk everything below it, if we do it in separate rc script we still may fail to shutdown all jails in time. These are my worries if users would run something fragile like PostgreSQL in jails.

@william-gr
Copy link
Member Author

What timeout are you talking about? There is no timeout in waiting pid to die in rc subsystem.

@themylogin
Copy link
Contributor

I've noted I am not so familiar with it :-D no concerns then.
Does it also mean that something can easily block shutdown entirely?

@william-gr
Copy link
Member Author

Yes, in fact recently we had a problem where consul-alerts would block and not shutdown.

Copy link
Contributor

@themylogin themylogin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last concern: if IOC stop won't block, LGTM!

@william-gr william-gr removed the WIP label Jul 24, 2018
@william-gr william-gr merged commit 2fd3aac into master Jul 24, 2018
@ghost ghost removed the review label Jul 24, 2018
@william-gr william-gr deleted the issues/38155 branch July 24, 2018 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants