Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to abort VMM when parent process dies #4172

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jseba
Copy link

@jseba jseba commented Oct 12, 2023

Changes

This adds a flag to set the parent death signal (SIGUSR2 in this instance) that the process will receive when its parent process exits before the VMM does. Receipt of this signal will cause the VMM to abruptly abort, much like the SIGILL signal. While a graceful shutdown would be preferable, since the parent process may have been controlling outside resources for Firecracker (disks, networking, etc.), it's indeterminate whether or not it is safe to continue running the VM.

Reason

If Firecracker is being monitored by a parent process that unexpectedly terminates, it will be abandoned up the process tree, likely to a process that doesn't know what do with it (such as init). This becomes even trickier if the process was running in a mount namespace that was controlled by the parent process, as the API socket is now inaccessible.

If the parent process was also keeping handles on other resources used by the Firecracker VMM, these could be re-used by new processes and cause conflicts with the now orphaned Firecracker.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@bchalios
Copy link
Contributor

Hi @jseba,

Thanks for the contribution. I have some questions to understand a bit more the context:

You send SIGUSR2 to let Firecracker know that its parent process died. Please, correct me if I'm wrong, but this is not a standard mechanism, i.e. processes do not receive SIGUSR2 when their parent process dies. So, why not just send SIGINT to kill the Firecracker process?

@jseba
Copy link
Author

jseba commented Oct 17, 2023

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

@bchalios
Copy link
Contributor

Thanks for the explanation @jseba.

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death

Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?

in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?

@jseba
Copy link
Author

jseba commented Oct 25, 2023

Thanks for the explanation @jseba.

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death

Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?

The log message was the only thing I could think of off hand behavior wise, but my thought process was that if the parent process is still alive and firecracker gets SIGTERM, even from another process, the parent process will still be running to get the result from waitpid, so firecracker can gracefully shut itself down. If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

If Firecracker is being monitored by a parent process that unexpectedly
terminates, it will be abandoned up the process tree, likely to
a process that doesn't know what do with it (such as init). This becomes
even trickier if the process was running in a mount namespace that was
controlled by the parent process, as the API socket is now inaccessible.

If the parent process was also keeping handles on other resources used by the
Firecracker VMM, these could be re-used by new processes and cause
conflicts with the now orphaned Firecracker.

This adds a flag to set the parent death signal (SIGUSR2 in this
instance) that the process will receive when its parent process exits
before the VMM does. Receipt of this signal will cause the VMM to
abruptly abort, much like the SIGILL signal. While a graceful shutdown
would be preferable, since the parent process may have been controlling
outside resources for Firecracker (disks, networking, etc.), it's
indeterminate whether or not it is safe to continue running the VM.

Signed-off-by: Josh Seba <jseba@cloudflare.com>
@jseba
Copy link
Author

jseba commented Nov 8, 2023

I fixed the merge conflict and rebased back onto the tip of main

@bchalios
Copy link
Contributor

bchalios commented Nov 9, 2023

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?

If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?

@jseba
Copy link
Author

jseba commented Dec 12, 2023

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?

Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.

If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?

The most obvious causes would be either a crash or an OOM kill from the kernel. Either way, the process terminates immediately with no chance for graceful cleanup. In our case, one of things being managed is the network namespace, device and address used for the Firecracker VM. If the parent process dies, the networking configuration used for the VM will be released to be used by another VM while the now-abandoned one continues running. This means another VM can be started on the host with the network configuration that the VM was using, meaning the abandoned VM will no longer have network connectivity. It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.

@bchalios
Copy link
Contributor

Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.

Thanks, I didn't know about this mechanism :)

Either way, the process terminates immediately with no chance for graceful cleanup

The graceful clean-up in this case just logging about the fact we are shutting down due to the parent process dying, other than that it's completely equivalent to just sending a SIGTERM to Firecracker.

It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.

That is true, but my point is that I don't think we need to change Firecracker itself to achieve this. In fact, I don't think we should change Firecracker at all since there's nothing that the Firecracker process needs to do other than shutting down.

What if, for example, you called prctl after you fork from the current process but before exec'ing in the Firecracker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants