Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide "close on exec" in Lwt_unix and "close all fds" in Lwt_process #872

Open
vog opened this issue Jul 24, 2021 · 3 comments
Open

Provide "close on exec" in Lwt_unix and "close all fds" in Lwt_process #872

vog opened this issue Jul 24, 2021 · 3 comments

Comments

@vog
Copy link
Contributor

vog commented Jul 24, 2021

I strongly recommend to reopen issue #327 and to consider #847, as I can report that we just ran into issues related to open file descriptors in our company.

The following report is meant to answer the question in #327 whether the missing "close on exec" feature has real-world implications on "normal" Lwt programs, and hopefully it will become clear that the answer is: yes, it does!

In a gross simplification of our real-world situation, only two basic Lwt functions actually play a role:

  • The Lwt program opens a socket to listen on some port using simply Lwt_io.establish_server_with_client_socket.
  • It also starts a long-running sub process with which it communicates through stdin/stdout, using simply Lwt_process.with_process_full.
  • Some simple error handling / timeout mechanism inside the Lwt program is in place, so that connection is reopened in case it dies, and the process is restarted in case it terminates or is killed.

A simple Lwt program like that should not run into any subtle issues, should it? And indeed, it does run flawlessly most of the time.

But it's fragile! One unlucky hiccup, and it breaks down.

Here's our real-life situation, but you can easily think of other situations where this will break down in a similar way:

  • Both Lwt threads are started at the beginning, and as it happens, the socket is opened before the sub process is started
    • We can, without loss of generality, always assume this to be the case. Even if they started the other way around, the sub process might have been terminated or killed over time, being restarted, and then at this later point in time we do have the situation that the sub process was started while the socket is open.
  • Since the socket was started without "close on exec", and since Lwt_process.with_process_full didn't close it after forking, the sub process now also listens on the open socket, even though it is not aware of that fact, and hence won't do anything with that socket.
  • Now something goes wrong with the socket, or has been closed on purpose, and then needs to be reopened.
  • The socket is still held open by the sub process, and since it is a listening socket, it can't be reopened by the Lwt program.
  • The system adminstrator is puzzled as the Lwt program complains about the port being used by another process, and running e.g. ss -lnp4 tells them that some other process (the sub process) is listening on that port, even though that one is not supposed to do anything over the network.

Of course, strange Unix defaults around fork() are to blame here in the first place. But Unix does provide solutions for that, more precisely there are two common ways to deal with that:

  • Open the socket with cloexec (close_on_exec).
  • Close all open file descriptors for the sub process.

These solutions are pretty much the standard solutions, see e.g. the Fedora docs / Preventing file descriptor leaks to child processes. I don't want to argue in favour of the one or the other solution. The point is:

Lwt supports neither of them!

I believe Lwt should support both, so developers can use the best solution for their situation. This means:

  • (1) cloexec (close_on_exec) should be supported everywhere. At the very least it should be supported by Lwt_io.establish_server_with_client_socket and the low-level socket functions.
  • (2) Lwt_process.with_process_full, and all other Lwt_process functions, should support an argument like ?close_all_fds:bool to close all file descriptors right after fork() in the child process.

Regarding (1), the cloexec support should be forwarded by Conduit and Cohttp, perhaps even being enabled by default there. That's of course outside the realm of the Lwt project. But we might consider opening tickets on those projects as soon as this is implemented in Lwt.

Regarding (2), it may be debatable whether to close the file descriptors 0, 1 and 2 (stdin/out/err) as well, but I believe we can just provide sane defaults here. For example, with_process_full and open_process_full should clearly close all file descriptors whatsoever, except of course our pipes to parent. In other with_process_* and open_process_* variants we should close e.g. everything except fd 1 and 2 (stdout/err) if the sub process is clearly meant to be using the Lwt program's stdout and stderr.

Also regarding (2), please note that there might be a slight performance hit running through all fds. However, this is mostly an issue with short-lived sub processes, and those aren't a big deal for file descriptor leaks anyway. The close_all_fds flag is mostly meant to be enabled for long running sub processes, and for those the slightly increased setup costs may be worth it. Also, when used in combination with (1), just to play safe, the list of unwanted open fds should be empty anyway, so in that case we'll iterate only over the list of accidentally still open fds which will be very small and ideally empty.

@vog vog changed the title Provide "close on exec" Provide "close on exec" in Lwt_unix and "close all fds" in Lwt_process Jul 24, 2021
@raphael-proust
Copy link
Collaborator

Thanks for the detailed explanation. I find the arguments compeling and I'll make and MR to provide the necessary additions and changes.

@vog
Copy link
Contributor Author

vog commented Aug 20, 2021

Has there be any progress on this? Is there anything I can do to assist you?

@raphael-proust
Copy link
Collaborator

Not yet. I tend to work on Lwt in a spiky manner: several contributions in a week and then nothing for a couple of months. I will happily review a PR if there is one open. But I cannot offer an ETA.

MisterDA added a commit to MisterDA/lwt that referenced this issue Nov 9, 2021
In the Lwt_unix module, add `?cloexec:bool` optional arguments to
functions that create file descriptors (`dup`, `dup2`, `pipe`,
`pipe_in`, `pipe_out`, `socket`, `socketpair`, `accept`, `accept_n`).
The `?cloexec` argument is simply forwarded to the wrapped Unix
function (with OCaml >= 4.05, see [ocaml/ocaml#650][650]), or emulated
as best-effort with `Unix.set_close_on_exec` on older OCaml versions.

Fix ocsigen#327. Fix ocsigen#847. See also ocsigen#872.

[650]: ocaml/ocaml#650
smorimoto pushed a commit to raphael-proust/lwt that referenced this issue Dec 2, 2021
In the Lwt_unix module, add `?cloexec:bool` optional arguments to
functions that create file descriptors (`dup`, `dup2`, `pipe`,
`pipe_in`, `pipe_out`, `socket`, `socketpair`, `accept`, `accept_n`).
The `?cloexec` argument is simply forwarded to the wrapped Unix
function (with OCaml >= 4.05, see [ocaml/ocaml#650][650]), or emulated
as best-effort with `Unix.set_close_on_exec` on older OCaml versions.

Fix ocsigen#327. Fix ocsigen#847. See also ocsigen#872.

[650]: ocaml/ocaml#650
MisterDA added a commit to MisterDA/lwt that referenced this issue Dec 6, 2021
MisterDA added a commit to MisterDA/lwt that referenced this issue Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants