Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caml_ml_open_descriptor_out memory leak and file corrupt at_exit #12300

Closed
edwintorok opened this issue Jun 12, 2023 · 9 comments
Closed

caml_ml_open_descriptor_out memory leak and file corrupt at_exit #12300

edwintorok opened this issue Jun 12, 2023 · 9 comments
Assignees

Comments

@edwintorok
Copy link
Contributor

edwintorok commented Jun 12, 2023

Self-contained testcase reproduced on OCaml 4.14.1 (but should also reproduce on 4.08.1):

let ignored = ref 0
let ignore_exn f = try f () with Out_of_memory as e -> raise e | _ -> incr ignored

let wrap ~finally f =
  match f () with
  | () ->
      finally ()
  | exception e ->
      let bt = Printexc.get_raw_backtrace () in
      ignore_exn finally;
      Printexc.raise_with_backtrace e bt

let leaktest fd =
  (* not allowed to close 'fd' here, because it is not the channel that opened it *)
  let ch = Unix.out_channel_of_descr fd in
  wrap
    ~finally:(fun () -> flush ch )
      (* flush can raise here, which leaves unflushed data, causing a leak *)
    (fun () -> output_string ch "socketpairchannel")

let doleak = true

let consume = Thread.create @@ fun fd ->
  let b = Bytes.make 65535 ' ' in
  while Unix.read fd b 0 (Bytes.length b) > 0 do
    ()
  done;
  Unix.close fd

let () =
  let fdsend, fdrecv = Unix.socketpair Unix.PF_UNIX Unix.SOCK_STREAM 0 in
  let (_:Sys.signal_behavior) = Sys.signal Sys.sigpipe Sys.Signal_ignore in
  if doleak then
    Unix.close fdrecv
  else
    consume fdrecv |> ignore;
  (* this will make sending immediately raise *)
  for _ = 1 to 10000 do
    let () = ignore_exn @@ fun () -> leaktest fdsend in
    ()
  done ;
  Unix.close fdsend ;
  (* trigger the corruption bug at exit, now reuse these FD numbers for something completely different *)
  let test_open i =
    Unix.openfile
      (Printf.sprintf "leaktest%d.txt" i)
      [Unix.O_RDWR; Unix.O_TRUNC; Unix.O_CREAT]
      0o755
  in
  let _fd1, _fd2 = test_open 1, test_open 2 in
  (* ok we leaked 2 file descriptors here, but the OCaml runtime shouldn't corrupt the data that is there by flushing another channel's data into it,
     and there might be other reasons why these FDs are open at exit, e.g. some other global channel/etc.
   *)
()
$ ocamlfind ocamlopt leak.ml -o leak -package unix,threads.posix -linkpkg -thread
$ (ulimit -v 123456; ./leak)
Fatal error: exception Out of memory
$ ./leak
$ ls -l leak*
ls -l leak*.txt
-rwxr-xr-x 1 edvint edvint      0 Jun 12 15:28 leaktest1.txt
-rwxr-xr-x 1 edvint edvint 170000 Jun 12 15:28 leaktest2.txt

There are 2 bugs here:

  • './leak' is using a lot of memory (when run under ulimit it gets an OOM exception),
  • when it doesn't OOM and finishes then 'leaktest2.txt' has some data written to it, but looking at the source code we never write data to that file (we just open it).

(Bear with me until the 'Analysis' section below on why I'm reporting 2 bugs in the same issue: they are caused by the same piece of code)

(You can also run the program without ulimit, increase the 'for' loop limit and watch the process eat gigabytes of memory in 'top')

Analysis

Each 'struct channel' allocated when opening a channel (from a fd) contains a 64k buffer:

#define IO_BUFFER_SIZE 65536
char buff[IO_BUFFER_SIZE]

I think the leak is caused by this code in the finalizer of the channels:

if (chan->max == NULL && chan->curr != chan->buff){
    /*
      This is an unclosed out channel (chan->max == NULL) with a
      non-empty buffer: keep it around so the OCaml [at_exit] function
      gets a chance to flush it.  We would want to simply flush the
      channel now, but (i) flushing can raise exceptions, and (ii) it
      is potentially a blocking operation.  Both are forbidden in a
      finalization function.

      Refs:
      http://caml.inria.fr/mantis/view.php?id=6902
      https://github.com/ocaml/ocaml/pull/210
    */
    if (chan->name && caml_runtime_warnings_active())
      fprintf(stderr,
              "[ocaml] (moreover, it has unflushed data)\n"
              );

This is a 64KiB/channel leak, which can happen e.g. if the underlying file descriptor returns an error on flush,
such that flush never succeeds (e.g. it is a socket and the other end has closed the connection).
This can be particularly problematic for a long running daemon using sockets (e.g. a webserver).

The comment says that the reason the channel is kept alive is such that the data may be flushed at exit.
However by that time the underlying file descriptor may have been closed if the channel was constructed by the Unix module for example, and the 'fd' is not owned by the channel itself.
In fact it could be worse than closed: some unrelated file descriptor may have reused its number by the time the process exits, at which point it will flush one channel's data into some other file descriptor's file.... (potentially corrupting whatever data was meant to be in that other file).
Thus this is not merely an OOM bug, it is also a correctness bug.

Suggested fix

  • Remove this 'flush on exit' behaviour for channels which got created by the Unix module: the channel does not have full control of the file descriptor and it may have been closed. Could potentially cause unexpected behaviour for applications that rely on this (unsafe) behaviour

  • Remove this behaviour for channels that are based on Unix sockets (whether the channel fully manages the file descriptor or not): retaining it to be flushed on exit may not work

  • Change 'caml_ml_flush' to set a flag such that the finalizer may discard the channel if flushing was attempted but failed.

The latter would fix the immediate bug reported here, although it might still leave the possibility open that if someone forgets to call 'flush' then 'at_exit' the runtime may write data into the wrong file...

I'm happy to help with creating a patch for these, but would be happy to hear thoughts on how else this could be fixed.

Background

We've been trying to track a leak in a long running OCaml daemon for the past 3 months that only happened in a production environment, and caused that particular OCaml daemon to end up using gigabytes of memory after a while.
Nothing suspicious as far as OCaml heap usage goes, but using 'jemalloc' we got some statistics from the C side that showed that caml_ml_open_descriptor_out -> caml_open_descriptor_in -> caml_stat_alloc was leaking about 800 MiB+ of memory.

The caller was this OCaml code:

let with_out_channel_output fd f =
  let oc = Unix.out_channel_of_descr fd in
  finally
    (fun () ->
      let output = Xmlm.make_output (`Channel oc) in
      f output
    )
    (fun () -> flush oc)

We looked at this code and concluded that because we always call flush then the finalizer should always free the memory and the above codepath should not be entered. The breakthrough came this morning when we realized that these are sockets, and that flushing may indeed fail if the other end has already closed the connection, and with that information I've been able to write the above testcase that immediately reproduced the problem.

There really isn't much this code could do better: it cannot close the channel because that would close the underlying Unix file descriptor which this function does not own (it got passed as a parameter), and there isn't anything other than 'flush' it could call to 'clear' the pending data buffer AFAIK either.

We could refactor the code to use channels instead of Unix file descriptors, such that this function gets a channel passed in (that it can 'close_noerr' on error), but the OCaml runtime leak should still be fixed.

Looking closer at the reason for the OCaml runtime leak (attempting to flush the channel 'at_exit') that is unsafe in general: the underlying Unix file descriptor could've been closed and reused by something else at which point 'at_exit' will "corrupt" this other file by writing some other channel's data into it.

@xavierleroy
Copy link
Contributor

You raise interesting points. However, there's one thing I don't understand in your with_out_channel_output example: since oc is always flushed before becoming garbage, the finalizer should free the buffer on the spot, shouldn't it?

@edwintorok
Copy link
Contributor Author

edwintorok commented Jun 12, 2023

You raise interesting points. However, there's one thing I don't understand in your with_out_channel_output example: since oc is always flushed before becoming garbage, the finalizer should free the buffer on the spot, shouldn't it?

Well that is what we've been thinking for the past 3 months, until we realized that the 'oc' points to a socket, and the other end of that socket has closed the connection, so all these flushes were raising exceptions and never finishing, so the finalizer always sees unflushed data :)

@edwintorok
Copy link
Contributor Author

edwintorok commented Jun 12, 2023

Meanwhile I thought of another way of working this around from OCaml: 'Unix.dup fd' and another wrap that calls 'close_out_noerr' (cannot use close, because that would always call flush which fails), this way the 'channel' would "own" the (now duplicated file descriptor) and we can safely close the channel. And that avoids the leak. Would still be good to have a solution for this in the runtime though.

@xavierleroy
Copy link
Contributor

xavierleroy commented Jun 12, 2023

Right, I see better now, thanks for all the explanations. I was about to suggest dup + close_out, but your idea of dup + close_out_noerr is even better to handle (as gracefully as possible) the case where the other side closes the network connection early.

Concerning the memory leak and FD capture you report, I agree we need to reconsider some of the current design choices.

@xavierleroy
Copy link
Contributor

See also #9786 for an interesting discussion of a somewhat related issue.

@xavierleroy
Copy link
Contributor

My current thoughts on this issue is that, currently, channels returned by {in,out}_channel_of_descr very much want to take ownership of the underlying file descriptor, and strange things happen if the FD is used directly afterwards. #9786 focused on the case where the same FD is used for an in_channel and for an out_channel, but the present report shows another problematic case where the FD is closed, then reused to point to a different file, and a flush that was delayed ends up writing on the wrong FD.

For this reason, I think dup + close_out_noerr is currently the best (maybe the only) solution to the issue described here.

Some ideas for future improvements:

  1. A function in Unix to explicitly detach the file descriptor from a channel.
  2. A flush_noerr function that not only ignores errors, but also discard the buffered data in case of an error, so that the out_channel can be garbage collected later.
  3. flush itself could decide to discard the buffered data if it gets a non-transient, non-recoverable error such as EBADF (bad descriptor) or EPIPE (nobody is here to read). Keeping the data is meaningful only if it makes sense to retry flushing later, i.e. for transient errors like ENOSPC (no space left on device).

@lindig
Copy link

lindig commented Jun 15, 2023

Is this problem rooted in the broken duality between FDs and channels? FDs are R/W while channels are explicitly read or write. This forces (for sockets or anything R/W) to use two channels that now have a complicated relationship with the underlying FD and each other. Could a future improvement be a new channel type (module) that supports R/W and has a 1:1 relationship with the underlying FD? That could mean fading out the current channels over the long run.

@edwintorok
Copy link
Contributor Author

3. flush itself could decide to discard the buffered data if it gets a non-transient, non-recoverable error such as EBADF (bad descriptor) or EPIPE (nobody is here to read). Keeping the data is meaningful only if it makes sense to retry flushing later, i.e. for transient errors like ENOSPC (no space left on device).

I think this is entirely backwards compatible, and should fix most of the memory leaks that I'm seeing.

About in/out channels I've attempted to write wrapper like this in the past (but there are too many ways to "escape" this unless I wrap the entire Unix module with a safer interface), but it will cause the original application to use more file descriptors which can be a problem if you're already close to the 1024 limit impose by select (which is a separate problem that we're solving by using epoll):


let with_ic fd =
  (* A file descriptor cannot be safely shared between an [in] and [out] channel.
   * Unix.open_connection does this but if you close both channels you get EBADF.
   *)
  Safe.within
  @@ Safe.create ~release:close_in_noerr (Unix.in_channel_of_descr (Unix.dup fd))

let with_oc fd =
  Safe.within
  @@ Safe.create ~release:close_out_noerr
       (Unix.out_channel_of_descr (Unix.dup fd))

let with_channels t f =
  let fd = !t in
  with_ic fd @@ fun ic ->
  with_oc fd @@ fun oc ->
  (* the channels are using [dup]-ed FDs, close original now *)
  Safe.safe_release t ;
  f Safe.(borrow_exn ic, borrow_exn oc)

If you're interested here is my attempt to write a library with a kind of borrow semantics from a while ago, but it is not quite as easy to use as I'd like (and because there is an escape hatch for getting the raw Unix fd it is also easy to misuse), and trying to adopt it in a large codebase would result in a lot of perhaps unnecessary churn and incompatibility with external libraries:
https://github.com/xapi-project/xen-api/blob/master/ocaml/libs/resources/unixfd.mli https://github.com/xapi-project/xen-api/blob/master/ocaml/libs/resources/safe.mli

@xavierleroy
Copy link
Contributor

Is this problem rooted in the broken duality between FDs and channels? FDs are R/W while channels are explicitly read or write. This forces (for sockets or anything R/W) to use two channels that now have a complicated relationship with the underlying FD and each other.

#9786 is related to this "broken duality", but not the present PR, it's just about output channels.

Could a future improvement be a new channel type (module) that supports R/W and has a 1:1 relationship with the underlying FD? That could mean fading out the current channels over the long run.

The problem with RW channels is the transitions between R and W. Reading after writing is OK, just have to flush the output buffer so that it can act as input buffer. But writing after reading means discarding the data that was read ahead. That's wrong if you're working with a socket.

I'd much prefer to keep in_channel and out_channel distinct types and suggest using dup to get two unidirectional FDs on a socket, just like on a pipe.

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jun 28, 2023
NickBarnes pushed a commit to NickBarnes/ocaml that referenced this issue Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants