Original bug ID: 5421
I have an OCaml process that is leaking fds in multiples of 6. In the logs I have:
Unix.Unix_error(ENOMEM, "fork", "")
Looking at otherlibs/unix/unix.ml in the source distribution, this corresponds to a call to open_process_full (line 882).
let open_process_full cmd env =
open_process_full in turn calls open_proc_full, which calls fork() (line 865). Matching up the timestamps with an strace:
19188 12:39:29 pipe([223, 224]) = 0
clone() is failing in one of the dozen or so ways documented in 'man 2 clone'. This exception propagates out of open_process_full, leaving the six fds from the three pipe() calls hanging around. As far as I can tell these fds are not cleaned up by any garbage collection process.
fds 223-228 are still open a few hours later. Soon the process will hit ulimit -n.
The text was updated successfully, but these errors were encountered:
Comment author: hw
From 'man 2 clone':
ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller’s context that need to be copied.
I'm not sure how clone() is implemented, but I've consulted a Unix veteran and he says the call can attempt to allocate up to the same amount of memory as the parent process. In my case the process has 645 MB resident, and there is ~ 64 MB unused on the machine. So the situation is bad but not terrible ;-) In my uninformed opinion a failure of clone() doesn't necessarily indicate your process is shafted beyond repair, so it would be nice if the error was recoverable.
So the fd leak. I'm not an expert in Unix programming and I understand forking is a particularly tricky matter. There are probably conventions in the standard library, but my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail. NB pipe() can also fail (e.g. with EMFILE/ENFILE). Unless of course I'm missing some other mechanism that is cleaning up these fds (but not in the case of my program).
Comment author: till
Fork/Exec is indeed very tricky and should be done in C (you just need a level of low-level control on the forked side that ocaml cannot provide).
Comment author: @protz
You patch is pretty involved and it's going to require a lot of work to make this happen (if we want to, which I'm in no position to declare :)). I suggest we try to fix this issue in a simpler way.
Hw: you wrote « my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail ». Could you possibly write a patch to the Unix module that fixes this? You'd be in a good position to make sure the patch is correct (I could write but I don't have your setup and I'm unsure how to trigger ENOMEM without bringing my work machine down to its knees :)).
I think it's a matter of simply wrapping the call to open_proc_full in a try..with block, cleaning up the file descriptors, and then re-raising the exception if there was a failure. There may be other places in the code of the Unix module that want similar cleanups.
Comment author: @protz
The patch looks fine. The only concern I have is that you're sometimes using a mix of close_in, close_out, close. In some places (but not all) where you could use close_in or close_out, you chose to use close, which indeed makes things easier. Reading the code, it turns close_in does close the underlying fd, but is there any strong reason why you're not using close everywhere?
Apart from that, no concerns :).