Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fd leak in Unix.ml? #5421

Closed
vicuna opened this issue Dec 12, 2011 · 9 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link

commented Dec 12, 2011

Original bug ID: 5421
Reporter: hw
Assigned to: @protz
Status: closed (set by @xavierleroy on 2013-08-31T10:44:25Z)
Resolution: fixed
Severity: major
Platform: x86_64
OS: CentOS
OS Version: 5.5
Version: 3.12.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: hw @protz

Bug description

I have an OCaml process that is leaking fds in multiples of 6. In the logs I have:

Unix.Unix_error(ENOMEM, "fork", "")
Raised by primitive operation at file "unix.ml", line 865, characters 8-14
Called from file "unix.ml", line 882, characters 2-141
[...]

Looking at otherlibs/unix/unix.ml in the source distribution, this corresponds to a call to open_process_full (line 882).

let open_process_full cmd env =
let (in_read, in_write) = pipe() in
let (out_read, out_write) = pipe() in
let (err_read, err_write) = pipe() in
let inchan = in_channel_of_descr in_read in
let outchan = out_channel_of_descr out_write in
let errchan = in_channel_of_descr err_read in
open_proc_full cmd env (Process_full(inchan, outchan, errchan))
out_read in_write err_write [in_read; out_write; err_read];
close out_read;
close in_write;
close err_write;
(inchan, outchan, errchan)

open_process_full in turn calls open_proc_full, which calls fork() (line 865). Matching up the timestamps with an strace:

19188 12:39:29 pipe([223, 224]) = 0
19188 12:39:29 pipe([225, 226]) = 0
19188 12:39:29 pipe([227, 228]) = 0
19188 12:39:29 futex(0x9b0174, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9b0170, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
19188 12:39:29 <... futex resumed> ) = 1
19188 12:39:29 lseek(223, 0, SEEK_CUR <unfinished ...>
19188 12:39:29 <... lseek resumed> ) = -1 ESPIPE (Illegal seek)
19188 12:39:29 futex(0x9b0174, FUTEX_WAIT_PRIVATE, 116925, NULL <unfinished ...>
19188 12:39:29 <... futex resumed> ) = 0
19188 12:39:29 futex(0x9b0140, FUTEX_WAKE_PRIVATE, 1) = 0
19188 12:39:29 lseek(226, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
19188 12:39:29 lseek(227, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
19188 12:39:29 fcntl(223, F_GETFD) = 0
19188 12:39:29 fcntl(223, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 fcntl(226, F_GETFD) = 0
19188 12:39:29 fcntl(226, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 fcntl(227, F_GETFD) = 0
19188 12:39:29 fcntl(227, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x447169d0) = -1 ENOMEM (Cannot allocate memory)

clone() is failing in one of the dozen or so ways documented in 'man 2 clone'. This exception propagates out of open_process_full, leaving the six fds from the three pipe() calls hanging around. As far as I can tell these fds are not cleaned up by any garbage collection process.

fds 223-228 are still open a few hours later. Soon the process will hit ulimit -n.

File attachments

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 12, 2011

Comment author: @protz

Hi,

So if I understand correctly, clone is failing because there's no memory left on the device. That's a pretty bad situation already. Can you suggest a way and/or a patch to handle this gracefully?

Thanks,

jonathan

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 12, 2011

Comment author: hw

From 'man 2 clone':

ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller’s context that need to be copied.

I'm not sure how clone() is implemented, but I've consulted a Unix veteran and he says the call can attempt to allocate up to the same amount of memory as the parent process. In my case the process has 645 MB resident, and there is ~ 64 MB unused on the machine. So the situation is bad but not terrible ;-) In my uninformed opinion a failure of clone() doesn't necessarily indicate your process is shafted beyond repair, so it would be nice if the error was recoverable.

So the fd leak. I'm not an expert in Unix programming and I understand forking is a particularly tricky matter. There are probably conventions in the standard library, but my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail. NB pipe() can also fail (e.g. with EMFILE/ENFILE). Unless of course I'm missing some other mechanism that is cleaning up these fds (but not in the case of my program).

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 12, 2011

Comment author: till

Fork/Exec is indeed very tricky and should be done in C (you just need a level of low-level control on the forked side that ocaml cannot provide).
This specific issue could probably be fixed easily but I believe that the fork/exec code in:
#5266
is resilient to this problem.

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 13, 2011

Comment author: @protz

Hi Till,

You patch is pretty involved and it's going to require a lot of work to make this happen (if we want to, which I'm in no position to declare :)). I suggest we try to fix this issue in a simpler way.

Hw: you wrote « my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail ». Could you possibly write a patch to the Unix module that fixes this? You'd be in a good position to make sure the patch is correct (I could write but I don't have your setup and I'm unsure how to trigger ENOMEM without bringing my work machine down to its knees :)).

I think it's a matter of simply wrapping the call to open_proc_full in a try..with block, cleaning up the file descriptors, and then re-raising the exception if there was a failure. There may be other places in the code of the Unix module that want similar cleanups.

Cheers,

jonathan

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 21, 2011

Comment author: till

I've uploaded a patch that should take care of the fd leak. It is also careful to handle the cases where pipe fails (e.g. running out of available fds.)

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 21, 2011

Comment author: @protz

The patch looks fine. The only concern I have is that you're sometimes using a mix of close_in, close_out, close. In some places (but not all) where you could use close_in or close_out, you chose to use close, which indeed makes things easier. Reading the code, it turns close_in does close the underlying fd, but is there any strong reason why you're not using close everywhere?

Apart from that, no concerns :).

Thanks,

jonathan

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 21, 2011

Comment author: @protz

After chatting with Damien, what you're doing (closing the descr, not the channel, as in open_process_full) is definitely safe.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 18, 2012

Comment author: @protz

Fixed in svn r12038

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 18, 2012

Comment author: hw

Nice one, thanks folks.

@vicuna vicuna closed this Aug 31, 2013

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.