Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easily reproducible leaking of file descriptors, show stopper. #208

Closed
fxfactorial opened this issue Jan 11, 2016 · 6 comments · Fixed by #258
Closed

Easily reproducible leaking of file descriptors, show stopper. #208

fxfactorial opened this issue Jan 11, 2016 · 6 comments · Fixed by #258
Labels

Comments

@fxfactorial
Copy link

I have boiled down my usage to this code sample which will leak file descriptors.

say you have:

#require "lwt.unix"

open Lwt.Infix

let echo ic oc = Lwt_io.(write_chars oc (read_chars ic))

let program =
  let server_address = Unix.(ADDR_INET (inet_addr_loopback, 2000)) in

  let other_addr = Unix.(ADDR_INET (inet_addr_loopback, 2001)) in

  let server = Lwt_io.establish_server server_address begin fun (tcp_ic, tcp_oc) ->
      Lwt_io.with_connection other_addr begin fun (nc_ic, nc_oc) ->

        Lwt_io.printl "Created connection" >>= fun () ->
        echo tcp_ic nc_oc <&> echo nc_ic tcp_oc >>= fun () ->
        Lwt_io.printl "finished"

      end
      |> Lwt.ignore_result

    end
  in
  fst (Lwt.wait ())

let () =
  Lwt_main.run program

and then you create a simple server with:

nc -l 2001

and then let's start up the OCaml code with

utop example.ml

and then open up a client

nc localhost 2000
blah blah
^c

Then looking at the connections for port 2000 using lsof, we see

ocamlrun 71109 Edgar    6u  IPv4 0x7ff3e309cb80aead      0t0  TCP 127.0.0.1:callbook (LISTEN)
ocamlrun 71109 Edgar    7u  IPv4 0x7ff3e309c9dc8ead      0t0  TCP 127.0.0.1:callbook->127.0.0.1:54872 (CLOSE_WAIT)

In fact for each usage of nc localhost 2000, we'll get a leftover CLOSE_WAIT record from the lsof usage.

Eventually this will lead to the system running out of file descriptors, which will MOST annoyingly not crash the program, but will lead to Lwt to just hang.

I can't tell if I am doing something wrong or if this is a genuine bug, in any case this is a serious bug for me and I run out of file descriptors within 10 hours...

EDIT: It seems to me that the problem is that one side of the connection is closed but the other isn't, I would have thought that with_connection should cleanup/close up whenever either side closes.

EDIT II: I have tried every which way where I manually close the descriptors with Lwt_io.close, but I still have the CLOSE_WAIT message.

EDIT III: Even used Lwt_unix.close on a raw fd given to with_connection's optional fd argument with similar bad results.

EDIT IV: Most insidious is if I use Lwt_daemon.daemonize, then this problem seemingly goes away

@artemkin
Copy link

There is a simplified code snippet to reproduce the issue:

#require "lwt.unix"

let program =
  let server_address = Unix.(ADDR_INET (inet_addr_loopback, 2000)) in

  let _server = Lwt_io.establish_server server_address begin fun (ic, oc) ->
    (* Lwt_io.close oc |> Lwt.ignore_result; *) ()
  end
  in
  fst (Lwt.wait ())

let () =
  Lwt_main.run program

Run nc localhost 2000 several times to get connections in CLOSE_WAIT state. Uncomment the code to fix the issue.

The question is why establish_server doesn't close it automatically? Is this a design issue?

@aantron
Copy link
Collaborator

aantron commented Apr 28, 2016

I've run into this before as well. I've always done this as a result:

Lwt_io.establish_server address (fun (ic, oc) ->
  Lwt.async (fun () ->
    (* ...close the channels after doing something... *)))

I think it's an issue of the generality of the interface. Someone may want to let the channels escape the function, or to handle requests synchronously. So, establish_server corresponds to open_connection, and not with_connection.

A version of establish_server with behavior like with_connection sounds like something desirable, though. Would appreciate any more comments.

@aantron
Copy link
Collaborator

aantron commented Apr 28, 2016

This could also be addressed with better documentation. I'll add a commit later, but let me know if you have arguments for a new function.

aantron added a commit that referenced this issue Apr 29, 2016
@aantron
Copy link
Collaborator

aantron commented Apr 29, 2016

I suppose there is also the issue of what this hypothetical convenience function should do with exceptions raised by close or f. It seems that handling these exceptions correctly is important for a server, much more so than a typical client.

Lwt can't decide for the application how to log or recover, so probably it would have to treat them the way Lwt.async does. That would make the convenience function much less useful than with_connection. It would probably be more of a toy for experiments than something you'd want to use in robust software. Would it still be worth it?

@aantron aantron added the docs label May 17, 2016
@aantron
Copy link
Collaborator

aantron commented May 23, 2016

Also related: #196.

@aantron
Copy link
Collaborator

aantron commented Jun 23, 2016

EDIT: It seems to me that the problem is that one side of the connection is closed but the other isn't, I would have thought that with_connection should cleanup/close up whenever either side closes.

EDIT II: I have tried every which way where I manually close the descriptors with Lwt_io.close, but I still have the CLOSE_WAIT message.

EDIT III: Even used Lwt_unix.close on a raw fd given to with_connection's optional fd argument with similar bad results.

I believe these symptoms are due to not trying to close tcp_ic, tcp_oc, as opposed to nc_ic, nc_oc. The latter two weren't causing the problem, the former two were. In other words, with_connection is fine, it's establish_server that is unintuitive. Of course, see #258.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants