Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCaml Unix.select bug impacts Lwt_engine.select #529

Open
gabelevi opened this issue Dec 27, 2017 · 4 comments
Open

OCaml Unix.select bug impacts Lwt_engine.select #529

gabelevi opened this issue Dec 27, 2017 · 4 comments
Milestone

Comments

@gabelevi
Copy link
Contributor

OCaml's Unix.select takes three list of fds. If any fd in any list is < 0 or > FD_SETSIZE then it will raise Unix.Unix_error(EINVAL, "select", ""). This is probably not quite the right behavior, and I've reported it here: https://caml.inria.fr/mantis/view.php?id=7700

Lwt_engine.select_based explicitly handles EBADF, but doesn't handle EINVAL.. Ideally, OCaml will be fixed to only throw EBADF for bad fds, but in the meantime it might make sense to catch EINVAL and filter the fds.

I found this issue via a stack trace. I'm using OCaml 4.05 and Lwt 3.1.0.

Unhandled exception: Unix.Unix_error(Unix.EINVAL, "select", "")
Raised by primitive operation at file "src/unix/lwt_engine.ml", line 410, characters 26-60
Called from file "src/unix/lwt_engine.ml", line 351, characters 8-19
Called from file "src/unix/lwt_main.ml", line 40, characters 4-78
Called from file "flow/src/flow.ml", line 81, characters 6-34
Called from file "flow/src/flow.ml", line 117, characters 4-21
@gabelevi
Copy link
Contributor Author

I'm guessing that my particular issue is just having a more than 1024 fds open, and passing through an fd > 1024 (FD_SETSIZE). I'm guessing this is the most common reason anyone will see Unix.Unix_error(Unix.EINVAL, "select", ""), even though I think that is a non-standard use of EINVAL

I think I'm going to try something like this in the meantime, which should make it clear in my logs what's going on.

class my_unix_select = object
  inherit Lwt_engine.select_based

  method private select fds_r fds_w timeout =
    let fds_r, fds_w, _ =
      try Unix.select fds_r fds_w [] timeout
      with
      | Unix.Unix_error (Unix.EINVAL, fn, params) -> begin
        (* Ok, so either one of the fds is an invalid fd, or maybe it's a valid fd but too large
         * for select *)
        begin try
          let explode_if_bad fd = Unix.fstat fd |> ignore in
          List.iter explode_if_bad fds_r;
          List.iter explode_if_bad fds_w
        with Unix.Unix_error (_, _, _) ->
          raise (Unix.Unix_error (Unix.EBADF, fn, params))
        end;
        (* Oh boy. So it looks like all the fds are valid. This likely means that one fd is larger
         * than FD_SETSIZE (which is probably 1024). select() stops working for large fds like this
         *)
        let string_of_fd fd = string_of_int ((Obj.magic fd): int) in
        let string_of_fds fds = String.concat ";" (List.map string_of_fd fds) in
        let params = spf "[%s] [%s] []" (string_of_fds fds_r) (string_of_fds fds_w) in
        raise (Unix.Unix_error (Unix.EINVAL, "select", params))
      end
    in
    (fds_r, fds_w)
end

@aantron
Copy link
Collaborator

aantron commented Dec 27, 2017

Yes, this is most likely due to an fd over the limit for select. I do prefer EBADF to EINVAL, it seems more consistent with the (still ambiguous) verbiage here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html.

It might be useful to expose a comparison with FD_SETSIZE to OCaml to avoid all those fstat calls, on the other hand this is only done in an exceptional situation, and we already do them for EBADF anyway.

It's normally recommended to use Lwt with libev (opam install conf-libev lwt) to avoid this problem. Is that an option for you?

@aantron
Copy link
Collaborator

aantron commented Dec 27, 2017

We may want to merge code like your check into the select engine, to get nicer error messages. If the Mantis ticket results in Unix.select throwing EBADF instead, we'll have to run this check in both cases to support OCaml down to 4.02.

@gabelevi
Copy link
Contributor Author

gabelevi commented Jan 9, 2018

It's normally recommended to use Lwt with libev (opam install conf-libev lwt) to avoid this problem. Is that an option for you?

My project runs also runs on Windows, so probably can't use libev. For the moment, it's easier for me to try and avoid having too many fds. In the future, though, if I need more fds I may try and build a libuv lwt_engine. If I do, I'll definitely send a PR.

facebook-github-bot pushed a commit to facebook/flow that referenced this issue Jan 12, 2018
Summary:
This is tricky. There's a few complicating issues:

1. OCaml will raise EINVAL in a nonstandard manner. Reported it here: https://caml.inria.fr/mantis/view.php?id=7700
2. Lwt handles EBADF but not EINVAL. Reported it here: ocsigen/lwt#529
3. The `select()` systemcall doesn't work with fds >= 1024

The third point is the tricky one. We can repro it like so:

1. Start up a flow server on some large repo
2. `for i in `seq 2000`; do flowd status --quiet ~/repo > /dev/null & done`

This will create more than 1024 clients connected to the monitor, all
waiting for the server to finish initializing. At some point, one of
their file descriptors (which are ints) will be >= 1024 (FD_SETSIZE).

So ultimately we have two choices:

1. Continue using `select()` and enforce that we never have more than 1024 fds open in the monitor
2. Add libev as a system dependency and use that instead of `select ()`. And figure out what to do for Windows

Reviewed By: samwgoldman

Differential Revision: D6639011

fbshipit-source-id: d9e5f3c560b5a629971cc3d683da813904323ab3
@aantron aantron added this to the 4.2.2 milestone Apr 3, 2019
@aantron aantron modified the milestones: 4.2.2, libuv Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants