Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix.select fast path misses events if the list of fds is greater than FD_SETSIZE (typically 64) #7276

Closed
vicuna opened this Issue Jun 16, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

commented Jun 16, 2016

Original bug ID: 7276
Reporter: djs55
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-09-24T15:32:59Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.03.0
Fixed in version: 4.04.0 +dev / +beta1 / +beta2
Category: platform support (windows, cross-compilation, etc)
Related to: #5563
Monitored by: @gasche @hcarty

Bug description

Hi,

On Windows there's a problem with the Unix.select fast path used when all
the fds are sockets: the MSDN docs for select[1] say

The variable FD_SETSIZE determines the maximum number of descriptors in
a set. (The default value of FD_SETSIZE is 64, which can be modified by
defining FD_SETSIZE to another value before including Winsock2.h.)

With the default FD_SETSIZE of 64, if the sockets list is of length 65+
then later sockets will be silently ignored. The attached program demonstrates
this effect on OCaml 4.03 and 4.02.3.

A possible fix appears to be to fall back to the slow path for lists of fds greater
than FD_SETSIZE (like we do already for list of fds of varying types).
For example the following patch:

diff --git a/otherlibs/win32unix/select.c b/otherlibs/win32unix/select.c
index 0e21db8..b592e3a 100644
--- a/otherlibs/win32unix/select.c
+++ b/otherlibs/win32unix/select.c
@@ -909,9 +909,12 @@ static int fdlist_to_fdset(value fdlist, fd_set *fdset)
{
value l, c;
FD_ZERO(fdset);

  • int used = 0;
    for (l = fdlist; l != Val_int(0); l = Field(l, 1)) {
    c = Field(l, 0);
    if (Descr_kind_val(c) == KIND_SOCKET) {
  •  used++;
    
  •  if (used > FD_SETSIZE) return 0;
     FD_SET(Socket_val(c), fdset);
    
    } else {
    DEBUG_PRINT("Non socket value encountered");

This is similar to "#5563: harden Unix.select against file descriptors
above FD_SETSIZE" except that we can still handle these larger lists using
the more generic (but more complicated) path.

[1] https://msdn.microsoft.com/en-gb/library/windows/desktop/ms740141(v=vs.85).aspx

Steps to reproduce

On Windows (I'm using the cygwin based installer from https://fdopen.github.io/opam-repository-mingw/)

ocamlfind ocamlopt -package unix -linkpkg -o test.exe test.ml
./test.exe

For me the large Unix.select will time out and the program prints "ERROR" and exits with code 1. On OSX and on Windows with the patch above applied, it prints "OK" and exits with code 0.

Additional information

I encountered this when filtering all network connections from a VM through the Mirage TCP/IP stack and out into the Internet via regular sockets. Occasionally the I/O would stop (but other Lwt timer threads continue), and then after some timeout it would wake up again. I believe this is because the list of active socket connections was > 64 and all the activity was on the fds at the end of the list.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 21, 2016

Comment author: @alainfrisch

Can you confirm the following:

  • FD_SETSIZE is supposed to give the maximum number of file descriptors that can go into an fd_set, independently of their numerical values. Hence the fix suggested here, which counts the number of such file descriptors.

  • On some systems, FD_SETSIZE gives an upper bound to fd values that can go into an fd_set. This corresponds to the fix in #5563 (which would be overly restrictive with the other interpretation above). I assume that most Unix-like where OCaml is supported follow this convention. Is that right?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 21, 2016

Comment author: @alainfrisch

I confirm the behavior (tested with the MSVC 32-bit port), and reversing the list before the final select in the example has the expected effect.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 21, 2016

Comment author: @alainfrisch

Fixed by commit f642817.

It would be nice to include the test in the testsuite, but I'm concerned that its use of the network interface might break it. (E.g. on my machine, it triggered a Windows security popup on first use.)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 21, 2016

Comment author: @alainfrisch

@djs55: feel free to give your real name for the Changes file if you wish.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 22, 2016

Comment author: djs55

Many thanks for resolving this! For the Changes file could you put my real name "David Scott"?

Regarding your two statements, the Microsoft docs for select say:

The variable FD_SETSIZE determines the maximum number of descriptors in a set.
and
Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque.
so I think this confirms your first statement is correct on Windows.

I'm not enough of a Unix expert to confirm your second statement ("FD_SETSIZE gives an upper bound to fd values that can go into an fd_set") but I think that is correct. I believe Unix-like systems all use low integers as file descriptors (unlike Windows which uses opaque handles, probably pointer values). I believe select on Unix has traditionally used simple bitmaps where including fd "n" means setting bit "n" to 1. Therefore I think on Unix people are more concerned with ensuring the bitmaps are large enough to represent the highest-possible fd value (and the performance implications of frequently scanning large bitmaps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.