Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Unix.select blocks under Windows if same socket listed in first and third arguments #5327
Original bug ID: 5327
If the same socket is listed in both read and error (first and third lists) for Unix.select in the Windows port then Unix.select never returns even if there is normal data to be read from the socket (see attached demonstration)
The attached file can be built with ocamlopt -o select.exe unix.cmxa select.ml and then simply run. If you raw telnet to localhost:400 and type any line of text, the program should simply echo it to the console and then terminate.
Under Windows, the Unix.select call never returns. If you change the second [socket] to  on line 12 then it works as expected. Code works under linux without change. It also works with the older version of otherlibs/win32unix/select.c
Previous testing with a larger file showed that if there's another socket in the read list which has data then the complete list of sockets returned by Unix.select is correct (i.e. it does return the sockets which were included in the error list, if they have data to be read - it's just that for some reason those sockets on their own don't cause Unix.select to return).
The original scenario for the loop had all client sockets in both the first and third lists but also the server socket in the first list - it meant that the server process only found out about pending data in the read socket when a new client connected!
Comment author: @dra27
Well, the diagnosis was considerably easier than the fix! But it was worth investigating this a bit further rather than letting PR 5329 blur the issue...
The problem is that WSAEventSelect only allows one event to be associated with a given socket (explicitly described in MSDN halfway through remarks - http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx). The code given calls WSAEventSelect individually for both the readfds and exceptfds - the second call overrides the first meaning that the reason select is blocking is because it's in effect only looking for out-of-band data (this can be absolutely demonstrated by changing unix_select to process readfds after exceptfds)
The attached patch fixes the problem by hacking the way the worker is allocated - it scans each individual worker to see if the socket itself is already in the lists, rather than just finding a sockets worker. This could be done much more efficiently (by taking advantage of the maps generated to determine if the socket needs to be searched for in unix_select). This minimal patch fixes the problem. I'm not particularly interested in optimising it further as PR 5329 will, for the common case, short circuit this anyway. My impression is that this implementation of select is more about compatibility than speed!
Armed with this patch, the attached ML example now runs as it should - it would benefit from more testing (once I submit a patch to PR 5329, none of my own will walk this path...)
The patch also contains a minor correction to windbug.h - the VA_ARGS handling was incorrect for GCC when no arguments are given which prevented compilation in debugging mode. I haven't tested the debugging mode on MSVC, but according to comments on http://msdn.microsoft.com/en-us/library/ms177415(v=vs.80).aspx the MSVC compiler recognises GCC's ## preprocessor operator correctly (and in fact has a hack so that it wouldn't need it anyway).
Comment author: @xavierleroy
That's a nontrivial fix indeed! Thanks for investigating. I'm willing to apply the patch, but before that let me give Sylvain Le Gall (a.k.a. gildor and the original author of this code) an opportunity to look at it and comment.