Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix.select blocks under Windows if same socket listed in first and third arguments #5327

Closed
vicuna opened this Issue Aug 3, 2011 · 5 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Aug 3, 2011

Original bug ID: 5327
Reporter: @dra27
Status: closed (set by @xavierleroy on 2013-08-31T10:44:15Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 3.12.0
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5325 #5329
Monitored by: @ygrek gildor

Bug description

If the same socket is listed in both read and error (first and third lists) for Unix.select in the Windows port then Unix.select never returns even if there is normal data to be read from the socket (see attached demonstration)

Additional information

The attached file can be built with ocamlopt -o select.exe unix.cmxa select.ml and then simply run. If you raw telnet to localhost:400 and type any line of text, the program should simply echo it to the console and then terminate.

Under Windows, the Unix.select call never returns. If you change the second [socket] to [] on line 12 then it works as expected. Code works under linux without change. It also works with the older version of otherlibs/win32unix/select.c

Previous testing with a larger file showed that if there's another socket in the read list which has data then the complete list of sockets returned by Unix.select is correct (i.e. it does return the sockets which were included in the error list, if they have data to be read - it's just that for some reason those sockets on their own don't cause Unix.select to return).

The original scenario for the loop had all client sockets in both the first and third lists but also the server socket in the first list - it meant that the server process only found out about pending data in the read socket when a new client connected!

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @xavierleroy

I have a hunch that this problem would go away if any of #5325 and #5329 is solved. Let's continue the discussion in these two PRs.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @dra27

I think fixing 5329 will simply mask the presence of this bug, given that it doesn't seem to be related to 5325

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 22, 2011

Comment author: @dra27

Well, the diagnosis was considerably easier than the fix! But it was worth investigating this a bit further rather than letting PR 5329 blur the issue...

The problem is that WSAEventSelect only allows one event to be associated with a given socket (explicitly described in MSDN halfway through remarks - http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx). The code given calls WSAEventSelect individually for both the readfds and exceptfds - the second call overrides the first meaning that the reason select is blocking is because it's in effect only looking for out-of-band data (this can be absolutely demonstrated by changing unix_select to process readfds after exceptfds)

The attached patch fixes the problem by hacking the way the worker is allocated - it scans each individual worker to see if the socket itself is already in the lists, rather than just finding a sockets worker. This could be done much more efficiently (by taking advantage of the maps generated to determine if the socket needs to be searched for in unix_select). This minimal patch fixes the problem. I'm not particularly interested in optimising it further as PR 5329 will, for the common case, short circuit this anyway. My impression is that this implementation of select is more about compatibility than speed!

Armed with this patch, the attached ML example now runs as it should - it would benefit from more testing (once I submit a patch to PR 5329, none of my own will walk this path...)

The patch also contains a minor correction to windbug.h - the VA_ARGS handling was incorrect for GCC when no arguments are given which prevented compilation in debugging mode. I haven't tested the debugging mode on MSVC, but according to comments on http://msdn.microsoft.com/en-us/library/ms177415(v=vs.80).aspx the MSVC compiler recognises GCC's ## preprocessor operator correctly (and in fact has a hack so that it wouldn't need it anyway).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 28, 2011

Comment author: @xavierleroy

That's a nontrivial fix indeed! Thanks for investigating. I'm willing to apply the patch, but before that let me give Sylvain Le Gall (a.k.a. gildor and the original author of this code) an opportunity to look at it and comment.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jan 14, 2012

Comment author: @xavierleroy

Patch applied in SVN trunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.