Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocked Unix.recv in one thread blocks Unix.send in another thread under Windows #5325

Closed
vicuna opened this Issue Aug 2, 2011 · 14 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Aug 2, 2011

Original bug ID: 5325
Reporter: @dra27
Status: closed (set by @xavierleroy on 2017-02-16T14:18:20Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 3.12.0
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Tags: patch
Has duplicate: #6771
Related to: #4466 #5327 #5578
Monitored by: @protz

Bug description

It appears not be possible to write to a socket in one thread if another thread is blocked in a [recv] call on the same socket (and presumably vice versa).

This behaviour only seems to affect Windows - tried on both Windows 7 with 3.12.0 and Windows XP with 3.10.1 with the same effect.

Additional information

The attached file should be compiled with ocamlopt -o foo.exe -thread unix.cmxa threads.cmxa Foo.ml

It can then be invoked with, say:

foo www.google.com

and the line

GET / HTTP/1.0

followed by enter twice. On Linux, you'll see an HTTP redirect response. On Windows, the program will hang after the first carriage return eventually crashing with an exception when Google (or whoever) closes the socket at the other end on a timeout.

The equivalent code in C definitely works, so it appears to be a problem with Unix module / OCaml runtime.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 20, 2011

Comment author: @xavierleroy

I checked the Win32 implementation of send and recv, and the master lock is properly released, allowing both system calls to run in parallel. So, I suspect the mutual exclusion occurs in the Win32/Winsock system calls themselves. Concerning your comparison with C, did you keep in mind that OCaml creates sockets in blocking mode, which may not be the case in your C program? That could at least explain the difference.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @dra27

You're correct - I hadn't seen the setsockopt calls in select.c and accept.c. Attached is my hacked up C equivalent code (compiled with gcc -mno-cygwin -o foo.exe foo.c -lws2_32)

According to MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740532(v=vs.85).aspx), SO_OPENTYPE is deprecated and WSASocket should be used instead with WSA_FLAG_OVERLAPPED not given in dwFlags. If you alter the call to WSASocket in my example to have 0 for the last parameter then the C code exhibits the same problem as the OCaml code - so it's Winsock, not OCaml specifically.

But - why is this Microsoft specific socket option being set in the first place? As far as I can see, all you're doing is creating sockets which can't use overlapping operations (which aren't used anyway) - can't the call simply be removed? Note that overlapped I/O and blocking/non-blocking in Microsoft parlance are unrelated concepts: see first para of http://msdn.microsoft.com/en-us/library/windows/desktop/ms740087(v=VS.85).aspx - ioctlsocket still controls blocking or non-blocking settings.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @xavierleroy

The "SO_OPENTYPE = SO_SYNCHRONOUS_NONALERT" hack probably goes back to the early days of the win32unix library, i.e. Windows NT 4 / Windows 98 (or maybe even 3.5 / 95). At that time, I think it was necessary so that sockets would be created in blocking mode. But many things have changed since then. We'd gladly consider a patch to modernize this aspect of win32unix (and maybe others too), provided you give it some testing on your applications first.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @dra27

I'm just putting a Windows 7 virtual PC together for some serious testing ... I agree that I think all these bugs are related to this. Will get back to you later today!

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @dra27

OK - I've tested two possible ways of fixing it.

The first is simply to remove lines 31-41 & 49 in otherlibs/win32unix/select.c and 29, 34-42 & 48-52 in otherlibs/win32unix/accept.c

The second is to invert the logic of the getsockopt block instead - i.e. ensure that SO_OPENTYPE is zero rather than non-zero before calling socket (or accept)

I don't know which you'd prefer - the first option (deleting the problem code) was my instinct; the second one guards against another linked in C library messing around with Winsock options (I don't know whether this is something you'd want to care about in general or not, though).

Happy to provide a patch for either, if necessary.

Note that neither option fixes PR 5327, surprisingly.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 22, 2011

Comment author: @dra27

Patches against trunk for both options attached...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 28, 2011

Comment author: @xavierleroy

Thanks for the patches. For the sake of simplicity, I go with the "delete" patch, which is now applied in SVN trunk, commit 11966.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented May 24, 2012

Comment author: @xavierleroy

Because we lack time to fix #5578 in time for release 4.00, I reverted the patch on branch version/4.00 but left it on trunk.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 4, 2013

Comment author: @damiendoligez

Note that this was inadvertently reverted also in trunk when merging the 4.00.1->4.00.2+dev changes.

re-patched in trunk/4.01 (rev 13743) but then we still need to address #5578

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 1, 2013

Comment author: @alainfrisch

Re-opening since the fix for #5578 required to undo the fix for #5325.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 4, 2015

Comment author: @xavierleroy

See proposed fix in #331

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 9, 2015

Comment author: @alainfrisch

I checked that the example works fine with trunk and the MSVC 32-bit port.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 9, 2015

Comment author: @xavierleroy

Fix committed to trunk, will be in 4.03.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 9, 2015

Comment author: enrico

That is awesome! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.