Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix.select on Windows not handling reads and writes to same socket #4466

Closed
vicuna opened this Issue Dec 12, 2007 · 9 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Dec 12, 2007

Original bug ID: 4466
Reporter: omion
Status: closed (set by @xavierleroy on 2017-02-16T14:18:22Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.10.0
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: platform support (windows, cross-compilation, etc)
Related to: #5325 #5578 #6771
Monitored by: "Christoph Bauer"

Bug description

I am making a program which constantly listens on a socket for incoming data, in order to check for timeouts, but Unix.select seems to act strange on Windows. It will not register a socket available for reading if it was written to after it started listening. Since that sentence doesn't make sense, here's a layout:

  1. One thread runs "Unix.select [socket] [] [] timeout"
  2. Another thread writes out to the socket
  3. Data is received from a remote program

In this case, the "select" function should return as soon as 3. occurs. In Linux and OSX, this is exactly what happens. However it always times out in Windows.

Additional information

The attached files demonstrate the problem. The receiver simply responds to all incoming data. The sender sends three "packets":
The first packet is sent before the select occurs
The second and third packets are sent right after a select is started up
Compile the files separately. I used
ocamlopt -o send3-timeout -thread unix.cmxa threads.cmxa send3-timeout.ml
and
ocamlopt -o recv3-timeout -thread unix.cmxa threads.cmxa recv3-timeout.ml

Then run the receiver in one window:
recv3-timeout 45678
And the sender in another:
send3-timeout 127.0.0.1:45678

Under Linux, the sender returns with the following:
Selected after 0.000013 seconds
Selected after 0.052056 seconds
Selected after 0.052001 seconds

However, in Windows it returns the following:
Selected after 0.016000 seconds
TIMEOUT after 10.000000 seconds
Selected after 0.000000 seconds

Under Windows, the second select fails, but the second response is picked up immediately by the third select.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 12, 2007

Comment author: omion

And... I just realized I selected "Caml-light" for the submission. This actually occurs in OCaml 3.10.0 for Windows.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jan 9, 2008

Comment author: omion

In the meantime, I made a nasty hack that can get around this issue. I made a socketpair called (fake_in, fake_out) (not easy in Windows since the Unix.socketpair is not supported) which I use to "reset" the select.

Whenever I write something to the real socket, I write to fake_in as well. Then I use the following for the select statement:
select [fake_out;real_sock] [] [] timeout

This will return fake_out whenever something was written to the real socket, so I can re-run the select statement and not have the problem.

However, right around the time I added this statement in the program, it started crashing (it comes up with the Windows "do you want to debug?" dialog). I have no idea if it is related, but it looks fishy.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Aug 14, 2009

Comment author: Christoph Bauer

seems to be solved with ocaml >= 3.11.0.

Selected after 0.000000 seconds
Selected after 0.063000 seconds
Selected after 0.047000 seconds

Tested with the patch from #4844.

Updated: On the other hand in a more complex program I see also
such a strange behaviour. I do something like
select [sock] [] [sock] 1.0
No timeout
select [] [sock] [sock] 1.0
No timeout
select [sock] [] [sock] 1.0
timeout
select [sock] [] [sock] 1.0
No timeout (exactly 0.0s)

Update II:
The problem indeed seems to be, that sock is used twice in my select calls.
There is a difference if the third list is empty.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 22, 2013

Comment author: omion

I'm making another program which would run into this bug, but it looks like it's fixed if the new[*] select function (from revision 10467, I think) is used.

For the Unix.select line in send3-timeout.ml:
match Unix.select [sock] [] [] timeout with

I added Unix.stdin:
match Unix.select [sock] [] [Unix.stdin] timeout with

This forces the new function, returning good values:
Selected after 0.000000 seconds
Selected after 0.062000 seconds
Selected after 0.062000 seconds

It's still a bit hackish, since I have no idea if/when Unix.stdin would actually be selected with an exception, but it seems to work...

[*] Well... "new" in relation to this bug report. I guess that function's 3 years old already.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 28, 2015

Comment author: @xavierleroy

Any update on this problem report for the most recent OCaml release (4.02.3) or even the development version? In the absence of additional information, I move to close it.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 28, 2015

Comment author: omion

I just did a 64-bit MSVC compile of commit e22eabe (4.03.0+dev11-2015-10-19), and the issue is still there.

It looks like bypassing the "classic" select function (as win32unix/select.c calls it) fixes the problem. However, I wouldn't be able to say if this is a safe change - most of that file is way above my head.

My fix in the past was to create an alternative to the Unix.select [edit: I mean Unix.socket] function which does not set SO_SYNCHRONOUS_NONALERT. That solution works fine for how I use sockets, but has problems with some functions (see the attempted fix for #5325 and the resulting issues in #5578).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 29, 2015

Comment author: @xavierleroy

Thank you for the quick retest, and for identifying SO_SYNCHRONOUS_NONALERT as the likely culprit. It's time for us to do something about that.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 4, 2015

Comment author: @xavierleroy

See proposed fix in #331

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 9, 2015

Comment author: @xavierleroy

Fix committed to trunk, will be in 4.03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.